Comments on Mechanical Sympathy: Smart Batching

This is a good technique for increasing throughput...

2021-05-18T14:45:40.910+01:00

This is a good technique for increasing throughput to a DB, especially when doing inserts.

Amazing article. I have a question: in this articl...

2021-05-17T14:21:25.533+01:00

Amazing article.
I have a question: in this article you are talking about IO.
I was thinking about other situation: for example, imagine if you have a REST api that need to store some data in a database (very simple).
Usually, you'll find a rest service working in sync storing row by row. I made some tests and basically collecting in async all requests in a queue and doing just one batch update on database improves strongly the performance (let's say 10x on my dumb MacOs).

What do you think about using this in the scenario I mentioned?
I find very difficult to pass this idea, because all devs seem in love with "sync approach"

I apologize if this went too far in the direction ...

2014-12-22T14:53:19.382+00:00

I apologize if this went too far in the direction of support for the disruptor. I was trying to best understand the smart batch implementation you highlighted and got carried away.

I'm in the process of reading Agrona, I'll get to Aeron eventually.

Again, thanks for your time, it's really appreciated.

It is possible to take control of the cursor and s...

2014-12-22T07:05:19.058+00:00

It is possible to take control of the cursor and sequence advancement with the Disruptor. I do not contribute to, or support, this project any longer. To get advice I'd recommend you ask in the Disruptor Google Group.

https://groups.google.com/forum/#!forum/lmax-disruptor

These days I use more flexible and higher performance data structures for such requirements. Examples can be found in the Aeron messaging system.

https://github.com/real-logic/Aeron

I know it's not a complete system and that'...

2014-12-21T21:31:23.283+00:00

I know it's not a complete system and that's the reason I'm asking these questions. Looking at articles and wikis on it, there aren't that many recommendations on how to best interact between the disruptor and other systems.

The reason I was pointing out asynchronous I/O and errors is I'm not sure how to keep this reserve in the first place. A sequence immediately advances upon return from the event handler. If I need to retry it seems the best way so far to keep this reserve is to make sure the event handler does not return before it's successful or it has exhausted the number of retries.

The other thing I can do with these sequences is to chain event handler B on the result of A, etc. Again, it's still not clear to me how a number of retries could be modelled like this.

Looking around at the code, I see there is a polling consumer that manipulates the sequences directly and calls another event handler itself. If nothing else, it does seem to indicate it's possible to do creative things here.

Are you saying this cursor, the sequence, can be stopped from advancing under certain conditions?

The Disruptor is just a data structure, not a comp...

2014-12-21T16:40:40.660+00:00

The Disruptor is just a data structure, not a complete system :-)

Let's consider what can go wrong.

If the IO fails it is usually something very bad and cannot be retried, e.g. hardware failure, disk full, etc. So you better have a resilient system and fail over to a secondary or be running multi-master. If your storage is remote NFS then you have asked for a world of hurt.

If something can be retried, such as a when a receiving network node was restarting, then things need to be replayed. With the Disruptor this can be achieved with a gating Sequence that keeps a reserve. However what if you go beyond the reserve? For this you need a persistent log to replay from. Even better if you are replaying messages containing CRDTs so ordering does not matter. If sequence matters then the receiver needs to be able to cope with idempotent delivery.

In a distributed world you need a higher level protocol to report on the success or failure of operations. The network, sync or async sends, cannot alone provide that. What if a failure happens before you receive the reply from a sync call? The world is really async in hardware and the lower layers.

For the replay to work, reporting on success or fa...

2014-12-21T14:47:31.988+00:00

For the replay to work, reporting on success or failure of operations is needed. When it comes to the disruptor, it's still not clear to me how you report this. Would a second disruptor be used to signal what succeeded or failed back to the higher level components?

Equally unclear is what if sequences of data have to be honored, sub-streams within the stream. In such a scenario an occasional failure I worry I would send later data first.

This technique is very useful when dealing with sy...

2014-12-21T14:34:29.559+00:00

This technique is very useful when dealing with synchronous IO and "hiding" it.

To deal with asynchronous IO and handling failure it is better to have higher level application replay. When it comes to distributed systems I just expect all remote calls to fail occasionally and be able to cope with it as a first class design principle.

I either case it allows for very fast non-blocking interactions and the ability to use the IO efficiently by batching.

I'm going through every entry on this blog and...

2014-12-21T14:07:16.060+00:00

I'm going through every entry on this blog and it seems this strategy seems to conflict with the recommendation to use asynchronous I/O in other places. Wouldn't we be forced to stick to synchronous I/O here? That way if there is an error in the network batcher, we have consumed the next events from the disruptor? I'm also thinking of the recommendation to use asynchronous database drivers here too.

This would allow to retry without too much effort.

For the thread and replies on the Akka user group ...

2014-05-26T14:05:32.265+01:00

For the thread and replies on the Akka user group see this thread and links therein: https://groups.google.com/d/msg/akka-user/P0PMxj5zhwM/vv51gncjwTAJ

Hi Martin, I have a use-case for something like t...

2014-04-29T15:41:38.435+01:00

Hi Martin,

I have a use-case for something like this in an Akka system. Are you aware of anyone else who's tried this (in Akka) before?

(I've posted on the Akka user group too!)

If your single buffer is well designed and lock-fr...

2014-01-19T18:38:02.303+00:00

If your single buffer is well designed and lock-free it tends to not to be an issue for contention. It can also lend itself better than bulk sequential reads whereas multiple buffer would require scatter/gather operations.

I also find a single buffer allows for a more predictable memory footprint.

To key to all this is having a good lock-free buffer implementation.

Wouldn't it be better for the network writer t...

2014-01-12T22:44:05.912+00:00

Wouldn't it be better for the network writer thread to round robin from one ring-buffer per producer instead of having them contend on a single ring-buffer?

Not sure I'm answering the right question here...

2013-04-12T07:53:25.953+01:00

Not sure I'm answering the right question here but what I think you are asking I'll have a go at. LBQ has only a wait strategy using locks which causes contention and huge latencies when the OS is used to signal on the condition variables. Rolling your own wait strategy is more development work but can scale significantly better and in some cases achieve lower latencies. Wait strategies are often a conflated concern I like to factor out in a design.

Why would rolling your own wait strategy with CLQ ...

2013-04-12T02:10:46.602+01:00

Why would rolling your own wait strategy with CLQ do better than LinkedBlockingQueue which designs it in?

Xin, I also meant to point out that even if your ...

2011-10-26T09:55:06.632+01:00

Xin,

I also meant to point out that even if your messages are larger than the block/MTU size you may still want to use this technique to avoid contention and allow the work thread to continue processing without waiting on the IO device.

Xin, The advantages from batching come when the m...

2011-10-24T06:11:50.798+01:00

Xin,

The advantages from batching come when the messages are smaller than the MTU or block size. If greater than MTU or block size then it is best to let the library just write as normal.

Taras, LMAX currently have no plans to open sourc...

2011-10-24T06:08:36.722+01:00

Taras,

LMAX currently have no plans to open source its collections library.

I have one question that may be related to batchin...

2011-10-22T03:28:10.368+01:00

I have one question that may be related to batching:
Assume we want to write 1M bytes to a tcp socket:
method1: split it into small packets with the size close to MTU and send them one by one.
method2: try to send the whole buffer and let API figure out how much can be sent.
Which method is better?
How about writing 1M bytes to disk?

(unrelated to this blog post) Is there any news on...

2011-10-20T00:40:51.330+01:00

(unrelated to this blog post)
Is there any news on the LMAX collections library that was mentioned in the InfoQ presentation?