tag:blogger.com,1999:blog-5560209661389175529.post5089162884965261703..comments2021-11-26T19:34:10.855+00:00Comments on Mechanical Sympathy: Single Writer PrincipleMartin Thompsonhttp://www.blogger.com/profile/15893849163924476586noreply@blogger.comBlogger61125tag:blogger.com,1999:blog-5560209661389175529.post-74721475189747847252019-03-08T00:47:13.721+00:002019-03-08T00:47:13.721+00:00Thank you Martin. For sharing this knowledge. This...Thank you Martin. For sharing this knowledge. This is one of the best blogs for understanding micro architecture for applications.amithttps://www.blogger.com/profile/04420874727730243383noreply@blogger.comtag:blogger.com,1999:blog-5560209661389175529.post-31900953002680473352019-02-19T20:53:37.082+00:002019-02-19T20:53:37.082+00:00in short you decided to use single thread to fetch...in short you decided to use single thread to fetch results from network which is scalable up to a point anyway, decided to consume completions from multile threads which are doing independent jobs so consuming queue seperately, which is basically single producer/single consumer which doesnt require synchronization if you use linked list or bounded ring buffer and doesnt block if qeueu empty thus 0 synchronization, but polling.Anonymoushttps://www.blogger.com/profile/02576056128790921860noreply@blogger.comtag:blogger.com,1999:blog-5560209661389175529.post-9865903771599313412019-02-19T20:45:14.853+00:002019-02-19T20:45:14.853+00:00i thought you delegate all write operations into s...i thought you delegate all write operations into single thread without interesting in immediate changes in state in process. and you communicate that thread without locks.<br />after investigating disruptor pattern i see there comes data from outside, one thread is responsible for writing it into fixed sized ring buffer, and other threads doesnt consume anything, just increment their own local counters. this is very specific for you particular needs and i cant get how you invented something it was complex before.Anonymoushttps://www.blogger.com/profile/02576056128790921860noreply@blogger.comtag:blogger.com,1999:blog-5560209661389175529.post-66458748737562434622019-02-14T21:10:58.045+00:002019-02-14T21:10:58.045+00:00On your first question this can be done without lo...On your first question this can be done without locks using a number of techniques such a model changes via a single reference or guard with before and after counters. For the second, if a queue support either multiple producers or multiple consumers then it is not single writer and then you bring in the complexity.Martin Thompsonhttps://www.blogger.com/profile/15893849163924476586noreply@blogger.comtag:blogger.com,1999:blog-5560209661389175529.post-75142681918798951392019-02-13T20:35:49.808+00:002019-02-13T20:35:49.808+00:00first question is about logic, ok single thread wr...first question is about logic, ok single thread writes data and others can read liberally without locks, but what if this write operation is not atomic and/or incures multiple locations? wont other threads see inconsistent state? what if they always require latest state, wont they wait until current writes flushed by writer? writes initiated from a thread wont be visible to others until write operation completed.<br />second about implementation: im sure you didnt replace queue implementation without using CAS/Spinning based approahes. which basically cant make your implementation equal/better than having seperate Spinlocks/MCS Spinlocks around tail and head of queue (in case of single producer or single-consumer there will be lock only on one end).Anonymoushttps://www.blogger.com/profile/02576056128790921860noreply@blogger.comtag:blogger.com,1999:blog-5560209661389175529.post-88457954108256195782017-10-15T02:35:16.783+01:002017-10-15T02:35:16.783+01:00It has been some time since last talk about this, ...It has been some time since last talk about this, but here are some perhaps more mature and better looking results<br /><br />https://docs.google.com/document/d/1X7RobP2tn7OyGZ8VrXzxa5R5fhT38QnO2mx1CrKWXrE/edit?usp=sharing<br /><br />I believe this aligns with the expectation of multiple consumers contention and the single writer (producer) multiplexing benefit.<br /><br />Difference between JDK (BlockingQueue) and JCTools is more than obvious, even though I do not have a powerful enough machine to measure accurately.<br /><br />Benchmarking results from JMH coming soon, although it is not expected to change the difference ratio noticeably.<br /><br />Disruptor version coming soon as well.<br /><br />I intend to publish the project under GitHub soon.<br /><br />I hope this gives some more food for thought :-)<br />Elias Balasishttps://www.blogger.com/profile/17309749755524092300noreply@blogger.comtag:blogger.com,1999:blog-5560209661389175529.post-69555367065828089892017-10-15T02:30:24.894+01:002017-10-15T02:30:24.894+01:00It has been some time since last talk about this, ...It has been some time since last talk about this, but here are some perhaps more mature and better looking results<br /><br />https://docs.google.com/document/d/1X7RobP2tn7OyGZ8VrXzxa5R5fhT38QnO2mx1CrKWXrE/edit?usp=sharing<br /><br />I believe this aligns with the expectation of multiple consumers contention and the single writer (producer) multiplexing benefit.<br /><br />Difference between JDK (BlockingQueue) and JCTools is more than obvious, even though I do not have a powerful enough machine to measure accurately.<br /><br />Benchmarking results from JMH coming soon, although it is not expected to change the difference ratio noticeably.<br /><br />Disruptor version coming soon as well.<br /><br />I intend to publish the project under GitHub soon.<br /><br />I hope this give some food for thought<br />Elias Balasishttps://www.blogger.com/profile/17309749755524092300noreply@blogger.comtag:blogger.com,1999:blog-5560209661389175529.post-6312937562208791322017-05-10T21:46:31.927+01:002017-05-10T21:46:31.927+01:00This comment has been removed by the author.Elias Balasishttps://www.blogger.com/profile/17309749755524092300noreply@blogger.comtag:blogger.com,1999:blog-5560209661389175529.post-31698404796140912862017-05-09T22:47:25.643+01:002017-05-09T22:47:25.643+01:00This comment has been removed by the author.Elias Balasishttps://www.blogger.com/profile/17309749755524092300noreply@blogger.comtag:blogger.com,1999:blog-5560209661389175529.post-17978809339929754052017-05-09T18:20:38.033+01:002017-05-09T18:20:38.033+01:00Unfortunately, I don't have such a machine (8 ...Unfortunately, I don't have such a machine (8 cores only).<br />I am indeed measuring the scheduler significantly I am afraid.<br /><br />Even so, the measurement displays the dramatic difference between queue implementations which has been one of my original goals.<br /><br />The added value here is the measurement of the "multiplexer" (SingleWriter impl)<br /><br />Still, without a good machine I cannot prove much I am afraid.<br /><br />So, I intend to isolate and share my code.<br /><br />I will get back soon.<br /><br />Thanks again for helping, this is good stuff :-)<br />Elias Balasishttps://www.blogger.com/profile/17309749755524092300noreply@blogger.comtag:blogger.com,1999:blog-5560209661389175529.post-42819075848399057802017-05-09T17:52:48.189+01:002017-05-09T17:52:48.189+01:00Did you test on a machine capable of truly running...Did you test on a machine capable of truly running 40+ threads concurrently? If the number of available cores is significantly less then you are measuring the scheduler for a significant part of the test and not the concurrent interactions on the FIFO data structures.Martin Thompsonhttps://www.blogger.com/profile/15893849163924476586noreply@blogger.comtag:blogger.com,1999:blog-5560209661389175529.post-8638179457666152672017-05-09T17:03:03.547+01:002017-05-09T17:03:03.547+01:00Here is a JMH benchmark using
JDK (BlockingQueues)...Here is a JMH benchmark using<br />JDK (BlockingQueues)<br />and<br />JCTools (Bounded Arrays with Yielding IdleStrategy)<br /><br />https://drive.google.com/open?id=0B3tIYUvecvkAZlRDb21NYTBZbEE<br /><br />It looks to me that, as expected, as the number of producers grows JCTools gets the advantage over JDK's BlockingQueue<br /><br />"Disruptor" could be another impl to measure<br /><br />Any thoughts?<br /><br />I am trying to pull out the benchmark code and share it somewhere for convenience<br />Elias Balasishttps://www.blogger.com/profile/17309749755524092300noreply@blogger.comtag:blogger.com,1999:blog-5560209661389175529.post-34147791596798052342017-05-04T08:22:53.923+01:002017-05-04T08:22:53.923+01:00Best to ask at the JCTools.Best to ask at the JCTools.Martin Thompsonhttps://www.blogger.com/profile/15893849163924476586noreply@blogger.comtag:blogger.com,1999:blog-5560209661389175529.post-32462962122894497452017-05-03T21:11:12.202+01:002017-05-03T21:11:12.202+01:00Any direction for "wait strategy" with J...Any direction for "wait strategy" with JCTools?Elias Balasishttps://www.blogger.com/profile/17309749755524092300noreply@blogger.comtag:blogger.com,1999:blog-5560209661389175529.post-14359568926631505472017-05-03T19:58:23.706+01:002017-05-03T19:58:23.706+01:00Yes the Disruptor is another example.Yes the Disruptor is another example.Martin Thompsonhttps://www.blogger.com/profile/15893849163924476586noreply@blogger.comtag:blogger.com,1999:blog-5560209661389175529.post-66532416116215962472017-05-03T19:55:21.619+01:002017-05-03T19:55:21.619+01:00Can "Disruptors" be used as well?
for co...Can "Disruptors" be used as well?<br />for comparison to "JCTools" queues and JDK blocking queues<br />Elias Balasishttps://www.blogger.com/profile/17309749755524092300noreply@blogger.comtag:blogger.com,1999:blog-5560209661389175529.post-28657730155507401092017-04-29T21:08:27.081+01:002017-04-29T21:08:27.081+01:00Try the queues from JCTools or Agrona that are sin...Try the queues from JCTools or Agrona that are single producer with the ability to do batch draining.<br /><br />https://github.com/real-logic/Agrona<br />https://github.com/JCTools/JCTools<br /><br />Running your benchmarks with JMH and publishing the tests for review can help with getting support from others.Martin Thompsonhttps://www.blogger.com/profile/15893849163924476586noreply@blogger.comtag:blogger.com,1999:blog-5560209661389175529.post-2773907761755516022017-04-29T20:32:42.993+01:002017-04-29T20:32:42.993+01:00Thanks Martin,
Trying to get to the bottom of thi...Thanks Martin,<br /><br />Trying to get to the bottom of this and share to all,<br /><br />Assuming that multiplexing means collecting all input from multiple producers into a collection and forward this collection as a single producer/writer to a final collection for consumption by final consumers<br /><br />here are some early calculations<br />(using JDK blocking queues, disruptor variant coming soon)<br /><br />|=============|===========|===========|==============|==================|================|<br />| EVENTS | CONSUMERS | PRODUCERS | MILLIS | TYPE | IMPLEMENTATION |<br />|=============|===========|===========|==============|==================|================|<br />| 10000000 | 1 | 10 | 2087 | MULTIPLE_WRITERS | JDK |<br />| 10000000 | 1 | 1 | 2305 | MULTIPLE_WRITERS | JDK |<br />| 10000000 | 10 | 100 | 2770 | MULTIPLE_WRITERS | JDK |<br />| 10000000 | 10 | 1 | 2922 | MULTIPLE_WRITERS | JDK |<br />| 10000000 | 100 | 10 | 2946 | MULTIPLE_WRITERS | JDK |<br />| 10000000 | 10 | 10 | 3002 | MULTIPLE_WRITERS | JDK |<br />| 10000000 | 100 | 100 | 3061 | MULTIPLE_WRITERS | JDK |<br />| 10000000 | 100 | 10 | 4305 | SINGLE_WRITER | JDK |<br />| 10000000 | 100 | 100 | 4535 | SINGLE_WRITER | JDK |<br />| 10000000 | 10 | 100 | 4651 | SINGLE_WRITER | JDK |<br />| 10000000 | 10 | 10 | 4737 | SINGLE_WRITER | JDK |<br />| 10000000 | 1 | 10 | 5238 | SINGLE_WRITER | JDK |<br />| 10000000 | 10 | 1 | 5807 | SINGLE_WRITER | JDK |<br />| 10000000 | 1 | 1 | 6040 | SINGLE_WRITER | JDK |<br />|=============|===========|===========|==============|==================|================|<br />(sorry for badly aligned output, no support for fixed size fonts)<br /><br />So far, it looks like multiple writers perform better.<br />Elias Balasishttps://www.blogger.com/profile/17309749755524092300noreply@blogger.comtag:blogger.com,1999:blog-5560209661389175529.post-34092016423588933352017-04-27T14:35:39.106+01:002017-04-27T14:35:39.106+01:00Yes :-)Yes :-)Martin Thompsonhttps://www.blogger.com/profile/15893849163924476586noreply@blogger.comtag:blogger.com,1999:blog-5560209661389175529.post-55031213096301898022017-04-27T14:22:32.575+01:002017-04-27T14:22:32.575+01:00Thanks Martin,
It is trial and error finally, as ...Thanks Martin,<br /><br />It is trial and error finally, as usual :-)<br /><br />I intend to measure this anyway.<br /><br />However it sounds like, the larger the contention (number of producers) the more the chances for a multiplexer to be performing better.<br /><br />Is this a valid statement?<br />Elias Balasishttps://www.blogger.com/profile/17309749755524092300noreply@blogger.comtag:blogger.com,1999:blog-5560209661389175529.post-67473875967543559832017-04-27T13:33:44.851+01:002017-04-27T13:33:44.851+01:00Best thing to do it measure for your usecase. I...Best thing to do it measure for your usecase. I've found a multiplexer can often be better performance and easier to reason about. It comes down to levels of contention and number of producers. Other times it is not.Martin Thompsonhttps://www.blogger.com/profile/15893849163924476586noreply@blogger.comtag:blogger.com,1999:blog-5560209661389175529.post-76095300356296222272017-04-27T11:41:29.710+01:002017-04-27T11:41:29.710+01:00This is good stuff guys, well done and well said,
...This is good stuff guys, well done and well said,<br /><br />However, one question crossed my mind while trying to apply the "multiplexer thread" technique above.<br /><br />Is the multiplexing cost equivalent to the loss of single writer advantage?<br />If yes, then perhaps a multiple writer scenario would work equally fast.<br /><br />Any thoughts?<br />Elias Balasishttps://www.blogger.com/profile/17309749755524092300noreply@blogger.comtag:blogger.com,1999:blog-5560209661389175529.post-73081825812850713852017-01-07T15:33:52.611+00:002017-01-07T15:33:52.611+00:00There are few mechanisms for safe publication that...There are few mechanisms for safe publication that you can use that vary in their cost and power. Using "volatile" is definitely an option in this case. If a value is written only once you can use "final" modifier that also guarantees safe publication. If you have multiple writers you can use AtomicInteger (https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/atomic/AtomicInteger.html) that gives volatility and atomicity.Ivan Mushketykhttps://www.blogger.com/profile/02170717132764703938noreply@blogger.comtag:blogger.com,1999:blog-5560209661389175529.post-52737823240932042002017-01-07T01:28:05.678+00:002017-01-07T01:28:05.678+00:00Is there a way to prevent the JVM optimization wit...Is there a way to prevent the JVM optimization without the cost of the memory barriers or volatiles or locks etc? For example, C++ has volatile as well but it simply means to not use compiler optimizations.pokerbot101https://www.blogger.com/profile/10527311039801522583noreply@blogger.comtag:blogger.com,1999:blog-5560209661389175529.post-12613789909416555912017-01-06T11:47:26.100+00:002017-01-06T11:47:26.100+00:00Yes you need to have a synchronization or any othe...Yes you need to have a synchronization or any other mechanism that ensures safe publication: https://shipilev.net/blog/2014/safe-public-construction/#_safe_publication<br /><br />The problem is that if one thread writes data to a variable Java Memory Model does not guarantee that other threads will see the written value unless you use safe publication mechanism. If you don't use synchronization, JVM may assume that a variable is only used by a single thread and may "optimize" write operations in a way that no other threads will ever see any writes.Ivan Mushketykhttps://www.blogger.com/profile/02170717132764703938noreply@blogger.com