<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-5560209661389175529</id><updated>2012-02-25T08:11:18.456Z</updated><category term='C++'/><category term='Modelling'/><category term='Disruptor'/><category term='Performance'/><category term='Memory Barriers'/><category term='Low-Latency'/><category term='IO'/><category term='Refurbishment'/><category term='Networking'/><category term='NIO'/><category term='Locks'/><category term='Atomic'/><category term='Memory'/><category term='Processor Affinity'/><category term='Batching'/><category term='Design'/><category term='Threads'/><category term='DDD'/><category term='Event Sourcing'/><category term='Java'/><category term='Concurrent'/><category term='Refactoring'/><category term='False Sharing'/><category term='Testing'/><title type='text'>Mechanical Sympathy</title><subtitle type='html'>Hardware and software working together in harmony</subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://mechanical-sympathy.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5560209661389175529/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://mechanical-sympathy.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>Martin Thompson</name><uri>http://www.blogger.com/profile/15893849163924476586</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>19</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-5560209661389175529.post-4208696484085669390</id><published>2011-12-26T19:47:00.001Z</published><updated>2012-01-19T00:25:01.439Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='Java'/><category scheme='http://www.blogger.com/atom/ns#' term='IO'/><category scheme='http://www.blogger.com/atom/ns#' term='Performance'/><category scheme='http://www.blogger.com/atom/ns#' term='NIO'/><title type='text'>Java Sequential IO Performance</title><content type='html'>&lt;div dir="ltr" style="text-align: left;" trbidi="on"&gt;&lt;div dir="ltr" style="text-align: left;" trbidi="on"&gt;Many applications record a series of events to file-based storage for later use.&amp;nbsp; This can be anything from logging and auditing, through to keeping a transaction redo log in an &lt;a href="http://martinfowler.com/eaaDev/EventSourcing.html"&gt;event sourced&lt;/a&gt; design or its close relative &lt;a href="http://martinfowler.com/bliki/CQRS.html"&gt;CQRS&lt;/a&gt;.&amp;nbsp; &lt;br /&gt;&lt;br /&gt;Java has a number of means by which a file can be sequentially written to, or read back again.&amp;nbsp; This article explores some of these mechanisms to understand their performance characteristics.&amp;nbsp; For the scope of this article I will be using pre-allocated files because I want to focus on performance.&amp;nbsp; Constantly extending a file imposes a significant performance overhead and adds jitter to an application resulting in highly variable latency.&amp;nbsp; "Why is a pre-allocated file better performance?", I hear you ask.&amp;nbsp; Well, on disk a file is made up from a series of blocks/pages containing the data.&amp;nbsp; Firstly, it is important that these blocks are contiguous to provide fast sequential access.&amp;nbsp;&amp;nbsp; Secondly, meta-data must be allocated to describe this file on disk and saved within the file-system.&amp;nbsp; A typical large file will have a number of "indirect" blocks allocated to describe the chain of data-blocks containing the file contents that make up part of this meta-data.&amp;nbsp;&amp;nbsp; I'll leave it as an exercise for the reader, or maybe a later article, to explore the performance impact of not preallocating the data files.&amp;nbsp; If you have used a database you may have noticed that it preallocates the files it will require.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size: large;"&gt;&lt;b&gt;The Test&lt;/b&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;I want to experiment with 2 file sizes.&amp;nbsp; One that is sufficiently large to test sequential access, but can easily fit in the file-system cache, and another that is much larger so that the cache subsystem is forced to retire pages so that new ones can be loaded.&amp;nbsp; For these two cases I'll use 400MB and 8GB respectively.&amp;nbsp; I'll also loop over the files a number of times to show the pre and post warm-up characteristics.&lt;br /&gt;&lt;br /&gt;I'll test 4 means of writing and reading back files sequentially:&lt;br /&gt;&lt;ol style="text-align: left;"&gt;&lt;li&gt;&lt;a href="http://docs.oracle.com/javase/6/docs/api/java/io/RandomAccessFile.html"&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;RandomAccessFile&lt;/span&gt;&lt;/a&gt; using a vanilla &lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;byte[]&lt;/span&gt; of page size.&lt;/li&gt;&lt;li&gt;Buffered &lt;a href="http://docs.oracle.com/javase/6/docs/api/java/io/FileInputStream.html"&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;FileInputStream&lt;/span&gt;&lt;/a&gt; and &lt;a href="http://docs.oracle.com/javase/6/docs/api/java/io/FileOutputStream.html"&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;FileOutputStream&lt;/span&gt;&lt;/a&gt;.&lt;/li&gt;&lt;li&gt;NIO &lt;a href="http://docs.oracle.com/javase/6/docs/api/java/nio/channels/FileChannel.html"&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;FileChannel&lt;/span&gt;&lt;/a&gt; with &lt;a href="http://docs.oracle.com/javase/6/docs/api/java/nio/ByteBuffer.html"&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;ByteBuffer&lt;/span&gt;&lt;/a&gt; of page size.&lt;/li&gt;&lt;li&gt;Memory mapping a file using NIO and direct &lt;a href="http://docs.oracle.com/javase/6/docs/api/java/nio/MappedByteBuffer.html"&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;MappedByteBuffer&lt;/span&gt;&lt;/a&gt;.&lt;/li&gt;&lt;/ol&gt;The tests are run on a 2.0Ghz Sandybridge CPU with 8GB RAM, an Intel 320 SSD on Fedora Core 15 64-bit Linux with an ext4 file system, and Oracle JDK 1.6.0_30.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;&lt;span style="font-size: large;"&gt;The Code &lt;/span&gt;&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;import java.io.*;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;import java.nio.ByteBuffer;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;import java.nio.MappedByteBuffer;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;import java.nio.channels.FileChannel;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;import static java.lang.Integer.MAX_VALUE;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;import static java.lang.System.out;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;import static java.nio.channels.FileChannel.MapMode.READ_ONLY;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;import static java.nio.channels.FileChannel.MapMode.READ_WRITE;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;public final class TestSequentialIoPerf&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;{&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; public static final int PAGE_SIZE = 1024 * 4;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; public static final long FILE_SIZE = PAGE_SIZE * 2000L * 1000L;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; public static final String FILE_NAME = "test.dat";&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; public static final byte[] BLANK_PAGE = new byte[PAGE_SIZE];&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; public static void main(final String[] arg) throws Exception&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; preallocateTestFile(FILE_NAME);&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; for (final PerfTestCase testCase : testCases)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; for (int i = 0; i &amp;lt; 5; i++)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; System.gc();&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; long writeDurationMs = testCase.test(PerfTestCase.Type.WRITE,&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; FILE_NAME);&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; System.gc();&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; long readDurationMs = testCase.test(PerfTestCase.Type.READ,&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; FILE_NAME);&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; long bytesReadPerSec = (FILE_SIZE * 1000L) / readDurationMs;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; long bytesWrittenPerSec = (FILE_SIZE * 1000L) / writeDurationMs;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; out.format("%s\twrite=%,d\tread=%,d bytes/sec\n",&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; testCase.getName(),&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; bytesWrittenPerSec, bytesReadPerSec);&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; deleteFile(FILE_NAME);&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; private static void preallocateTestFile(final String fileName)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; throws Exception&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; RandomAccessFile file = new RandomAccessFile(fileName, "rw");&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; for (long i = 0; i &amp;lt; FILE_SIZE; i += PAGE_SIZE)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; file.write(BLANK_PAGE, 0, PAGE_SIZE);&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; file.close();&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; private static void deleteFile(final String testFileName) throws Exception&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; File file = new File(testFileName);&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; if (!file.delete())&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; out.println("Failed to delete test file=" + testFileName);&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; out.println("Windows does not allow mapped files to be deleted.");&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; public abstract static class PerfTestCase&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; public enum Type { READ, WRITE }&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; private final String name;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; private int checkSum;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; public PerfTestCase(final String name)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; this.name = name;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; public String getName()&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; return name;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; public long test(final Type type, final String fileName)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; long start = System.currentTimeMillis();&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; try&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; switch (type)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; case WRITE:&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; checkSum = testWrite(fileName);&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; break;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; case READ:&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; final int checkSum = testRead(fileName);&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; if (checkSum != this.checkSum)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; final String msg = getName() +&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; " expected=" + this.checkSum +&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; " got=" + checkSum;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; throw new IllegalStateException(msg);&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; break;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; catch (Exception ex)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ex.printStackTrace();&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; return System.currentTimeMillis() - start;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; public abstract int testWrite(final String fileName) throws Exception;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; public abstract int testRead(final String fileName) throws Exception;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; private static PerfTestCase[] testCases =&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; new PerfTestCase("RandomAccessFile")&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; public int testWrite(final String fileName) throws Exception&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; RandomAccessFile file = new RandomAccessFile(fileName, "rw");&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; final byte[] buffer = new byte[PAGE_SIZE];&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; int pos = 0;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; int checkSum = 0;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; for (long i = 0; i &amp;lt; FILE_SIZE; i++)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; byte b = (byte)i;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; checkSum += b;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; buffer[pos++] = b;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; if (PAGE_SIZE == pos)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; file.write(buffer, 0, PAGE_SIZE);&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; pos = 0;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; file.close();&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; return checkSum;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; public int testRead(final String fileName) throws Exception&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; RandomAccessFile file = new RandomAccessFile(fileName, "r");&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; final byte[] buffer = new byte[PAGE_SIZE];&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; int checkSum = 0;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; int bytesRead;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; while (-1 != (bytesRead = file.read(buffer)))&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; for (int i = 0; i &amp;lt; bytesRead; i++)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; checkSum += buffer[i];&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; file.close();&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; return checkSum;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; },&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; new PerfTestCase("BufferedStreamFile")&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; public int testWrite(final String fileName) throws Exception&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; int checkSum = 0;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; OutputStream out = &lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; new BufferedOutputStream(new FileOutputStream(fileName));&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; for (long i = 0; i &amp;lt; FILE_SIZE; i++)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; byte b = (byte)i;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; checkSum += b;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; out.write(b);&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; out.close();&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; return checkSum;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; public int testRead(final String fileName) throws Exception&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; int checkSum = 0;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; InputStream in = &lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; new BufferedInputStream(new FileInputStream(fileName));&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; int b;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; while (-1 != (b = in.read()))&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; checkSum += (byte)b;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; in.close();&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; return checkSum;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; },&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; new PerfTestCase("BufferedChannelFile")&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; public int testWrite(final String fileName) throws Exception&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; FileChannel channel = &lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; new RandomAccessFile(fileName, "rw").getChannel();&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ByteBuffer buffer = ByteBuffer.allocate(PAGE_SIZE);&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; int checkSum = 0;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; for (long i = 0; i &amp;lt; FILE_SIZE; i++)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; byte b = (byte)i;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; checkSum += b;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; buffer.put(b);&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; if (!buffer.hasRemaining())&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; buffer.flip();&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; channel.write(buffer);&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; buffer.clear();&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; channel.close();&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; return checkSum;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; public int testRead(final String fileName) throws Exception&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; FileChannel channel = &lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; new RandomAccessFile(fileName, "rw").getChannel();&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ByteBuffer buffer = ByteBuffer.allocate(PAGE_SIZE);&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; int checkSum = 0;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; while (-1 != (channel.read(buffer)))&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; buffer.flip();&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; while (buffer.hasRemaining())&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; checkSum += buffer.get();&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; buffer.clear();&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; return checkSum;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; },&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; new PerfTestCase("MemoryMappedFile")&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; public int testWrite(final String fileName) throws Exception&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; FileChannel channel = &lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; new RandomAccessFile(fileName, "rw").getChannel();&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; MappedByteBuffer buffer = &lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; channel.map(READ_WRITE, 0,&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Math.min(channel.size(), MAX_VALUE));&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; int checkSum = 0;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; for (long i = 0; i &amp;lt; FILE_SIZE; i++)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; if (!buffer.hasRemaining())&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; buffer = &lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; channel.map(READ_WRITE, i,&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Math.min(channel.size() - i , MAX_VALUE));&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; byte b = (byte)i;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; checkSum += b;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; buffer.put(b);&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; channel.close();&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; return checkSum;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; public int testRead(final String fileName) throws Exception&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; FileChannel channel = &lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; new RandomAccessFile(fileName, "rw").getChannel();&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; MappedByteBuffer buffer = &lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; channel.map(READ_ONLY, 0,&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Math.min(channel.size(), MAX_VALUE));&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; int checkSum = 0;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; for (long i = 0; i &amp;lt; FILE_SIZE; i++)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; if (!buffer.hasRemaining())&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; buffer = &lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; channel.map(READ_WRITE, i,&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Math.min(channel.size() - i , MAX_VALUE));&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; checkSum += buffer.get();&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; channel.close();&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; return checkSum;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; },&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; };&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size: large;"&gt;&lt;b&gt;Results&lt;/b&gt;&lt;/span&gt;&lt;br /&gt;&lt;div style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;400MB file&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;===========&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;RandomAccessFile&amp;nbsp;&amp;nbsp;&amp;nbsp; write=379,610,750&amp;nbsp;&amp;nbsp; read=1,452,482,269 bytes/sec&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;RandomAccessFile&amp;nbsp;&amp;nbsp;&amp;nbsp; write=294,041,636&amp;nbsp;&amp;nbsp; read=1,494,890,510 bytes/sec&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;RandomAccessFile&amp;nbsp;&amp;nbsp;&amp;nbsp; write=250,980,392&amp;nbsp;&amp;nbsp; read=1,422,222,222 bytes/sec&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;RandomAccessFile&amp;nbsp;&amp;nbsp;&amp;nbsp; write=250,366,748&amp;nbsp;&amp;nbsp; read=1,388,474,576 bytes/sec&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;RandomAccessFile&amp;nbsp;&amp;nbsp;&amp;nbsp; write=260,394,151&amp;nbsp;&amp;nbsp; read=1,422,222,222 bytes/sec&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;BufferedStreamFile&amp;nbsp; write=98,178,331&amp;nbsp;&amp;nbsp;&amp;nbsp; read=286,433,566 bytes/sec&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;BufferedStreamFile&amp;nbsp; write=100,244,738&amp;nbsp;&amp;nbsp; read=288,857,545 bytes/sec&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;BufferedStreamFile&amp;nbsp; write=82,948,562&amp;nbsp;&amp;nbsp;&amp;nbsp; read=154,100,827 bytes/sec&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;BufferedStreamFile&amp;nbsp; write=108,503,311&amp;nbsp;&amp;nbsp; read=153,869,271 bytes/sec&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;BufferedStreamFile&amp;nbsp; write=113,055,478&amp;nbsp;&amp;nbsp; read=152,608,047 bytes/sec&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace; text-align: -webkit-auto;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family: 'Courier New', Courier, monospace; text-align: -webkit-auto;"&gt;BufferedChannelFile&amp;nbsp;&lt;/span&gt;&lt;span style="font-family: 'Courier New', Courier, monospace; text-align: -webkit-auto;"&gt;write=228,443,948 &amp;nbsp;&amp;nbsp;&lt;/span&gt;&lt;span style="font-family: 'Courier New', Courier, monospace; text-align: -webkit-auto;"&gt;read=356,173,913 bytes/sec&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace; text-align: -webkit-auto;"&gt;BufferedChannelFile&amp;nbsp;write=265,629,053 &amp;nbsp;&amp;nbsp;read=374,063,926 bytes/sec&lt;/span&gt;&lt;br /&gt;&lt;div style="text-align: -webkit-auto;"&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;BufferedChannelFile&amp;nbsp;write=223,825,136 &amp;nbsp;&amp;nbsp;read=1,539,849,624 bytes/sec&lt;/span&gt;&lt;/div&gt;&lt;div style="text-align: -webkit-auto;"&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;BufferedChannelFile&amp;nbsp;write=232,992,036 &amp;nbsp;&amp;nbsp;read=1,539,849,624 bytes/sec&lt;/span&gt;&lt;/div&gt;&lt;div style="text-align: -webkit-auto;"&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;BufferedChannelFile&amp;nbsp;write=212,779,220 &amp;nbsp;&amp;nbsp;read=1,534,082,397 bytes/sec&lt;/span&gt;&lt;/div&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;MemoryMappedFile&amp;nbsp;&amp;nbsp;&amp;nbsp; write=300,955,180&amp;nbsp;&amp;nbsp; read=305,899,925 bytes/sec&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;MemoryMappedFile&amp;nbsp;&amp;nbsp;&amp;nbsp; write=313,149,847&amp;nbsp;&amp;nbsp; read=310,538,286 bytes/sec&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;MemoryMappedFile&amp;nbsp;&amp;nbsp;&amp;nbsp; write=326,374,501&amp;nbsp;&amp;nbsp; read=303,857,566 bytes/sec&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;MemoryMappedFile&amp;nbsp;&amp;nbsp;&amp;nbsp; write=327,680,000&amp;nbsp;&amp;nbsp; read=304,535,315 bytes/sec&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;MemoryMappedFile&amp;nbsp;&amp;nbsp;&amp;nbsp; write=326,895,450&amp;nbsp;&amp;nbsp; read=303,632,320 bytes/sec&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;8GB File&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;============&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;RandomAccessFile&amp;nbsp;&amp;nbsp;&amp;nbsp; write=167,402,321&amp;nbsp;&amp;nbsp; read=251,922,012 bytes/sec&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;RandomAccessFile&amp;nbsp;&amp;nbsp;&amp;nbsp; write=193,934,802&amp;nbsp;&amp;nbsp; read=257,052,307 bytes/sec&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;RandomAccessFile&amp;nbsp;&amp;nbsp;&amp;nbsp; write=192,948,159&amp;nbsp;&amp;nbsp; read=248,460,768 bytes/sec&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;RandomAccessFile&amp;nbsp;&amp;nbsp;&amp;nbsp; write=191,814,180&amp;nbsp;&amp;nbsp; read=245,225,408 bytes/sec&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;RandomAccessFile&amp;nbsp;&amp;nbsp;&amp;nbsp; write=190,635,762&amp;nbsp;&amp;nbsp; read=275,315,073 bytes/sec&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;BufferedStreamFile&amp;nbsp; write=154,823,102&amp;nbsp;&amp;nbsp; read=248,355,313 bytes/sec&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;BufferedStreamFile&amp;nbsp; write=152,083,913&amp;nbsp;&amp;nbsp; read=253,418,301 bytes/sec&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;BufferedStreamFile&amp;nbsp; write=133,099,369&amp;nbsp;&amp;nbsp; read=146,056,197 bytes/sec&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;BufferedStreamFile&amp;nbsp; write=131,065,708&amp;nbsp;&amp;nbsp; read=146,217,827 bytes/sec&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;BufferedStreamFile&amp;nbsp; write=132,694,052&amp;nbsp;&amp;nbsp; read=148,116,004 bytes/sec&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace; text-align: -webkit-auto;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family: 'Courier New', Courier, monospace; text-align: -webkit-auto;"&gt;BufferedChannelFile&amp;nbsp;&lt;/span&gt;&lt;span style="font-family: 'Courier New', Courier, monospace; text-align: -webkit-auto;"&gt;write=186,703,740 &amp;nbsp; &lt;/span&gt;&lt;span style="font-family: 'Courier New', Courier, monospace; text-align: -webkit-auto;"&gt;read=215,075,218 bytes/sec&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;BufferedChannelFile&amp;nbsp;write=190,591,410 &amp;nbsp; read=211,030,680 bytes/sec&lt;/span&gt;&lt;/div&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;BufferedChannelFile&amp;nbsp;write=187,220,038 &amp;nbsp; read=223,087,606 bytes/sec&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;BufferedChannelFile&amp;nbsp;write=191,585,397 &amp;nbsp; read=221,297,747 bytes/sec&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;BufferedChannelFile&amp;nbsp;write=192,653,214 &amp;nbsp; read=211,789,038 bytes/sec&lt;/span&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;MemoryMappedFile&amp;nbsp;&amp;nbsp; &amp;nbsp;write=123,023,322 &amp;nbsp; read=231,530,156 bytes/sec&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;MemoryMappedFile&amp;nbsp;&amp;nbsp; &amp;nbsp;write=121,961,023 &amp;nbsp; read=230,403,600 bytes/sec&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;MemoryMappedFile&amp;nbsp;&amp;nbsp; &amp;nbsp;write=123,317,778 &amp;nbsp; read=229,899,250 bytes/sec&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;MemoryMappedFile&amp;nbsp;&amp;nbsp; &amp;nbsp;write=121,472,738 &amp;nbsp; read=231,739,745 bytes/sec&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;MemoryMappedFile&amp;nbsp;&amp;nbsp; &amp;nbsp;write=120,362,615 &amp;nbsp; read=231,190,382 bytes/sec&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size: large;"&gt;&lt;b&gt;Analysis&lt;/b&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;For years I was a big fan of using &lt;a href="http://docs.oracle.com/javase/6/docs/api/java/io/RandomAccessFile.html"&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;RandomAccessFile&lt;/span&gt;&lt;/a&gt; directly because of the control it gives and the predictable execution.&amp;nbsp; I never found using buffered streams to be useful from a performance perspective and this still seems to be the case.&lt;br /&gt;&lt;br /&gt;In more recent testing I've found that using NIO &lt;a href="http://docs.oracle.com/javase/6/docs/api/java/nio/channels/FileChannel.html"&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;FileChannel&lt;/span&gt;&lt;/a&gt; and &lt;a href="http://docs.oracle.com/javase/6/docs/api/java/nio/ByteBuffer.html"&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;ByteBuffer&lt;/span&gt;&lt;/a&gt; are doing much better. With Java 7 the flexibility of this programming approach has been improved for random access with &lt;a href="http://openjdk.java.net/projects/nio/javadoc/java/nio/channels/SeekableByteChannel.html"&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;SeekableByteChannel&lt;/span&gt;&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;It seems that for reading RandomAccessFile and NIO do very well with Memory Mapped files winning for writes in some cases.&lt;br /&gt;&lt;br /&gt;I've seen these results vary greatly depending on platform.&amp;nbsp; File system, OS, storage devices, and available memory all have a significant impact.&amp;nbsp; In a few cases I've seen memory-mapped files perform significantly better than the others but this needs to be tested on your platform because &lt;i&gt;your mileage may vary...&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;A special note should be made for the use of memory-mapped large files when pushing for maximum throughput.&amp;nbsp; I've often found the OS can become unresponsive due the the pressure put on the virtual memory sub-system.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size: large;"&gt;&lt;b&gt;Conclusion&lt;/b&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;There is a significant difference in performance for the different means of doing sequential file IO from Java.&amp;nbsp; Not all methods are even remotely equal.&amp;nbsp; For most IO I've found the use of ByteBuffers and Channels to be the best optimised parts of the IO libraries.&amp;nbsp; If buffered streams are your IO libraries of choice, then it is worth branching out and and getting familiar with the implementations of &lt;a href="http://docs.oracle.com/javase/6/docs/api/java/nio/channels/Channel.html"&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;Channel&lt;/span&gt;&lt;/a&gt; and &lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;a href="http://docs.oracle.com/javase/6/docs/api/java/nio/Buffer.html"&gt;Buffer&lt;/a&gt;&amp;nbsp;&lt;/span&gt;&lt;span style="font-family: inherit;"&gt;or even falling back and using the good old&amp;nbsp;&lt;/span&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;a href="http://docs.oracle.com/javase/6/docs/api/java/io/RandomAccessFile.html"&gt;RandomAccessFile&lt;/a&gt;&lt;/span&gt;.&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5560209661389175529-4208696484085669390?l=mechanical-sympathy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mechanical-sympathy.blogspot.com/feeds/4208696484085669390/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mechanical-sympathy.blogspot.com/2011/12/java-sequential-io-performance.html#comment-form' title='23 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5560209661389175529/posts/default/4208696484085669390'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5560209661389175529/posts/default/4208696484085669390'/><link rel='alternate' type='text/html' href='http://mechanical-sympathy.blogspot.com/2011/12/java-sequential-io-performance.html' title='Java Sequential IO Performance'/><author><name>Martin Thompson</name><uri>http://www.blogger.com/profile/15893849163924476586</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>23</thr:total><georss:featurename>London, UK</georss:featurename><georss:point>51.508129 -0.12800500000003012</georss:point><georss:box>51.3644275 -0.3778745000000301 51.651830499999996 0.12186449999996987</georss:box></entry><entry><id>tag:blogger.com,1999:blog-5560209661389175529.post-2694323315109547143</id><published>2011-11-22T16:36:00.003Z</published><updated>2012-01-07T13:36:36.383Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='Java'/><category scheme='http://www.blogger.com/atom/ns#' term='Testing'/><category scheme='http://www.blogger.com/atom/ns#' term='Locks'/><category scheme='http://www.blogger.com/atom/ns#' term='Low-Latency'/><category scheme='http://www.blogger.com/atom/ns#' term='Performance'/><title type='text'>Biased Locking, OSR, and Benchmarking Fun</title><content type='html'>&lt;div dir="ltr" style="text-align: left;" trbidi="on"&gt;After my last post on &lt;a href="http://mechanical-sympathy.blogspot.com/2011/11/java-lock-implementations.html"&gt;Java Lock Implementations&lt;/a&gt;, I got a lot of good feedback about my results and micro-benchmark design approach.&amp;nbsp; As a result I now understand JVM warmup, On Stack Replacement (OSR) and Biased Locking somewhat better than before.&amp;nbsp; Special thanks to &lt;a href="http://blogs.oracle.com/dave/"&gt;Dave Dice&lt;/a&gt; from Oracle, and &lt;a href="http://www.azulsystems.com/blog/cliff"&gt;Cliff Click&lt;/a&gt; &amp;amp; Gil Tene from Azul, for their very useful feedback.&lt;br /&gt;&lt;br /&gt;In the last post I concluded, based on my experiments, that biased locking was no longer necessary on modern CPUs.&amp;nbsp; While this conclusion is understandable given the data gathered in the experiment, it was not valid because the experiment did not take account of some JVM warm up behaviour that I was unaware of.&lt;br /&gt;&lt;br /&gt;In this post I will re-run the experiment taking into account the feedback and present some new results.&amp;nbsp; I shall also expand on the changes I've made to the test and why it is important to consider the JVM warm-up behaviour when writing micro-benchmarks, or even very lean Java applications with quick start up time.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;&lt;span style="font-size: large;"&gt;On Stack Replacement (OSR)&lt;/span&gt;&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;Java virtual machines will compile code to achieve greater performance based on runtime profiling.&amp;nbsp; Some VMs run an interpreter for the majority of code and replace hot areas with compiled code following the 80/20 rule.&amp;nbsp; Other VMs compile all code simply at first then replace the simple code with more optimised code based on profiling.&amp;nbsp; Oracle Hotspot and Azul are examples of the first type and Oracle JRockit is an example of the second.&lt;br /&gt;&lt;br /&gt;Oracle Hotspot will count invocations of a method return plus branch backs for loops in that method, and if this exceeds 10K in server mode the method will be compiled.&amp;nbsp; The compiled code on normal JIT'ing can be used when the method is next called.&amp;nbsp; However if a loop is still iterating it may make sense to replace the method before the loop completes, especially if it has many iterations to go.&amp;nbsp; OSR is the means by which a method gets replaced with a compiled version part way through iterating a loop.&lt;br /&gt;&lt;br /&gt;I was under the impression that normal JIT'ing and OSR would result in similar code.&amp;nbsp; Cliff Click pointed out that it is much harder for a runtime to optimise a loop part way through, and especially difficult if nested.&amp;nbsp; For example, bounds checking within the loop may not be possible to eliminate. Cliff will &lt;a href="http://www.azulsystems.com/blog/cliff/2011-11-22-what-the-heck-is-osr-and-why-is-it-bad-or-good"&gt;blog&lt;/a&gt; in more detail on this shortly.&lt;br /&gt;&lt;br /&gt;What this means is that you are likely to get better optimised code by doing a small number of shorter warm ups than a single large one.&amp;nbsp; You can see in the code below how I do 10 shorter runs in a loop before the main large run compared to the last article where I did a single large warm-up run.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size: large;"&gt;&lt;b&gt;Biased Locking&lt;/b&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Dave Dice pointed out that Hotspot does not enable objects for biased locking in the first few seconds (4s at present) of JVM startup. This is because some benchmarks, and NetBeans, have a lot of thread contention on start up and the revocation cost is significant.&amp;nbsp; Other VMs such as Azul use biased locking right from the start which is not an issue because their revocation is cheap.&lt;br /&gt;&lt;br /&gt;All objects by default are created with biased locking enabled in Oracle Hotspot after the first few seconds of start-up delay, and can be configured with &lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;-XX:BiasedLockingStartupDelay=0&lt;/span&gt;.&lt;br /&gt;&lt;br /&gt;This point, combined with knowing more about OSR, is important for micro-benchmarks.&amp;nbsp; It is also important to be aware of these points if you have a lean Java application that starts in a few seconds.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size: large;"&gt;&lt;b&gt;The Code&lt;/b&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;import java.util.concurrent.locks.Lock;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;import java.util.concurrent.locks.ReentrantLock;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;import java.util.concurrent.CyclicBarrier;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;import static java.lang.System.out;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;public final class TestLocks implements Runnable&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;{&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; public enum LockType { JVM, JUC }&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; public static LockType lockType;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; public static final long WARMUP_ITERATIONS = 100L * 1000L;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; public static final long ITERATIONS = 500L * 1000L * 1000L;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; public static long counter = 0L;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; public static final Object jvmLock = new Object();&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; public static final Lock jucLock = new ReentrantLock();&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; private static int numThreads;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; private final long iterationLimit;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; private final CyclicBarrier barrier;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; public TestLocks(final CyclicBarrier barrier, final long iterationLimit)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; this.barrier = barrier;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; this.iterationLimit = iterationLimit;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; public static void main(final String[] args) throws Exception&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; lockType = LockType.valueOf(args[0]);&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; numThreads = Integer.parseInt(args[1]);&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; for (int i = 0; i &amp;lt; 10; i++)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; runTest(numThreads, WARMUP_ITERATIONS);&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; counter = 0L;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; final long start = System.nanoTime();&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; runTest(numThreads, ITERATIONS);&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; final long duration = System.nanoTime() - start;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; out.printf("%d threads, duration %,d (ns)\n", numThreads, duration);&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; out.printf("%,d ns/op\n", duration / ITERATIONS);&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; out.printf("%,d ops/s\n", (ITERATIONS * 1000000000L) / duration);&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; out.println("counter = " + counter);&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; private static void runTest(final int numThreads, final long iterationLimit)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; throws Exception&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; CyclicBarrier barrier = new CyclicBarrier(numThreads);&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Thread[] threads = new Thread[numThreads];&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; for (int i = 0; i &amp;lt; threads.length; i++)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; threads[i] = new Thread(new TestLocks(barrier, iterationLimit));&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; for (Thread t : threads)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; t.start();&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; for (Thread t : threads)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; t.join();&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; public void run()&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; try&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; barrier.await();&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; catch (Exception e)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; // don't care&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; switch (lockType)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; case JVM: jvmLockInc(); break;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; case JUC: jucLockInc(); break;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; private void jvmLockInc()&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; long count = iterationLimit / numThreads;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; while (0 != count--)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; synchronized (jvmLock)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ++counter;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; private void jucLockInc()&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; long count = iterationLimit / numThreads;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; while (0 != count--)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; jucLock.lock();&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; try&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ++counter;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; finally&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; jucLock.unlock();&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Script to run tests:&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;set -x&lt;/span&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;for i in {1..8}&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;do&amp;nbsp;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; java -server -XX:-UseBiasedLocking TestLocks JVM $i&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;done&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;for i in {1..8}&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;do&amp;nbsp;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; java -server -XX:+UseBiasedLocking -XX:BiasedLockingStartupDelay=0 TestLocks JVM $i&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;done&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;for i in {1..8}&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;do&amp;nbsp;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; java -server TestLocks JUC $i&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;done &lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size: large;"&gt;&lt;b&gt;Results&lt;/b&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The tests are carried out with 64-bit Linux (Fedora Core 15) and Oracle JDK 1.6.0_29. &lt;br /&gt;&lt;br /&gt;&lt;div align="center"&gt;&lt;table border="1" cellpadding="5"&gt;&lt;tbody&gt;&lt;tr style="background-color: cyan;"&gt;&lt;th colspan="4"&gt;Nehalem 2.8GHz - Ops/Sec&lt;/th&gt;&lt;/tr&gt;&lt;tr&gt;&lt;th style="background-color: #cfe2f3;"&gt;Threads&lt;/th&gt;&lt;th style="background-color: #cfe2f3;"&gt;-UseBiasedLocking&lt;/th&gt;&lt;th style="background-color: #cfe2f3;"&gt;+UseBiasedLocking&lt;/th&gt;&lt;th style="background-color: #cfe2f3;"&gt;ReentrantLock&lt;/th&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="right"&gt;1&lt;/td&gt;&lt;td align="right"&gt;53,283,461&lt;/td&gt;&lt;td align="right"&gt;&lt;table border="0" cellpadding="0" cellspacing="0" style="width: 126px;"&gt;&lt;colgroup&gt;&lt;col width="126"&gt;&lt;/col&gt;&lt;/colgroup&gt;&lt;tbody style="color: purple;"&gt;&lt;tr height="20"&gt;   &lt;td align="right" class="xl63" height="20" style="height: 15.0pt; width: 95pt;" width="126"&gt;&lt;b&gt;450,950,969&lt;/b&gt;&lt;/td&gt; &lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/td&gt;&lt;td align="right"&gt;&lt;table border="0" cellpadding="0" cellspacing="0" style="width: 97px;"&gt;&lt;colgroup&gt;&lt;col width="97"&gt;&lt;/col&gt;&lt;/colgroup&gt;&lt;tbody&gt;&lt;tr height="20"&gt;   &lt;td align="right" class="xl65" height="20" style="height: 15.0pt; width: 73pt;" width="97"&gt;62,876,566&lt;/td&gt; &lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="right"&gt;2&lt;/td&gt;&lt;td align="right"&gt;18,519,295&lt;/td&gt;&lt;td align="right"&gt;&lt;table border="0" cellpadding="0" cellspacing="0" style="width: 126px;"&gt;&lt;colgroup&gt;&lt;col width="126"&gt;&lt;/col&gt;&lt;/colgroup&gt;&lt;tbody&gt;&lt;tr height="20"&gt;   &lt;td align="right" class="xl65" height="20" style="height: 15.0pt; width: 95pt;" width="126"&gt;18,108,615&lt;/td&gt; &lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/td&gt;&lt;td align="right"&gt;&lt;table border="0" cellpadding="0" cellspacing="0" style="width: 97px;"&gt;&lt;colgroup&gt;&lt;col width="97"&gt;&lt;/col&gt;&lt;/colgroup&gt;&lt;tbody&gt;&lt;tr height="20"&gt;   &lt;td align="right" class="xl65" height="20" style="height: 15.0pt; width: 73pt;" width="97"&gt;10,217,186&lt;/td&gt; &lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="right"&gt;3&lt;/td&gt;&lt;td align="right"&gt;13,349,605&lt;/td&gt;&lt;td align="right"&gt;&lt;table border="0" cellpadding="0" cellspacing="0" style="width: 126px;"&gt;&lt;colgroup&gt;&lt;col width="126"&gt;&lt;/col&gt;&lt;/colgroup&gt;&lt;tbody&gt;&lt;tr height="20"&gt;   &lt;td align="right" class="xl65" height="20" style="height: 15.0pt; width: 95pt;" width="126"&gt;13,416,198&lt;/td&gt; &lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/td&gt;&lt;td align="right"&gt;&lt;table border="0" cellpadding="0" cellspacing="0" style="width: 97px;"&gt;&lt;colgroup&gt;&lt;col width="97"&gt;&lt;/col&gt;&lt;/colgroup&gt;&lt;tbody&gt;&lt;tr height="20"&gt;   &lt;td align="right" class="xl65" height="20" style="height: 15.0pt; width: 73pt;" width="97"&gt;14,108,622&lt;/td&gt; &lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="right"&gt;4&lt;/td&gt;&lt;td align="right"&gt;8,120,172&lt;/td&gt;&lt;td align="right"&gt;&lt;table border="0" cellpadding="0" cellspacing="0" style="width: 126px;"&gt;&lt;colgroup&gt;&lt;col width="126"&gt;&lt;/col&gt;&lt;/colgroup&gt;&lt;tbody&gt;&lt;tr height="20"&gt;   &lt;td align="right" class="xl65" height="20" style="height: 15.0pt; width: 95pt;" width="126"&gt;8,040,773&lt;/td&gt; &lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/td&gt;&lt;td align="right"&gt;&lt;table border="0" cellpadding="0" cellspacing="0" style="width: 97px;"&gt;&lt;colgroup&gt;&lt;col width="97"&gt;&lt;/col&gt;&lt;/colgroup&gt;&lt;tbody&gt;&lt;tr height="20"&gt;   &lt;td align="right" class="xl65" height="20" style="height: 15.0pt; width: 73pt;" width="97"&gt;14,207,310&lt;/td&gt; &lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="right"&gt;5&lt;/td&gt;&lt;td align="right"&gt;4,725,114&lt;/td&gt;&lt;td align="right"&gt;&lt;table border="0" cellpadding="0" cellspacing="0" style="width: 126px;"&gt;&lt;colgroup&gt;&lt;col width="126"&gt;&lt;/col&gt;&lt;/colgroup&gt;&lt;tbody&gt;&lt;tr height="20"&gt;   &lt;td align="right" class="xl65" height="20" style="height: 15.0pt; width: 95pt;" width="126"&gt;4,551,766&lt;/td&gt; &lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/td&gt;&lt;td align="right"&gt;&lt;table border="0" cellpadding="0" cellspacing="0" style="width: 97px;"&gt;&lt;colgroup&gt;&lt;col width="97"&gt;&lt;/col&gt;&lt;/colgroup&gt;&lt;tbody&gt;&lt;tr height="20"&gt;   &lt;td align="right" class="xl65" height="20" style="height: 15.0pt; width: 73pt;" width="97"&gt;14,302,683&lt;/td&gt; &lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="right"&gt;6&lt;/td&gt;&lt;td align="right"&gt;5,133,706&lt;/td&gt;&lt;td align="right"&gt;&lt;table border="0" cellpadding="0" cellspacing="0" style="width: 126px;"&gt;&lt;colgroup&gt;&lt;col width="126"&gt;&lt;/col&gt;&lt;/colgroup&gt;&lt;tbody&gt;&lt;tr height="20"&gt;   &lt;td align="right" class="xl65" height="20" style="height: 15.0pt; width: 95pt;" width="126"&gt;5,246,548&lt;/td&gt; &lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/td&gt;&lt;td align="right"&gt;&lt;table border="0" cellpadding="0" cellspacing="0" style="width: 97px;"&gt;&lt;colgroup&gt;&lt;col width="97"&gt;&lt;/col&gt;&lt;/colgroup&gt;&lt;tbody&gt;&lt;tr height="20"&gt;   &lt;td align="right" class="xl65" height="20" style="height: 15.0pt; width: 73pt;" width="97"&gt;14,676,616&lt;/td&gt; &lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="right"&gt;7&lt;/td&gt;&lt;td align="right"&gt;5,473,652&lt;/td&gt;&lt;td align="right"&gt;&lt;table border="0" cellpadding="0" cellspacing="0" style="width: 126px;"&gt;&lt;colgroup&gt;&lt;col width="126"&gt;&lt;/col&gt;&lt;/colgroup&gt;&lt;tbody&gt;&lt;tr height="20"&gt;   &lt;td align="right" class="xl65" height="20" style="height: 15.0pt; width: 95pt;" width="126"&gt;5,585,666&lt;/td&gt; &lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/td&gt;&lt;td align="right"&gt;&lt;table border="0" cellpadding="0" cellspacing="0" style="width: 97px;"&gt;&lt;colgroup&gt;&lt;col width="97"&gt;&lt;/col&gt;&lt;/colgroup&gt;&lt;tbody&gt;&lt;tr height="20"&gt;   &lt;td align="right" class="xl65" height="20" style="height: 15.0pt; width: 73pt;" width="97"&gt;18,145,525&lt;/td&gt; &lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="right"&gt;8&lt;/td&gt;&lt;td align="right"&gt;5,514,056&lt;/td&gt;&lt;td align="right"&gt;&lt;table border="0" cellpadding="0" cellspacing="0" style="width: 126px;"&gt;&lt;colgroup&gt;&lt;col width="126"&gt;&lt;/col&gt;&lt;/colgroup&gt;&lt;tbody&gt;&lt;tr height="20"&gt;   &lt;td align="right" class="xl65" height="20" style="height: 15.0pt; width: 95pt;" width="126"&gt;5,414,171&lt;/td&gt; &lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/td&gt;&lt;td align="right"&gt;&lt;table border="0" cellpadding="0" cellspacing="0" style="width: 97px;"&gt;&lt;colgroup&gt;&lt;col width="97"&gt;&lt;/col&gt;&lt;/colgroup&gt;&lt;tbody&gt;&lt;tr height="20"&gt;   &lt;td align="right" class="xl65" height="20" style="height: 15.0pt; width: 73pt;" width="97"&gt;19,010,725&lt;/td&gt; &lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;&lt;div align="center"&gt;&lt;table border="1" cellpadding="5"&gt;&lt;tbody&gt;&lt;tr style="background-color: cyan;"&gt;&lt;th colspan="4"&gt;Sandybridge 2.0GHz - Ops/Sec&lt;/th&gt;&lt;/tr&gt;&lt;tr&gt;&lt;th style="background-color: #cfe2f3;"&gt;Threads&lt;/th&gt;&lt;th style="background-color: #cfe2f3;"&gt;-UseBiasedLocking&lt;/th&gt;&lt;th style="background-color: #cfe2f3;"&gt;+UseBiasedLocking&lt;/th&gt;&lt;th style="background-color: #cfe2f3;"&gt;ReentrantLock&lt;/th&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="right"&gt;1&lt;/td&gt;&lt;td align="right"&gt;&lt;table border="0" cellpadding="0" cellspacing="0" style="width: 124px;"&gt;&lt;colgroup&gt;&lt;col width="124"&gt;&lt;/col&gt;&lt;/colgroup&gt;&lt;tbody&gt;&lt;tr height="20"&gt;   &lt;td align="right" class="xl65" height="20" style="height: 15.0pt; width: 93pt;" width="124"&gt;34,500,407&lt;/td&gt; &lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/td&gt;&lt;td align="right"&gt;&lt;table border="0" cellpadding="0" cellspacing="0" style="width: 126px;"&gt;&lt;colgroup&gt;&lt;col width="126"&gt;&lt;/col&gt;&lt;/colgroup&gt;&lt;tbody style="color: purple;"&gt;&lt;tr height="20"&gt;   &lt;td align="right" class="xl63" height="20" style="height: 15.0pt; width: 95pt;" width="126"&gt;&lt;table border="0" cellpadding="0" cellspacing="0" style="width: 126px;"&gt;&lt;colgroup&gt;&lt;col width="126"&gt;&lt;/col&gt;&lt;/colgroup&gt;&lt;tbody style="color: purple;"&gt;&lt;tr height="20"&gt;   &lt;td align="right" class="xl65" height="20" style="height: 15.0pt; width: 95pt;" width="126"&gt;&lt;b&gt;396,511,324&lt;/b&gt;&lt;/td&gt; &lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/td&gt; &lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/td&gt;&lt;td align="right"&gt;&lt;table border="0" cellpadding="0" cellspacing="0" style="width: 97px;"&gt;&lt;colgroup&gt;&lt;col width="97"&gt;&lt;/col&gt;&lt;/colgroup&gt;&lt;tbody&gt;&lt;tr height="20"&gt;   &lt;td align="right" class="xl65" height="20" style="height: 15.0pt; width: 73pt;" width="97"&gt;&lt;table border="0" cellpadding="0" cellspacing="0" style="width: 97px;"&gt;&lt;colgroup&gt;&lt;col width="97"&gt;&lt;/col&gt;&lt;/colgroup&gt;&lt;tbody&gt;&lt;tr height="20"&gt;   &lt;td align="right" class="xl65" height="20" style="height: 15.0pt; width: 73pt;" width="97"&gt;43,148,808&lt;/td&gt; &lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/td&gt; &lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="right"&gt;2&lt;/td&gt;&lt;td align="right"&gt;&lt;table border="0" cellpadding="0" cellspacing="0" style="width: 124px;"&gt;&lt;colgroup&gt;&lt;col width="124"&gt;&lt;/col&gt;&lt;/colgroup&gt;&lt;tbody&gt;&lt;tr height="20"&gt;   &lt;td align="right" class="xl65" height="20" style="height: 15.0pt; width: 93pt;" width="124"&gt;20,899,076&lt;/td&gt; &lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/td&gt;&lt;td align="right"&gt;&lt;table border="0" cellpadding="0" cellspacing="0" style="width: 126px;"&gt;&lt;colgroup&gt;&lt;col width="126"&gt;&lt;/col&gt;&lt;/colgroup&gt;&lt;tbody&gt;&lt;tr height="20"&gt;   &lt;td align="right" class="xl65" height="20" style="height: 15.0pt; width: 95pt;" width="126"&gt;&lt;table border="0" cellpadding="0" cellspacing="0" style="width: 126px;"&gt;&lt;colgroup&gt;&lt;col width="126"&gt;&lt;/col&gt;&lt;/colgroup&gt;&lt;tbody&gt;&lt;tr height="20"&gt;   &lt;td align="right" class="xl65" height="20" style="height: 15.0pt; width: 95pt;" width="126"&gt;19,742,639&lt;/td&gt; &lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/td&gt; &lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/td&gt;&lt;td align="right"&gt;&lt;table border="0" cellpadding="0" cellspacing="0" style="width: 97px;"&gt;&lt;colgroup&gt;&lt;col width="97"&gt;&lt;/col&gt;&lt;/colgroup&gt;&lt;tbody&gt;&lt;tr height="20"&gt;   &lt;td align="right" class="xl65" height="20" style="height: 15.0pt; width: 73pt;" width="97"&gt;&lt;table border="0" cellpadding="0" cellspacing="0" style="width: 97px;"&gt;&lt;colgroup&gt;&lt;col width="97"&gt;&lt;/col&gt;&lt;/colgroup&gt;&lt;tbody&gt;&lt;tr height="20"&gt;   &lt;td align="right" class="xl65" height="20" style="height: 15.0pt; width: 73pt;" width="97"&gt;6,038,923&lt;/td&gt; &lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/td&gt; &lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="right"&gt;3&lt;/td&gt;&lt;td align="right"&gt;&lt;table border="0" cellpadding="0" cellspacing="0" style="width: 124px;"&gt;&lt;colgroup&gt;&lt;col width="124"&gt;&lt;/col&gt;&lt;/colgroup&gt;&lt;tbody&gt;&lt;tr height="20"&gt;   &lt;td align="right" class="xl65" height="20" style="height: 15.0pt; width: 93pt;" width="124"&gt;9,288,039&lt;/td&gt; &lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/td&gt;&lt;td align="right"&gt;&lt;table border="0" cellpadding="0" cellspacing="0" style="width: 126px;"&gt;&lt;colgroup&gt;&lt;col width="126"&gt;&lt;/col&gt;&lt;/colgroup&gt;&lt;tbody&gt;&lt;tr height="20"&gt;   &lt;td align="right" class="xl65" height="20" style="height: 15.0pt; width: 95pt;" width="126"&gt;&lt;table border="0" cellpadding="0" cellspacing="0" style="width: 126px;"&gt;&lt;colgroup&gt;&lt;col width="126"&gt;&lt;/col&gt;&lt;/colgroup&gt;&lt;tbody&gt;&lt;tr height="20"&gt;   &lt;td align="right" class="xl65" height="20" style="height: 15.0pt; width: 95pt;" width="126"&gt;11,957,032&lt;/td&gt; &lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/td&gt; &lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/td&gt;&lt;td align="right"&gt;&lt;table border="0" cellpadding="0" cellspacing="0" style="width: 97px;"&gt;&lt;colgroup&gt;&lt;col width="97"&gt;&lt;/col&gt;&lt;/colgroup&gt;&lt;tbody&gt;&lt;tr height="20"&gt;   &lt;td align="right" class="xl65" height="20" style="height: 15.0pt; width: 73pt;" width="97"&gt;&lt;table border="0" cellpadding="0" cellspacing="0" style="width: 97px;"&gt;&lt;colgroup&gt;&lt;col width="97"&gt;&lt;/col&gt;&lt;/colgroup&gt;&lt;tbody&gt;&lt;tr height="20"&gt;   &lt;td align="right" class="xl65" height="20" style="height: 15.0pt; width: 73pt;" width="97"&gt;24,147,807&lt;/td&gt; &lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/td&gt; &lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="right"&gt;4&lt;/td&gt;&lt;td align="right"&gt;&lt;table border="0" cellpadding="0" cellspacing="0" style="width: 124px;"&gt;&lt;colgroup&gt;&lt;col width="124"&gt;&lt;/col&gt;&lt;/colgroup&gt;&lt;tbody&gt;&lt;tr height="20"&gt;   &lt;td align="right" class="xl65" height="20" style="height: 15.0pt; width: 93pt;" width="124"&gt;5,618,862&lt;/td&gt; &lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/td&gt;&lt;td align="right"&gt;&lt;table border="0" cellpadding="0" cellspacing="0" style="width: 126px;"&gt;&lt;colgroup&gt;&lt;col width="126"&gt;&lt;/col&gt;&lt;/colgroup&gt;&lt;tbody&gt;&lt;tr height="20"&gt;   &lt;td align="right" class="xl65" height="20" style="height: 15.0pt; width: 95pt;" width="126"&gt;&lt;table border="0" cellpadding="0" cellspacing="0" style="width: 126px;"&gt;&lt;colgroup&gt;&lt;col width="126"&gt;&lt;/col&gt;&lt;/colgroup&gt;&lt;tbody&gt;&lt;tr height="20"&gt;   &lt;td align="right" class="xl65" height="20" style="height: 15.0pt; width: 95pt;" width="126"&gt;5,589,289&lt;/td&gt; &lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/td&gt; &lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/td&gt;&lt;td align="right"&gt;&lt;table border="0" cellpadding="0" cellspacing="0" style="width: 97px;"&gt;&lt;colgroup&gt;&lt;col width="97"&gt;&lt;/col&gt;&lt;/colgroup&gt;&lt;tbody&gt;&lt;tr height="20"&gt;   &lt;td align="right" class="xl65" height="20" style="height: 15.0pt; width: 73pt;" width="97"&gt;&lt;table border="0" cellpadding="0" cellspacing="0" style="width: 97px;"&gt;&lt;colgroup&gt;&lt;col width="97"&gt;&lt;/col&gt;&lt;/colgroup&gt;&lt;tbody&gt;&lt;tr height="20"&gt;   &lt;td align="right" class="xl65" height="20" style="height: 15.0pt; width: 73pt;" width="97"&gt;9,082,961&lt;/td&gt; &lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/td&gt; &lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="right"&gt;5&lt;/td&gt;&lt;td align="right"&gt;&lt;table border="0" cellpadding="0" cellspacing="0" style="width: 124px;"&gt;&lt;colgroup&gt;&lt;col width="124"&gt;&lt;/col&gt;&lt;/colgroup&gt;&lt;tbody&gt;&lt;tr height="20"&gt;   &lt;td align="right" class="xl65" height="20" style="height: 15.0pt; width: 93pt;" width="124"&gt;5,609,932&lt;/td&gt; &lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/td&gt;&lt;td align="right"&gt;&lt;table border="0" cellpadding="0" cellspacing="0" style="width: 126px;"&gt;&lt;colgroup&gt;&lt;col width="126"&gt;&lt;/col&gt;&lt;/colgroup&gt;&lt;tbody&gt;&lt;tr height="20"&gt;   &lt;td align="right" class="xl65" height="20" style="height: 15.0pt; width: 95pt;" width="126"&gt;&lt;table border="0" cellpadding="0" cellspacing="0" style="width: 126px;"&gt;&lt;colgroup&gt;&lt;col width="126"&gt;&lt;/col&gt;&lt;/colgroup&gt;&lt;tbody&gt;&lt;tr height="20"&gt;   &lt;td align="right" class="xl65" height="20" style="height: 15.0pt; width: 95pt;" width="126"&gt;5,592,574&lt;/td&gt; &lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/td&gt; &lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/td&gt;&lt;td align="right"&gt;&lt;table border="0" cellpadding="0" cellspacing="0" style="width: 97px;"&gt;&lt;colgroup&gt;&lt;col width="97"&gt;&lt;/col&gt;&lt;/colgroup&gt;&lt;tbody&gt;&lt;tr height="20"&gt;   &lt;td align="right" class="xl65" height="20" style="height: 15.0pt; width: 73pt;" width="97"&gt;&lt;table border="0" cellpadding="0" cellspacing="0" style="width: 97px;"&gt;&lt;colgroup&gt;&lt;col width="97"&gt;&lt;/col&gt;&lt;/colgroup&gt;&lt;tbody&gt;&lt;tr height="20"&gt;   &lt;td align="right" class="xl65" height="20" style="height: 15.0pt; width: 73pt;" width="97"&gt;9,389,243&lt;/td&gt; &lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/td&gt; &lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="right"&gt;6&lt;/td&gt;&lt;td align="right"&gt;&lt;table border="0" cellpadding="0" cellspacing="0" style="width: 124px;"&gt;&lt;colgroup&gt;&lt;col width="124"&gt;&lt;/col&gt;&lt;/colgroup&gt;&lt;tbody&gt;&lt;tr height="20"&gt;   &lt;td align="right" class="xl65" height="20" style="height: 15.0pt; width: 93pt;" width="124"&gt;5,742,907&lt;/td&gt; &lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/td&gt;&lt;td align="right"&gt;&lt;table border="0" cellpadding="0" cellspacing="0" style="width: 126px;"&gt;&lt;colgroup&gt;&lt;col width="126"&gt;&lt;/col&gt;&lt;/colgroup&gt;&lt;tbody&gt;&lt;tr height="20"&gt;   &lt;td align="right" class="xl65" height="20" style="height: 15.0pt; width: 95pt;" width="126"&gt;&lt;table border="0" cellpadding="0" cellspacing="0" style="width: 126px;"&gt;&lt;colgroup&gt;&lt;col width="126"&gt;&lt;/col&gt;&lt;/colgroup&gt;&lt;tbody&gt;&lt;tr height="20"&gt;   &lt;td align="right" class="xl65" height="20" style="height: 15.0pt; width: 95pt;" width="126"&gt;5,760,558&lt;/td&gt; &lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/td&gt; &lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/td&gt;&lt;td align="right"&gt;&lt;table border="0" cellpadding="0" cellspacing="0" style="width: 97px;"&gt;&lt;colgroup&gt;&lt;col width="97"&gt;&lt;/col&gt;&lt;/colgroup&gt;&lt;tbody&gt;&lt;tr height="20"&gt;   &lt;td align="right" class="xl65" height="20" style="height: 15.0pt; width: 73pt;" width="97"&gt;&lt;table border="0" cellpadding="0" cellspacing="0" style="width: 97px;"&gt;&lt;colgroup&gt;&lt;col width="97"&gt;&lt;/col&gt;&lt;/colgroup&gt;&lt;tbody&gt;&lt;tr height="20"&gt;   &lt;td align="right" class="xl65" height="20" style="height: 15.0pt; width: 73pt;" width="97"&gt;12,518,728&lt;/td&gt; &lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/td&gt; &lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="right"&gt;7&lt;/td&gt;&lt;td align="right"&gt;&lt;table border="0" cellpadding="0" cellspacing="0" style="width: 124px;"&gt;&lt;colgroup&gt;&lt;col width="124"&gt;&lt;/col&gt;&lt;/colgroup&gt;&lt;tbody&gt;&lt;tr height="20"&gt;   &lt;td align="right" class="xl65" height="20" style="height: 15.0pt; width: 93pt;" width="124"&gt;6,699,201&lt;/td&gt; &lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/td&gt;&lt;td align="right"&gt;&lt;table border="0" cellpadding="0" cellspacing="0" style="width: 126px;"&gt;&lt;colgroup&gt;&lt;col width="126"&gt;&lt;/col&gt;&lt;/colgroup&gt;&lt;tbody&gt;&lt;tr height="20"&gt;   &lt;td align="right" class="xl65" height="20" style="height: 15.0pt; width: 95pt;" width="126"&gt;&lt;table border="0" cellpadding="0" cellspacing="0" style="width: 126px;"&gt;&lt;colgroup&gt;&lt;col width="126"&gt;&lt;/col&gt;&lt;/colgroup&gt;&lt;tbody&gt;&lt;tr height="20"&gt;   &lt;td align="right" class="xl65" height="20" style="height: 15.0pt; width: 95pt;" width="126"&gt;6,641,886&lt;/td&gt; &lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/td&gt; &lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/td&gt;&lt;td align="right"&gt;&lt;table border="0" cellpadding="0" cellspacing="0" style="width: 97px;"&gt;&lt;colgroup&gt;&lt;col width="97"&gt;&lt;/col&gt;&lt;/colgroup&gt;&lt;tbody&gt;&lt;tr height="20"&gt;   &lt;td align="right" class="xl65" height="20" style="height: 15.0pt; width: 73pt;" width="97"&gt;&lt;table border="0" cellpadding="0" cellspacing="0" style="width: 97px;"&gt;&lt;colgroup&gt;&lt;col width="97"&gt;&lt;/col&gt;&lt;/colgroup&gt;&lt;tbody&gt;&lt;tr height="20"&gt;   &lt;td align="right" class="xl65" height="20" style="height: 15.0pt; width: 73pt;" width="97"&gt;13,684,475&lt;/td&gt; &lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/td&gt; &lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="right"&gt;8&lt;/td&gt;&lt;td align="right"&gt;&lt;table border="0" cellpadding="0" cellspacing="0" style="width: 124px;"&gt;&lt;colgroup&gt;&lt;col width="124"&gt;&lt;/col&gt;&lt;/colgroup&gt;&lt;tbody&gt;&lt;tr height="20"&gt;   &lt;td align="right" class="xl65" height="20" style="height: 15.0pt; width: 93pt;" width="124"&gt;6,957,824&lt;/td&gt; &lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/td&gt;&lt;td align="right"&gt;&lt;table border="0" cellpadding="0" cellspacing="0" style="width: 126px;"&gt;&lt;colgroup&gt;&lt;col width="126"&gt;&lt;/col&gt;&lt;/colgroup&gt;&lt;tbody&gt;&lt;tr height="20"&gt;   &lt;td align="right" class="xl65" height="20" style="height: 15.0pt; width: 95pt;" width="126"&gt;&lt;table border="0" cellpadding="0" cellspacing="0" style="width: 126px;"&gt;&lt;colgroup&gt;&lt;col width="126"&gt;&lt;/col&gt;&lt;/colgroup&gt;&lt;tbody&gt;&lt;tr height="20"&gt;   &lt;td align="right" class="xl65" height="20" style="height: 15.0pt; width: 95pt;" width="126"&gt;6,925,410&lt;/td&gt; &lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/td&gt; &lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/td&gt;&lt;td align="right"&gt;&lt;table border="0" cellpadding="0" cellspacing="0" style="width: 97px;"&gt;&lt;colgroup&gt;&lt;col width="97"&gt;&lt;/col&gt;&lt;/colgroup&gt;&lt;tbody&gt;&lt;tr height="20"&gt;   &lt;td align="right" class="xl65" height="20" style="height: 15.0pt; width: 73pt;" width="97"&gt;&lt;table border="0" cellpadding="0" cellspacing="0" style="width: 97px;"&gt;&lt;colgroup&gt;&lt;col width="97"&gt;&lt;/col&gt;&lt;/colgroup&gt;&lt;tbody&gt;&lt;tr height="20"&gt;   &lt;td align="right" class="xl65" height="20" style="height: 15.0pt; width: 73pt;" width="97"&gt;14,819,005&lt;/td&gt; &lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/td&gt; &lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;br /&gt;&lt;span style="font-size: large;"&gt;&lt;b&gt;Observations&lt;/b&gt;&lt;/span&gt;&lt;br /&gt;&lt;ol&gt;&lt;li&gt;Biased locking has a huge benefit in the un-contended single threaded case.&lt;/li&gt;&lt;li&gt;Biased locking when un-contended, and not revoked, only adds 4-5 cycles of cost.&amp;nbsp; This is the cost when having a cache hit for the lock structures, on top of the code protected in the critical section.&lt;/li&gt;&lt;li&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;-XX:BiasedLockingStartupDelay=0&lt;/span&gt; needs to be set for lean applications and micro-benchmarks.&lt;/li&gt;&lt;li&gt;Avoiding OSR does not make a material difference to this set of test results.&amp;nbsp; This is likely to be because the loop is so simple or other costs are dominating.&lt;/li&gt;&lt;li&gt;For the current implementations, ReentrantLocks scale better than synchronised locks under contention, except in the case of 2 contending threads.&lt;/li&gt;&lt;/ol&gt;&lt;span style="font-size: large;"&gt;&lt;b&gt;Conclusion&lt;/b&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;My tests in the last &lt;a href="http://mechanical-sympathy.blogspot.com/2011/11/java-lock-implementations.html"&gt;post&lt;/a&gt; are invalid for the testing of an un-contended biased lock, because the lock was not actually biased.&amp;nbsp; If you are designing code following the &lt;a href="http://mechanical-sympathy.blogspot.com/2011/09/single-writer-principle.html"&gt;single writer principle&lt;/a&gt;, and therefore having un-contended locks when using 3rd party libraries, then having biased locking enabled is a significant performance boost.&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5560209661389175529-2694323315109547143?l=mechanical-sympathy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mechanical-sympathy.blogspot.com/feeds/2694323315109547143/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mechanical-sympathy.blogspot.com/2011/11/biased-locking-osr-and-benchmarking-fun.html#comment-form' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5560209661389175529/posts/default/2694323315109547143'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5560209661389175529/posts/default/2694323315109547143'/><link rel='alternate' type='text/html' href='http://mechanical-sympathy.blogspot.com/2011/11/biased-locking-osr-and-benchmarking-fun.html' title='Biased Locking, OSR, and Benchmarking Fun'/><author><name>Martin Thompson</name><uri>http://www.blogger.com/profile/15893849163924476586</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5560209661389175529.post-2309472209846012228</id><published>2011-11-19T02:57:00.008Z</published><updated>2012-01-07T13:35:49.242Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='Java'/><category scheme='http://www.blogger.com/atom/ns#' term='Locks'/><category scheme='http://www.blogger.com/atom/ns#' term='Performance'/><title type='text'>Java Lock Implementations</title><content type='html'>&lt;div dir="ltr" style="text-align: left;" trbidi="on"&gt;We all use 3rd party libraries as a normal part of development.&amp;nbsp; Generally, we have no control over their internals.&amp;nbsp; The libraries provided with the JDK are a typical example. &amp;nbsp; Many of these libraries employ locks to manage contention.&lt;br /&gt;&lt;br /&gt;JDK locks come with two implementations.&amp;nbsp; One uses atomic CAS style instructions to manage the claim process.&amp;nbsp; CAS instructions tend to be the most expensive type of CPU instructions and on x86 have &lt;a href="http://mechanical-sympathy.blogspot.com/2011/07/memory-barriersfences.html"&gt;memory ordering&lt;/a&gt; semantics.&amp;nbsp; Often locks are un-contended which gives rise to a possible optimisation whereby a lock can be &lt;a href="http://home.comcast.net/%7Epjbishop/Dave/QRL-OpLocks-BiasedLocking.pdf"&gt;biased &lt;/a&gt;to the un-contended thread using techniques to avoid the use of atomic instructions.&amp;nbsp; This biasing allows a lock in theory to be quickly reacquired by the same thread.&amp;nbsp; If the lock turns out to be contended by multiple threads the algorithm with revert from being biased and fall back to the standard approach using atomic instructions.&amp;nbsp; Biased locking became the &lt;a href="http://java.sun.com/performance/reference/whitepapers/6_performance.html#2.1.1"&gt;default lock implementation&lt;/a&gt; with Java 6.&lt;br /&gt;&lt;br /&gt;When respecting the &lt;a href="http://mechanical-sympathy.blogspot.com/2011/09/single-writer-principle.html"&gt;single writer principle,&lt;/a&gt; biased locking should be your friend.&amp;nbsp; Lately, when using the sockets API, I decided to measure the lock costs and was surprised by the results.&amp;nbsp; I found that my un-contended thread was incurring a bit more cost than I expected from the lock.&amp;nbsp; I put together the following test to compare the cost of the current lock implementations available in Java 6.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size: large;"&gt;&lt;b&gt;The Test&lt;/b&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;For the test I shall increment a counter within a lock, and increase the number of contending threads on the lock.&amp;nbsp; This test will be repeated for the 3 major lock implementations available to Java:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;Atomic locking on Java language monitors&lt;/li&gt;&lt;li&gt;Biased locking on Java language monitors&lt;/li&gt;&lt;li&gt;&lt;a href="http://download.oracle.com/javase/1,5,0/docs/api/java/util/concurrent/locks/ReentrantLock.html"&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;ReentrantLock&lt;/span&gt;&lt;/a&gt; introduced with the java.util.concurrent package in Java 5.&lt;/li&gt;&lt;/ol&gt;I'll also run the tests on the 3 most recent generations of the Intel CPU.&amp;nbsp; For each CPU I'll execute the tests up to the maximum number of concurrent threads the core count will support.&lt;br /&gt;&lt;br /&gt;The tests are carried out with 64-bit Linux (Fedora Core 15) and Oracle JDK 1.6.0_29. &lt;br /&gt;&lt;br /&gt;&lt;span style="font-size: large;"&gt;&lt;b&gt;The Code &lt;/b&gt;&lt;/span&gt;&lt;br /&gt;&lt;ol&gt;&lt;/ol&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;import java.util.concurrent.BrokenBarrierException;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;import java.util.concurrent.locks.Lock;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;import java.util.concurrent.locks.ReentrantLock;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;import java.util.concurrent.CyclicBarrier;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;import static java.lang.System.out;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;public final class TestLocks implements Runnable&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;{&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; public enum LockType { JVM, JUC }&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; public static LockType lockType;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; public static final long ITERATIONS = 500L * 1000L *1000L;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; public static long counter = 0L;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; public static final Object jvmLock = new Object();&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; public static final Lock jucLock = new ReentrantLock();&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; private static int numThreads;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; private static CyclicBarrier barrier;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; public static void main(final String[] args) throws Exception&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; lockType = LockType.valueOf(args[0]);&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; numThreads = Integer.parseInt(args[1]);&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; runTest(numThreads); // warm up&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; counter = 0L;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; final long start = System.nanoTime();&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; runTest(numThreads);&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; final long duration = System.nanoTime() - start;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; out.printf("%d threads, duration %,d (ns)\n", numThreads, duration);&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; out.printf("%,d ns/op\n", duration / ITERATIONS);&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; out.printf("%,d ops/s\n", (ITERATIONS * 1000000000L) / duration);&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; out.println("counter = " + counter);&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; private static void runTest(final int numThreads) throws Exception&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; barrier = new CyclicBarrier(numThreads);&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Thread[] threads = new Thread[numThreads];&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; for (int i = 0; i &amp;lt; threads.length; i++)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; threads[i] = new Thread(new TestLocks());&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; for (Thread t : threads)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; t.start();&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; for (Thread t : threads)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; t.join();&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; public void run()&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; try&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; barrier.await();&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; catch (Exception e)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; // don't care&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; switch (lockType)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; case JVM: jvmLockInc(); break;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; case JUC: jucLockInc(); break;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; private void jvmLockInc()&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; long count = ITERATIONS / numThreads;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; while (0 != count--)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; synchronized (jvmLock)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ++counter;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; private void jucLockInc()&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; long count = ITERATIONS / numThreads;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; while (0 != count--)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; jucLock.lock();&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; try&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ++counter;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; finally&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; jucLock.unlock();&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Script the tests:&lt;br /&gt;&lt;div style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;set -x&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;for i in {1..8}; do java -XX:-UseBiasedLocking TestLocks JVM $i; done&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;for i in {1..8}; do java -XX:+UseBiasedLocking TestLocks JVM $i; done&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;for i in {1..8}; do java TestLocks JUC $i; done&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size: large;"&gt;&lt;b&gt;The Results&lt;/b&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td style="text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/-SiFwxQfwLZ8/Tsb-T3W12oI/AAAAAAAAABg/Sueahj4UJlE/s1600/nehalem.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"&gt;&lt;img border="0" src="http://3.bp.blogspot.com/-SiFwxQfwLZ8/Tsb-T3W12oI/AAAAAAAAABg/Sueahj4UJlE/s1600/nehalem.png" /&gt;&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="tr-caption" style="text-align: center;"&gt;Figure 1.&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;br /&gt;&lt;table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td style="text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/-rkM8TR8MQrM/Tsb-sv8lHEI/AAAAAAAAABo/aomXaK4Qc4g/s1600/westmere.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"&gt;&lt;img border="0" src="http://4.bp.blogspot.com/-rkM8TR8MQrM/Tsb-sv8lHEI/AAAAAAAAABo/aomXaK4Qc4g/s1600/westmere.png" /&gt;&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="tr-caption" style="text-align: center;"&gt;Figure 2.&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;br /&gt;&lt;table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td style="text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/-UhtDN_zK8gw/Tsb_K4E3QQI/AAAAAAAAABw/geUNmA476Cg/s1600/sandybridge.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"&gt;&lt;img border="0" src="http://4.bp.blogspot.com/-UhtDN_zK8gw/Tsb_K4E3QQI/AAAAAAAAABw/geUNmA476Cg/s1600/sandybridge.png" /&gt;&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="tr-caption" style="text-align: center;"&gt;Figure 3.&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;br /&gt;&lt;span style="font-size: large;"&gt;&lt;b&gt;Observations&lt;/b&gt;&lt;/span&gt;&lt;br /&gt;&lt;ol&gt;&lt;li&gt;Biased locking, in the un-contended case, is ~10% more expensive than the atomic locking.&amp;nbsp; It seems that for recent CPU generations the cost of atomic instructions is less than the necessary housekeeping for biased locks.&amp;nbsp; Previous to Nehalem, lock instructions would assert a lock on the memory bus to perform the these atomic operations, each would cost more than 100 cycles.&amp;nbsp; Since Nehalem, atomic instructions can be handled local to a CPU core, and typically cost only 10-20 cycles if they do not need to wait on the store buffer to empty while enforcing memory ordering semantics.&lt;/li&gt;&lt;li&gt;As contention increases, language monitor locks quickly reach a throughput limit regardless of thread count.&lt;/li&gt;&lt;li&gt;ReentrantLock gives the best un-contended performance and scales significantly better with increasing contention compared to language monitors using synchronized.&lt;/li&gt;&lt;li&gt;ReentrantLock has an odd characteristic of reduced performance when 2 threads are contending.&amp;nbsp; This deserves further investigation.&lt;/li&gt;&lt;li&gt;Sandybridge suffers from the &lt;a href="http://mechanical-sympathy.blogspot.com/2011/09/adventures-with-atomiclong.html"&gt;increased latency&lt;/a&gt; of atomic instructions I detailed in a previous article when contended thread count is low.&amp;nbsp; As contended thread count continues to increase, the cost of the kernel arbitration tends to dominate and Sandybridge shows its strength with increased memory throughput.&lt;/li&gt;&lt;/ol&gt;&lt;span style="font-size: large;"&gt;&lt;b&gt;Conclusion&lt;/b&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Biased locking should no longer be the default lock implementation on modern Intel processors.&amp;nbsp; I recommend you measure your applications and experiement with the &lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;-XX:-UseBiasedLocking&lt;/span&gt; JVM option to determine if you can benefit from using atomic lock based algorithm for the un-contended case. &lt;br /&gt;&lt;br /&gt;When developing your own concurrent libraries I would recommend &lt;a href="http://download.oracle.com/javase/1,5,0/docs/api/java/util/concurrent/locks/ReentrantLock.html"&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;ReentrantLock&lt;/span&gt;&lt;/a&gt; rather than using the synchronized keyword due to the significantly better performance on x86, if a lock-free alternative algorithm is not a viable option.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;&lt;i&gt;Update 20-Nov-2011&lt;/i&gt;&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;&lt;a href="http://blogs.oracle.com/dave/"&gt;Dave Dice&lt;/a&gt; has pointed out that biased locking is not implemented for the locks created in the first few seconds of the JVM startup.&amp;nbsp; I'll re-run my tests this week and post the results.&amp;nbsp; I've had some more quality feedback that suggests my results could be potentially invalid.&amp;nbsp; Micro benchmarks can be tricky but the advice of measuring your own application in the large still stands.&lt;br /&gt;&lt;br /&gt;A re-run of the tests can be seen in &lt;a href="http://mechanical-sympathy.blogspot.com/2011/11/biased-locking-osr-and-benchmarking-fun.html"&gt;this&lt;/a&gt; follow-on blog taking account of Dave's feedback.&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5560209661389175529-2309472209846012228?l=mechanical-sympathy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mechanical-sympathy.blogspot.com/feeds/2309472209846012228/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mechanical-sympathy.blogspot.com/2011/11/java-lock-implementations.html#comment-form' title='5 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5560209661389175529/posts/default/2309472209846012228'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5560209661389175529/posts/default/2309472209846012228'/><link rel='alternate' type='text/html' href='http://mechanical-sympathy.blogspot.com/2011/11/java-lock-implementations.html' title='Java Lock Implementations'/><author><name>Martin Thompson</name><uri>http://www.blogger.com/profile/15893849163924476586</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/-SiFwxQfwLZ8/Tsb-T3W12oI/AAAAAAAAABg/Sueahj4UJlE/s72-c/nehalem.png' height='72' width='72'/><thr:total>5</thr:total><georss:featurename>London, UK</georss:featurename><georss:point>51.5001524 -0.12623619999999391</georss:point><georss:box>51.322796399999994 -0.39052969999999393 51.6775084 0.1380573000000061</georss:box></entry><entry><id>tag:blogger.com,1999:blog-5560209661389175529.post-5997806377376805503</id><published>2011-11-05T13:52:00.006Z</published><updated>2011-11-16T23:36:27.486Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='Java'/><category scheme='http://www.blogger.com/atom/ns#' term='Locks'/><category scheme='http://www.blogger.com/atom/ns#' term='Low-Latency'/><category scheme='http://www.blogger.com/atom/ns#' term='Performance'/><title type='text'>Locks &amp; Condition Variables - Latency Impact</title><content type='html'>In a previous article on &lt;a href="http://mechanical-sympathy.blogspot.com/2011/08/inter-thread-latency.html"&gt;Inter-Thread Latency&lt;/a&gt; I showed how it is possible to signal a state change between 2 threads with less than 50ns of latency.&amp;nbsp; To many developers, writing concurrent code using locks is a scary experience.&amp;nbsp; Writing concurrent code using lock-free algorithms, i.e. algorithms that rely on the use of memory barriers and an intimate understanding of the underlying memory models, can be totally terrifying.&amp;nbsp; To me lock-free / &lt;a href="http://en.wikipedia.org/wiki/Non-blocking_algorithm"&gt;non-blocking&lt;/a&gt; algorithms are like playing with explosives or corrosive chemicals, if you do not understand what you are doing, or show the ultimate respect, then very bad things can, and most likely will, happen!&lt;br /&gt;&lt;br /&gt;In this article, I'd like to illustrate the impact of using locks and the resulting latency they can impose on your designs.&amp;nbsp; I want to use a very similar algorithm to that used in my previous inter-thread latency article to illustrate the ping-pong effect of handing control back and forth between 2 threads.&amp;nbsp; In this case, rather than using a couple of volatile variables, I will employ a pair of condition variables to signal a state change so control can be passed back and forth.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size: large;"&gt;&lt;b&gt;The Code&lt;/b&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;import java.util.concurrent.locks.Condition;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;import java.util.concurrent.locks.Lock;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;import java.util.concurrent.locks.ReentrantLock;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;import static java.lang.System.out;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;public final class LockedSignallingLatency&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;{&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; private static final int ITERATIONS = 10 * 1000 * 1000;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; private static final Lock lock = new ReentrantLock();&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; private static final Condition sendCondition = lock.newCondition();&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; private static final Condition echoCondition = lock.newCondition();&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; private static long sendValue = -1L;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; private static long echoValue = -1L;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; public static void main(final String[] args)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; throws Exception&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; final Thread sendThread = new Thread(new SendRunner());&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; final Thread echoThread = new Thread(new EchoRunner());&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; final long start = System.nanoTime();&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; echoThread.start();&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; sendThread.start();&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; sendThread.join();&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; echoThread.join();&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; final long duration = System.nanoTime() - start;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; out.printf("duration %,d (ns)\n", duration);&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; out.printf("%,d ns/op\n", duration / (ITERATIONS * 2L));&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; out.printf("%,d ops/s\n", (ITERATIONS * 2L * 1000000000L) / duration);&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; public static final class SendRunner implements Runnable&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; public void run()&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; for (long i = 0; i &amp;lt; ITERATIONS; i++)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; lock.lock();&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; try&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; sendValue = i;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; sendCondition.signal();&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; finally&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; lock.unlock();&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; lock.lock();&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; try&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; while (echoValue != i)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; echoCondition.await();&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; catch (final InterruptedException ex)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; break;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; finally&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; lock.unlock();&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; public static final class EchoRunner implements Runnable&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; public void run()&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; for (long i = 0; i &amp;lt; ITERATIONS; i++)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; lock.lock();&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; try&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; while (sendValue != i)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; sendCondition.await();&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; catch (final InterruptedException ex)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; break;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; finally&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; lock.unlock();&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; lock.lock();&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; try&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; echoValue = i;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; echoCondition.signal();&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; finally&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; lock.unlock();&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size: large;"&gt;&lt;b&gt;Test Results&lt;/b&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Windows 7 Professional 64-bit - Oracle JDK 1.6.0 - Nehalem 2.8 GHz&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;$ start /AFFINITY 0x14 /B /WAIT java LockedSignallingLatency&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;duration 41,649,616,343 (ns)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;2,082 ns/op&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;480,196 ops/s&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;$ java LockedSignallingLatency&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;duration 73,789,456,491 (ns)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;3,689 ns/op&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;271,041 ops/s&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Linux Fedora Core 15 64-bit - Oracle JDK 1.6.0 - Nehalem 2.8 GHz&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;&lt;div style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;$ taskset -c 2,4 java LockedSignallingLatency&lt;br /&gt;duration 40,469,689,559 (ns)&lt;br /&gt;2,023 ns/op&lt;br /&gt;494,197 ops/s&lt;/div&gt;&lt;div style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;$ java LockedSignallingLatency&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;duration 169,795,756,230 (ns)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;8,489 ns/op&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;117,788 ops/s&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Linux Fedora Core 15 64-bit - Oracle JDK 1.6.0 - Sandybridge 2.0 GHz&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;$ taskset -c 2,4 java LockedSignallingLatency&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;duration 47,209,549,484 (ns)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;2,360 ns/op&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;423,643 ops/s&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;$ java LockedSignallingLatency&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;duration 336,168,489,093 (ns)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;16,808 ns/op&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;59,493 ops/s&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size: large;"&gt;&lt;b&gt;Observations&lt;/b&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The above is a typical set of results I've seen in the middle of the range from multiple runs.&amp;nbsp; There are a couple of interesting observations I'd like to expand on.&lt;br /&gt;&lt;br /&gt;Firstly, this is 3 orders of magnitude greater latency than what I illustrated in the previous article using just memory barriers to signal between threads.&amp;nbsp; This cost comes about because the kernel needs to get involved to arbitrate between the threads for the lock, and then manage the scheduling for the threads to awaken when the condition is signalled.&amp;nbsp; The one-way latency to signal a change is pretty much the same as what  is considered current state of the art for network hops between nodes  via a switch.&amp;nbsp; It is possible to get ~1&lt;span style="font-family: &amp;quot;Calibri&amp;quot;,&amp;quot;sans-serif&amp;quot;; font-size: 11pt; line-height: 115%;"&gt;µ&lt;/span&gt;s latency with &lt;a href="http://en.wikipedia.org/wiki/InfiniBand"&gt;InfiniBand&lt;/a&gt; and less than 5&lt;span style="font-family: &amp;quot;Calibri&amp;quot;,&amp;quot;sans-serif&amp;quot;; font-size: 11pt; line-height: 115%;"&gt;µ&lt;/span&gt;s with &lt;a href="http://www.solarflare.com/09-14-11-Solarflare-Arista-Complete-Ultra-Low-Latency-Testing"&gt;10GigE and user-space IP stacks&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Secondly, the impact is clear when letting the OS choose what CPUs the threads get scheduled on rather than pinning them manually.&amp;nbsp; I've observed this same issue across many use cases whereby Linux, in default configuration for its scheduler, will greatly impact the performance of a low-latency system by scheduling threads on different cores resulting in cache pollution.&amp;nbsp;&amp;nbsp; Windows by default seems to make a better job of this.&lt;br /&gt;&lt;br /&gt;I recently had an interesting discussion with &lt;a href="http://www.azulsystems.com/blog/cliff"&gt;Cliff Click&lt;/a&gt; about using condition variables and their cost.&amp;nbsp; He pointed out a problem he was seeing.&amp;nbsp; If you look at the case where a sleeping thread gets signalled within the lock, it goes to run and then discovers it cannot get the lock because the signalling thread already has the lock, so it gets put back to sleep until the signalling thread releases the lock, thus causing more work than necessary.&amp;nbsp; Modern schedulers would benefit from being more aware of communication mechanisms between threads to have more efficient location and rescheduling logic.&amp;nbsp; As we go more concurrent and parallel our schedulers need to become more aware of &lt;a href="http://en.wikipedia.org/wiki/Inter-process_communication"&gt;IPC&lt;/a&gt; mechanisms.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size: large;"&gt;&lt;b&gt;Conclusion&lt;/b&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;When designing a low-latency system it is crucial to avoid the use of locks and condition variables for the main transaction flows.&amp;nbsp; Non-blocking or lock-free algorithms are key to achieving ultra-low latency but can be very difficult to prove correct.&amp;nbsp;&amp;nbsp; I would not recommend designing lock-free algorithms for business logic but they can be very effectively employed for low-level infrastructure components.&amp;nbsp; The business logic is best run on single threads following the &lt;a href="http://mechanical-sympathy.blogspot.com/2011/09/single-writer-principle.html"&gt;&lt;i&gt;Single Writer Principle&lt;/i&gt;&lt;/a&gt; from my previous article.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5560209661389175529-5997806377376805503?l=mechanical-sympathy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mechanical-sympathy.blogspot.com/feeds/5997806377376805503/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mechanical-sympathy.blogspot.com/2011/11/locks-condition-variables-latency.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5560209661389175529/posts/default/5997806377376805503'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5560209661389175529/posts/default/5997806377376805503'/><link rel='alternate' type='text/html' href='http://mechanical-sympathy.blogspot.com/2011/11/locks-condition-variables-latency.html' title='Locks &amp; Condition Variables - Latency Impact'/><author><name>Martin Thompson</name><uri>http://www.blogger.com/profile/15893849163924476586</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total><georss:featurename>London, UK</georss:featurename><georss:point>51.5001524 -0.12623619999999391</georss:point><georss:box>51.322796399999994 -0.39052969999999393 51.6775084 0.1380573000000061</georss:box></entry><entry><id>tag:blogger.com,1999:blog-5560209661389175529.post-2632924219660461165</id><published>2011-10-19T17:44:00.003+01:00</published><updated>2011-11-11T09:50:08.929Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='Batching'/><category scheme='http://www.blogger.com/atom/ns#' term='IO'/><category scheme='http://www.blogger.com/atom/ns#' term='Networking'/><category scheme='http://www.blogger.com/atom/ns#' term='Performance'/><title type='text'>Smart Batching</title><content type='html'>How often have we all heard that “batching” will increase latency?&amp;nbsp; As someone with a passion for low-latency systems this surprises me.&amp;nbsp; In my experience when batching is done correctly, not only does it increase throughput, it can also reduce average latency and keep it consistent.&lt;br /&gt;&lt;br /&gt;Well then, how can batching magically reduce latency?&amp;nbsp; It comes down to what algorithm and data structures are employed.&amp;nbsp; In a distributed environment we are often having to batch up messages/events into network packets to achieve greater throughput.&amp;nbsp; We also employ similar techniques in buffering writes to storage to reduce the number of &lt;a href="http://en.wikipedia.org/wiki/IOPS"&gt;IOPS&lt;/a&gt;. That storage could be a block device backed file-system or a relational database.&amp;nbsp; Most IO devices can only handle a modest number of IO operations per second, so it is best to fill those operations efficiently.&amp;nbsp; Many approaches to batching involve waiting for a timeout to occur and this will by its very nature increase latency.&amp;nbsp; The batch can also get filled before the timeout occurs making the latency even more unpredictable.&lt;br /&gt;&lt;br /&gt;&lt;table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td style="text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/-SSf5hpn4l2g/Tp7TjmYdUwI/AAAAAAAAABQ/bYgRlrSlt90/s1600/batching.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"&gt;&lt;img border="0" src="http://4.bp.blogspot.com/-SSf5hpn4l2g/Tp7TjmYdUwI/AAAAAAAAABQ/bYgRlrSlt90/s1600/batching.png" /&gt;&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="tr-caption" style="text-align: center;"&gt;Figure 1.&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;/div&gt;Figure 1. above depicts decoupling the access to an IO device, and therefore the contention for access to it, by introducing a queue like structure to stage the messages/events to be sent and a thread doing the batching for writing to the device.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size: large;"&gt;&lt;b&gt;The Algorithm&lt;/b&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;An approach to batching uses the following algorithm in Java pseudo code:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;public final class NetworkBatcher&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; implements Runnable&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;{&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; private final NetworkFacade network;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; private final Queue&amp;lt;Message&amp;gt; queue;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; private final ByteBuffer buffer;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; public NetworkBatcher(final NetworkFacade network,&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; final int maxPacketSize,&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; final Queue&amp;lt;Message&amp;gt; queue)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; this.network = network;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; buffer = ByteBuffer.allocate(maxPacketSize);&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; this.queue = queue;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; public void run()&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; while (!Thread.currentThread().isInterrupted())&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; while (null == queue.peek())&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; employWaitStrategy(); // block, spin, yield, etc.&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Message msg;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; while (null != (msg = queue.poll()))&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; if (msg.size() &amp;gt; buffer.remaining())&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; sendBuffer();&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; buffer.put(msg.getBytes());&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; sendBuffer();&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; private void sendBuffer()&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; buffer.flip();&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; network.send(buffer);&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; buffer.clear();&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Basically, wait for data to become available and as soon as it is, send it right away.&amp;nbsp; While sending a previous message or waiting on new messages, a burst of traffic may arrive which can all be sent in a batch, up to the size of the buffer, to the underlying resource.&amp;nbsp; This approach can use &lt;a href="http://download.oracle.com/javase/6/docs/api/java/util/concurrent/ConcurrentLinkedQueue.html"&gt;ConcurrentLinkedQueue&lt;/a&gt; which provides low-latency and avoid locks.&amp;nbsp; However it has an issue in not creating back pressure to stall producing/publishing threads if they are outpacing the batcher whereby the queue could grow out of control because it is unbounded.&amp;nbsp; I’ve often had to wrap &lt;a href="http://download.oracle.com/javase/6/docs/api/java/util/concurrent/ConcurrentLinkedQueue.html"&gt;ConcurrentLinkedQueue&lt;/a&gt; to track its size and thus create back pressure.&amp;nbsp; This size tracking can add 50% to the processing cost of using this queue in my experience.&lt;br /&gt;&lt;br /&gt;This algorithm respects the &lt;a href="http://mechanical-sympathy.blogspot.com/2011/09/single-writer-principle.html"&gt;&lt;i&gt;single writer principle&lt;/i&gt;&lt;/a&gt; and can often be employed when writing to a network or storage device, and thus avoid lock contention in third party API libraries.&amp;nbsp; By avoiding the contention we avoid the J-Curve latency profile normally associated with contention on resources, due to the queuing effect on locks.&amp;nbsp; With this algorithm, as load increases, latency stays constant until the underlying device is saturated with traffic resulting in a more "bathtub" profile than the J-Curve.&lt;br /&gt;&lt;br /&gt;Let’s take a worked example of handling 10 messages that arrive as a burst of traffic.&amp;nbsp; In most systems traffic comes in bursts and is seldom uniformly spaced out in time.&amp;nbsp; One approach will assume no batching and the threads write to device API directly as in Figure 1. above.&amp;nbsp; The other will use a lock free data structure to collect the messages plus a single thread consuming messages in a loop as per the algorithm above.&amp;nbsp;&amp;nbsp; For the example let’s assume it takes 100&lt;span style="font-family: &amp;quot;Calibri&amp;quot;,&amp;quot;sans-serif&amp;quot;; font-size: 11pt; line-height: 115%;"&gt;µ&lt;/span&gt;s to write a single buffer to the network device as a synchronous operation and have it acknowledged.&amp;nbsp; The buffer will ideally be less than the MTU of the network in size when latency is critical.&amp;nbsp; Many network sub-systems are asynchronous and support &lt;a href="http://en.wikipedia.org/wiki/Pipeline_%28computing%29"&gt;pipelining&lt;/a&gt; but we will make the above assumption to clarify the example.&amp;nbsp; If the network operation is using a protocol like &lt;a href="http://en.wikipedia.org/wiki/Hypertext_Transfer_Protocol"&gt;HTTP&lt;/a&gt; under &lt;a href="http://en.wikipedia.org/wiki/Representational_state_transfer"&gt;REST&lt;/a&gt; or &lt;a href="http://en.wikipedia.org/wiki/Web_service"&gt;Web Services&lt;/a&gt; then this assumption matches the underlying implementation.&lt;br /&gt;&lt;br /&gt;&lt;div align="center"&gt;&lt;table border="1" cellpadding="5"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;th&gt;&lt;/th&gt;&lt;th&gt;Best (&lt;span style="font-family: &amp;quot;Calibri&amp;quot;,&amp;quot;sans-serif&amp;quot;; font-size: 11pt; line-height: 115%;"&gt;µ&lt;/span&gt;s)&lt;/th&gt;&lt;th&gt;Average (&lt;span style="font-family: &amp;quot;Calibri&amp;quot;,&amp;quot;sans-serif&amp;quot;; font-size: 11pt; line-height: 115%;"&gt;µ&lt;/span&gt;s)&lt;/th&gt;&lt;th&gt;Worst (&lt;span style="font-family: &amp;quot;Calibri&amp;quot;,&amp;quot;sans-serif&amp;quot;; font-size: 11pt; line-height: 115%;"&gt;µ&lt;/span&gt;s)&lt;/th&gt;&lt;th&gt;Packets Sent&lt;/th&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Serial&lt;/td&gt;&lt;td align="right"&gt;100&lt;/td&gt;&lt;td align="right"&gt;500&lt;/td&gt;&lt;td align="right"&gt;1,000&lt;/td&gt;&lt;td align="right"&gt;10&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Smart Batching&lt;/td&gt;&lt;td align="right"&gt;100&lt;/td&gt;&lt;td align="right"&gt;150&lt;/td&gt;&lt;td align="right"&gt;200&lt;/td&gt;&lt;td align="right"&gt;1-2&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;br /&gt;The absolute lowest latency will be achieved if a message is sent from the thread originating the data directly to the resource, if the resource is un-contended.&amp;nbsp;&amp;nbsp; The table above shows what happens when contention occurs and a queuing effect kicks in. With the serial approach 10 individual packets will have to be sent and these typically need to queue on a lock managing access to the resource, therefore they get processed sequentially.&amp;nbsp; The above figures assume the locking strategy works perfectly with no perceivable overhead which is unlikely in a real application.&lt;br /&gt;&lt;br /&gt;For the batching solution it is likely all 10 packets will be picked up in first batch if the concurrent queue is efficient, thus giving the best case latency scenario.&amp;nbsp; In the worst case only one message is sent in the first batch with the other nine following in the next.&amp;nbsp; Therefore in the worst case scenario one message has a latency of 100&lt;span style="font-family: &amp;quot;Calibri&amp;quot;,&amp;quot;sans-serif&amp;quot;; font-size: 11pt; line-height: 115%;"&gt;µ&lt;/span&gt;s and the following 9 have a latency of 200&lt;span style="font-family: &amp;quot;Calibri&amp;quot;,&amp;quot;sans-serif&amp;quot;; font-size: 11pt; line-height: 115%;"&gt;µ&lt;/span&gt;s thus giving a worst case average of 190&lt;span style="font-family: &amp;quot;Calibri&amp;quot;,&amp;quot;sans-serif&amp;quot;; font-size: 11pt; line-height: 115%;"&gt;µ&lt;/span&gt;s which is significantly better than the serial approach.&lt;br /&gt;&lt;br /&gt;This is one good example when the simplest solution is just a bit too simple because of the contention.&amp;nbsp; The batching solution helps achieve consistent low-latency under burst conditions and is best for throughput.&amp;nbsp; It also has a nice effect across the network on the receiving end in that the receiver has to process fewer packets and therefore makes the communication more efficient both ends.&lt;br /&gt;&lt;br /&gt;Most hardware handles data in buffers up to a fixed size for efficiency.&amp;nbsp; For a storage device this will typically be a 4KB block.&amp;nbsp; For networks this will be the MTU and is typically 1500 bytes for Ethernet.&amp;nbsp; When batching, it is best to understand the underlying hardware and write batches down in ideal buffer size to be optimally efficient.&amp;nbsp; However keep in mind that some devices need to envelope the data, e.g. the Ethernet and IP headers for network packets so the buffer needs to allow for this.&lt;br /&gt;&lt;br /&gt;There will always be an increased latency from a thread switch and the cost of exchange via the data structure.&amp;nbsp; However there are a number of very good non-blocking structures available using lock-free techniques.&amp;nbsp; For the &lt;a href="http://code.google.com/p/disruptor/"&gt;Disruptor&lt;/a&gt; this type of exchange can be achieved in as little as 50-100ns thus making the choice of taking the smart batching approach a no brainer for low-latency or high-throughput distributed systems. &lt;br /&gt;&lt;br /&gt;This technique can be employed for many problems and not just IO.&amp;nbsp; The core of the Disruptor uses this technique to help rebalance the system when the publishers burst and outpace the &lt;a href="http://code.google.com/p/disruptor/source/browse/trunk/code/src/main/com/lmax/disruptor/EventProcessor.java" style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;EventProcessor&lt;/a&gt;s.&amp;nbsp; The algorithm can be seen inside the &lt;a href="http://code.google.com/p/disruptor/source/browse/trunk/code/src/main/com/lmax/disruptor/BatchEventProcessor.java" style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;BatchEventProcessor&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Note:&lt;/b&gt; For this algorithm to work the queueing structure must handle the contention better than the underlying resource.&amp;nbsp; Many queue implementations are extremely poor at managing contention.&amp;nbsp; Use science and measure before coming to a conclusion.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size: large;"&gt;&lt;b&gt;Batching with the Disruptor&lt;/b&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The code below shows the same algorithm in action using the Disruptor's &lt;a href="http://code.google.com/p/disruptor/source/browse/trunk/code/src/main/com/lmax/disruptor/EventHandler.java"&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;EventHandler&lt;/span&gt;&lt;/a&gt; mechanism.&amp;nbsp; In my experience, this is a very effective technique for handling any IO device efficiently and keeping latency low when dealing with load or burst traffic.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;public final class NetworkBatchHandler&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; implements EventHander&amp;lt;Message&amp;gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;{&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; private final NetworkFacade network;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; private final ByteBuffer buffer;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; public NetworkBatchHandler(final NetworkFacade network,&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; final int maxPacketSize)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; this.network = network;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; buffer = ByteBuffer.allocate(maxPacketSize);&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; public void onEvent(Message msg, long sequence, boolean endOfBatch) &lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; throws Exception&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; if (msg.size() &amp;gt; buffer.remaining())&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; sendBuffer();&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; buffer.put(msg.getBytes());&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; if (endOfBatch)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; sendBuffer();&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; } &lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; private void sendBuffer()&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; buffer.flip();&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; network.send(buffer);&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; buffer.clear();&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The &lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;endOfBatch&lt;/span&gt; parameter greatly simplifies the handling of the batch compared to the double loop in the algorithm above.&lt;br /&gt;&lt;br /&gt;I have simplified the examples to illustrate the algorithm.&amp;nbsp; Clearly error handling and other edge conditions need to be considered.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size: large;"&gt;&lt;b&gt;Separation of IO from Work Processing&lt;/b&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;There is another very good reason to separate the IO from the threads doing the work processing.&amp;nbsp; Handing off the IO to another thread means the worker thread, or threads, can continue processing without blocking in a nice cache friendly manner. I've found this to be critical in achieving high-performance throughput.&lt;br /&gt;&lt;br /&gt;If the underlying IO device or resource becomes briefly saturated then the messages can be queued for the batcher thread allowing the work processing threads to continue.&amp;nbsp; The batching thread then feeds the messages to the IO device in the most efficient way possible allowing the data structure to handle the burst and if full apply the necessary back pressure, thus providing a good separation of concerns in the workflow.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size: large;"&gt;&lt;b&gt;Conclusion&lt;/b&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;So there you have it.&amp;nbsp; Smart Batching can be employed in concert with the appropriate data structures to achieve consistent low-latency and maximum throughput.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5560209661389175529-2632924219660461165?l=mechanical-sympathy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mechanical-sympathy.blogspot.com/feeds/2632924219660461165/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mechanical-sympathy.blogspot.com/2011/10/smart-batching.html#comment-form' title='5 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5560209661389175529/posts/default/2632924219660461165'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5560209661389175529/posts/default/2632924219660461165'/><link rel='alternate' type='text/html' href='http://mechanical-sympathy.blogspot.com/2011/10/smart-batching.html' title='Smart Batching'/><author><name>Martin Thompson</name><uri>http://www.blogger.com/profile/15893849163924476586</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/-SSf5hpn4l2g/Tp7TjmYdUwI/AAAAAAAAABQ/bYgRlrSlt90/s72-c/batching.png' height='72' width='72'/><thr:total>5</thr:total><georss:featurename>London, UK</georss:featurename><georss:point>51.5001524 -0.12623619999999391</georss:point><georss:box>51.322796399999994 -0.39052969999999393 51.6775084 0.1380573000000061</georss:box></entry><entry><id>tag:blogger.com,1999:blog-5560209661389175529.post-5089162884965261703</id><published>2011-09-22T15:24:00.006+01:00</published><updated>2011-11-13T21:05:36.890Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='Concurrent'/><category scheme='http://www.blogger.com/atom/ns#' term='Processor Affinity'/><category scheme='http://www.blogger.com/atom/ns#' term='Performance'/><title type='text'>Single Writer Principle</title><content type='html'>When trying to build a highly scalable system the single biggest limitation on scalability is having multiple writers contend for any item of data or resource.&amp;nbsp; Sure, algorithms can be bad, but let’s assume they have a reasonable &lt;a href="http://en.wikipedia.org/wiki/Big_O_notation"&gt;Big O notation&lt;/a&gt; so we'll focus on the scalability limitations of the systems design.&amp;nbsp; &lt;br /&gt;&lt;br /&gt;I keep seeing people just accept having multiple writers as the norm.&amp;nbsp; There is a lot of research in computer science for managing this contention that boils down to 2 basic approaches.&amp;nbsp; One is to provide mutual exclusion to the contended resource while the mutation takes place; the other is to take an optimistic strategy and swap in the changes if the underlying resource has not changed while you created the new copy.&amp;nbsp; &lt;br /&gt;&lt;br /&gt;&lt;span style="font-size: large;"&gt;&lt;b&gt;Mutual Exclusion &lt;/b&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Mutual exclusion is the means by which only one writer can have access to a protected resource at a time, and is usually implemented with a locking strategy.&amp;nbsp; Locking strategies require an arbitrator, usually the operating system kernel, to get involved when the contention occurs to decide who gains access and in what order.&amp;nbsp; This can be a very expensive process often requiring many more CPU cycles than the actual transaction to be applied to the business logic would use.&amp;nbsp; Those waiting to enter the &lt;a href="http://en.wikipedia.org/wiki/Critical_section"&gt;critical section&lt;/a&gt;, in advance of performing the mutation must queue, and this queuing effect (&lt;a href="http://en.wikipedia.org/wiki/Little%27s_law"&gt;Little's Law&lt;/a&gt;) causes latency to become unpredictable and ultimately restricts throughput.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;&lt;span style="font-size: large;"&gt;Optimistic Concurrency Control&lt;/span&gt;&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;Optimistic strategies involve taking a copy of the data, modifying it, then copying back the changes if data has not mutated in the meantime.&amp;nbsp; If a change has happened in the meantime you repeat the process until successful.&amp;nbsp; This repeating of the process increases with contention and therefore causes a queuing effect just like with mutual exclusion.&amp;nbsp; If you work with a source code control system, such as Subversion or CVS, then you are using this algorithm every day.&amp;nbsp; Optimistic strategies can work with data but do not work so well with resources such as hardware because you cannot take a copy of the hardware!&amp;nbsp; The ability to perform the changes atomically to data is made possible by &lt;a href="http://en.wikipedia.org/wiki/Compare-and-swap"&gt;CAS&lt;/a&gt; instructions offered by the hardware.&lt;br /&gt;&lt;br /&gt;Most locking strategies are composed from optimistic strategies for changing the lock state or mutual exclusion primitive.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size: large;"&gt;&lt;b&gt;Managing Contention vs. Doing Real Work&lt;/b&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;CPUs can typically process one or more instructions per cycle.&amp;nbsp; For example, modern Intel CPU cores each have 6 execution units that can be doing a combination of arithmetic, branch logic, word manipulation and memory loads/stores in parallel.&amp;nbsp; If while doing work the CPU core incurs a cache miss, and has to go to main memory, it will stall for hundreds of cycles until the result of that memory request returns.&amp;nbsp; To try and improve things the CPU will make some speculative guesses as to what a memory request will return to continue processing.&amp;nbsp; If a second miss occurs the CPU will no longer speculate and simply wait for the memory request to return because it cannot typically keep the state for speculative execution beyond 2 cache misses.&amp;nbsp; Managing cache misses is the single largest limitation to scaling the performance of our current generation of CPUs.&lt;br /&gt;&lt;br /&gt;Now what does this have to do with managing contention?&amp;nbsp; Well if two or more threads are using locks to provide mutual exclusion, at best they will be going to the L3 cache, or over a socket interconnect, to access share state of the lock using CAS operations.&amp;nbsp; These lock/CAS instructions cost ~100+ cycles each plus they cause out-of-order execution for the CPU to be suspended and load/store buffers to be flushed.&amp;nbsp; At worst, collisions occur and the kernel will need to get involved and put one or more of the threads to sleep until the lock is released.&amp;nbsp; This rescheduling of the blocked thread will result in cache pollution.&amp;nbsp; The situation can be even worse when the thread is re-scheduled on another core with a cold cache resulting in many cache misses.&amp;nbsp; &lt;br /&gt;&lt;br /&gt;For highly contended data it is very easy to get into a situation whereby the system spends significantly more time managing contention than doing real work.&amp;nbsp; The table below gives an idea of basic costs for managing contention when the program state is very small and easy to reload from the L2/L3 cache, never mind main memory.&amp;nbsp; &lt;br /&gt;&lt;br /&gt;&lt;div align="center"&gt;&lt;table border="1" cellpadding="5"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;th&gt;Method&lt;/th&gt;&lt;th&gt;Time (ms)&lt;/th&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;One Thread&lt;/td&gt;&lt;td align="right"&gt;300&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;One Thread with Memory Barrier&lt;/td&gt;&lt;td align="right"&gt;4,700&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;One Thread with CAS&lt;/td&gt;&lt;td align="right"&gt;5,700&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Two Threads with CAS&lt;/td&gt;&lt;td align="right"&gt;18,000&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;One Thread with Lock&lt;/td&gt;&lt;td align="right"&gt;10,000&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Two Threads with Lock&lt;/td&gt;&lt;td align="right"&gt;118,000&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;br /&gt;This table illustrates the costs of incrementing a 64-bit counter 500 million times using a variety of techniques on a 2.4Ghz Westmere processor.&amp;nbsp;&amp;nbsp; I can hear people coming back with “but this is a trivial example and real-world applications are not that contended”.&amp;nbsp; This is true but remember real-world applications have way more state, and what do you think happens to all that state which is warm in cache when the context switch occurs???&amp;nbsp; By measuring the basic cost of contention it is possible to extrapolate the scalability limits of a system which has contention points.&amp;nbsp; As multi-core becomes ever more significant another approach is required.&amp;nbsp; My last &lt;a href="http://mechanical-sympathy.blogspot.com/2011/09/adventures-with-atomiclong.html"&gt;post&lt;/a&gt; illustrates the micro level effects of CAS operations on modern CPUs, whereby Sandybridge can be worse for CAS and locks.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size: large;"&gt;&lt;b&gt;Single Writer Designs&lt;/b&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Now, what if you could design a system whereby any item of data, or resource, is only mutated by a single writer/thread?&amp;nbsp; It is actually easier than you think in my experience.&amp;nbsp; It is OK if multiple threads, or other execution contexts, read the same data.&amp;nbsp; CPUs can broadcast read only copies of data to other cores via the cache coherency sub-system.&amp;nbsp; This has a cost but it scales very well.&lt;br /&gt;&lt;br /&gt;If you have a system that can honour this single writer principle then each execution context can spend all its time and resources processing the logic for its purpose, and not be wasting cycles and resource on dealing with the contention problem.&amp;nbsp; You can also scale up without limitation until the hardware is saturated.&amp;nbsp; There is also a really nice benefit in that when working on architectures, such as x86/x64, where at a hardware level they have a &lt;a href="http://en.wikipedia.org/wiki/Memory_model_%28computing%29"&gt;memory model&lt;/a&gt;, whereby load/store memory operations have preserved order, thus &lt;a href="http://mechanical-sympathy.blogspot.com/2011/07/memory-barriersfences.html"&gt;memory barriers&lt;/a&gt; are not required if you adhere strictly to the single writer principle.&amp;nbsp; On x86/x64 "loads can be re-ordered with older stores" according to the memory model so memory barriers are required when multiple threads mutate the same data across cores.&amp;nbsp; The single writer principle avoids this issue because it never has to deal with writing the latest version of a data item that may have been written by another thread and currently in the store buffer of another core.&lt;br /&gt;&lt;br /&gt;So how can we drive towards single writer designs?&amp;nbsp; I’ve found it is a very natural thing.&amp;nbsp; Consider how humans, or any other autonomous creatures of nature, operate with their model of the world.&amp;nbsp; We all have our own model of the world contained in our own heads, i.e. We have a copy of the world state for our own use.&amp;nbsp; We mutate the state in our heads based on inputs (events/messages) we receive via our senses.&amp;nbsp; As we process these inputs and apply them to our model we may take action that produces outputs, which others can take as their own inputs.&amp;nbsp; None of us reach directly into each other’s heads and mess with the neurons.&amp;nbsp; If we did this it would be a serious breach of encapsulation!&amp;nbsp; Originally, Object Oriented (OO) design was all about message passing, and somehow along the way we bastardised the message passing to be method calls and even allowed direct field manipulation – Yuk!&amp;nbsp; Who's bright idea was it to allow public access to fields of an object?&amp;nbsp; You deserve your own special hell.&amp;nbsp; &lt;br /&gt;&lt;br /&gt;At university I studied &lt;a href="http://en.wikipedia.org/wiki/Transputer"&gt;transputers&lt;/a&gt; and interesting languages like &lt;a href="http://en.wikipedia.org/wiki/Occam_%28programming_language%29"&gt;Occam&lt;/a&gt;.&amp;nbsp; I thought very elegant designs appeared by having the nodes collaborate via message passing rather than mutating shared state.&amp;nbsp; I’m sure some of this has inspired the &lt;a href="http://code.google.com/p/disruptor/"&gt;Disruptor&lt;/a&gt;.&amp;nbsp; My experience with the Disruptor has shown that is it possible to build systems with one or more orders of magnitude better throughput than locking or contended state based approaches.&amp;nbsp; It also gives much more predictable latency that stays constant until the hardware is saturated rather than the traditional J-curve latency profile.&lt;br /&gt;&lt;br /&gt;It is interesting to see the emergence of numerous approaches that lend themselves to single writer solutions such as Node.js, Erlang, Actor patterns, and SEDA to name a few.&amp;nbsp; Unfortunately most use queue based implementations underneath, which breaks the single writer principle, whereas the Disruptor strives to separate the concerns so that the single writer principle can be preserved for the common cases.&lt;br /&gt;&lt;br /&gt;Now I’m not saying locks and optimistic strategies are bad and should not be used.&amp;nbsp; They are excellent for many problems.&amp;nbsp; For example, bootstrapping a concurrent system or making major state stages in configuration or reference data.&amp;nbsp; However if the main flow of transactions act on contended data, and locks or optimistic strategies have to be employed, then the scalability is fundamentally limited.&amp;nbsp; &lt;br /&gt;&lt;br /&gt;&lt;span style="font-size: large;"&gt;&lt;b&gt;The Principle at Scale&lt;/b&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;This principle works at all levels of scale.&amp;nbsp; &lt;a href="http://en.wikipedia.org/wiki/Benoit_Mandelbrot"&gt;Mandelbrot&lt;/a&gt; got this so right.&amp;nbsp; CPU cores are just nodes of execution and the cache system provides message passing for communication.&amp;nbsp; The same patterns apply if the processing node is a server and the communication system is a local network.&amp;nbsp; If a service, in &lt;a href="http://en.wikipedia.org/wiki/Service-oriented_architecture"&gt;SOA&lt;/a&gt; architecture parlance, is the only service that can write to its data store it can be made to scale and perform much better.&amp;nbsp; Let’s say that underlying data is stored in a database and other services can go directly to that data, without sending a message to the service that owns the data, then the data is contended and requires the database to manage the contention and coherence of that data.&amp;nbsp; This prevents the service from caching copies of the data for faster response to the clients and restricts how the data can be sharded.&amp;nbsp; Encapsulation has just been broken at a more macro level when multiple different services write to the same data store.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size: large;"&gt;&lt;b&gt;Summary&lt;/b&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;If a system is decomposed into components that keep their own relevant state model, without a central shared model, and all communication is achieved via message passing then you have a system without contention naturally.&amp;nbsp; This type of system obeys the single writer principle if the messaging passing sub-system is not implemented as queues.&amp;nbsp; If you cannot move straight to a model like this, but are finding scalability issues related to contention, then start by asking the question, “How do I change this code to preserve the &lt;i&gt;Single Writer Principle&lt;/i&gt; and thus avoid the contention?”&lt;br /&gt;&lt;br /&gt;The &lt;i&gt;Single Writer Principle&lt;/i&gt; is that for any item of data, or resource, that item of data should be owned by a single execution context for all mutations.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5560209661389175529-5089162884965261703?l=mechanical-sympathy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mechanical-sympathy.blogspot.com/feeds/5089162884965261703/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mechanical-sympathy.blogspot.com/2011/09/single-writer-principle.html#comment-form' title='13 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5560209661389175529/posts/default/5089162884965261703'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5560209661389175529/posts/default/5089162884965261703'/><link rel='alternate' type='text/html' href='http://mechanical-sympathy.blogspot.com/2011/09/single-writer-principle.html' title='Single Writer Principle'/><author><name>Martin Thompson</name><uri>http://www.blogger.com/profile/15893849163924476586</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>13</thr:total><georss:featurename>London, UK</georss:featurename><georss:point>51.5001524 -0.12623619999999391</georss:point><georss:box>51.322796399999994 -0.39052969999999393 51.6775084 0.1380573000000061</georss:box></entry><entry><id>tag:blogger.com,1999:blog-5560209661389175529.post-8984907638235554295</id><published>2011-09-11T12:46:00.008+01:00</published><updated>2011-09-20T08:51:02.925+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Java'/><category scheme='http://www.blogger.com/atom/ns#' term='Concurrent'/><category scheme='http://www.blogger.com/atom/ns#' term='C++'/><category scheme='http://www.blogger.com/atom/ns#' term='Atomic'/><category scheme='http://www.blogger.com/atom/ns#' term='Performance'/><title type='text'>Adventures with AtomicLong</title><content type='html'>Sequencing events between threads is a common operation for many multi-threaded algorithms.&amp;nbsp; These sequences could be used for assigning identity to orders, trades, transactions, messages, events, etc.&amp;nbsp; Within the &lt;a href="http://code.google.com/p/disruptor/"&gt;Disruptor &lt;/a&gt;we use a monotonic sequence for all events which is implemented as &lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;AtomicLong&lt;/span&gt; &lt;a href="http://download.oracle.com/javase/6/docs/api/java/util/concurrent/atomic/AtomicLong.html#incrementAndGet%28%29" style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;incrementAndGet&lt;/a&gt; for the &lt;a href="http://code.google.com/p/disruptor/source/browse/trunk/code/src/main/com/lmax/disruptor/ClaimStrategy.java"&gt;multi-threaded publishing&lt;/a&gt; scenario.&lt;br /&gt;&lt;br /&gt;While working on the latest version of the Disruptor I made some changes which I was convinced would improve performance, however the results surprised me.&amp;nbsp; I had removed some potentially megamorphic method calls and the performance got worse rather than better.&amp;nbsp; After a lot of investigation, I discovered that the megamorphic method calls were hiding a performance issue with the latest Intel &lt;a href="http://en.wikipedia.org/wiki/Sandy_Bridge"&gt;Sandybridge&lt;/a&gt; processors.&amp;nbsp; With the megamorphic calls out of the way, the contention on the atomic sequence generation increased exposing the issue.&amp;nbsp; I've also observed this performance issue with other Java concurrent structures such as &lt;a href="http://download.oracle.com/javase/6/docs/api/java/util/concurrent/ArrayBlockingQueue.html"&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;ArrayBlockingQueue&lt;/span&gt;&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;I’ve been running various benchmarks on Sandybridge and have so far been impressed with performance improvements over &lt;a href="http://en.wikipedia.org/wiki/Nehalem_%28microarchitecture%29"&gt;Nehalem&lt;/a&gt;, especially for memory intensive applications due to the changes in its front-end.&amp;nbsp; However with this sequencing benchmark, I discovered that Sandybridge has taken a major step backward in performance with regard to atomic instructions.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://en.wikipedia.org/wiki/Atomic_instruction"&gt;Atomic instructions&lt;/a&gt; enable read-modify-write actions to be combined into an atomic operation.&amp;nbsp; A good example is incrementing a counter.&amp;nbsp; To complete the increment operation a thread must read the current value, increment it, and then write back the results.&amp;nbsp; In a multi-threaded environment these distinct operations could interleave with other threads doing the same with corrupt results as a consequence.&amp;nbsp; The normal way to avoid this interleaving is to take out a lock for mutual exclusion while performing the steps.&amp;nbsp; Locks are very expensive and often require kernel arbitration between threads.&amp;nbsp; Modern CPUs provide a number of atomic instructions which allow operations such as atomically incrementing a counter, or the ability to conditional set a pointer reference if the value is still as expected.&amp;nbsp; These operations are commonly referred to as &lt;a href="http://en.wikipedia.org/wiki/Compare-and-swap"&gt;CAS&lt;/a&gt; (Compare And Swap) instructions.&amp;nbsp; A good way to think of these CAS instructions is like optimistic locks, similar to what you experience when using a version control system like Subversion or CVS.&amp;nbsp; You try to make a change and if the version is what you expect then you succeed, otherwise the action aborts.&lt;br /&gt;&lt;br /&gt;On x86/x64 these instructions are known as “lock” instructions.&amp;nbsp; The "lock" name comes from how a processor, after setting its lock signal, would lock the front-side/memory bus (&lt;a href="http://en.wikipedia.org/wiki/Front-side_bus"&gt;FSB&lt;/a&gt;) for serialising memory access while the three steps of the operation took place atomically.&amp;nbsp; On more recent processors the lock instruction is simply implemented by getting an exclusive lock on the cache-line for modification.&lt;br /&gt;&lt;br /&gt;These instructions are the basic building blocks used for implementing higher-level locks and semaphores.&amp;nbsp; This is, as will be explained shorty, why I've seen performing issues on Sandybridge for &lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;ArrayBlockingQueue &lt;/span&gt;in some of the Disruptor comparative &lt;a href="http://code.google.com/p/disruptor/source/browse/#svn%2Ftrunk%2Fcode%2Fsrc%2Fperf%2Fcom%2Flmax%2Fdisruptor"&gt;performance tests&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Back to my benchmark.&amp;nbsp; The test was spending significantly more time in AtomicLong.incrementAndGet() than I had previously observed.&amp;nbsp; Initially, I suspected an issue with JDK 1.6.0_27 which I had just installed.&amp;nbsp; I ran the following test with various JVMs, including 1.7.0, and kept getting the same results.&amp;nbsp; I then booted different operating systems (Ubuntu, Fedora, Windows 7 - all 64-bit), again the same results.&amp;nbsp; This lead me to write an isolated test which I ran on Nehalem (2.8 GHz  Core i7 860) and Sandybridge (2.2Ghz Core i7-2720QM).&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;import java.util.concurrent.atomic.AtomicLong;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;public final class TestAtomicIncrement&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; implements Runnable&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;{&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; public static final long COUNT = 500L * 1000L * 1000L;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; public static final AtomicLong counter = new AtomicLong(0L);&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; public static void main(final String[] args) throws Exception&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; final int numThreads = Integer.parseInt(args[0]);&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; final long start = System.nanoTime();&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; runTest(numThreads);&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; System.out.println("duration = " + (System.nanoTime() - start));&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; System.out.println("counter = " + counter);&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; private static void runTest(final int numThreads)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; throws InterruptedException&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Thread[] threads = new Thread[numThreads];&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; for (int i = 0; i &amp;lt; threads.length; i++)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; threads[i] = new Thread(new TestAtomicIncrement());&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; for (Thread t : threads)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; t.start();&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; for (Thread t : threads)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; t.join();&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; public void run()&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; long i = 0L;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; while (i &amp;lt; COUNT)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; i = counter.incrementAndGet();&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;/div&gt;&lt;table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td style="text-align: center;"&gt;&lt;a href="http://1.bp.blogspot.com/-rZGHz5ICYoU/Tm4mtGfE4gI/AAAAAAAAABA/lnLZFkzQgpc/s1600/JavaCAS.png" style="margin-left: auto; margin-right: auto;"&gt;&lt;img border="0" src="http://1.bp.blogspot.com/-rZGHz5ICYoU/Tm4mtGfE4gI/AAAAAAAAABA/lnLZFkzQgpc/s1600/JavaCAS.png" /&gt;&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="tr-caption" style="text-align: center;"&gt;Figure 1.&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;br /&gt;After running this test on 4 different Sandybridge processors with a range of clock speeds, I concluded that using LOCK CMPXCHG, under contention with increasing numbers of threads, is much less scalable than the previous Nehalem generation of processors.&amp;nbsp; Figure 1. above charts the results in nanoseconds duration to complete 500 million increments of a counter with increasing thread count.&amp;nbsp; Less is better.&lt;br /&gt;&lt;br /&gt;I confirmed the JVM was generating the correct instructions for the CAS loop by getting Hotspot to print the assembler it generated.&amp;nbsp; I also confirmed that Hotspot generated identical assembler instructions for both Nehalem and Sandybridge.&lt;br /&gt;&lt;br /&gt;I then decided to investigate further and write the following C++ program to test the relevant lock instructions to compare Nehalem and Sandybridge.&amp;nbsp; I know from using “&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;objdump -d&lt;/span&gt;” on the binary that the &lt;a href="http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Atomic-Builtins.html"&gt;GNU Atomic Builtins&lt;/a&gt; generate the lock instructions for ADD, XADD, and CMPXCHG, for the respectively named functions below.&amp;nbsp; &lt;br /&gt;&lt;br /&gt;&lt;div style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;#include &amp;lt;time.h&amp;gt;&lt;br /&gt;#include &amp;lt;pthread.h&amp;gt;&lt;br /&gt;#include &amp;lt;stdlib.h&amp;gt;&lt;br /&gt;#include &amp;lt;iostream&amp;gt;&lt;br /&gt;&lt;br /&gt;typedef unsigned long long uint64;&lt;br /&gt;const uint64 COUNT = 500LL * 1000LL * 1000LL;&lt;br /&gt;volatile uint64 counter = 0;&lt;br /&gt;&lt;br /&gt;void* run_add(void* numThreads)&lt;br /&gt;{&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; register uint64 value = (COUNT / *((int*)numThreads)) + 1;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; while (--value != 0)&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; __sync_add_and_fetch(&amp;amp;counter, 1);&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;void* run_xadd(void*)&lt;br /&gt;{&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; register uint64 value = counter;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; while (value &amp;lt; COUNT)&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; value = __sync_add_and_fetch(&amp;amp;counter, 1);&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;void* run_cas(void*)&lt;br /&gt;{&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; register uint64 value = 0;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; while (value &amp;lt; COUNT)&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; do&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; value = counter;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; while (!__sync_bool_compare_and_swap(&amp;amp;counter, value, value + 1));&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;int main (int argc, char* argv[])&lt;br /&gt;{&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; const int NUM_THREADS = atoi(argv[1]);&lt;br /&gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; pthread_t threads[NUM_THREADS];&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; void* status;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; timespec ts_start;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; timespec ts_finish;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; clock_gettime(CLOCK_MONOTONIC, &amp;amp;ts_start);&lt;br /&gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; for (int i = 0; i &amp;lt; NUM_THREADS; i++)&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; pthread_create(&amp;amp;threads[i], NULL, run_add, (void*)&amp;amp;NUM_THREADS);&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;br /&gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; for (int i = 0; i &amp;lt; NUM_THREADS; i++)&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; pthread_join(threads[i], &amp;amp;status);&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;br /&gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; clock_gettime(CLOCK_MONOTONIC, &amp;amp;ts_finish);&lt;br /&gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; uint64 start = (ts_start.tv_sec * 1000000000LL) + ts_start.tv_nsec;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; uint64 finish = (ts_finish.tv_sec * 1000000000LL) + ts_finish.tv_nsec;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; uint64 duration = finish - start;&lt;br /&gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; std::cout &amp;lt;&amp;lt; "threads = "&amp;nbsp; &amp;lt;&amp;lt; NUM_THREADS &amp;lt;&amp;lt; std::endl;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; std::cout &amp;lt;&amp;lt; "duration = " &amp;lt;&amp;lt; duration &amp;lt;&amp;lt; std::endl;&lt;/div&gt;&lt;div style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; std::cout &amp;lt;&amp;lt; "counter = "&amp;nbsp; &amp;lt;&amp;lt; counter &amp;lt;&amp;lt; std::endl;&lt;br /&gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; return 0;&lt;br /&gt;}&lt;/div&gt;&lt;br /&gt;&lt;table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td style="text-align: center;"&gt;&lt;a href="http://1.bp.blogspot.com/-wVF9DuFkrzk/Tm4nQC-ZNEI/AAAAAAAAABE/MQ6jfEoMc3s/s1600/atomics.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"&gt;&lt;img border="0" src="http://1.bp.blogspot.com/-wVF9DuFkrzk/Tm4nQC-ZNEI/AAAAAAAAABE/MQ6jfEoMc3s/s1600/atomics.png" /&gt;&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="tr-caption" style="text-align: center;"&gt;Figure 2.&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;br /&gt;It is clear from Figure 2. that Nehalem performs nearly an order of magnitude better for atomic operations as contention increases with threads.&amp;nbsp; I found LOCK ADD and LOCK XADD to be similar so I've only charted XADD for clarity.&amp;nbsp; The CAS operations for C++ and Java are comparable.&lt;br /&gt;&lt;br /&gt;It is also very interesting how XADD greatly outperforms CAS and gives a nice scalable profile.&amp;nbsp; For 3 threads and above, XADD does not degrade further and simply performs at the rate at which the processor can keep the caches coherent.&amp;nbsp; Nehalem and Sandybridge level out respectively at ~100m and ~20m XADD operations per second for 3+ concurrent threads, whereas CAS continues to degrade with increasing thread count because of contention.&amp;nbsp; Naturally, performance degrades when &lt;a href="http://en.wikipedia.org/wiki/Intel_QuickPath_Interconnect"&gt;QPI&lt;/a&gt; links are involved for a multi-socket scenario.&amp;nbsp; Oracle have now accepted that not supporting XADD is a &lt;a href="http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7023898"&gt;bug&lt;/a&gt; and will hopefully fix it soon for the JVM.&amp;nbsp; &lt;br /&gt;&lt;br /&gt;As to the performance I’ve observed with Sandybridge, it would be great if others could confirm my findings so we can all feedback to Intel and have this addressed.&amp;nbsp; I've not been able to get my hands on a server class system with Sandybridge.&amp;nbsp; I can confirm that for the "tick" to Westmere, the performance is similar to Nehalem and not an issue.&amp;nbsp; The "tock" to Sandybridge seems to introduce the issue.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5560209661389175529-8984907638235554295?l=mechanical-sympathy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mechanical-sympathy.blogspot.com/feeds/8984907638235554295/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mechanical-sympathy.blogspot.com/2011/09/adventures-with-atomiclong.html#comment-form' title='25 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5560209661389175529/posts/default/8984907638235554295'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5560209661389175529/posts/default/8984907638235554295'/><link rel='alternate' type='text/html' href='http://mechanical-sympathy.blogspot.com/2011/09/adventures-with-atomiclong.html' title='Adventures with AtomicLong'/><author><name>Martin Thompson</name><uri>http://www.blogger.com/profile/15893849163924476586</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/-rZGHz5ICYoU/Tm4mtGfE4gI/AAAAAAAAABA/lnLZFkzQgpc/s72-c/JavaCAS.png' height='72' width='72'/><thr:total>25</thr:total><georss:featurename>London, UK</georss:featurename><georss:point>51.5001524 -0.12623619999999391</georss:point><georss:box>51.322796399999994 -0.39052969999999393 51.6775084 0.1380573000000061</georss:box></entry><entry><id>tag:blogger.com,1999:blog-5560209661389175529.post-2310721568957726896</id><published>2011-09-02T19:23:00.004+01:00</published><updated>2011-09-03T18:29:25.218+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='DDD'/><category scheme='http://www.blogger.com/atom/ns#' term='Modelling'/><category scheme='http://www.blogger.com/atom/ns#' term='Performance'/><title type='text'>Modelling Is Everything</title><content type='html'>I’m often asked, “What is the best way to learn about building high-performance systems”?&amp;nbsp; There are many perfectly valid answers to this question but there is one thing that stands out for me above everything else, and that is modelling.&amp;nbsp; Modelling what you need to implement is the most important and effective step in the process.&amp;nbsp; I’d go further and say this principle applies to any development and the rest is just typing :-)&lt;br /&gt;&lt;br /&gt;&lt;a href="http://en.wikipedia.org/wiki/Domain-driven_design"&gt;Domain Driven Design&lt;/a&gt; (DDD) advocates modelling the domain and expressing this model in code as fundamental to the successful delivery and ongoing maintenance of software.&amp;nbsp; I wholeheartedly agree with this.&amp;nbsp;&amp;nbsp; How often do we see code that is an approximation of the problem domain?&amp;nbsp; Code that exhibits behaviour which approximates to what is required via inappropriate abstractions and mappings which just about cope.&amp;nbsp; Those mappings between what is in the code and the real domain are only contained in the developers’ heads and this is just not good enough.&lt;br /&gt;&lt;br /&gt;When requiring high-performance, code for parts of the system often have to model what is happening with the CPU, memory, storage sub-systems, or network sub-systems.&amp;nbsp; When we have imperfect abstractions on top of these domains, performance can be very adversely affected.&amp;nbsp; The goal of my “&lt;a href="http://mechanical-sympathy.blogspot.com/"&gt;Mechanical Sympathy&lt;/a&gt;” blog is to peek at what is under the hood so we can improve our abstractions.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size: large;"&gt;&lt;b&gt;What is a Model?&lt;/b&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;A model does not need to be the result of a 3-year exercise producing UML.&amp;nbsp; It can be, and often is best as, people communicating via various means including speech, drawings, illustrations, metaphors, analogies, etc, to build a mental model for shared understanding.&amp;nbsp;&amp;nbsp; If an accurate and distilled understanding can be reached then this model can be turned into code with great results.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size: large;"&gt;&lt;b&gt;Infrastructure Domain Models&lt;/b&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;If developers writing a concurrent framework do not have a good model of how a typical cache sub-system works, i.e. it uses message passing to exchange cache lines, then the framework is unlikely to perform well or be correct.&amp;nbsp; If their code drives the cache sub-system with mechanical sympathy and understanding, it is less likely to have bugs and more likely to perform well.&lt;br /&gt;&lt;br /&gt;It is much easier to predict performance from a sound model when coming from an understanding of the infrastructure for the underlying platform and its published abilities.&amp;nbsp; For example, if you know how many packets per second a network sub-system can handle, and the size of its transfer unit, then it is easy to extrapolate expected bandwidth.&amp;nbsp; With this model based understanding we can test our code for expectations with confidence.&lt;br /&gt;&lt;br /&gt;I’ve fixed many performance issues whereby a framework treated a storage sub-system as stream-based when it is really a block-based model.&amp;nbsp; If you update part of a file on disk, the block to be updated must be read, the changes applied, and the results written back.&amp;nbsp; Now if you know the system is block based and the boundaries&amp;nbsp; of the blocks, you can write whole blocks back without incurring the read, modify, write back cycle replacing these actions with a single write.&amp;nbsp; This&amp;nbsp; applies even when appending to a file as the last block is likely to have been partially written previously.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size: large;"&gt;&lt;b&gt;Business Domain Models&lt;/b&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The same thinking should be applied to the models we construct for the business domain.&amp;nbsp; If a business process is modelled accurately, then the software will not surprise its end users.&amp;nbsp; When we draw up a model it is important to describe the relationships for cardinality and the characteristics by which they will be traversed.&amp;nbsp; This understanding will guide the selection of data structures to those best suited for implementing the relationships.&amp;nbsp; I often see people use a list for a relationship which is mostly searched by key, for this case a map could be more appropriate.&amp;nbsp; Are the entities at the other end of a relationship ordered?&amp;nbsp; A tree or skiplist implementation may then be a better option.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size: large;"&gt;&lt;b&gt;Identity&lt;/b&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Identity of entities in a model is so important.&amp;nbsp; All models have to be entered in some way, and this normally starts with an entity from which to walk.&amp;nbsp; That entity could be “Customer” by customer ID but could equally be “DiskBlock” by filename and offset in an infrastructure domain.&amp;nbsp; The identity of each entity in the system needs to be clear so the model can be accessed efficiently.&amp;nbsp; If for each interaction with a model we waste precious cycles trying to find our entity as a starting point, then other optimisations can become almost irrelevant.&amp;nbsp; Make identity explicit in your model and, if necessary, index entities by their identity so you can efficiently enter the model for each interaction.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size: large;"&gt;&lt;b&gt;Refine as we learn&lt;/b&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;It is also important to keep refining a model as we learn.&amp;nbsp; If the model grows as a series of extensions without refining and distilling, then we end up with a spaghetti mess that is very difficult to manage when trying to achieve predictable performance.&amp;nbsp; Never mind how difficult it is to maintain and support.&amp;nbsp; Everyday we learn new things.&amp;nbsp; Reflect this in the model and keep it up to date.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size: large;"&gt;&lt;b&gt;Implement no more, but also no less, than what is needed!&lt;/b&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The fastest code is code that just does what is needed and no more.&amp;nbsp; Perform the instructions to complete the task and no more.&amp;nbsp; Really fast code is normally not a weird mess of bit-shifting and complier tricks.&amp;nbsp; It is best to start with something clean and elegant.&amp;nbsp; Then measure to see if you are within performance targets.&amp;nbsp; So often this will be sufficient.&amp;nbsp; Sometimes performance will be a surprise.&amp;nbsp; You then need to apply science to test and measure before jumping to conclusions.&amp;nbsp; A profiler will often tell you where the time is being taken.&amp;nbsp; Once the basic modelling mistakes and assumptions have been corrected, it usually takes just a little &lt;a href="http://mechanical-sympathy.blogspot.com/2011/07/why-mechanical-sympathy.html"&gt;mechanical sympathy&lt;/a&gt; to reach the performance goal.&amp;nbsp; Unused code is waste.&amp;nbsp; Try not to create it.&amp;nbsp; If you happen to create some, then remove it from your codebase as soon as you notice it.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size: large;"&gt;&lt;b&gt;Conclusion&lt;/b&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;When non-functional requirements, such as performance and availability, are critical to success, I’ve found the most important thing is to get the model correct for the domain at all levels.&amp;nbsp;&amp;nbsp; That is, take the principles of DDD and make sure your code is an appropriate reflection of each domain.&amp;nbsp; Be that the domain of business applications, or the domain of interactions with infrastructure, I’ve found modelling is everything.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5560209661389175529-2310721568957726896?l=mechanical-sympathy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mechanical-sympathy.blogspot.com/feeds/2310721568957726896/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mechanical-sympathy.blogspot.com/2011/09/modelling-is-everything.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5560209661389175529/posts/default/2310721568957726896'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5560209661389175529/posts/default/2310721568957726896'/><link rel='alternate' type='text/html' href='http://mechanical-sympathy.blogspot.com/2011/09/modelling-is-everything.html' title='Modelling Is Everything'/><author><name>Martin Thompson</name><uri>http://www.blogger.com/profile/15893849163924476586</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total><georss:featurename>London, UK</georss:featurename><georss:point>51.5001524 -0.12623619999999391</georss:point><georss:box>51.322796399999994 -0.39052969999999393 51.6775084 0.1380573000000061</georss:box></entry><entry><id>tag:blogger.com,1999:blog-5560209661389175529.post-6782173680429554585</id><published>2011-08-27T09:49:00.021+01:00</published><updated>2011-09-16T15:10:01.396+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Java'/><category scheme='http://www.blogger.com/atom/ns#' term='Concurrent'/><category scheme='http://www.blogger.com/atom/ns#' term='Disruptor'/><category scheme='http://www.blogger.com/atom/ns#' term='Performance'/><title type='text'>Disruptor 2.0 Released</title><content type='html'>Significantly improved performance and a cleaner API are the key takeaways for the &lt;a href="http://code.google.com/p/disruptor/"&gt;Disruptor 2.0&lt;/a&gt; concurrent programming framework for Java.&amp;nbsp; This release is the result of all the great feedback we have received from the community.&amp;nbsp; Feedback is very welcome and really improves the end product so please keep it coming.&lt;br /&gt;&lt;br /&gt;You can find the Disruptor project &lt;a href="http://code.google.com/p/disruptor/"&gt;here&lt;/a&gt;, plus we have a &lt;a href="http://code.google.com/p/disruptor/w/list"&gt;wiki&lt;/a&gt; with links to detailed blogs describing how things work.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size: large;"&gt;&lt;b&gt;Naming &amp;amp; API&lt;/b&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Over the lifetime of the Disruptor naming has been a challenge.&amp;nbsp; The funny thing is that with the 2.0 release we have come almost full circle.&amp;nbsp; Originally we considered the &lt;a href="http://code.google.com/p/disruptor/"&gt;Disruptor&lt;/a&gt; as an event processing framework that often got used as a queue replacement.&amp;nbsp; To make it understandable to queue users we adopted the nomenclature of producers and consumers.&amp;nbsp; However the consumers are not true consumers.&amp;nbsp; With this release the consensus is to return to the event processing roots and adopt the following naming changes.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Producer -&amp;gt; Publisher&lt;/b&gt;&lt;br /&gt;Events are claimed in strict sequence and published to the &lt;a href="http://code.google.com/p/disruptor/source/browse/trunk/code/src/main/com/lmax/disruptor/RingBuffer.java"&gt;&lt;span style="font-family: 'Courier New',Courier,monospace;"&gt;RingBuffer&lt;/span&gt;&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Entry -&amp;gt; Event&lt;/b&gt;&lt;br /&gt;Events represent the currency of data exchange through the dependency graph of &lt;a href="http://code.google.com/p/disruptor/source/browse/trunk/code/src/main/com/lmax/disruptor/EventProcessor.java"&gt;&lt;span style="font-family: 'Courier New',Courier,monospace;"&gt;EventProcessor&lt;/span&gt;&lt;/a&gt;s.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Consumer -&amp;gt; EventProcessor&lt;/b&gt;&lt;br /&gt;Events are processed by &lt;a href="http://code.google.com/p/disruptor/source/browse/trunk/code/src/main/com/lmax/disruptor/EventProcessor.java"&gt;&lt;span style="font-family: 'Courier New',Courier,monospace;"&gt;EventProcessor&lt;/span&gt;&lt;/a&gt;s.&amp;nbsp; The processing of an event can be read only, but can also involve mutations on which other &lt;a href="http://code.google.com/p/disruptor/source/browse/trunk/code/src/main/com/lmax/disruptor/EventProcessor.java"&gt;&lt;span style="font-family: 'Courier New',Courier,monospace;"&gt;EventProcessor&lt;/span&gt;&lt;/a&gt;s depend.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;ConsumerBarrier -&amp;gt; DependencyBarrier&lt;/b&gt;&lt;br /&gt;Complex graphs of dependent &lt;a href="http://code.google.com/p/disruptor/source/browse/trunk/code/src/main/com/lmax/disruptor/EventProcessor.java"&gt;&lt;span style="font-family: 'Courier New',Courier,monospace;"&gt;EventProcessor&lt;/span&gt;&lt;/a&gt;s can be constructed for the processing of an Event.&amp;nbsp; The &lt;a href="http://code.google.com/p/disruptor/source/browse/trunk/code/src/main/com/lmax/disruptor/DependencyBarrier.java"&gt;&lt;span style="font-family: 'Courier New',Courier,monospace;"&gt;DependencyBarrier&lt;/span&gt;&lt;/a&gt;s are assembled to represent the dependency graph.&amp;nbsp; This topic is the real value of the Disruptor and often misunderstood.&amp;nbsp; A fun example can be seen playing &lt;a href="http://code.google.com/p/disruptor/source/browse/trunk/code/src/perf/com/lmax/disruptor/OnePublisherToThreeProcessorDiamondThroughputTest.java"&gt;FizzBuzz&lt;/a&gt; in our &lt;a href="http://code.google.com/p/disruptor/source/browse/#svn%2Ftrunk%2Fcode%2Fsrc%2Fperf%2Fcom%2Flmax%2Fdisruptor"&gt;performance tests&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;The &lt;span style="font-family: 'Courier New',Courier,monospace;"&gt;ProducerBarrier&lt;/span&gt; was always a one-to-one relationship with the &lt;a href="http://code.google.com/p/disruptor/source/browse/trunk/code/src/main/com/lmax/disruptor/RingBuffer.java"&gt;&lt;span style="font-family: 'Courier New',Courier,monospace;"&gt;RingBuffer&lt;/span&gt;&lt;/a&gt; so for ease of use its behaviour has been merged into the &lt;a href="http://code.google.com/p/disruptor/source/browse/trunk/code/src/main/com/lmax/disruptor/RingBuffer.java"&gt;&lt;span style="font-family: 'Courier New',Courier,monospace;"&gt;RingBuffer&lt;/span&gt;&lt;/a&gt;.&amp;nbsp; This allows direct publishing into the &lt;a href="http://code.google.com/p/disruptor/source/browse/trunk/code/src/main/com/lmax/disruptor/RingBuffer.java"&gt;&lt;span style="font-family: 'Courier New',Courier,monospace;"&gt;RingBuffer&lt;/span&gt;&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size: large;"&gt;&lt;b&gt;DSL Wizard&lt;/b&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The most complex part of using the Disruptor is the setting up of the dependency graph of &lt;a href="http://code.google.com/p/disruptor/source/browse/trunk/code/src/main/com/lmax/disruptor/EventProcessor.java"&gt;&lt;span style="font-family: 'Courier New',Courier,monospace;"&gt;EventProcessor&lt;/span&gt;&lt;/a&gt;s.&amp;nbsp;&amp;nbsp; To simplify this for the most common cases we have integrated the &lt;a href="http://www.symphonious.net/2011/07/11/lmax-disruptor-high-performance-low-latency-and-simple-too/"&gt;DisruptorWizard&lt;/a&gt; project which provides a &lt;a href="http://en.wikipedia.org/wiki/Domain-specific_language"&gt;DSL&lt;/a&gt; as a fluent API for assembling the graph and assigning threads.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size: large;"&gt;&lt;b&gt;Performance&lt;/b&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Significant performance tuning effort has gone into this release.&amp;nbsp; This effort has resulted in a ~2-3X improvement in throughput depending on CPU architecture.&amp;nbsp; For most use cases it is now an order of magnitude better than queue based approaches. On &lt;a href="http://en.wikipedia.org/wiki/Sandy_Bridge"&gt;Sandybridge&lt;/a&gt; processors I've seen over 50 million events processed per second.&lt;br /&gt;&lt;br /&gt;Sequence tracking has been completely rewritten to reduce the usage of hardware &lt;a href="http://mechanical-sympathy.blogspot.com/2011/07/memory-barriersfences.html"&gt;memory barriers&lt;/a&gt;, indirection layers, and megamorphic method calls resulting in a much more data and instruction cache friendly design.&amp;nbsp; New techniques have been employed to prevent &lt;a href="http://mechanical-sympathy.blogspot.com/2011/08/false-sharing-java-7.html"&gt;false sharing&lt;/a&gt; because the previous ones got optimised out by the Oracle Java 7 JVM.&lt;br /&gt;&lt;br /&gt;The one area not seeing a significant performance increase is the &lt;a href="http://code.google.com/p/disruptor/source/browse/trunk/code/src/perf/com/lmax/disruptor/ThreePublisherToOneProcessorSequencedThroughputTest.java"&gt;sequencer&lt;/a&gt; pattern.&amp;nbsp; The Disruptor is still much faster than queue based approaches for this pattern but a limitation of Java hits us hard here.&amp;nbsp;&amp;nbsp; Java on x86/x64 is using LOCK CMPXCHG for CAS operations to implement the &lt;a href="http://download.oracle.com/javase/6/docs/api/java/util/concurrent/atomic/AtomicLong.html"&gt;&lt;span style="font-family: 'Courier New',Courier,monospace;"&gt;AtomicLong&lt;/span&gt;&lt;/a&gt; &lt;a href="http://download.oracle.com/javase/6/docs/api/java/util/concurrent/atomic/AtomicLong.html#incrementAndGet%28%29" style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;incrementAndGet()&lt;/a&gt; method which, based on my measurements, is ~2-10X slower than using LOCK XADD as contention increases.&amp;nbsp; Hopefully Oracle will see the error of SUNs ways on this and embrace x86/x64 to take advantage of such instructions.&amp;nbsp; Dave Dice at Oracle has &lt;a href="http://blogs.oracle.com/dave/entry/atomic_fetch_and_add_vs"&gt;blogged&lt;/a&gt; on the subject so I live in hope.&lt;br /&gt;&lt;b&gt;&lt;span style="font-size: large;"&gt;&lt;br /&gt;Memory Barriers&lt;/span&gt;&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;Of special note for this release is the elimination of hardware memory barriers on x86/x64 for &lt;a href="http://code.google.com/p/disruptor/source/browse/trunk/code/src/main/com/lmax/disruptor/Sequence.java"&gt;Sequence&lt;/a&gt; tracking.&amp;nbsp; The beauty in the Disruptor design is that on CPU architectures that have a memory model &lt;span style="font-size: xx-small;"&gt;[1]&lt;/span&gt; whereby:&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;“&lt;i&gt;loads are not reordered with older loads&lt;/i&gt;”, and&lt;/li&gt;&lt;li&gt;“&lt;i&gt;stores are not reordered with older stores&lt;/i&gt;”;&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;it is then possible to take advantage of the semantics provided by &lt;a href="http://download.oracle.com/javase/6/docs/api/java/util/concurrent/atomic/AtomicLong.html" style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;AtomicLong&lt;/a&gt; to avoid the use of the Java &lt;span style="font-family: 'Courier New',Courier,monospace;"&gt;volatile&lt;/span&gt; keyword, and thus hardware fences on x86/x64.&amp;nbsp; The one sticky rule for concurrent algorithms, such as Dekker &lt;span style="font-size: x-small;"&gt;[2]&lt;/span&gt; and Peterson &lt;span style="font-size: x-small;"&gt;[3]&lt;/span&gt; locks, on x86/x64 is “&lt;i&gt;loads can be re-ordered with older stores&lt;/i&gt;”.&amp;nbsp; This is not an issue given the design of the Disruptor.&amp;nbsp; The issue relates to the snooping of CPU local store buffers for older writes.&amp;nbsp; I’m likely to blog in more detail about why this is the case at a later date.&amp;nbsp; The code should be safe on other CPU architectures if the JVM implementers get the semantics of &lt;a href="http://download.oracle.com/javase/6/docs/api/java/util/concurrent/atomic/AtomicLong.html"&gt;&lt;span style="font-family: 'Courier New',Courier,monospace;"&gt;AtomicLong&lt;/span&gt;&lt;/a&gt; and &lt;a href="http://www.java2s.com/Open-Source/Java-Document/Apache-Harmony-Java-SE/com-sun-package/sun/misc/Unsafe.java.java-doc.htm"&gt;&lt;span style="font-family: 'Courier New',Courier,monospace;"&gt;Unsafe&lt;/span&gt;&lt;/a&gt; correct, however your mileage may vary for performance on other architectures compared to x64.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size: large;"&gt;&lt;b&gt;Roadmap&lt;/b&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;With this latest release it is becoming increasingly obvious how sensitive some CPU architectures are to processor affinity for threads.&amp;nbsp; When an &lt;a href="http://code.google.com/p/disruptor/source/browse/trunk/code/src/main/com/lmax/disruptor/EventProcessor.java"&gt;&lt;span style="font-family: 'Courier New',Courier,monospace;"&gt;EventProcessor&lt;/span&gt;&lt;/a&gt; gets rescheduled on a different core, after its time-slice is exhausted or it yields, the resulting cache pollution really hits performance.&amp;nbsp; For those who require more extreme and predictable performance I plan to release an &lt;a href="http://download.oracle.com/javase/6/docs/api/java/util/concurrent/Executor.html" style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;Executor&lt;/a&gt; service with the Disruptor to allow the pinning of threads to CPU cores.&lt;br /&gt;&lt;br /&gt;I'm also thinking of adding a progressive back off strategy for waiting &lt;a href="http://code.google.com/p/disruptor/source/browse/trunk/code/src/main/com/lmax/disruptor/EventProcessor.java"&gt;&lt;span style="font-family: 'Courier New',Courier,monospace;"&gt;EventProcessor&lt;/span&gt;&lt;/a&gt;s as a &lt;a href="http://code.google.com/p/disruptor/source/browse/trunk/code/src/main/com/lmax/disruptor/WaitStrategy.java"&gt;&lt;span style="font-family: 'Courier New',Courier,monospace;"&gt;WaitStrategy&lt;/span&gt;&lt;/a&gt;.&amp;nbsp; This strategy would first busy spin, then yield, then eventually sleep in millisecond periods to conserve CPU resource for those applications that burst for a while then go quiet.&lt;br /&gt;&lt;br /&gt;&lt;ol&gt;&lt;li&gt;Memory Model: See Section 8.2 of http://www.intel.com/content/www/us/en/architecture-and-technology/64-ia-32-architectures-software-developer-vol-3a-part-1-manual.html&lt;/li&gt;&lt;li&gt;Dekker algorithm: http://en.wikipedia.org/wiki/Dekker%27s_algorithm&lt;/li&gt;&lt;li&gt;Peterson Algorithm: http://en.wikipedia.org/wiki/Peterson%27s_algorithm&lt;/li&gt;&lt;/ol&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5560209661389175529-6782173680429554585?l=mechanical-sympathy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mechanical-sympathy.blogspot.com/feeds/6782173680429554585/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mechanical-sympathy.blogspot.com/2011/08/disruptor-20-released.html#comment-form' title='9 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5560209661389175529/posts/default/6782173680429554585'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5560209661389175529/posts/default/6782173680429554585'/><link rel='alternate' type='text/html' href='http://mechanical-sympathy.blogspot.com/2011/08/disruptor-20-released.html' title='Disruptor 2.0 Released'/><author><name>Martin Thompson</name><uri>http://www.blogger.com/profile/15893849163924476586</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>9</thr:total><georss:featurename>London, UK</georss:featurename><georss:point>51.5001524 -0.12623619999999391</georss:point><georss:box>51.322796399999994 -0.39052969999999393 51.6775084 0.1380573000000061</georss:box></entry><entry><id>tag:blogger.com,1999:blog-5560209661389175529.post-6453797578539649265</id><published>2011-08-20T17:02:00.006+01:00</published><updated>2011-08-20T18:41:08.247+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Refactoring'/><category scheme='http://www.blogger.com/atom/ns#' term='DDD'/><category scheme='http://www.blogger.com/atom/ns#' term='Refurbishment'/><title type='text'>Code Refurbishment</title><content type='html'>Within our industry we use a huge range of terminology.&amp;nbsp; Unfortunately we don’t all agree on what individual terms actually mean.&amp;nbsp; I so often hear people misuse the term “&lt;a href="http://en.wikipedia.org/wiki/Code_refactoring"&gt;Refactoring&lt;/a&gt;” which has come to make the business in many organisations recoil in fear.&amp;nbsp; The reason for this fear I’ve observed is because of what people often mean when misusing this term.&lt;br /&gt;&lt;br /&gt;I feel we are holding back our industry by not being disciplined in our use of terminology.&amp;nbsp; If one chemist said to another chemist “we are about to perform &lt;a href="http://en.wikipedia.org/wiki/Titration"&gt;titration&lt;/a&gt;”, both would have a good idea what is involved.&amp;nbsp; I believe computing is still a very immature science.&amp;nbsp; As our subject matures hopefully we will become more precise and disciplined in our use of terminology and thus make our communication more accurate and effective.&lt;br /&gt;&lt;br /&gt;Refactoring is a very useful technique for improving code quality and clarity.&amp;nbsp; To be precise it is a behaviour preserving change that improves a code base for future maintenance and understanding.&amp;nbsp; A good example would be extracting a method to remove code duplication and applying this method at every site of the duplication, thus removing the duplication.&amp;nbsp; Refactoring was first discussed in the early 1990s and became mainstream after Martin Fowler’s excellent “&lt;a href="http://www.amazon.co.uk/Refactoring-Improving-Design-Existing-Technology/dp/0201485672/"&gt;Refactoring&lt;/a&gt;” book in 1999.&lt;br /&gt;&lt;br /&gt;Refactoring involves making a number of small internal changes to the code structure.&amp;nbsp; These changes will typically not have any external impact.&amp;nbsp; Well written unit tests that just assert externally observable behaviour will not change when code is refactored.&amp;nbsp; If the external behaviour of code is changing when the structure is being changed then this is not refactoring.&lt;br /&gt;&lt;br /&gt;Now, why do our business folk recoil in fear when this simple and useful technique of “refactoring” is mentioned?&amp;nbsp; I believe this is because developers are actually talking about a much more extensive structural redevelopment technique that does not have a common term.&amp;nbsp; These structural changes are often not a complete ground-up rewrite because much of the existing code will be reused.&amp;nbsp; The reason the business folk have come to recoil is that they fear we are about to head off into uncharted waters with no idea of how long things will take and if any value will come out of the exercise. &lt;br /&gt;&lt;br /&gt;This example of significant structural change reminds me of when a bar or restaurant gets taken over by new management.&amp;nbsp; The new management often undertake a refurbishment exercise to make the place more appealing and suitable for the customers they are targeting.&amp;nbsp; A lot of the building will be preserved and reused thus greatly reducing the costs of a complete rebuild.&amp;nbsp; In my experience when developers use the term “refactoring” what they really mean is that some module, or &lt;a href="http://domaindrivendesign.org/node/91"&gt;bounded context&lt;/a&gt;, in a code base is about to undergo significant refurbishment.&amp;nbsp; If we define this term, and agree the goal and value to the business, we may be able to better plan and manage our projects.&lt;br /&gt;&lt;br /&gt;These code refurbishment exercises should have clear goals defined at the outset and all change must be tested against these goals.&amp;nbsp;&amp;nbsp; For example, we may have discovered that code is not a true reflection of the business domain after new insights.&amp;nbsp; These insights may have been gleaned over a period of time and the code has grown out of step to become an approximation of what the business requires.&amp;nbsp; While performing &lt;a href="http://domaindrivendesign.org/resources/what_is_ddd"&gt;Domain Driven Design&lt;/a&gt; the penny may drop with the essence of the business model becoming clear.&amp;nbsp; After this clarity of understanding the code may need a major overhaul to align it with this new understanding of the business.&amp;nbsp; Code can also drift from being a distilled model of the business domain if quick hacks are put in place to meet a deadline.&amp;nbsp; Over time these hacks can build on each other until the model no longer describes the business, it just about makes itself useful by side effect.&amp;nbsp; During this exercise our tests are likely to see significant change as we tighten up the specification for our new improved understanding of the business domain.&lt;br /&gt;&lt;br /&gt;A code refurbishment is worthwhile to correct the core domain if it's about to undergo significant further development, or if a module is business critical and needs to be occasionally corrected under production pressure to preserve revenue generation.&lt;br /&gt;&lt;br /&gt;I’m interested to know if other folk have observed similar developments and if you think refinement of this concept would be valuable?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5560209661389175529-6453797578539649265?l=mechanical-sympathy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mechanical-sympathy.blogspot.com/feeds/6453797578539649265/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mechanical-sympathy.blogspot.com/2011/08/code-refurbishment.html#comment-form' title='9 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5560209661389175529/posts/default/6453797578539649265'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5560209661389175529/posts/default/6453797578539649265'/><link rel='alternate' type='text/html' href='http://mechanical-sympathy.blogspot.com/2011/08/code-refurbishment.html' title='Code Refurbishment'/><author><name>Martin Thompson</name><uri>http://www.blogger.com/profile/15893849163924476586</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>9</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5560209661389175529.post-5683641513359800215</id><published>2011-08-13T10:07:00.007+01:00</published><updated>2011-09-15T08:36:26.787+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Java'/><category scheme='http://www.blogger.com/atom/ns#' term='Concurrent'/><category scheme='http://www.blogger.com/atom/ns#' term='Performance'/><title type='text'>False Sharing &amp;&amp; Java 7</title><content type='html'>In my previous post on &lt;a href="http://mechanical-sympathy.blogspot.com/2011/07/false-sharing.html"&gt;False Sharing&lt;/a&gt; I suggested it can be avoided by padding the cache line with unused &lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;long&lt;/span&gt; fields.&amp;nbsp; It seems Java 7 got clever and eliminated or re-ordered the unused fields, thus re-introducing false sharing.&amp;nbsp; I've experimented with a number of techniques on different platforms and found the following code to be the most reliable.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;import java.util.concurrent.atomic.AtomicLong;&lt;br /&gt;&lt;br /&gt;public final class FalseSharing&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; implements Runnable&lt;br /&gt;{&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; public final static int NUM_THREADS = 4; // change&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; public final static long ITERATIONS = 500L * 1000L * 1000L;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; private final int arrayIndex;&lt;br /&gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; private static PaddedAtomicLong[] longs = new PaddedAtomicLong[NUM_THREADS];&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; static&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; for (int i = 0; i &amp;lt; longs.length; i++)&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; longs[i] = new PaddedAtomicLong();&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;br /&gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; public FalseSharing(final int arrayIndex)&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; this.arrayIndex = arrayIndex;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;br /&gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; public static void main(final String[] args) throws Exception&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; final long start = System.nanoTime();&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; runTest();&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; System.out.println("duration = " + (System.nanoTime() - start));&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;br /&gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; private static void runTest() throws InterruptedException&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Thread[] threads = new Thread[NUM_THREADS];&lt;br /&gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; for (int i = 0; i &amp;lt; threads.length; i++)&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; threads[i] = new Thread(new FalseSharing(i));&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;br /&gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; for (Thread t : threads)&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; t.start();&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;br /&gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; for (Thread t : threads)&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; t.join();&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;br /&gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; public void run()&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; long i = ITERATIONS + 1;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; while (0 != --i)&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; longs[arrayIndex].set(i);&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;br /&gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; public static long sumPaddingToPreventOptimisation(final int index)&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; PaddedAtomicLong v = longs[index];&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; return v.p1 + v.p2 + v.p3 + v.p4 + v.p5 + v.p6;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;br /&gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; public static class PaddedAtomicLong extends AtomicLong&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; public volatile long p1, p2, p3, p4, p5, p6 = 7L;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;br /&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;With this code I get similar performance results to those stated in the previous &lt;a href="http://mechanical-sympathy.blogspot.com/2011/07/false-sharing.html"&gt;False Sharing&lt;/a&gt; article.&amp;nbsp; The padding in &lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;PaddedAtomicLong&lt;/span&gt; above can be commented out to see the false sharing effect.&lt;br /&gt;&lt;br /&gt;I think we should all lobby the powers that be inside Oracle to have intrinsics added to the language so we can have cache line aligned and padded atomic classes.&amp;nbsp; This and some other low-level changes would help make Java a real concurrent programming language.&amp;nbsp; We keep hearing them say multi-core is coming.&amp;nbsp;&amp;nbsp; I say it is here and Java needs to catch up.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5560209661389175529-5683641513359800215?l=mechanical-sympathy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mechanical-sympathy.blogspot.com/feeds/5683641513359800215/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mechanical-sympathy.blogspot.com/2011/08/false-sharing-java-7.html#comment-form' title='13 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5560209661389175529/posts/default/5683641513359800215'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5560209661389175529/posts/default/5683641513359800215'/><link rel='alternate' type='text/html' href='http://mechanical-sympathy.blogspot.com/2011/08/false-sharing-java-7.html' title='False Sharing &amp;&amp; Java 7'/><author><name>Martin Thompson</name><uri>http://www.blogger.com/profile/15893849163924476586</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>13</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5560209661389175529.post-6199920331281240562</id><published>2011-08-09T20:37:00.016+01:00</published><updated>2011-09-10T09:12:30.106+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Java'/><category scheme='http://www.blogger.com/atom/ns#' term='Concurrent'/><category scheme='http://www.blogger.com/atom/ns#' term='C++'/><category scheme='http://www.blogger.com/atom/ns#' term='Performance'/><title type='text'>Inter Thread Latency</title><content type='html'>Message rates between threads are fundamentally determined by the latency of memory exchange between CPU cores.&amp;nbsp;&amp;nbsp; The minimum unit of transfer will be a cache line exchanged via shared caches or socket interconnects.&amp;nbsp; In a previous article I explained &lt;a href="http://mechanical-sympathy.blogspot.com/2011/07/memory-barriersfences.html"&gt;Memory Barriers&lt;/a&gt; and why they are important to concurrent programming between threads.&amp;nbsp; These are the instructions that cause a CPU to make memory visible to other cores in an ordered and timely manner.&lt;br /&gt;&lt;br /&gt;Lately I’ve been asked a lot about how much faster the &lt;a href="http://code.google.com/p/disruptor/"&gt;Disruptor&lt;/a&gt; would be if C++ was used instead of Java.&amp;nbsp; For sure C++ would give more control for memory alignment and potential access to underlying CPU instructions such as memory barriers and lock instructions.&amp;nbsp; In this article I’ll directly compare C++ and Java to measure the cost of signalling a change between threads.&lt;br /&gt;&lt;br /&gt;For the test we'll use two counters each updated by their own thread.&amp;nbsp; A simple ping-pong algorithm will be used to signal from one to the other and back again.&amp;nbsp; The exchange will be repeated millions of times to measure the average latency between cores.&amp;nbsp; This measurement will give us the latency of exchanging a cache line between cores in a serial manner.&lt;br /&gt;&lt;br /&gt;For Java we’ll use volatile counters which the JVM will kindly insert a lock instruction for the update giving us an effective memory barrier.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;public final class InterThreadLatency&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; implements Runnable&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;{&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; public static final long ITERATIONS = 500L * 1000L * 1000L;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; public static volatile long s1;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; public static volatile long s2;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; public static void main(final String[] args)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Thread t = new Thread(new InterThreadLatency());&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; t.setDaemon(true);&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; t.start();&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; long start = System.nanoTime();&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; long value = s1;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; while (s1 &amp;lt; ITERATIONS)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; while (s2 != value)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; // busy spin&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; value = ++s1;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; long duration = System.nanoTime() - start;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; System.out.println("duration = " + duration);&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; System.out.println("ns per op = " + duration / (ITERATIONS * 2));&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; System.out.println("op/sec = " +&amp;nbsp; &lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; (ITERATIONS * 2L * 1000L * 1000L * 1000L) / duration);&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; System.out.println("s1 = " + s1 + ", s2 = " + s2);&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; public void run()&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; long value = s2;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; while (true)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; while (value == s1)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; // busy spin&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; value = ++s2;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;For C++ we’ll use the &lt;a href="http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Atomic-Builtins.html"&gt;GNU Atomic Builtins&lt;/a&gt; which give us a similar lock instruction insertion to that which the JVM uses.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;#include &amp;lt;time.h&amp;gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;#include &amp;lt;pthread.h&amp;gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;#include &amp;lt;stdio.h&amp;gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;typedef unsigned long long uint64;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;const uint64 ITERATIONS = 500LL * 1000LL * 1000LL;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;volatile uint64 s1 = 0;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;volatile uint64 s2 = 0;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;void* run(void*)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;{&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; register uint64 value = s2;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; while (true)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; while (value == s1)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; // busy spin&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; value = __sync_add_and_fetch(&amp;amp;s2, 1);&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;int main (int argc, char *argv[])&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;{&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; pthread_t threads[1];&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; pthread_create(&amp;amp;threads[0], NULL, run, NULL);&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; timespec ts_start;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; timespec ts_finish;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; clock_gettime(CLOCK_MONOTONIC, &amp;amp;ts_start);&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; register uint64 value = s1;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; while (s1 &amp;lt; ITERATIONS)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; while (s2 != value)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; // busy spin&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; value = __sync_add_and_fetch(&amp;amp;s1, 1);&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; clock_gettime(CLOCK_MONOTONIC, &amp;amp;ts_finish);&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; uint64 start = (ts_start.tv_sec * 1000000000LL) + ts_start.tv_nsec;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; uint64 finish = (ts_finish.tv_sec * 1000000000LL) + ts_finish.tv_nsec;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; uint64 duration = finish - start;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; printf("duration = %lld\n", duration);&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; printf("ns per op = %lld\n", (duration / (ITERATIONS * 2)));&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; printf("op/sec = %lld\n",&amp;nbsp; &lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ((ITERATIONS * 2L * 1000L * 1000L * 1000L) / duration));&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; printf("s1 = %lld, s2 = %lld\n", s1, s2);&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; return 0;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size: large;"&gt;&lt;b&gt;Results&lt;/b&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;$ taskset -c 2,4 /opt/jdk1.7.0/bin/java InterThreadLatency&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;duration = 50790271150&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;ns per op = 50&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;op/sec = 19,688,810&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;s1 = 500000000, s2 = 500000000&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;$ g++ -O3 -lpthread -lrt -o itl itl.cpp&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;$ taskset -c 2,4 ./itl&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;duration = 45087955393&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;ns per op = 45&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;op/sec = 22,178,872&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;s1 = 500000000, s2 = 500000000&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The C++ version is slightly faster on my Intel Sandybridge laptop.&amp;nbsp; So what does this tell us?&amp;nbsp; Well, that the latency between 2 cores on a 2.2 GHz machine is ~45ns and that you can exchange 22m messages per second in a serial fashion.&amp;nbsp; On an Intel CPU this is fundamentally the cost of the lock instruction enforcing total order and forcing the store buffer and &lt;a href="http://mechanical-sympathy.blogspot.com/2011/07/write-combining.html"&gt;write combining buffers&lt;/a&gt; to drain, followed by the resulting cache coherency traffic between the cores.&amp;nbsp;&amp;nbsp; Note that each core has a 96GB/s port onto the L3 cache ring bus, yet 22m * 64-bytes is only 1.4 GB/s.&amp;nbsp; This is because we have measured latency and not throughput.&amp;nbsp; We could easily fit some nice fat messages between those memory barriers as part of the exchange if the data has been written before the lock instruction was executed.&lt;br /&gt;&lt;br /&gt;So what does this all mean for the Disruptor?&amp;nbsp; Basically, the latency of the Disruptor is about as low as we can get from Java.&amp;nbsp; It would be possible to get a ~10% latency improvement by moving to C++.&amp;nbsp; I’d expect a similar improvement in throughput for C++.&amp;nbsp; The main win with C++ would be the control, and therefore, the predictability that comes with it if used correctly.&amp;nbsp; The JVM gives us nice safety features like garbage collection in complex applications but we pay a little for that with the extra instructions it inserts that can be seen if you get Hotspot to dump the assembler instructions it is generating.&lt;br /&gt;&lt;br /&gt;How does the Disruptor achieve more than 25m messages per second I hear you say???&amp;nbsp;&amp;nbsp; Well that is one of the neat parts of its design.&amp;nbsp; The “&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;waitFor&lt;/span&gt;” semantics on the &lt;a href="http://code.google.com/p/disruptor/source/browse/trunk/code/src/main/com/lmax/disruptor/SequenceBarrier.java"&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;SequenceBarrier&lt;/span&gt;&lt;/a&gt; enables a very efficient form of batching, which allows the &lt;a href="http://code.google.com/p/disruptor/source/browse/trunk/code/src/main/com/lmax/disruptor/BatchEventProcessor.java"&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;BatchEventProcessor&lt;/span&gt;&lt;/a&gt; to process a series of events that occurred since it last checked in with the &lt;a href="http://code.google.com/p/disruptor/source/browse/trunk/code/src/main/com/lmax/disruptor/RingBuffer.java"&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;RingBuffer&lt;/span&gt;&lt;/a&gt;, all without incurring a memory barrier.&amp;nbsp; For real world applications this batching effect is really significant.&amp;nbsp; For micro benchmarks it only makes the results more random,&amp;nbsp; especially when there is little work done other than accepting the message.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size: large;"&gt;&lt;b&gt;Conclusion&lt;/b&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;So when processing events in series, the measurements tell us that the current generation of processors can do between 20-30 million exchanges per second at a latency less than 50ns.&amp;nbsp; The Disruptor design allows us to get greater throughput without explicit batching on the publisher side.&amp;nbsp; In addition the Disruptor has an explicit &lt;a href="http://code.google.com/p/disruptor/source/browse/trunk/code/src/perf/com/lmax/disruptor/OnePublisherToOneProcessorUniCastBatchThroughputTest.java"&gt;batching API&lt;/a&gt; on the publisher side that can give over &lt;a href="http://code.google.com/p/disruptor/source/browse/trunk/code/src/perf/com/lmax/disruptor/OnePublisherToOneProcessorUniCastBatchThroughputTest.java"&gt;100 million &lt;/a&gt;messages per second.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5560209661389175529-6199920331281240562?l=mechanical-sympathy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mechanical-sympathy.blogspot.com/feeds/6199920331281240562/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mechanical-sympathy.blogspot.com/2011/08/inter-thread-latency.html#comment-form' title='24 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5560209661389175529/posts/default/6199920331281240562'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5560209661389175529/posts/default/6199920331281240562'/><link rel='alternate' type='text/html' href='http://mechanical-sympathy.blogspot.com/2011/08/inter-thread-latency.html' title='Inter Thread Latency'/><author><name>Martin Thompson</name><uri>http://www.blogger.com/profile/15893849163924476586</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>24</thr:total><georss:featurename>London, UK</georss:featurename><georss:point>51.5001524 -0.12623619999999391</georss:point><georss:box>51.322796399999994 -0.39052969999999393 51.6775084 0.1380573000000061</georss:box></entry><entry><id>tag:blogger.com,1999:blog-5560209661389175529.post-7473064875854639804</id><published>2011-07-30T09:34:00.014+01:00</published><updated>2011-09-10T17:37:10.331+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Java'/><category scheme='http://www.blogger.com/atom/ns#' term='False Sharing'/><category scheme='http://www.blogger.com/atom/ns#' term='Threads'/><category scheme='http://www.blogger.com/atom/ns#' term='Performance'/><title type='text'>False Sharing</title><content type='html'>Memory is stored within the cache system in units know as cache lines.&amp;nbsp; Cache lines are a power of 2 of contiguous bytes which are typically 32-256 in size.&amp;nbsp; The most common cache line size is 64 bytes.&amp;nbsp;&amp;nbsp; False sharing is a term which applies when threads unwittingly impact the performance of each other while modifying independent variables sharing the same cache line.&amp;nbsp; Write contention on cache lines is the single most limiting factor on achieving scalability for parallel threads of execution in an SMP system.&amp;nbsp; I’ve heard false sharing described as the silent performance killer because it is far from obvious when looking at code.&lt;br /&gt;&lt;br /&gt;To achieve linear scalability with number of threads, we must ensure no  two threads write to the same variable or cache line.&amp;nbsp; Two threads  writing to the same variable can be tracked down at a code level.&amp;nbsp;&amp;nbsp; To  be able to know if independent variables share the same cache line we  need to know the memory layout, or we can get a  tool to tell us.&amp;nbsp; Intel VTune is such a profiling tool.&amp;nbsp; In this  article I’ll explain how memory is laid out for Java objects and how we  can pad out our cache lines to avoid false sharing. &lt;br /&gt;&lt;br /&gt;&lt;table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td style="text-align: center;"&gt;&lt;a href="http://1.bp.blogspot.com/-MYso5dR5ItE/TjOy1Mbr3NI/AAAAAAAAAAk/uLPjSeGhbmg/s1600/cache-line.png" style="margin-left: auto; margin-right: auto;"&gt;&lt;img border="0" src="http://1.bp.blogspot.com/-MYso5dR5ItE/TjOy1Mbr3NI/AAAAAAAAAAk/uLPjSeGhbmg/s1600/cache-line.png" /&gt;&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="tr-caption" style="text-align: center;"&gt;Figure 1.&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;br /&gt;Figure 1. above illustrates the issue of false sharing.&amp;nbsp; A thread running on core 1 wants to update variable X while a thread on core 2 wants to update variable Y.&amp;nbsp; Unfortunately these two hot variables reside in the same cache line.&amp;nbsp; Each thread will race for ownership of the cache line so they can update it.&amp;nbsp; If core 1 gets ownership then the cache sub-system will need to invalidate the corresponding cache line for core 2.&amp;nbsp; When Core 2 gets ownership and performs its update, then core 1 will be told to invalidate its copy of the cache line.&amp;nbsp; This will ping pong back and forth via the L3 cache greatly impacting performance.&amp;nbsp; The issue would be further exacerbated if competing cores are on different sockets and additionally have to cross the socket interconnect.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size: large;"&gt;&lt;b&gt;Java Memory Layout&lt;/b&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;For the Hotspot JVM, all objects have a 2-word header.&amp;nbsp; First is the “mark” word which is made up of 24-bits for the hash code and 8-bits for flags such as the lock state, or it can be swapped for lock objects.&amp;nbsp; The second is a reference to the class of the object.&amp;nbsp; Arrays have an additional word for the size of the array.&amp;nbsp; Every object is aligned to an 8-byte granularity boundary for performance.&amp;nbsp; Therefore to be efficient when packing, the object fields are re-ordered from declaration order to the following order based on size in bytes:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;doubles (8) and longs (8)&lt;/li&gt;&lt;li&gt;ints (4) and floats (4)&lt;/li&gt;&lt;li&gt;shorts (2) and chars (2)&lt;/li&gt;&lt;li&gt;booleans (1) and bytes (1)&lt;/li&gt;&lt;li&gt;references (4/8)&lt;/li&gt;&lt;li&gt;&amp;lt;repeat for sub-class fields&amp;gt; &lt;/li&gt;&lt;/ol&gt;With this knowledge we can pad a cache line between any fields with 7 longs.&amp;nbsp; Within the &lt;a href="http://code.google.com/p/disruptor/"&gt;Disruptor&lt;/a&gt; we pad cache lines around the &lt;a href="http://code.google.com/p/disruptor/source/browse/trunk/code/src/main/com/lmax/disruptor/RingBuffer.java"&gt;RingBuffer &lt;/a&gt;cursor and &lt;a href="http://code.google.com/p/disruptor/source/browse/trunk/code/src/main/com/lmax/disruptor/BatchEventProcessor.java"&gt;BatchEventProcessor&lt;/a&gt; sequences.&lt;br /&gt;&lt;br /&gt;To show the performance impact let’s take a few threads each updating  their own independent counters.&amp;nbsp; These counters will be &lt;i&gt;volatile long&lt;/i&gt;s so  the world can see their progress. &lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;public final class FalseSharing&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; implements Runnable&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;{&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; public final static int NUM_THREADS = 4; &lt;b&gt;&lt;span style="color: #cc0000;"&gt;// change&lt;/span&gt;&lt;/b&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; public final static long ITERATIONS = 500L * 1000L * 1000L;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; private final int arrayIndex;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; private static VolatileLong[] longs = new VolatileLong[NUM_THREADS];&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; static&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; for (int i = 0; i &amp;lt; longs.length; i++)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; longs[i] = new VolatileLong();&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; public FalseSharing(final int arrayIndex)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; this.arrayIndex = arrayIndex;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; public static void main(final String[] args) throws Exception&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; final long start = System.nanoTime();&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; runTest();&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; System.out.println("duration = " + (System.nanoTime() - start));&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; private static void runTest() throws InterruptedException&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Thread[] threads = new Thread[NUM_THREADS];&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; for (int i = 0; i &amp;lt; threads.length; i++)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; threads[i] = new Thread(new FalseSharing(i));&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; for (Thread t : threads)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; t.start();&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; for (Thread t : threads)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; t.join();&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; public void run()&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; long i = ITERATIONS + 1;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; while (0 != --i)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; longs[arrayIndex].value = i;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; public final static class VolatileLong&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; public volatile long value = 0L;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; public long p1, p2, p3, p4, p5, p6; &lt;b&gt;&lt;span style="color: #990000;"&gt;// comment out&lt;/span&gt;&lt;/b&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size: large;"&gt;&lt;b&gt;Results&lt;/b&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Running the above code while ramping the number of threads and adding/removing the cache line padding,&amp;nbsp; I get the results depicted in Figure 2. below.&amp;nbsp; This is measuring the duration of test runs on my 4-core Nehalem.&lt;br /&gt;&lt;br /&gt;&lt;table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td style="text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/-gi_3N6LC26E/TjOzSGIUMxI/AAAAAAAAAAo/5-UaNKmGZzY/s1600/duration.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"&gt;&lt;img border="0" src="http://2.bp.blogspot.com/-gi_3N6LC26E/TjOzSGIUMxI/AAAAAAAAAAo/5-UaNKmGZzY/s1600/duration.png" /&gt;&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="tr-caption" style="text-align: center;"&gt;Figure 2.&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;br /&gt;The impact of false sharing can clearly be seen by the increased execution time required to complete the test.&amp;nbsp; Without the cache line contention we achieve near linear scale up with threads.&lt;br /&gt;&lt;br /&gt;This is not a perfect test because we cannot be sure where the &lt;i&gt;VolatileLongs &lt;/i&gt;will be laid out in memory.&amp;nbsp; They are independent objects.&amp;nbsp; However experience shows that objects allocated at the same time tend to be co-located.&lt;br /&gt;&lt;br /&gt;So there you have it.&amp;nbsp; False sharing can be a silent performance killer.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Note:&lt;/b&gt; Please read my further adventures with false sharing in this follow on &lt;a href="http://mechanical-sympathy.blogspot.com/2011/08/false-sharing-java-7.html"&gt;blog&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5560209661389175529-7473064875854639804?l=mechanical-sympathy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mechanical-sympathy.blogspot.com/feeds/7473064875854639804/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mechanical-sympathy.blogspot.com/2011/07/false-sharing.html#comment-form' title='20 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5560209661389175529/posts/default/7473064875854639804'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5560209661389175529/posts/default/7473064875854639804'/><link rel='alternate' type='text/html' href='http://mechanical-sympathy.blogspot.com/2011/07/false-sharing.html' title='False Sharing'/><author><name>Martin Thompson</name><uri>http://www.blogger.com/profile/15893849163924476586</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/-MYso5dR5ItE/TjOy1Mbr3NI/AAAAAAAAAAk/uLPjSeGhbmg/s72-c/cache-line.png' height='72' width='72'/><thr:total>20</thr:total><georss:featurename>London, UK</georss:featurename><georss:point>51.5001524 -0.12623619999999391</georss:point><georss:box>51.322796399999994 -0.39052969999999393 51.6775084 0.1380573000000061</georss:box></entry><entry><id>tag:blogger.com,1999:blog-5560209661389175529.post-1940143616890829807</id><published>2011-07-24T19:24:00.013+01:00</published><updated>2011-09-12T08:47:13.139+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Java'/><category scheme='http://www.blogger.com/atom/ns#' term='Memory Barriers'/><category scheme='http://www.blogger.com/atom/ns#' term='Performance'/><title type='text'>Memory Barriers/Fences</title><content type='html'>In this article I'll discuss the most fundamental technique in concurrent programming known as memory barriers, or fences, that make the memory state within a processor visible to other processors.&lt;br /&gt;&lt;br /&gt;CPUs have employed many techniques to try and accommodate the fact that  CPU execution unit performance has greatly outpaced main memory  performance.&amp;nbsp; In my “&lt;a href="http://mechanical-sympathy.blogspot.com/2011/07/write-combining.html"&gt;Write Combining&lt;/a&gt;” article I touched on just one of these techniques.&amp;nbsp; The most common technique employed by CPUs to hide memory latency is to pipeline instructions and then spend significant effort, and resource, on trying to re-order these pipelines to minimise stalls related to cache misses.&lt;br /&gt;&lt;br /&gt;When a program is executed it does not matter if its instructions are re-ordered provided the same end result is achieved.&amp;nbsp; For example, within a loop it does not matter when the loop counter is updated if no operation within the loop uses it.&amp;nbsp; The compiler and CPU are free to re-order the instructions to best utilise the CPU provided it is updated by the time the next iteration is about to commence.&amp;nbsp; Also over the execution of a loop this variable may be stored in a register and never pushed out to cache or main memory, thus it is never visible to another CPU.&lt;br /&gt;&lt;br /&gt;CPU cores contain multiple execution units.&amp;nbsp; For example, a modern Intel CPU contains 6 execution units which can do a combination of arithmetic, conditional logic, and memory manipulation.&amp;nbsp; Each execution unit can do some combination of these tasks.&amp;nbsp; These execution units operate in parallel allowing instructions to be executed in parallel.&amp;nbsp; This introduces another level of non-determinism to program order if it was observed from another CPU.&lt;br /&gt;&lt;br /&gt;Finally, when a cache-miss occurs, a modern CPU can make an assumption on the results of a memory load and continue executing based on this assumption until the load returns the actual data.&lt;br /&gt;&lt;br /&gt;Provided “program order” is preserved the CPU, and compiler, are free to do whatever they see fit to improve performance.&lt;br /&gt;&lt;br /&gt;&lt;table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td style="text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/-aX64aN8wOTE/Tixd9Y4X-4I/AAAAAAAAAAg/FgM0HWTCbUI/s1600/cpu.png" style="margin-left: auto; margin-right: auto;"&gt;&lt;img border="0" src="http://2.bp.blogspot.com/-aX64aN8wOTE/Tixd9Y4X-4I/AAAAAAAAAAg/FgM0HWTCbUI/s1600/cpu.png" /&gt;&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="tr-caption" style="text-align: center;"&gt;Figure 1.&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;br /&gt;Loads and stores to the caches and main memory are buffered and re-ordered using the load, store, and write-combining buffers.&amp;nbsp; These buffers are associative queues that allow fast lookup.&amp;nbsp; This lookup is necessary when a later load needs to read the value of a previous store that has not yet reached the cache.&amp;nbsp; Figure 1 above depicts a simplified view of a modern multi-core CPU.&amp;nbsp; It shows how the execution units can use the local registers and buffers to manage memory while it is being transferred back and forth from the cache sub-system.&lt;br /&gt;&lt;br /&gt;In a multi-threaded environment techniques need to be employed for making program results visible in a timely manner.&amp;nbsp; I will not cover cache coherence in this article.&amp;nbsp; Just assume that once memory has been pushed to the cache then a protocol of messages will occur to ensure all caches are coherent for any shared data.&amp;nbsp; The techniques for making memory visible from a processor core are known as memory barriers or fences.&lt;br /&gt;&lt;br /&gt;Memory barriers provide two properties.&amp;nbsp; Firstly, they preserve externally visible program order by ensuring all instructions either side of the barrier appear in the correct program order if observed from another CPU and, secondly, they make the memory visible by ensuring the data is propagated to the cache sub-system. &lt;br /&gt;&lt;br /&gt;Memory barriers are a complex subject.&amp;nbsp; They are implemented very differently across CPU architectures.&amp;nbsp; At one end of the spectrum there is a relatively strong memory model on Intel CPUs that is more simple than say the weak and complex memory model on a DEC Alpha with its partitioned caches in addition to cache layers.&amp;nbsp; Since x86 CPUs are the most common for multi-threaded programming I’ll try and simplify to this level.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Store Barrier&lt;/b&gt;&lt;br /&gt;A store barrier, “&lt;i&gt;sfence&lt;/i&gt;” instruction on x86, forces all store instructions prior to the barrier to happen before the barrier and have the store buffers flushed to cache for the CPU on which it is issued.&amp;nbsp; This will make the program state visible to other CPUs so they can act on it if necessary.&amp;nbsp; A good example of this in action is the following simplified code from the &lt;a href="http://code.google.com/p/disruptor/source/browse/trunk/code/src/main/com/lmax/disruptor/BatchEventProcessor.java"&gt;BatchEventProcessor&lt;/a&gt; in the Disruptor.&amp;nbsp; When the sequence is updated other consumers and producers know how far this consumer has progressed and thus can take appropriate action.&amp;nbsp; All previous updates to memory that happened before the barrier are now visible.&lt;br /&gt;&lt;br /&gt;&lt;div style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&lt;span style="font-size: small;"&gt;private volatile long sequence = RingBuffer.INITIAL_CURSOR_VALUE;&lt;/span&gt;&lt;/div&gt;&lt;div style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&lt;span style="font-size: small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&lt;span style="font-size: small;"&gt;// from inside the run() method&lt;/span&gt;&lt;/div&gt;&lt;div style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&lt;span style="font-size: small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&lt;span style="font-size: small;"&gt;T event = null;&lt;br /&gt;long nextSequence = sequence.get() + 1L;&lt;br /&gt;while (running)&lt;br /&gt;{&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; try&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; final long availableSequence = barrier&lt;/span&gt;&lt;span style="font-size: small;"&gt;.waitFor(nextSequence);&lt;/span&gt;&lt;/div&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace; font-size: small;"&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; while (nextSequence &amp;lt;= availableSequence)&lt;/span&gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; event = ringBuffer.get(nextSequence);&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace; font-size: small;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; boolean endOfBatch = nextSequence == availableSequence;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; eventHandler.onEvent(event, nextSequence, endOfBatch);&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; nextSequence++;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;br /&gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; sequence.set(nextSequence - 1L); &lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace; font-size: small;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;b&gt;&lt;span style="color: red;"&gt;// store barrier inserted here !!!&lt;/span&gt;&lt;/b&gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; catch (final Exception ex)&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; exceptionHandler.handle(ex, nextSequence, event);&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; sequence.set(nextSequence);&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;b&gt;&lt;span style="color: red;"&gt;// store barrier inserted here !!!&lt;/span&gt;&lt;/b&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace; font-size: small;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; nextSequence++;&lt;/span&gt;&lt;br /&gt;&lt;div style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/div&gt;&lt;div style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&lt;span style="font-size: small;"&gt; }&lt;/span&gt;&lt;/div&gt;&lt;div style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&lt;span style="font-size: x-small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;b&gt;Load Barrier&lt;/b&gt;&lt;br /&gt;A load barrier, “&lt;i&gt;lfence&lt;/i&gt;” instruction on x86, forces all load instructions after the barrier to happen after the barrier and then wait on the load buffer to drain for that CPU.&amp;nbsp; This makes program state exposed from other CPUs visible to this CPU before making further progress.&amp;nbsp; A good example of this is when the BatchEventProcessor sequence referenced above is read by producers, or consumers, in the corresponding barriers of the Disruptor.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Full Barrier&lt;/b&gt;&lt;br /&gt;A full barrier, "&lt;i&gt;mfence&lt;/i&gt;" instruction on x86, is a composite of both load and store barriers happening on a CPU.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Java Memory Model&lt;/b&gt;&lt;br /&gt;In the &lt;a href="http://en.wikipedia.org/wiki/Java_Memory_Model"&gt;Java Memory Model&lt;/a&gt; a &lt;i&gt;volatile &lt;/i&gt;field has a store barrier inserted after a write to it and a load barrier inserted before a read of it.&amp;nbsp; Qualified &lt;i&gt;final&lt;/i&gt; fields of a class have a store barrier inserted after their initialisation to ensure these fields are visible once the constructor completes when a reference to the object is available.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Atomic Instructions and Software Locks&lt;/b&gt;&lt;br /&gt;Atomic instructions, such as the “&lt;i&gt;lock &lt;/i&gt;...” instructions on x86, are effectively a full barrier as they lock the memory sub-system to perform an operation and have guaranteed total order, even across CPUs.&amp;nbsp; Software locks usually employ memory barriers, or atomic instructions, to achieve visibility and preserve program order.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Performance Impact of Memory Barriers&lt;/b&gt;&lt;br /&gt;Memory barriers prevent a CPU from performing a lot of techniques  to hide memory latency therefore they have a significant performance cost which must be considered.&amp;nbsp; To achieve maximum performance it is best to model the problem so the processor can do units of work, then have all the necessary memory barriers occur on the boundaries of these work units.&amp;nbsp; Taking this approach allows the processor to optimise the units of work without restriction.&amp;nbsp; There is an advantage to grouping necessary memory barriers in that buffers flushed after the first one will be less costly because no work will be under way to refill them.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5560209661389175529-1940143616890829807?l=mechanical-sympathy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mechanical-sympathy.blogspot.com/feeds/1940143616890829807/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mechanical-sympathy.blogspot.com/2011/07/memory-barriersfences.html#comment-form' title='10 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5560209661389175529/posts/default/1940143616890829807'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5560209661389175529/posts/default/1940143616890829807'/><link rel='alternate' type='text/html' href='http://mechanical-sympathy.blogspot.com/2011/07/memory-barriersfences.html' title='Memory Barriers/Fences'/><author><name>Martin Thompson</name><uri>http://www.blogger.com/profile/15893849163924476586</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/-aX64aN8wOTE/Tixd9Y4X-4I/AAAAAAAAAAg/FgM0HWTCbUI/s72-c/cpu.png' height='72' width='72'/><thr:total>10</thr:total><georss:featurename>London, UK</georss:featurename><georss:point>51.5001524 -0.12623619999999391</georss:point><georss:box>51.322796399999994 -0.39052969999999393 51.6775084 0.1380573000000061</georss:box></entry><entry><id>tag:blogger.com,1999:blog-5560209661389175529.post-5980356954194735688</id><published>2011-07-19T22:08:00.001+01:00</published><updated>2011-08-10T07:10:38.871+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Java'/><category scheme='http://www.blogger.com/atom/ns#' term='Processor Affinity'/><category scheme='http://www.blogger.com/atom/ns#' term='Performance'/><title type='text'>Processor Affinity - Part 1</title><content type='html'>In a series of articles I’ll aim to show the performance impact of &lt;a href="http://en.wikipedia.org/wiki/Processor_affinity"&gt;processor affinity&lt;/a&gt; in a range of use cases.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size: large;"&gt;&lt;b&gt;Background&lt;/b&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;A thread of execution will typically run until it has used up its quantum (aka time slice), at which point it joins the back of the run queue waiting to be re-scheduled as soon as a processor core becomes available.&amp;nbsp; While running the thread will have accumulated a significant amount of state in the processor, including instructions and data in the cache.&amp;nbsp;&amp;nbsp; If the thread can be re-scheduled to run on the same core as last time it can benefit from all that accumulated state.&amp;nbsp;&amp;nbsp; A thread may equally not run to the end of its quantum because it has been pre-empted, or blocked on IO or a lock.&amp;nbsp; After which, when it is ready to run again, the same holds true.&lt;br /&gt;&lt;br /&gt;There are numerous techniques available for pinning threads to a particular core.&amp;nbsp;&amp;nbsp; In this article I’ll illustrate the use of the &lt;a href="http://linuxcommand.org/man_pages/taskset1.html"&gt;taskset &lt;/a&gt;command on two threads exchanging IP multicast messages via a dummy interface.&amp;nbsp; I’ve chosen this as the first example because in a low-latency environment multicast is the preferred IP protocol.&amp;nbsp; For simplicity, I’ve also chosen to not involve the physical network while introducing the concepts.&amp;nbsp;&amp;nbsp; In the next article I’ll expand on this example and the issues involving a real network.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;1. Create the dummy interface&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp; $ su -&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp; $ modprobe dummy&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp; $ ifconfig dummy0 172.16.1.1 netmask 255.255.255.0&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp; $ ifconfig dummy0 multicast&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;b&gt;2. Get the Java files (&lt;a href="http://code.google.com/p/mechanical-sympathy/source/browse/trunk/multicast/MultiCastSender.java"&gt;Sender&lt;/a&gt; and &lt;a href="http://code.google.com/p/mechanical-sympathy/source/browse/trunk/multicast/MultiCastReceiver.java"&gt;Receiver&lt;/a&gt;) and compile them&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp; $ javac *.java&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;b&gt;3. Run the tests without CPU pinning&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;Window 1:&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp; $ java MultiCastReceiver 230.0.0.1 dummy0&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Window 2:&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp; $ java MultiCastSender 230.0.0.1 dummy0 20000000&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;b&gt;4. Run the tests with CPU pinning&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;Window 1:&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp; $ taskset -c 2 java MultiCastReceiver 230.0.0.1 dummy0&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Window 2:&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp; $ taskset -c 4 java MultiCastSender 230.0.0.1 dummy0 20000000&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size: large;"&gt;&lt;b&gt;Results&lt;/b&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The tests output once per second the number of messages they have managed to send and receive.&amp;nbsp; A typically example run is charted in Figure 1 below.&lt;br /&gt;&lt;br /&gt;&lt;table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td style="text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/-8kg6COKmAc0/TiXtxcjpNhI/AAAAAAAAAAU/7dfgoT1RXOY/s1600/throughput.png" style="margin-left: auto; margin-right: auto;"&gt;&lt;img border="0" src="http://4.bp.blogspot.com/-8kg6COKmAc0/TiXtxcjpNhI/AAAAAAAAAAU/7dfgoT1RXOY/s1600/throughput.png" /&gt;&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="tr-caption" style="text-align: center;"&gt;Figure 1.&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;br /&gt;The interesting thing I've observed is that the unpinned test will follow a step function of unpredictable performance.&amp;nbsp; Across many runs I've seen different patterns but all similar in this step function nature.&amp;nbsp; For the pinned tests I get consistent throughput with no step pattern and always the greatest throughput.&lt;br /&gt;&lt;br /&gt;This test is not particularly CPU intensive, nor does it access the physical network device, yet it shows how critical processor affinity is to not just high performance but also predictable performance.&amp;nbsp; In the next article of this series I'll introduce a network hop and the issues arising from interrupt handling.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5560209661389175529-5980356954194735688?l=mechanical-sympathy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mechanical-sympathy.blogspot.com/feeds/5980356954194735688/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mechanical-sympathy.blogspot.com/2011/07/processor-affinity-part-1.html#comment-form' title='8 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5560209661389175529/posts/default/5980356954194735688'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5560209661389175529/posts/default/5980356954194735688'/><link rel='alternate' type='text/html' href='http://mechanical-sympathy.blogspot.com/2011/07/processor-affinity-part-1.html' title='Processor Affinity - Part 1'/><author><name>Martin Thompson</name><uri>http://www.blogger.com/profile/15893849163924476586</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/-8kg6COKmAc0/TiXtxcjpNhI/AAAAAAAAAAU/7dfgoT1RXOY/s72-c/throughput.png' height='72' width='72'/><thr:total>8</thr:total><georss:featurename>London, UK</georss:featurename><georss:point>51.5001524 -0.12623619999999391</georss:point><georss:box>51.322796399999994 -0.39052969999999393 51.6775084 0.1380573000000061</georss:box></entry><entry><id>tag:blogger.com,1999:blog-5560209661389175529.post-3263970411944690746</id><published>2011-07-19T10:08:00.000+01:00</published><updated>2012-01-07T13:37:03.948Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='Event Sourcing'/><title type='text'>LMAX Architecture - by Martin Fowler</title><content type='html'>&lt;div dir="ltr" style="text-align: left;" trbidi="on"&gt;Martin Fowler has written a great &lt;a href="http://martinfowler.com/articles/lmax.html"&gt;article&lt;/a&gt; about the architecture we developed for our exchange &lt;a href="http://www.lmaxtrader.co.uk/"&gt;LMAX&lt;/a&gt;.&amp;nbsp; &lt;br /&gt;&lt;br /&gt;The article explains the origins of the &lt;a href="http://code.google.com/p/disruptor/"&gt;Disruptor&lt;/a&gt; and what is possible using single threaded business logic.&amp;nbsp; I really like how he explains the benefits of this approach, much better than I could :-)&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5560209661389175529-3263970411944690746?l=mechanical-sympathy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mechanical-sympathy.blogspot.com/feeds/3263970411944690746/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mechanical-sympathy.blogspot.com/2011/07/lmax-architecture-by-martin-fowler.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5560209661389175529/posts/default/3263970411944690746'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5560209661389175529/posts/default/3263970411944690746'/><link rel='alternate' type='text/html' href='http://mechanical-sympathy.blogspot.com/2011/07/lmax-architecture-by-martin-fowler.html' title='LMAX Architecture - by Martin Fowler'/><author><name>Martin Thompson</name><uri>http://www.blogger.com/profile/15893849163924476586</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total><georss:featurename>London, UK</georss:featurename><georss:point>51.5001524 -0.12623619999999391</georss:point><georss:box>51.322796399999994 -0.39052969999999393 51.6775084 0.1380573000000061</georss:box></entry><entry><id>tag:blogger.com,1999:blog-5560209661389175529.post-6224388439279301539</id><published>2011-07-16T16:36:00.001+01:00</published><updated>2012-01-07T13:37:24.610Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='Java'/><category scheme='http://www.blogger.com/atom/ns#' term='Memory'/><category scheme='http://www.blogger.com/atom/ns#' term='Design'/><category scheme='http://www.blogger.com/atom/ns#' term='Performance'/><title type='text'>Let The Caller Choose</title><content type='html'>&lt;div dir="ltr" style="text-align: left;" trbidi="on"&gt;&lt;div style="font-family: inherit;"&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="font-family: inherit;"&gt;A programming idiom I often see in managed runtime environments, such as Java, is to return a collection or array from a method that is allocated within the method.&amp;nbsp; Having originally come from a C/C++ language background this always struck me as a bit strange and lazy.&amp;nbsp; In the C language world this idiom would likely be a source of memory leaks.&amp;nbsp; I do however understand how convenient it can be in garbage collected environments.&amp;nbsp; A good example is the split function on a String as below:&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; String str = “Some string that I may want to split”; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; String[] values = str.split(“ &lt;/span&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;;"&gt;”&lt;/span&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;;"&gt;);&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;For this a new array has to be allocated within the method.&amp;nbsp; If the array is small it can be allocated in the young generation.&amp;nbsp; Due to the round-robin nature of the young generation it will most likely involve a cache miss. &amp;nbsp;The OS, in addition, must zero the memory for security before handing it over. &amp;nbsp;What’s worse is the size of many arrays cannot be determined until the method completes, so temporary structures may have to be allocated before the final structure is allocated and copied into.&amp;nbsp; Then after all this, the memory will eventually need to be garbage collected.&amp;nbsp; This feels like the CPU is doing a lot of work to make this idiom work.&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span style="font-size: large;"&gt;&lt;b&gt;What is the alternative?&lt;/b&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;Another idiom is to let the caller allocate the structure and pass it into the method like in the example below:&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; String str = “Some string that I may want to split”; &lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; Collection&amp;lt;String&amp;gt; values = new ArrayList&amp;lt;String&amp;gt;();&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; str.split(&lt;b style="color: red;"&gt;values&lt;/b&gt;, “ &lt;/span&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;;"&gt;”&lt;/span&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;;"&gt;);&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;With this approach the caller has more control and flexibility.&amp;nbsp; Not only can the caller reuse the collection across multiple calls, they can also choose the most appropriate structure for the end needs.&amp;nbsp; For example, it could be a HashSet if finding distinct words, an ArrayList for cache friendly performance, or a LinkedList if memory is a scarce resource.&amp;nbsp; The method can return the collection allowing a fluent coding pattern.&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;In the Java world this idiom becomes critical to performance when large arrays are involved. &amp;nbsp;An array may be too large to be allocated in the young generation and therefore needs to be allocated in the old generation.&amp;nbsp; Besides being less efficient and contended for large object allocation, the large array may cause a compaction of the old generation resulting in a multi-second stop-the-world pause.&amp;nbsp; Now that is a good topic for another blog post...&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;I can hear folk cry out, “but short lived objects are cheap!”&amp;nbsp; While this is true when compared to longer living objects, they are relatively cheap but certainly not free.&amp;nbsp; I’d prefer to use those extra cycles solving the business problem and, in addition, have the flexibility to choose the most appropriate data structure for the task at hand.&lt;/div&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5560209661389175529-6224388439279301539?l=mechanical-sympathy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mechanical-sympathy.blogspot.com/feeds/6224388439279301539/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mechanical-sympathy.blogspot.com/2011/07/let-caller-choose.html#comment-form' title='11 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5560209661389175529/posts/default/6224388439279301539'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5560209661389175529/posts/default/6224388439279301539'/><link rel='alternate' type='text/html' href='http://mechanical-sympathy.blogspot.com/2011/07/let-caller-choose.html' title='Let The Caller Choose'/><author><name>Martin Thompson</name><uri>http://www.blogger.com/profile/15893849163924476586</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>11</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5560209661389175529.post-3750655872511316079</id><published>2011-07-15T14:10:00.000+01:00</published><updated>2011-07-16T17:06:57.352+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Memory'/><category scheme='http://www.blogger.com/atom/ns#' term='Performance'/><title type='text'>Write Combining</title><content type='html'>Modern CPUs employ lots of techniques to counteract the latency cost of going to main memory.&amp;nbsp; These days CPUs can process hundreds of instructions in the time it takes to read or write data to the DRAM memory banks.&amp;nbsp; &lt;br /&gt;&lt;br /&gt;The major tool used to hide this latency is multiple layers of SRAM cache.&amp;nbsp; In addition, SMP systems employ message passing protocols to achieve coherence between caches.&amp;nbsp; Unfortunately CPUs are now so fast that even these caches cannot keep up at times.&amp;nbsp; So to further hide this latency a number of less well known buffers are used.&amp;nbsp; &lt;br /&gt;&lt;br /&gt;This article explores “write combining store buffers” and how we can write code that uses them effectively.&lt;br /&gt;&lt;br /&gt;CPU caches are effectively unchained hash maps where each bucket is typically 64-bytes. This is known as a “cache line”.&amp;nbsp; The cache line is the effective unit of memory transfer.&amp;nbsp; For example, an address A in main memory would hash to map to a given cache line C.&lt;br /&gt;&lt;br /&gt;If a CPU needs to work with an address which hashes to a line that is not already in cache, then the existing line that matches that hash needs to be evicted so the new line can take its place.&amp;nbsp; For example if we have two addresses which both map via the hashing algorithm to the same cache line, then the old one must make way for the new cache line.&lt;br /&gt;&lt;br /&gt;When a CPU executes a store operation it will try to write the data to the L1 cache nearest to the CPU.&amp;nbsp; If a cache miss occurs at this stage the CPU goes out to the next layer of cache.&amp;nbsp; At this point on an Intel, and many other, CPUs a technique known as “write combining” comes into play.&amp;nbsp; &lt;br /&gt;&lt;br /&gt;While the request for ownership of the L2 cache line is outstanding the data to be stored is written to one of a number of cache line sized store buffers on the processor itself.&amp;nbsp; These on chip buffers allow the CPU to continue processing instructions while the cache sub-system gets ready to receive and process the data.&amp;nbsp; The biggest advantage comes when the data is not present in any of the other cache layers.&lt;br /&gt;&lt;br /&gt;These buffers become very interesting when subsequent writes happen to require the same cache line.&amp;nbsp; The subsequent writes can be combined into the buffer before it is committed to the L2 cache. These 64-byte buffers maintain a 64-bit field which has the corresponding bit set for each byte that is updated to indicate what data is valid when the buffer is transferred to the outer caches.&lt;br /&gt;&lt;br /&gt;Hang on I hear you say.&amp;nbsp; What happens if the program wants to read some of the data that has been written to a buffer?&amp;nbsp; Well our hardware friends have thought of that and they will snoop the buffers before they read the caches.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;What does all this mean for our programs?&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;If we can fill these buffers before they are transferred to the outer caches then we will greatly improve the effective use of the transfer bus at every level.&amp;nbsp; How do we do this?&amp;nbsp; Well programs spend most of their time in loops doing work.&amp;nbsp; &lt;br /&gt;&lt;br /&gt;There are a limited number of these buffers, and they differ by CPU model.&amp;nbsp; For example on an Intel CPU you are only guaranteed to get 4 of them at one time.&amp;nbsp; What this means is that within a loop you should not write to more than 4 distinct memory locations at one time or you will not benefit from the write combining effect.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;What does this look like in code?&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size: small;"&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;import static java.lang.System.out;&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;public final class WriteCombining&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;{&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; private static final int ITERATIONS = Integer.MAX_VALUE;&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; private static final int ITEMS = 1 &amp;lt;&amp;lt; 24;&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; private static final int MASK = ITEMS - 1;&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; private static final byte[] arrayA = new byte[ITEMS];&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; private static final byte[] arrayB = new byte[ITEMS];&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; private static final byte[] arrayC = new byte[ITEMS];&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; private static final byte[] arrayD = new byte[ITEMS];&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; private static final byte[] arrayE = new byte[ITEMS];&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; private static final byte[] arrayF = new byte[ITEMS];&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; public static void main(final String[] args)&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; for (int i = 1; i &amp;lt;= 3; i++)&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; out.println(i + " SingleLoop duration (ns) = " + runCaseOne());&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; out.println(i + " SplitLoop&amp;nbsp; duration (ns) = " + runCaseTwo());&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; int result = arrayA[1] + arrayB[2] + arrayC[3] +&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; arrayD[4] + arrayE[5] + arrayF[6];&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; out.println("result = " + result);&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; public static long runCaseOne()&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; long start = System.nanoTime();&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; int i = ITERATIONS;&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; while (--i != 0)&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; int slot = i &amp;amp; MASK;&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; byte b = (byte)i;&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; arrayA[slot] = b;&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; arrayB[slot] = b;&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; arrayC[slot] = b;&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; arrayD[slot] = b;&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; arrayE[slot] = b;&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; arrayF[slot] = b;&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; return System.nanoTime() - start;&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; public static long runCaseTwo()&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; long start = System.nanoTime();&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; int i = ITERATIONS;&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; while (--i != 0)&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; int slot = i &amp;amp; MASK;&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; byte b = (byte)i;&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; arrayA[slot] = b;&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; arrayB[slot] = b;&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; arrayC[slot] = b;&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; i = ITERATIONS;&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; while (--i != 0)&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; int slot = i &amp;amp; MASK;&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; byte b = (byte)i;&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; arrayD[slot] = b;&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; arrayE[slot] = b;&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; arrayF[slot] = b;&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; return System.nanoTime() - start;&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;}&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;This program on my Windows 7 64-bit Intel Core i7 860 @ 2.8 GHz system produces the following output:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace; font-size: small;"&gt;1 SingleLoop duration (ns) = 14019753545&lt;br /&gt;1 SplitLoop&amp;nbsp; duration (ns) = 8972368661&lt;br /&gt;2 SingleLoop duration (ns) = 14162455066&lt;br /&gt;2 SplitLoop&amp;nbsp; duration (ns) = 8887610558&lt;br /&gt;3 SingleLoop duration (ns) = 13800914725&lt;br /&gt;3 SplitLoop&amp;nbsp; duration (ns) = 7271752889&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;To spell it out, if we write to 6 array locations (memory addresses) inside one loop we see that the program takes significantly longer than if we split the work up, and write first to 3 array locations, then to the other 3 locations sequentially.&lt;br /&gt;&lt;br /&gt;By splitting the loop we do much more work yet the program completes in much less time!&amp;nbsp; Welcome to the magic of “write combining”.&amp;nbsp; By using our knowledge of the CPU architecture to fill those buffers properly we can use the underlying hardware to accelerate our code by a factor of two.&lt;br /&gt;&lt;br /&gt;Don’t forget that with hyper-threading you can have 2 threads in competition for these buffers on the same core.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5560209661389175529-3750655872511316079?l=mechanical-sympathy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mechanical-sympathy.blogspot.com/feeds/3750655872511316079/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mechanical-sympathy.blogspot.com/2011/07/write-combining.html#comment-form' title='22 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5560209661389175529/posts/default/3750655872511316079'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5560209661389175529/posts/default/3750655872511316079'/><link rel='alternate' type='text/html' href='http://mechanical-sympathy.blogspot.com/2011/07/write-combining.html' title='Write Combining'/><author><name>Martin Thompson</name><uri>http://www.blogger.com/profile/15893849163924476586</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>22</thr:total><georss:featurename>London, UK</georss:featurename><georss:point>51.5001524 -0.12623619999999391</georss:point><georss:box>51.322796399999994 -0.39052969999999393 51.6775084 0.1380573000000061</georss:box></entry><entry><id>tag:blogger.com,1999:blog-5560209661389175529.post-2156648036159218160</id><published>2011-07-15T08:37:00.000+01:00</published><updated>2011-07-25T07:31:49.776+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Performance'/><title type='text'>Why Mechanical Sympathy?</title><content type='html'>A little while ago I discovered this wonderful quote by Henry Peteroski:&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align: left;"&gt;&lt;blockquote&gt;&lt;i&gt;"The most amazing achievement of the computer software industry is its   continuing cancellation of the steady and staggering gains made by the   computer hardware industry."&lt;/i&gt;&lt;/blockquote&gt;&lt;/div&gt;&lt;br /&gt;This observation has been a great source of employment for me over the last two decades.&amp;nbsp; In this series of blog posts I will try and share some of the techniques I've learned for getting the most out of our underlying hardware.&amp;nbsp; The name "Mechanical Sympathy" comes from the great racing driver Jackie Stewart, who was a 3 times world Formula 1 champion.&amp;nbsp; He believed the best drivers had enough understanding of how a machine worked so they could work in harmony with it.&lt;br /&gt;&lt;br /&gt;Why does the software we use today not feel any faster than the DOS based applications we used 20 years ago???&amp;nbsp; It does not have to be this way.&amp;nbsp; As a software developer I want to try and produce software which does justice to the wonderful achievements of our hardware friends.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5560209661389175529-2156648036159218160?l=mechanical-sympathy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mechanical-sympathy.blogspot.com/feeds/2156648036159218160/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mechanical-sympathy.blogspot.com/2011/07/why-mechanical-sympathy.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5560209661389175529/posts/default/2156648036159218160'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5560209661389175529/posts/default/2156648036159218160'/><link rel='alternate' type='text/html' href='http://mechanical-sympathy.blogspot.com/2011/07/why-mechanical-sympathy.html' title='Why Mechanical Sympathy?'/><author><name>Martin Thompson</name><uri>http://www.blogger.com/profile/15893849163924476586</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total><georss:featurename>London, UK</georss:featurename><georss:point>51.5001524 -0.12623619999999391</georss:point><georss:box>51.322796399999994 -0.39052969999999393 51.6775084 0.1380573000000061</georss:box></entry></feed>
