Financial systems communicate by sending and receiving vast numbers of messages in many different formats. When people use terms like "vast" I normally think, "really..how many?" So lets quantify "vast" for the finance industry. Market data feeds from financial exchanges typically can be emitting tens or hundreds of thousands of message per second, and aggregate feeds like OPRA can peak at over 10 million messages per second with volumes growing year-on-year. This presentation gives a good overview.
In this crazy world we still see significant use of ASCII encoded presentations, such as FIX tag value, and some more slightly sane binary encoded presentations like FAST. Some markets even commit the sin of sending out market data as XML! Well I cannot complain too much as they have at times provided me a good income writing ultra fast XML parsers.
Last year the CME, who are a member the FIX community, commissioned Todd Montgomery, of 29West LBM fame, and myself to build the reference implementation of the new FIX Simple Binary Encoding (SBE) standard. SBE is a codec aimed at addressing the efficiency issues in low-latency trading, with a specific focus on market data. The CME, working within the FIX community, have done a great job of coming up with an encoding presentation that can be so efficient. Maybe a suitable atonement for the sins of past FIX tag value implementations. Todd and I worked on the Java and C++ implementation, and later we were helped on the .Net side by the amazing Olivier Deheurles at Adaptive. Working on a cool technical problem with such a team is a dream job.
SBE Overview
SBE is an OSI layer 6 presentation for encoding/decoding messages in binary format to support low-latency applications. Of the many applications I profile with performance issues, message encoding/decoding is often the most significant cost. I've seen many applications that spend significantly more CPU time parsing and transforming XML and JSON than executing business logic. SBE is designed to make this part of a system the most efficient it can be. SBE follows a number of design principles to achieve this goal. By adhering to these design principles sometimes means features available in other codecs will not being offered. For example, many codecs allow strings to be encoded at any field position in a message; SBE only allows variable length fields, such as strings, as fields grouped at the end of a message.
The SBE reference implementation consists of a compiler that takes a message schema as input and then generates language specific stubs. The stubs are used to directly encode and decode messages from buffers. The SBE tool can also generate a binary representation of the schema that can be used for the on-the-fly decoding of messages in a dynamic environment, such as for a log viewer or network sniffer.
The design principles drive the implementation of a codec that ensures messages are streamed through memory without backtracking, copying, or unnecessary allocation. Memory access patterns should not be underestimated in the design of a high-performance application. Low-latency systems in any language especially need to consider all allocation to avoid the resulting issues in reclamation. This applies for both managed runtime and native languages. SBE is totally allocation free in all three language implementations.
The end result of applying these design principles is a codec that has ~16-25 times greater throughput than Google Protocol Buffers (GPB) with very low and predictable latency. This has been observed in micro-benchmarks and real-world application use. A typical market data message can be encoded, or decoded, in ~25ns compared to ~1000ns for the same message with GPB on the same hardware. XML and FIX tag value messages are orders of magnitude slower again.
The sweet spot for SBE is as a codec for structured data that is mostly fixed size fields which are numbers, bitsets, enums, and arrays. While it does work for strings and blobs, many my find some of the restrictions a usability issue. These users would be better off with another codec more suited to string encoding.
Message Structure
A message must be capable of being read or written sequentially to preserve the streaming access design principle, i.e. with no need to backtrack. Some codecs insert location pointers for variable length fields, such as string types, that have to be indirected for access. This indirection comes at a cost of extra instructions plus losing the support of the hardware prefetchers. SBE's design allows for pure sequential access and copy-free native access semantics.
SBE messages have a common header that identifies the type and version of the message body to follow. The header is followed by the root fields of the message which are all fixed length with static offsets. The root fields are very similar to a struct in C. If the message is more complex then one or more repeating groups similar to the root block can follow. Repeating groups can nest other repeating group structures. Finally, variable length strings and blobs come at the end of the message. Fields may also be optional. The XML schema describing the SBE presentation can be found here.
SbeTool and the Compiler
To use SBE it is first necessary to define a schema for your messages. SBE provides a language independent type system supporting integers, floating point numbers, characters, arrays, constants, enums, bitsets, composites, grouped structures that repeat, and variable length strings and blobs.
A message schema can be input into the SbeTool and compiled to produce stubs in a range of languages, or to generate binary metadata suitable for decoding messages on-the-fly.
SbeTool and the compiler are written in Java. The tool can currently output stubs in Java, C++, and C#.
Programming with Stubs
A full example of messages defined in a schema with supporting code can be found here. The generated stubs follow a flyweight pattern with instances reused to avoid allocation. The stubs wrap a buffer at an offset and then read it sequentially and natively.
The generated code in all languages gives performance similar to casting a C struct over the memory.
On-The-Fly Decoding
The compiler produces an intermediate representation (IR) for the input XML message schema. This IR can be serialised in the SBE binary format to be used for later on-the-fly decoding of messages that have been stored. It is also useful for tools, such as a network sniffer, that will not have been compiled with the stubs. A full example of the IR being used can be found here.
Direct Buffers
SBE, via Agrona, provides an abstraction to Java, with the MutableDirectBuffer class, to work with buffers that are byte[], heap or direct ByteBuffer buffers, and off heap memory addresses returned from Unsafe.allocateMemory(long) or JNI. In low-latency applications, messages are often encoded/decoded in memory mapped files via MappedByteBuffer and thus can be be transferred to a network channel by the kernel thus avoiding user space copies.
C++ and C# have built-in support for direct memory access and do not require such an abstraction as the Java version does. A DirectBuffer abstraction was added for C# to support Endianess and encapsulate the unsafe pointer access.
Message Extension and Versioning
SBE schemas carry a version number that allows for message extension. A message can be extended by adding fields at the end of a block. Fields cannot be removed or reordered for backwards compatibility.
Extension fields must be optional otherwise a newer template reading an older message would not work. Templates carry metadata for min, max, null, timeunit, character encoding, etc., these are accessible via static (class level) methods on the stubs.
Byte Ordering and Alignment
The message schema allows for precise alignment of fields by specifying offsets. Fields are by default encoded in Little Endian form unless otherwise specified in a schema. For maximum performance native encoding with fields on word aligned boundaries should be used. The penalty for accessing non-aligned fields on some processors can be very significant. For alignment one must consider the framing protocol and buffer locations in memory.
Message Protocols
I often see people complain that a codec cannot support a particular presentation in a single message. However this is often possible to address with a protocol of messages. Protocols are a great way to split an interaction into its component parts, these parts are then often composable for many interactions between systems. For example, the IR implementation of schema metadata is more complex than can be supported by the structure of a single message. We encode IR by first sending a template message providing an overview, followed by a stream of messages, each encoding the tokens from the compiler IR. This allows for the design of a very fast OTF decoder which can be implemented as a threaded interpreter with much less branching than the typical switch based state machines.
Protocol design is an area that most developers don't seem to get an opportunity to learn. I feel this is a great loss. The fact that so many developers will call an "encoding" such as ASCII a "protocol" is very telling. The value of protocols is so obvious when one gets to work with a programmer like Todd who has spent his life successfully designing protocols.
Stub Performance
The stubs provide a significant performance advantage over the dynamic OTF decoding. For accessing primitive fields we believe the performance is reaching the limits of what is possible from a general purpose tool. The generated assembly code is very similar to what a compiler will generate for accessing a C struct, even from Java!
Regarding the general performance of the stubs, we have observed that C++ has a very marginal advantage over the Java which we believe is due to runtime inserted Safepoint checks. The C# version lags a little further behind due to its runtime not being as aggressive with inlining methods as the Java runtime. Stubs for all three languages are capable of encoding or decoding typical financial messages in tens of nanoseconds. This effectively makes the encoding and decoding of messages almost free for most applications relative to the rest of the application logic.
Feedback
This is the first version of SBE and we would welcome feedback. The reference implementation is constrained by the FIX community specification. It is possible to influence the specification but please don't expect pull requests to be accepted that significantly go against the specification. Support for Javascript, Python, Erlang, and other languages has been discussed and would be very welcome.
Update: 08-May-2014
Thanks to feedback from Kenton Varda, the creator of GPB, we were able to improve the benchmarks to get the best performance out of GPB. Below are the results for the changes to the Java benchmarks.
The C++ GPB examples on optimisation show approximately a doubling of throughput compared to initial results. It should be noted that you often have to do the opposite in Java with GPB compared to C++ to get performance improvements, such as allocate objects rather than reuse them.
Before GPB Optimisation:
In this crazy world we still see significant use of ASCII encoded presentations, such as FIX tag value, and some more slightly sane binary encoded presentations like FAST. Some markets even commit the sin of sending out market data as XML! Well I cannot complain too much as they have at times provided me a good income writing ultra fast XML parsers.
Last year the CME, who are a member the FIX community, commissioned Todd Montgomery, of 29West LBM fame, and myself to build the reference implementation of the new FIX Simple Binary Encoding (SBE) standard. SBE is a codec aimed at addressing the efficiency issues in low-latency trading, with a specific focus on market data. The CME, working within the FIX community, have done a great job of coming up with an encoding presentation that can be so efficient. Maybe a suitable atonement for the sins of past FIX tag value implementations. Todd and I worked on the Java and C++ implementation, and later we were helped on the .Net side by the amazing Olivier Deheurles at Adaptive. Working on a cool technical problem with such a team is a dream job.
SBE Overview
SBE is an OSI layer 6 presentation for encoding/decoding messages in binary format to support low-latency applications. Of the many applications I profile with performance issues, message encoding/decoding is often the most significant cost. I've seen many applications that spend significantly more CPU time parsing and transforming XML and JSON than executing business logic. SBE is designed to make this part of a system the most efficient it can be. SBE follows a number of design principles to achieve this goal. By adhering to these design principles sometimes means features available in other codecs will not being offered. For example, many codecs allow strings to be encoded at any field position in a message; SBE only allows variable length fields, such as strings, as fields grouped at the end of a message.
The SBE reference implementation consists of a compiler that takes a message schema as input and then generates language specific stubs. The stubs are used to directly encode and decode messages from buffers. The SBE tool can also generate a binary representation of the schema that can be used for the on-the-fly decoding of messages in a dynamic environment, such as for a log viewer or network sniffer.
The design principles drive the implementation of a codec that ensures messages are streamed through memory without backtracking, copying, or unnecessary allocation. Memory access patterns should not be underestimated in the design of a high-performance application. Low-latency systems in any language especially need to consider all allocation to avoid the resulting issues in reclamation. This applies for both managed runtime and native languages. SBE is totally allocation free in all three language implementations.
The end result of applying these design principles is a codec that has ~16-25 times greater throughput than Google Protocol Buffers (GPB) with very low and predictable latency. This has been observed in micro-benchmarks and real-world application use. A typical market data message can be encoded, or decoded, in ~25ns compared to ~1000ns for the same message with GPB on the same hardware. XML and FIX tag value messages are orders of magnitude slower again.
The sweet spot for SBE is as a codec for structured data that is mostly fixed size fields which are numbers, bitsets, enums, and arrays. While it does work for strings and blobs, many my find some of the restrictions a usability issue. These users would be better off with another codec more suited to string encoding.
Message Structure
A message must be capable of being read or written sequentially to preserve the streaming access design principle, i.e. with no need to backtrack. Some codecs insert location pointers for variable length fields, such as string types, that have to be indirected for access. This indirection comes at a cost of extra instructions plus losing the support of the hardware prefetchers. SBE's design allows for pure sequential access and copy-free native access semantics.
Figure 1 |
SbeTool and the Compiler
To use SBE it is first necessary to define a schema for your messages. SBE provides a language independent type system supporting integers, floating point numbers, characters, arrays, constants, enums, bitsets, composites, grouped structures that repeat, and variable length strings and blobs.
A message schema can be input into the SbeTool and compiled to produce stubs in a range of languages, or to generate binary metadata suitable for decoding messages on-the-fly.
java [-Doption=value] -jar sbe.jar <message-declarations-file.xml>
SbeTool and the compiler are written in Java. The tool can currently output stubs in Java, C++, and C#.
Programming with Stubs
A full example of messages defined in a schema with supporting code can be found here. The generated stubs follow a flyweight pattern with instances reused to avoid allocation. The stubs wrap a buffer at an offset and then read it sequentially and natively.
// Write the message header first MESSAGE_HEADER.wrap(directBuffer, bufferOffset, messageTemplateVersion) .blockLength(CAR.sbeBlockLength()) .templateId(CAR.sbeTemplateId()) .schemaId(CAR.sbeSchemaId()) .version(CAR.sbeSchemaVersion()); // Then write the body of the message car.wrapForEncode(directBuffer, bufferOffset) .serialNumber(1234) .modelYear(2013) .available(BooleanType.TRUE) .code(Model.A) .putVehicleCode(VEHICLE_CODE, srcOffset);Messages can be written via the generated stubs in a fluent manner. Each field appears as a generated pair of methods to encode and decode.
// Read the header and lookup the appropriate template to decode MESSAGE_HEADER.wrap(directBuffer, bufferOffset, messageTemplateVersion); final int templateId = MESSAGE_HEADER.templateId(); final int actingBlockLength = MESSAGE_HEADER.blockLength(); final int schemaId = MESSAGE_HEADER.schemaId(); final int actingVersion = MESSAGE_HEADER.version(); // Once the template is located then the fields can be decoded. car.wrapForDecode(directBuffer, bufferOffset, actingBlockLength, actingVersion); final StringBuilder sb = new StringBuilder(); sb.append("\ncar.templateId=").append(car.sbeTemplateId()); sb.append("\ncar.schemaId=").append(schemaId); sb.append("\ncar.schemaVersion=").append(car.sbeSchemaVersion()); sb.append("\ncar.serialNumber=").append(car.serialNumber()); sb.append("\ncar.modelYear=").append(car.modelYear()); sb.append("\ncar.available=").append(car.available()); sb.append("\ncar.code=").append(car.code());
The generated code in all languages gives performance similar to casting a C struct over the memory.
On-The-Fly Decoding
The compiler produces an intermediate representation (IR) for the input XML message schema. This IR can be serialised in the SBE binary format to be used for later on-the-fly decoding of messages that have been stored. It is also useful for tools, such as a network sniffer, that will not have been compiled with the stubs. A full example of the IR being used can be found here.
Direct Buffers
SBE, via Agrona, provides an abstraction to Java, with the MutableDirectBuffer class, to work with buffers that are byte[], heap or direct ByteBuffer buffers, and off heap memory addresses returned from Unsafe.allocateMemory(long) or JNI. In low-latency applications, messages are often encoded/decoded in memory mapped files via MappedByteBuffer and thus can be be transferred to a network channel by the kernel thus avoiding user space copies.
C++ and C# have built-in support for direct memory access and do not require such an abstraction as the Java version does. A DirectBuffer abstraction was added for C# to support Endianess and encapsulate the unsafe pointer access.
Message Extension and Versioning
SBE schemas carry a version number that allows for message extension. A message can be extended by adding fields at the end of a block. Fields cannot be removed or reordered for backwards compatibility.
Extension fields must be optional otherwise a newer template reading an older message would not work. Templates carry metadata for min, max, null, timeunit, character encoding, etc., these are accessible via static (class level) methods on the stubs.
Byte Ordering and Alignment
The message schema allows for precise alignment of fields by specifying offsets. Fields are by default encoded in Little Endian form unless otherwise specified in a schema. For maximum performance native encoding with fields on word aligned boundaries should be used. The penalty for accessing non-aligned fields on some processors can be very significant. For alignment one must consider the framing protocol and buffer locations in memory.
Message Protocols
I often see people complain that a codec cannot support a particular presentation in a single message. However this is often possible to address with a protocol of messages. Protocols are a great way to split an interaction into its component parts, these parts are then often composable for many interactions between systems. For example, the IR implementation of schema metadata is more complex than can be supported by the structure of a single message. We encode IR by first sending a template message providing an overview, followed by a stream of messages, each encoding the tokens from the compiler IR. This allows for the design of a very fast OTF decoder which can be implemented as a threaded interpreter with much less branching than the typical switch based state machines.
Protocol design is an area that most developers don't seem to get an opportunity to learn. I feel this is a great loss. The fact that so many developers will call an "encoding" such as ASCII a "protocol" is very telling. The value of protocols is so obvious when one gets to work with a programmer like Todd who has spent his life successfully designing protocols.
Stub Performance
The stubs provide a significant performance advantage over the dynamic OTF decoding. For accessing primitive fields we believe the performance is reaching the limits of what is possible from a general purpose tool. The generated assembly code is very similar to what a compiler will generate for accessing a C struct, even from Java!
Regarding the general performance of the stubs, we have observed that C++ has a very marginal advantage over the Java which we believe is due to runtime inserted Safepoint checks. The C# version lags a little further behind due to its runtime not being as aggressive with inlining methods as the Java runtime. Stubs for all three languages are capable of encoding or decoding typical financial messages in tens of nanoseconds. This effectively makes the encoding and decoding of messages almost free for most applications relative to the rest of the application logic.
Feedback
This is the first version of SBE and we would welcome feedback. The reference implementation is constrained by the FIX community specification. It is possible to influence the specification but please don't expect pull requests to be accepted that significantly go against the specification. Support for Javascript, Python, Erlang, and other languages has been discussed and would be very welcome.
Update: 08-May-2014
Thanks to feedback from Kenton Varda, the creator of GPB, we were able to improve the benchmarks to get the best performance out of GPB. Below are the results for the changes to the Java benchmarks.
The C++ GPB examples on optimisation show approximately a doubling of throughput compared to initial results. It should be noted that you often have to do the opposite in Java with GPB compared to C++ to get performance improvements, such as allocate objects rather than reuse them.
Before GPB Optimisation:
Mode Thr Cnt Sec Mean Mean error Units [exec] u.c.r.protobuf.CarBenchmark.testDecode thrpt 1 30 1 462.817 6.474 ops/ms [exec] u.c.r.protobuf.CarBenchmark.testEncode thrpt 1 30 1 326.018 2.972 ops/ms [exec] u.c.r.protobuf.MarketDataBenchmark.testDecode thrpt 1 30 1 1148.050 17.194 ops/ms [exec] u.c.r.protobuf.MarketDataBenchmark.testEncode thrpt 1 30 1 1242.252 12.248 ops/ms [exec] u.c.r.sbe.CarBenchmark.testDecode thrpt 1 30 1 10436.476 102.114 ops/ms [exec] u.c.r.sbe.CarBenchmark.testEncode thrpt 1 30 1 11657.190 65.168 ops/ms [exec] u.c.r.sbe.MarketDataBenchmark.testDecode thrpt 1 30 1 34078.646 261.775 ops/ms [exec] u.c.r.sbe.MarketDataBenchmark.testEncode thrpt 1 30 1 29193.600 443.638 ops/msAfter GPB Optimisation:
Mode Thr Cnt Sec Mean Mean error Units [exec] u.c.r.protobuf.CarBenchmark.testDecode thrpt 1 30 1 619.467 4.429 ops/ms [exec] u.c.r.protobuf.CarBenchmark.testEncode thrpt 1 30 1 433.711 10.364 ops/ms [exec] u.c.r.protobuf.MarketDataBenchmark.testDecode thrpt 1 30 1 2088.998 60.619 ops/ms [exec] u.c.r.protobuf.MarketDataBenchmark.testEncode thrpt 1 30 1 1316.123 19.816 ops/ms
Throughput msg/ms - Before GPB Optimisation | |||||
---|---|---|---|---|---|
Test | Protocol Buffers | SBE | Ratio | ||
Car Encode | 462.817 |
|
| ||
Car Decode | 326.018 |
|
| ||
Market Data Encode | 1148.050 |
|
| ||
Market Data Decode | 1242.252 |
|
|
Throughput msg/ms - After GPB Optimisation | |||||
---|---|---|---|---|---|
Test | Protocol Buffers | SBE | Ratio | ||
Car Encode | 619.467 |
|
| ||
Car Decode | 433.711 |
|
| ||
Market Data Encode | 2088.998 |
|
| ||
Market Data Decode | 1316.123 |
|
|
Martin, thank you for the article. Could you talk a bit more about this "We encode IR by first sending a template message providing an overview, followed by a stream of messages, each encoding the tokens from the compiler IR. This allows for the design of a very fast OTF decoder which can be implemented as a threaded interrupter with much less branching than the typical switch based state machines." Especially interested in the "threaded interrupter vs Switch based state machine" bit.
ReplyDeleteMight this be "threaded interpreter", which is an alternative to a switch-based interpreter? I always liked Forth, and apparently it can be more CPU-cache friendly (see stuff at http://www.complang.tuwien.ac.at/projects/interpreters.html)
ReplyDeleteRather than encode the IR tokens as a finger tree we encode them as a stream. This stream can then be feed into a parser that, even with a Java implementation, can be implemented without using a single big switch statement. Too much unpredictable branching can really hurt CPU throughput. Branching is OK provided it is mostly predictable based on past statistics. By using recursion in Java it is also possible to make the OTF decoder allocation free. Recursion in this case is safe because we only need to recurse into nested repeating groups.
ReplyDeleteDebug the following example to see the IR being used and the parser in action.
https://github.com/real-logic/simple-binary-encoding/blob/master/examples/java/uk/co/real_logic/sbe/examples/OtfExample.java
Thanks Martin. I looked at the project a bit more in detail and had a brief thread on the Cap'n Proto boards too. So even though you do mention it I think a fair comparison would highlight the difference in features especially compared to something like Cap'n Proto where a lot of the same principles are used. Two things especially stick out:
ReplyDeletei) No bounds checking in the CPP code as far as I can tell. This means that you probably only support trusted sources. Seems like you could perform heartbleed like attacks if you accept messages from the internet. Maybe the responsibility for these checks lies somewhere else?
ii) The sequential access requirement is a killer for some projects. You mention this very clearly and I understand that this is the norm for trading data, but some applications just can't live with this constraint. For example imagine I want to represent my objects using SBE in a replicated object database. One of the replicas gets a query for only a particular field (and this is unpredictable), I need to iterate every field just to satisfy that query. Further the CPP bindings at least don't prevent you from shooting yourself in the foot. You could easily call car.available() before car.modelYear() and it won't complain.
Cap'n Proto is a good project. There are many others. We just picked GBP as a comparison because of how commonly it is used to show people a difference. I could have chosen ASN.1 but not so many people know that.
ReplyDeletei) The bound checking reaction is fascinating in how people so misunderstand heartbleed and the like. Any codec could be used to window over a buffer from the network. However any externally sourced input should be validated, this is the crux of the problem.
I think a check similar to the Java and C# side should be added to help prevent people being silly, but this is not a security issue. If people need this protection for security then I'd not trust them with any other part of a secure app. If you take your thinking to its conclusion then char* is not allowed in C/C++.
ii) The sequential access is actually more flexible than I outlined. Best that you are totally sequential, but SBE can allow arbitrary access to any field within a given block. Think C structures and how it can move over memory. Each block has a C structure over it. If arbitrary access is required to fields across blocks then maybe you should be considering another codec and accept the costs that implementing that features requires.
Re (i) If I send some one a buffer saying it can be cast to struct foo {int length; char* data} and they blindly believe the length part and then send the data back to me when requested later - it is a problem. Of course it's their fault and they should have validated the data. In SBE's case since a separate part of the program (networking code) is allocating the buffer (char*) and knows it's length, it needs to be able to tell the decoding code (which is generated) to not exceed the bounds when returning data from the getters. The decoding logic needs to know the size of the buffer and ensure that it doesn't reach for something out of bounds. To your point about char*, it is a pain isn't it? C/C++ allow a lot of things including returning pointers to stack allocated data, doesn't mean it's a good idea.
ReplyDeleteRe (ii) I see, good to know. So you just prefer sequential access because of locality within a block, but don't require it. That seems pretty workable.
Re re i)
DeleteI'm not disagreeing that safety can be improved by bounds checking the access, and do believe it can be done efficiently in the C++ case without noticeable impact for this class of problem. I agree with a sensible level of bounds checking for writing robust and safe code. I also believe in native languages people should have a choice. For the record I think the default should be bounds checking on for SBE. However there is no guarantee that the code calling SBE will pass the correct length.
I just cannot accept that is an automatic security issue. There are a lot of responsibilities that come with native programming. Heartbleed, that everyone keeps quoting because it is what is front of mind, requires a service to take inputs from the network and return a range of memory without validating the inputs. Taking a packet off the network and reading its contents is a different thing. This is not blindly returning data to the network, or is it allowing an overrun on write that corrupts the stack. Unfortunately our biases lead us to always want to fight the last war rather than the big picture.
I think bounds checking in this case helps developers to not make stupid mistakes, like use an insufficiently large buffer when reading from the network, however it does not prevent them from creating security issues.
Martin, I am talking about the returning case and not the storing case. I am exaggerating here but this is the problem I am talking about:
Delete//Pseudo code.
char* data = malloc(someSize);
read(data, someSize); // All is fine till now.
Decoder* decoder = passDataToSBE(data, someSize); // SBE determines that the data is of the form struct Foo { int n; char* data } where n is more than someSize.
// Now the client demands to read the some byte (in a separate request) that it just wrote and the server depends on SBE for the bytes. Server does this:
write(decoder.getDataAtIndex(i), sizeOf(char));
// SBE returns any byte (as long as it is less than n?) , doesn't check if it goes beyond the boundary of the buffer (of someSize) it was supplied.
As I said I think bounds checking should be the default for C++ and it will help with this sort of scenario. However, and this is a big however, if this is the sort of programming people are doing when responding to network based requests then way more problems are about to come your way. Absolutely no checking at a semantic level of input to output parameters is beyond dumb.
DeleteJust think how brain dead it would be to create a method such as decoder.getDataAtIndex(i) is. The decoder could have checks but equally the buffer capacity could be passed in wrong by the idiot who wrote such a method. This is the level of craziness that caused Heartbleed. I totally get your point of the value of bounds checking and agree. Do you get mine that this level of discussion deflects from the real issues around designing secure code?
I totally get your point. My example was intentionally horrendous and I don't claim to say that code of that quality can be protected merely by SBE or other libraries including extensive bounds checking. The interface of someMethod(void* foo, int length) is error prone and like you pointed out some one could pass the wrong buffer capacity. For networking code in serious native projects, I end up using higher level classes similar to Netty's ByteBuf/Java ByteBuffer and wrap read/write calls in them so as minimize errors like passing the wrong buffer capacity. But that doesn't protect people completely either. Native code like you said is dangerous and even projects by experienced programmers need serious security audits, but my point (and you seem to agree) is that in this day and age bounds checking is a sensible default for most libraries. I do agree with you that bounds checking is a tree level strategy, and we need to look at the forest to understand how to write secure code.
DeleteThanks Rajiv. It is good to help make people aware of the wider issues surrounding security.
DeleteTodd has now added bounds checking to the C++ codec.
Hi. I just read your blog on SBE. I would like to know how this compares to those: https://github.com/eishay/jvm-serializers Since ProtoBuf isn't quite the "fastest contender out there", when it comes to Java serialisation.
ReplyDeleteThe last time I looked these benchmarks used POJOs which generate a lot of garbage and thus do not fit with the low-latency goals of SBE. If you know of a good low-latency targeted benchmark for encoding I'd love to give it a try.
DeleteWe know Protobuf is not the fastest. We just picked it for illustration because it is one of the most common. I know the likes of Kryo is much faster Protobuf. The Kryo folk have tried techniques from my blog in the past to great success.
OK. I do not know of another "well-know" benchmark about Java serialisation. I think Kryo is, or used to be, usually thought to be generally "the fastest". I think one can serialize manually with it, so I suspect a benchmark against "manual Kryo" would be "interesting".
DeleteAnyway, I had another question. I was wondering if the benefits achieved by designing to get the best out of the CPU would "port" to JavaScript? In other words, would a JavaScript version also show (significantly) lower latency then "popular" JS serialisation API, or is it only interesting for "compatibility"?
We have discussed doing a Javascript implementation of SBE. This would be possible given the support for typed binary arrays and be a great performance boost over websockets.
DeleteI think it would be great to have Javascript and Python support.
Actually fast-serialization is faster than kryo for many (most) test cases (depends on data) and is mostly compatible to the original JDK implementation.
DeleteI am currently adding a raw offheap/byte[] interface as the use of JDK streams is a performance killer. Its also possible to do zero-copy serialization then.
On my i7 laptop (2ghz) i get ~460K encode, ~480K decode/second (see https://github.com/RuedigerMoeller/fast-serialization/blob/master/src/test/ser/CarBench.java).
Actually by replacing String fields with byte[] arrays (as your bench does) should speed up this even further .. (this probably hurts protobuf also ...^^), however in a real world application i'd expect data coming in as "strings" rather than byte[] objects (except for very performance critical stuff).
"
static
{
MAKE = "MAKE".getBytes(Car.makeCharacterEncoding());
MODEL = "MODEL".getBytes(Car.modelCharacterEncoding());
ENG_MAN_CODE = "abc".getBytes(Engine.manufacturerCodeCharacterEncoding());
VEHICLE_CODE = "abcdef".getBytes(Car.vehicleCodeCharacterEncoding());
}
"
should be moved inside the benchmark imho.
An advantage of serialization is low-effort. Frequently there is no explicit data-to-message conversion as one may serialize application data structure without doing any inbetween transformation/processing. Vice versa, an application frequently works directly with received deserialized messages, rarely need for 'parsing'/transformation. It really shines for generic approaches (rmi alike stuff) and if complex interlinked data structures are transmitted (reference restoration).
Ofc the receiving side will have to allocate serialized objects, so its probably not well suited for the ultra-low jitter/latency arena.
Regarding a port to JS: I'd expect JS-world to be "convenience first", so somewhat handcrafted approaches probably would not get too much love (at least from the front end dev's perspective ;) ).
follow up: replacing String by byte[] + replaing enums by byte yields:
Deleteencoding: ~530K/s, decoding ~450k/s
same machine proto/sbe:
[java] Benchmark Mode Samples Score Score error Units
[java] u.c.r.protobuf.CarBenchmark.testDecode thrpt 30 464020,055 5002,836 ops/s
[java] u.c.r.protobuf.CarBenchmark.testEncode thrpt 30 315095,113 5463,022 ops/s
[java] u.c.r.protobuf.MarketDataBenchmark.testDecode thrpt 30 1730058,470 29177,252 ops/s
[java] u.c.r.protobuf.MarketDataBenchmark.testEncode thrpt 30 1233034,889 55036,042 ops/s
[java] u.c.r.sbe.CarBenchmark.testDecode thrpt 30 7190887,194 185086,616 ops/s
[java] u.c.r.sbe.CarBenchmark.testEncode thrpt 30 7517854,307 192425,267 ops/s
[java] u.c.r.sbe.MarketDataBenchmark.testDecode thrpt 30 26663111,435 850010,273 ops/s
[java] u.c.r.sbe.MarketDataBenchmark.testEncode thrpt 30 23562488,790 658661,182 ops/s
Any example of decoding a byte array? I'm listening to the instrument feed from CME. I've tried the following, but templateId is always 0?
ReplyDeleteByteBuffer encodedMsgBuffer = ByteBuffer.wrap(data, 0, data.length);
encodedMsgBuffer.order(ByteOrder.BIG_ENDIAN);
DirectBuffer buffer = new DirectBuffer(encodedMsgBuffer)
Any help will be much appreciated...
The protocol is LITTLE_ENDIAN by default.
DeleteYou should reach out to your support contact at the CME. I cannot provide support for their data feed.
I've tried LITTLE_ENDIAN with the same result. Any ideas?
DeleteThis is far too little information to know what the issue might be. Can you please contact your customer support at the CME.
DeleteCan I provide my complete source code then? I've also contacted CME but apparently they do not support language specific implementations.
DeleteI do not work for the CME or provide support for their services.
DeleteIs SBE compatible with java 8?
ReplyDeleteYes. SBE does work with Java 8.
DeleteThank you for your reply. I notice SBE makes uses of java Unsafe package. I see all those comments that mention it could be changed from version to version. What do you think about these compatibility issue?
DeleteBTW, do you have any version of SBE that supports java 1.6?
Thank you
A significant number of projects now use Unsafe, plus the core java.util.concurrent classes. There are plans to find an alternative for Java 9. SBE can look to support that alternative at that time.
DeleteSBE is designed for low-latency and we do not intend to support Java 6. If someone wanted to build a low-latency application, then one of the first things to do is upgrade to Java 7 or 8 for the performance benefits it brings, and not stay on an old unsupported version.
Martin, Is it possible to access fields by offset without reading all fields in the message? It seems like the functionality is missing from the first look to a generated Java code. Thanks for considering a new feature (if it is missing)!
ReplyDeleteIt was a design decision to stream through messages so the case of reading all fields is fast. Complex messages allowing arbitrary access would require a dictionary.
DeleteBTW you can access in any order and only the fields you want within any block, where block is root fields or each iteration of a repeating group.
Martin, Would you be open to a patch to offer the inclusion of being able to have default values for fields? The reason to add them would be to introduce the concept of forward compatibility like with avro. Any thoughts on supporting an official set of resolution rules like AVRO?
ReplyDeleteReferences:
- http://martin.kleppmann.com/2012/12/05/schema-evolution-in-avro-protocol-buffers-thrift.html
- http://docs.confluent.io/1.0.1/avro.html#serialization-and-evolution
- http://avro.apache.org/docs/1.7.7/spec.html#Schema+Resolution
We would happy consider pull requests to SBE.
DeleteHi Martin, do you remember the details of the machine the benchmarks were run on + were they using all cores or just one?
ReplyDeleteSingle core on an Ivy Bridge 2.2 GHz Processor (i7-3632QM)
DeleteHow does sbe compare with asn1 ber and der?
ReplyDeleteI have a sbe project code that have build successfully ,but cannot find c++ code。
ReplyDeleteThere are lot java class that output in
\simple-binary-encoding-master\sbe-benchmarks\build\generated\uk\co\real_logic\sbe\benchmarks\fix.
In the java class,import uk.co.real_logic.agrona.concurrent.UnsafeBuffer.They are java style.
If my project is c++ project,how can I build sbe project to out c++ class ,so I can use it in my c++ project.
Look at the bottom of this page for instructions.
Deletehttps://github.com/real-logic/simple-binary-encoding
Then find the binaries under
./cppbuild
well written, very interesting, too bad it does not reflect the javamare the generator code is, as well horror of the code it produces for c++(in terms of readablility, performance , maintanability, bloat). My manager unfortunately is sold on this peace of shit, so I am stack supporting it. Simple example: the so called token builder, whotf wrote that, it is impossible to troubleshoot. don't belive me - try to find(under 5 minutes) existing bug in the code where individual values for an enum loose their description attribute values, while being read from a xml config
ReplyDeleteIf you have some concrete examples of bugs then please file them as issues. https://github.com/real-logic/simple-binary-encoding/issues
Delete1. Do I have to maintain separate message schema for Little Endian servers and Big Endian servers?
ReplyDelete2. When I use SBE with Aeron, is it always required to run on a Little Endian server as it said on design assumptions? (https://github.com/real-logic/aeron/wiki/Protocol-Specification#design-assumptions)
1. You do not need separate schemas.
Delete2. Aeron and SBE can run on little or big endian CPUs and do the necessary conversions. Many existing network protocols assumed big endian but this is no longer the most common platform. Things evolve.
I don't really understand the point of designing SBE when ITCH must be much faster and simpler to decode?
ReplyDeleteSBE is more general purpose than ITCH. ITCH was designed for a very specific use case.
DeleteHow does SBE compare to FAST?
ReplyDelete