Showing posts with label DDD. Show all posts
Showing posts with label DDD. Show all posts

Friday, 2 September 2011

Modelling Is Everything

I’m often asked, “What is the best way to learn about building high-performance systems”? There are many perfectly valid answers to this question but there is one thing that stands out for me above everything else, and that is modelling. Modelling what you need to implement is the most important and effective step in the process. I’d go further and say this principle applies to any development and the rest is just typing :-)

Domain Driven Design (DDD) advocates modelling the domain and expressing this model in code as fundamental to the successful delivery and ongoing maintenance of software. I wholeheartedly agree with this. How often do we see code that is an approximation of the problem domain? Code that exhibits behaviour which approximates to what is required via inappropriate abstractions and mappings which just about cope. Those mappings between what is in the code and the real domain are only contained in the developers’ heads and this is just not good enough.

When requiring high-performance, code for parts of the system often have to model what is happening with the CPU, memory, storage sub-systems, or network sub-systems. When we have imperfect abstractions on top of these domains, performance can be very adversely affected. The goal of my “Mechanical Sympathy” blog is to peek at what is under the hood so we can improve our abstractions.

What is a Model?

A model does not need to be the result of a 3-year exercise producing UML. It can be, and often is best as, people communicating via various means including speech, drawings, illustrations, metaphors, analogies, etc, to build a mental model for shared understanding. If an accurate and distilled understanding can be reached then this model can be turned into code with great results.

Infrastructure Domain Models

If developers writing a concurrent framework do not have a good model of how a typical cache sub-system works, i.e. it uses message passing to exchange cache lines, then the framework is unlikely to perform well or be correct. If their code drives the cache sub-system with mechanical sympathy and understanding, it is less likely to have bugs and more likely to perform well.

It is much easier to predict performance from a sound model when coming from an understanding of the infrastructure for the underlying platform and its published abilities. For example, if you know how many packets per second a network sub-system can handle, and the size of its transfer unit, then it is easy to extrapolate expected bandwidth. With this model based understanding we can test our code for expectations with confidence.

I’ve fixed many performance issues whereby a framework treated a storage sub-system as stream-based when it is really a block-based model. If you update part of a file on disk, the block to be updated must be read, the changes applied, and the results written back. Now if you know the system is block based and the boundaries of the blocks, you can write whole blocks back without incurring the read, modify, write back cycle replacing these actions with a single write. This applies even when appending to a file as the last block is likely to have been partially written previously.

Business Domain Models

The same thinking should be applied to the models we construct for the business domain. If a business process is modelled accurately, then the software will not surprise its end users. When we draw up a model it is important to describe the relationships for cardinality and the characteristics by which they will be traversed. This understanding will guide the selection of data structures to those best suited for implementing the relationships. I often see people use a list for a relationship which is mostly searched by key, for this case a map could be more appropriate. Are the entities at the other end of a relationship ordered? A tree or skiplist implementation may be a better option.

Identity

Identity of entities in a model is so important. All models have to be entered in some way, and this normally starts with an entity from which to walk. That entity could be “Customer” by customer ID but could equally be “DiskBlock” by filename and offset in an infrastructure domain. The identity of each entity in the system needs to be clear so the model can be accessed efficiently. If for each interaction with a model we waste precious cycles trying to find our entity as a starting point, then other optimisations can become almost irrelevant. Make identity explicit in your model and, if necessary, index entities by their identity so you can efficiently enter the model for each interaction.

Refine as we learn

It is also important to keep refining a model as we learn. If the model grows as a series of extensions without refining and distilling, then we end up with a spaghetti mess that is very difficult to manage when trying to achieve predictable performance. Never mind how difficult it is to maintain and support. Everyday we learn new things. Reflect this in the model and keep it up to date.

Implement no more, but also no less, than what is needed!

The fastest code is code that does just what is needed and no more. Perform the instructions to complete the task and no more. Really fast code is normally not a weird mess of bit-shifting and complier tricks. It is best to start with something clean and elegant. Then measure to see if you are within performance targets. So often this will be sufficient. Sometimes performance will be a surprise. You then need to apply science to test and measure before jumping to conclusions. A profiler will often tell you where the time is being taken. Once the basic modelling mistakes and assumptions have been corrected, it usually takes just a little mechanical sympathy to reach the performance goal. Unused code is waste. Try not to create it. If you happen to create some, then remove it from your codebase as soon as you notice it.

Conclusion

When cross-functional requirements, such as performance and availability, are critical to success, I’ve found the most important thing is to get the model correct for the domain at all levels. That is, take the principles of DDD and make sure your code is an appropriate reflection of each domain. Be that the domain of business applications, or the domain of interactions with infrastructure, I’ve found modelling is everything.

Saturday, 20 August 2011

Code Refurbishment

Within our industry we use a huge range of terminology.  Unfortunately we don’t all agree on what individual terms actually mean.  I so often hear people misuse the term “Refactoring” which has come to make the business in many organisations recoil in fear.  The reason for this fear I’ve observed is because of what people often mean when misusing this term.

I feel we are holding back our industry by not being disciplined in our use of terminology.  If one chemist said to another chemist “we are about to perform titration”, both would have a good idea what is involved.  I believe computing is still a very immature science.  As our subject matures hopefully we will become more precise and disciplined in our use of terminology and thus make our communication more accurate and effective.

Refactoring is a very useful technique for improving code quality and clarity.  To be precise it is a behaviour preserving change that improves a code base for future maintenance and understanding.  A good example would be extracting a method to remove code duplication and applying this method at every site of the duplication, thus removing the duplication.  Refactoring was first discussed in the early 1990s and became mainstream after Martin Fowler’s excellent “Refactoring” book in 1999.

Refactoring involves making a number of small internal changes to the code structure.  These changes will typically not have any external impact.  Well written unit tests that just assert externally observable behaviour will not change when code is refactored.  If the external behaviour of code is changing when the structure is being changed then this is not refactoring.

Now, why do our business folk recoil in fear when this simple and useful technique of “refactoring” is mentioned?  I believe this is because developers are actually talking about a much more extensive structural redevelopment technique that does not have a common term.  These structural changes are often not a complete ground-up rewrite because much of the existing code will be reused.  The reason the business folk have come to recoil is that they fear we are about to head off into uncharted waters with no idea of how long things will take and if any value will come out of the exercise.

This example of significant structural change reminds me of when a bar or restaurant gets taken over by new management.  The new management often undertake a refurbishment exercise to make the place more appealing and suitable for the customers they are targeting.  A lot of the building will be preserved and reused thus greatly reducing the costs of a complete rebuild.  In my experience when developers use the term “refactoring” what they really mean is that some module, or bounded context, in a code base is about to undergo significant refurbishment.  If we define this term, and agree the goal and value to the business, we may be able to better plan and manage our projects.

These code refurbishment exercises should have clear goals defined at the outset and all change must be tested against these goals.   For example, we may have discovered that code is not a true reflection of the business domain after new insights.  These insights may have been gleaned over a period of time and the code has grown out of step to become an approximation of what the business requires.  While performing Domain Driven Design the penny may drop with the essence of the business model becoming clear.  After this clarity of understanding the code may need a major overhaul to align it with this new understanding of the business.  Code can also drift from being a distilled model of the business domain if quick hacks are put in place to meet a deadline.  Over time these hacks can build on each other until the model no longer describes the business, it just about makes itself useful by side effect.  During this exercise our tests are likely to see significant change as we tighten up the specification for our new improved understanding of the business domain.

A code refurbishment is worthwhile to correct the core domain if it's about to undergo significant further development, or if a module is business critical and needs to be occasionally corrected under production pressure to preserve revenue generation.

I’m interested to know if other folk have observed similar developments and if you think refinement of this concept would be valuable?