“The price of reliability is the pursuit of the utmost simplicity. It is a price which the very rich may find hard to pay.”
--C.A.R. Hoare
Hardware fails. How to respond to a failure? One approach is build redundancy into
the system. For example, NoSQL databases are designed with large amount of
redundancy. In the rare case of a hardware failure, the system will continue
operating from a different server. Since redundancy requires constant information
exchange between the components, the performance of the overall system
significantly degrades.
Therefore, at Sokol Systems we chose a very different approach.
We subscribe to a philosophy that a simple architecture, co-located components, and a high quality code will prevent many of the system crashes in the first place. And in the unlikely event of a crash, a good system needs two things - a very short restart time and restoration of its state. In case of a failure, Sokol Systems' Asset Record engine will restart in under 2 minutes, and replay a day worth of activity (assuming 80,000 business transactions per second) in under 5 minutes.
To summarize, at Sokol Systems we built our resiliency strategy on the following fundamentals: