Resiliency

Image

“The price of reliability is the pursuit of the utmost simplicity. It is a price which the very rich may find hard to pay.”

--C.A.R. Hoare

Hardware fails. How to respond to a failure? One approach is build redundancy into the system. For example, NoSQL databases are designed with large amount of redundancy. In the rare case of a hardware failure, the system will continue operating from a different server. Since redundancy requires constant information exchange between the components, the performance of the overall system significantly degrades.

Therefore, at Sokol Systems we chose a very different approach.

We subscribe to a philosophy that a simple architecture, co-located components, and a high quality code will prevent many of the system crashes in the first place. And in the unlikely event of a crash, a good system needs two things - a very short restart time and restoration of its state. In case of a failure, Sokol Systems' Asset Record engine will restart in under 2 minutes, and replay a day worth of activity (assuming 80,000 business transactions per second) in under 5 minutes.

To summarize, at Sokol Systems we built our resiliency strategy on the following fundamentals:

  • Strive for the architectural simplicity.
  • Build only upon the foundation of the open source, robust components and frameworks.
  • Guarantee zero loss on all captured events.
  • Provide close to instantaneous recoverability in case of an unlikely system failure.
  • Great attention to quality, including good design principals and complete test automation.
  • Provide reliable replication (optional).