Architecture Weekly Issue #33. Articles, books, and playlists on architecture and related topics. Every record has the complexity indication: 🤟 means hardcore, 👷‍♂️ is technically applicable right away,  🍼 - introduction to the topic or an overview. Now in telegram as well.

WARNING 🇺🇦

It's already been 193 days since the crazy, brutal, unjustified war of Russia against Ukraine. We condemn this war and want it to stop ASAP. We continue this newsletter so you can advance your skill and help the millions of Ukrainian people in any way possible.

How Discord Stores Billions of Messages 👷‍♂️

Amazing post from the CTO of Discord on storing the messages. You will find a business problem statement, inherited situation, articulated requirements and a detailed solution in the article. A true example of an architectural blog post. Deserves a first line in the newsletter issue!

How Discord Stores Billions of Messages

Write-ahead logging and the ARIES crash recovery 🤟

As we learned from Designing Data-Intensive Applications, databases guarantee to either write the data durably or not writing at all is crucial in software systems. But underneath the claim, there is a complex process with its tradeoffs of complexity and performance. Kevin Sookocheff wrote a long, detailed post with transaction examples, explaining how it all works under the hood.

Write-ahead logging and the ARIES crash recovery algorithm
How do databases recover from failure?

A Data Movement and Processing Platform @ Netflix 👷‍♂️

Netflix defined Data Mesh as a fully managed solution for Change Data Capturing. But their understanding evolved to getting data from any data source, not only databases. This change required the new architecture for Data Mesh - and this is what the article by Netflix is all about.

Data Mesh — A Data Movement and Processing Platform @ Netflix
By Bo Lei, Guilherme Pires, James Shao, Kasturi Chatterjee, Sujay Jain, Vlad Sydorenko

MySQL to MyRocks Migration in Uber's Distributed Datastores 👷‍♂️

With a scale of petabytes of data, disk storage efficiency for the databases became a bottleneck for Uber. The tricks of InnoDB - the default DB engine of MySQL - didn't help much, so they switched to RocksDB. But how do you migrate everything to a new engine? Follow the journey in the post of the Uber Blog.

MySQL to MyRocks Migration in Uber’s Distributed Datastores
Uber uses MySQL as the underlying database engine for Schemaless and Docstore, our distributed databases. By default, MySQL uses the most popular InnoDB engine, a B+Tree structure for data storage. MyRocks is a MySQL storage engine that integrates with RocksDB, an open source project. The RocksDB st…

Understanding the Raft consensus algorithm 🤟

Consensus is required in distributed systems for many reasons including electing a leader and synchronizing replicas. Raft is one of the consensus algorithms. Find an article by Shubheksha Jalan, where the ideas are described and illustrated very well.

Understanding the Raft consensus algorithm: an academic article summary
This post summarizes the Raft consensus algorithm presented in the paper In Search of An Understandable Consensus Algorithm [https://www.usenix.org/system/files/conference/atc14/atc14-paper-ongaro.pdf] by Diego Ongaro and John Ousterhout. All pull quotes are taken from that paper. Credit [https://gi…

Combining cloud CI/CD tools with external cloud mobile app protection services 🍼

Cloud CI/CD is a convenient tool to build and ship mobile applications. However, they require protection from various  threats while delivering and running on a mobile device.  For this, there is a separate set of cloud app protection services. But how risky is it to combine them? Looks like the danger is real. Find out the details in my new article.

Is it safe to combine CI/CD tools with external cloud mobile app protection solutions?
Cloud CI/CD for mobile development is a popular solution. But developers need to protect their apps, and they use cloud app protection services as well. It is safe to combine them?

Sagas to maintain data consistency in a microservice architecture 👷‍♂️

Sagas are one way to make distributed transactions. The tricky part is how to keep the data consistent across services and replicas, as each service typically use its own data storage. Chris Richardson discusses the pros and cons of the sagas in the video from Devoxx UK conference.

Patterns for Resilient Architecture 👷‍♂️

We all expect our systems to grow over time. At some particular scale, the failures of servers, software, and processes are inevitable. So instead of trying to avoid them, you embrace them - and become resilient. How to think about resiliency and what patterns to apply - read in the series of Adrian Hornsby.

Patterns for Resilient Architecture — Part 1
The story of embracing failure at scale

Observability Engineering 🍼

SRE Book by Google tells you that you need to monitor against errors, traffic and resource consumption. But does it really tell you anything business-wise? Three authors of Observability Engineering book discuss, what's the difference between monitoring and understanding what your system goes through. If you like this video, consider reading the book as well.  

Music streaming reference architecture 🍼

Short and fun article on how you could build a music streaming application like Spotify. Not sure about the solution itself, but the requirements list is rather thorough though.

Spotify System Architecture
Please clap and share if you like this article

This newsletter is hosted on GCP and uses Mailgun to send emails. The cost is ~$25 per month. Liked it? Consider helping to run this newsletter at Patreon. Big thanks to Nikita, Anatoly, Oleksandr and Dima for already supporting it.