Architecture Weekly Issue #17. Articles, books, and playlists on architecture and related topics. Every record has the complexity indication: 🤟 means hardcore, 👷‍♂️ is technically applicable right away,  🍼 - introduction to the topic or an overview. Now in telegram as well.

WARNING 🇺🇦

It's already a two month and a half of crazy, inhuman, unjustified war of Russia against Ukraine. We condemn this war and want it to stop ASAP. We continue this newsletter so you can advance your skill and help the millions of Ukrainian people in any way possible.

Project Loom to build more reliable distributed systems 👷‍♂️

A story by James Baker on what is the FoundationDB approach to testing through virtualizing everything to have deterministic tests of a distributed system and how Java's Loom can help to build the same for testing your distributed systems. In the end, there's an example of homemade Raft implementation.

Using Java’s Project Loom to build more reliable distributed systems · James Baker

Building AdWords like system in reality 👷‍♂️

This is a chapter from a Site Reliability Engineering workbook. The chapter shows an example of Non Abstract Large System Design. Basically, the folks layout the domain model, make the estimations, and try to create solution for a single machine. Later they discover it is not gonna fit it(surprise!) and then evolve the solution to be at a full scale.

Google - Site Reliability Engineering

Delta: A highly available, strongly consistent storage service using chain replication 🤟

Delta is a new object storage service by Meta. It puts simplicity and reliability in the first place. Delta achieves strong consistency with chain replication: responds to the client only when all the serves in the chain have persisted update. Comparison with quorum replications is inside. The article also describes a few optimizations. For example, reading from all servers in the chain instead of the only tail.

Delta: A highly available, strongly consistent storage service using chain replication
Delta leverages chain replication as an approach to coordinate clusters of fail-stop storage servers.

Securing Kafka Infrastructure at Uber 👷‍♂️

Uber Engineering blog is a good source of articles for our newsletter. This time they shared how they secure their one of the biggest Kafka installation in the world. Starting with authentication and authorization terms from a broker perspective, and ending up with the sequence and components diagrams despising the security flows and infrastructure, this article becomes a good continuation of  Kafka-related posts of the newsletter.

Securing Kafka® Infrastructure at Uber
Background Uber has one of the largest deployments of Apache Kafka® in the world. It empowers a large number of real-time workflows at Uber, including pub-sub message buses for passing event data from the rider and driver apps, as well as financial transaction events between the backend services. As…

The Adventures of Rendezvous in Heroku’s New Architecture 👷‍♂️

Changing architecture with migration to a platform can lead to unpredictable errors. Here is a detective story from the Heroku team: Finding the reason for Timeouts. The main character is Rendezvous - a bidirectional proxy that connects clients to dynos through NAT. The story starts with replacing EIP with NLB. And it leads to customer reports. But where's the problem?

The Adventures of Rendezvous in Heroku’s New Architecture
A story of how we encountered a critical error for a customer use-case and the root cause analysis that followed.

What makes a Good Software Architect 🍼

The folks from Software Engineering Institute discuss different aspects of being an Architect in the end of 20ties. The talk is about basing decisions on what type of Database you want to go with, or how to keep up with learning the new technologies or how to see Functions as a Service from a tradeoff viewpoint including cost, vendor lock versus having as little code as required, and many more other questions.  

State of Mobile App Security 🍼

Good introduction in what makes the modern mobile applications (in)secure. Starts with stating the user landscape, amount of data shared and finishing up with platform capabilities and trust issues.

State of Mobile App Security Here | Licel
What is the current state of mobile app security? In this visual report we explore the ways mobile device usage habits have evolved and what those changes mean for security.

Dealing with long-running jobs using Apache Kafka 👷‍♂️

Long-running jobs are tricky for Kafka. But if you don't want to introduce a new component like a workflow engine, look at the options in this article. For example, pause the consumer and keep polling as the heartbeat. Looks like a great hack.

Evolution of ML Fact Store 🍼

Netflix shares the evolution of ML Fact Store - Axion. It is interesting how they admitted premature optimization, simplified logging client, and focused on query optimization. A combination of Iceberg and EvCache helps for the last purpose. There are so many internal open-sourced components in Netflix... Also, they keep special attention to monitoring the quality of data.

Data Mesh Principles and Logical Architecture 🍼

Data Mesh is the DDD answer for analytics. Look at this solution if you have already had pain with Data Wharehouse or Data Lake. There are no ready tools in this article, but there is a detailed description of key principles of Data Mesh: Domain Ownership, Data as a Product, Self-serve data platform, and Federated computational governance. Also, look at Data Mesh book by Zhamak Dehghani.

Data Mesh Principles and Logical Architecture
Four principles that drive a logical architecture for a data mesh.

Brought to you by Vladimir @vvsevolodovich Ivanov and Ilya @puzan Zonov