Architecture Weekly Issue #66. Articles, books, and playlists on architecture and related topics. Split by sections, highlighted with complexity: 🤟 means hardcore, 👷‍♂️ is technically applicable right away,  🍼 - is an introduction to the topic or an overview. Now in telegram as well.

WARNING 🇺🇦

It's already been a year since Russia's crazy, brutal and unjustified war against Ukraine. We condemn this war and want it to stop ASAP. We continue this newsletter so you can advance your skill and help the millions of Ukrainian people in any way possible. If you want to help directly, visit this fund.

This week I conducted an interview with Anton Malinskiy - an author of Marathon Test Runner and co-founder of Marathon Labs. We discussed Mobile Testing, the challenges behind it and how Marathon helps to resolve them. Watch the video!

Big thanks to Nikita, Anatoly, Oleksandr, Dima, Pavel B, Pavel, Robert, Roman, Iyri, Andrey, Lidia, Vladimir, August, Roman, Egor Roman and Evgeniy for supporting the newsletter. They receive early access to the articles, influence the content and participate in the closed group where we discuss the architecture problems. They also see my daily updates on all the things I am working on. Join them at Patreon or Boosty!  

Highlights

Reducing cost by 90% by rewriting microservices to a monolith 👷‍♂️

Yeah, you read it right! Prime Video - a video streaming product of Amazon with their own technical blog - dropped a piece which exploded over all the technical communities I am participating. Twitter, work slacks, telegram groups - all are referencing this article. So what's the hype? Prime Video has a service called Video Quality Analysis. It is supposed to identify any problems with video streaming and report it for further fix and investigation. The initial architecture leveraged Amazon Lambdas and Step Functions, but most importantly it was distributed by nature, which caused the usage of an S3 bucket for data sharing between the microservices. Apparently, it is very costly! So after some consideration, the team decided to move to a monolith. Find out the details of that story below, and remember, that on our YouTube channel, we kinda told you.

Scaling up the Prime Video audio/video monitoring service and reducing costs by 90%
The move from a distributed microservices architecture to a monolith application helped achieve higher scale, resilience, and reduce costs.

#refactoring #microservice #distributedsystem

Real-time Messaging at Slack 👷‍♂️

Slack handles tens of millions of simultaneously connected clients every second and manages to deliver any message under 500 ms all over the world. They built a pretty sophisticated system consisting of Channel Servers, Edge Proxies, Gateway servers and Web Apps. They posted a good article with the explanations of those in the technical blog, grab the read!

Real-time Messaging - Slack Engineering
Did you know that ground stations transmit signals to satellites 22,236 miles above the equator in geostationary orbits, and that those signals are then beamed down to the entire North American subcontinent? Satellite radios today serve hundreds of channels across 9,540,000 square miles. Unless you’…

#highload #architecture #casestudy

Secure Search Over Encrypted Data 👷‍♂️

The common understanding is that once you encrypted the data, the only way you can do any operations over it, like modification or search, is possible only by decrypting the data first. It leads to a bunch of problems like key management, exposing the plane data to untrusty agents and others. But with the development of homomorphic encryption, you can at least search over encrypted data just find. Our friends from Cossack Labs share the article explaining the hustle.  

Secure Search Over Encrypted Data | Cossack Labs
What is searchable encryption and how to perform secure search over encrypted data.

#encryption #security

Follow Up

Deterministic Simulation: A New Era of Distributed Testing 🤟

Ensuring the correctness of distributed system is hard. Some people tend to use formal verification, while others seek to test all the possible cases. Both approaches are hard. However, deterministic simulation can be a combination of both tactics - and a very powerful one. Find an article on deterministic simulation engine and what it takes to simulate the distributed systems behavior.  

Deterministic Simulation: A New Era of Distributed System Testing (Part 1)
In this article, we will discuss the background and principles of deterministic simulation, introduce our deterministic testing framework Madsim, and share our experience applying deterministic testing to RisingWave.

#distributedsystem

Make Architecture Reviews like Peer Reviews 🍼

Architecture reviews, or committees to be more precise, have the bad reputation of slowing down initiatives with useless templates and discussions. While taking decisions in a silo with a high degree of autonomy is satisfying, it has a high probability of missing critical information that leads to costly reworks afterwards. So the question here is how to ensure the appropriate aligned architecture while not compromising on quality. Find out in the article below. To my taste, it is a bit of an overkill, but can work well even for a small org after an adoption.  

How To Make Architecture Reviews That Feel Like Peer Reviews
It’s about making the right decision, fast.

#architecture #adr #documentation

Kubernetes Security Part 1 - Security Context 👷‍♂️

Kubernetes runs a major part of the work payloads nowadays. And we need to run those securely. I am sharing a very deep detailed guide on adding security context to the container we run there alongside with scanning docker images, configuring network policies, implementing RBAC model and many more!

Kubernetes Security Part 1 - Security Contexts

#security #kubernetes #k8s

The API. The Book 👷‍♂️

My colleague - Sergey Konstantinov - wrote an online book on API-first development principles covering a vast spectrum of topics from authentication and authorization, API Design, Backward Compatibility and API as a product. Start reading while the additional parts of the second edition is being written now!

Sergey Konstantinov. The API
API-first development is one of the hottest technical topics nowadays since many companies started to realize that API serves as a multiplicator to their opportunities—but it also amplifies the design mistakes as well. This book is written to share the expertise and describe the best practices in de…

#api #apidesign

System Design Blue Print 🍼

I promise this is the last system design blueprint or an ultimate guide or you name it. But the folks asking for a consultation are always asking what should I at least be aware of to be ready for a systems design interview... Such articles help. However, they don't give you much detail - rather an overview.  

System Design Blueprint: The Ultimate Guide
Developing a robust, scalable, and efficient system can be daunting. However, understanding the key concepts and components can make the…

#systemdesign

Agility and Architecture 🍼

The talk about how we combine the architecture work with the agile iterative approach is long and controversial. Somebody say, make the big upfront design, others insist of postponing all the decisions to the last possible moment. I am sharing a new article on InfoQ, which explores those takes and explains for example that there are no such "last possible moments" in software development and you rather have some Minimal Viable Architecture, which you can iterate on.

Agility and Architecture: Balancing Minimum Viable Product and Minimum Viable Architecture
Software architecture and agility are often portrayed as incompatible. In reality, they are mutually reinforcing - a sound architecture helps teams build better solutions in a series of short intervals, and gradually evolving a system’s architecture helps by validating and improving it over time.

Next Thursday, I am discussing Disaster Recovery with Misha Druzhinin. Join the live stream!