Architecture Weekly Issue #68. Articles, books, and playlists on architecture and related topics. Split by sections, highlighted with complexity: 🤟 means hardcore, 👷‍♂️ is technically applicable right away,  🍼 - is an introduction to the topic or an overview. Now in telegram as well.

WARNING 🇺🇦

It's already been a year since Russia's crazy, brutal and unjustified war against Ukraine. We condemn this war and want it to stop ASAP. We continue this newsletter so you can advance your skill and help the millions of Ukrainian people in any way possible. If you want to help directly, visit this fund.

This week I held the interview with Vitaly Sharovatov. We discussed the team dynamics, what managers can do to improve team performance and back it all up with scientific papers! Watch it here:

Big thanks to Nikita, Anatoly, Oleksandr, Dima, Pavel B, Pavel, Robert, Roman, Iyri, Andrey, Lidia, Vladimir, August, Roman, Egor, Roman, Evgeniy and Nadia for supporting the newsletter. They receive early access to the articles, influence the content and participate in the closed group where we discuss the architecture problems. They also see my daily updates on all the things I am working on. Join them at Patreon or Boosty!  

Highlights

Datadog long-awaited postmortem 👷‍♂️

Datadog had a 24 hours long outage on March, 8th. Datadog being an observability company was kinda expected to publish the postmortem soon enough, but 2 months later there was nothing published. Some researchers ever tried to write their own version, but luckily the company decided to publish the PM themselves. Read a fascinating story on how Linux upgrades can get you down even if you're deployed to 3 different cloud providers for reliability.

2023 03 08 Incident: Infrastructure connectivity issue affecting multiple regions
Between March 8, 2023, 06:03 UTC and March 9, 2023, 08:58 UTC, Datadog experienced an infrastructure connectivity issue that caused service degradation across multiple regions.

#pm #reliability #upgrade

What Happens When You Type an URL Into Your Browser?

I remember several years ago I was going through an interview in Amazon. After the questions about the cloud advantages, the interviewer asked the questions in the title. And I think I managed to do pretty well: I described the 21h interruption, the events in the operating system, the DNS stuff including local caches, HTTP protocol... There was no second interview. So in case you get the same question - get the answer below!  

What Happens When You Type a URL Into Your Browser?
1. DNS resolution 2. TCP three-way handshake 3. HTTPS upgrade 4. HTTP Request/Response 5. Browser rendering the response

#systemdesign

How to run a Decision-Making Architecture Board 🍼

The autonomy of decisions in team is a good thing; however if the organization just allows everybody to do whatever they want, it soon will face a zoo of technologies and approaches. So at some point it makes sense to have a board where at least those decisions can be discussed. How to create and run one? Read a guest post in the blog of Olad Zommermann!  

How to Build and Run a Decision-Making Architecture Board
Guest post in ZIO’s blog by Hans-Peter Hoidn

#adr #architectureboard

Follow-up

Raft does not guarantee liveness in the face of Network Faults 🍼

Well, Raft as one of the consensus algorithms should guarantee the leader election during network faults. This post(but rather old one) showcased the 2 cases where the leader will not be able to be elected. The fixes suggested in article as well, so take a closer look.

Raft does not Guarantee Liveness in the face of Network Faults
Last month, Cloudflare published a postmortem of a recent 6-hour outage caused by a partial switch failure which left etcd unavailable as it was unable to establish a stable leader. This outage has understandably led to discussion online about exactly what liveness guarantees are provided by the Raf…

#distributedsystem #raft #consensus

Core Solution Architecture Methods 👷‍♂️

I am sharing an article from the Solution Architecture training. In this chapter the shared vision is considered: what you actually need to do in order to share the understanding of the system including defining boundaries, external interfaces, internal components etc. Get more details inside!

Core Solution Architecture Methods | Web Age Solutions
This chapter is adapted from Web Age course Solution Architect Training. 3.1 Shared Vision Architecture is a team effort. All stakeholders must collaborate for success. A “shared vision”...

#architecture #documentation

Hotspot performance engineering fails 🍼

Some companies believe that software can be fast, if you find some hotspots in the code and optimize those. But as an architect, you can easily guess that enormous performance problems happen from inappropriate architecture. Daniel Lemire explains it in little more detail.

Hotspot performance engineering fails
Developers often believe that software performance follows a Pareto distribution: 80% of the running time is spent in 20% of the code. Using this model, you can write most of your code without any care for performance and focus on the narrow pieces of code that are performance sensitive. Engineers l…

#performance #pareto

Postgres Superpowers in Practice 👷‍♂️

Postgres being a universal database for the majority of small and medium enterprises gets supported by the post of Oskar Dudycz, where he demonstrates how you can turn PostgreSQL into a multimodal database using the extensions. Look, how easy to convert it for example into a time-series db!

Postgres Superpowers in Practice - Event-Driven.io
Event-Driven by Oskar Dudycz

#db #postgres #timeseries

I built an AI Avatars Generator using Stable Diffusion 👷‍♂️

AI is on hype here. My colleague from Bolt wrote a blog post about how he made his own AI Avatars Generator. He describes the request ingestion, cron jobs, model deployment and training and provides the architecture he used. Follow the post!  

I built an AI Avatars Generator using Stable Diffusion. Here’s how to build your own. Part 1.
Step-by-step guide with code open-sourced.

#ai #ml

Connecting Block Business Units with AWS API Gateway 🤟

Company acquisition or merging can be a tricky process from a technical perspective. Different ecosystems, programming languages, deployment and runtime approaches are among those complexities. However, Block(which is an owning company for Square and Cash App) does it almost on a regular basis. Find a very thorough post on how AWS API Gateway and Fargate help them integrate new companies into their infrastructure with minimal possible effort.

Connecting Block Business Units with AWS API Gateway
In this article, we explore how Block onboards acquisitions within weeks rather than months.

#integration #cloud #aws