Architecture Weekly Issue #73. Articles, books, and playlists on architecture and related topics. Split by sections, highlighted with complexity: 🤟 means hardcore, 👷‍♂️ is technically applicable right away,  🍼 - is an introduction to the topic or an overview. Now in telegram as well.

WARNING 🇺🇦

It's already been a year since Russia's crazy, brutal and unjustified war against Ukraine. We condemn this war and want it to stop ASAP. We continue this newsletter so you can advance your skill and help the millions of Ukrainian people in any way possible. If you want to help directly, visit this fund.

Big thanks to Nikita, Anatoly, Oleksandr, Dima, Pavel B, Pavel, Robert, Roman, Iyri, Andrey, Lidia, Vladimir, August, Roman, Egor, Roman, Evgeniy, Nadia and Daria for supporting the newsletter. They receive early access to the articles, influence the content and participate in the closed group where we discuss the architecture problems. They also see my daily updates on all the things I am working on. Join them at Patreon or Boosty!  

Highlights

Architecture is like Stock Market. Selling Options 🍼

Options in financial work are the possibility to purchase or sell a stock for a predefined price. Gregor Hohpe argues that Postponing an Architecture Decisions resembles options to a certain degree: it has a price, but allows to defer the decisions of buying into a particular thing(platform, pattern, etc.) later. Grab a short note in the Architecture Elevator.

Architecture: Selling Options
How do you explain the value of architecture to business stakeholders? Deferring to the Nobel-prize winning economists Black and Scholes can work surprisingly well.

#architecture #philosophy

Basecamp moving out from the cloud 🍼

37signals is a pretty famous company with the products like Basecamp and HEY. DHH - David Henemeier Hansson wrote a blog post that they completed their migration from AWS to their own hardware in 2 datacenters. It took them just 6 months to migrate but will save 1.5 million dollars yearly. Check out the thoughts on the migration!

We have left the cloud
Since it took us years to get into the cloud in the first place, I originally imagined it would take us years to get out as well. But all that work to containerize our applications and prepare them for the cloud actually turned out to make it relatively easy to exit. And now, after six months of eff…

#cloud #migration

Partitioning and replication: benefits & challenges 🤟

In distributed systems, partitioning involves dividing data into smaller units assigned to specific machines, aiding in scalability, performance, and fault-tolerance. Replication, on the other hand, duplicates data across different machines for increased fault tolerance. Despite their benefits, challenges exist. Replication requires consistent updates across replicas, while partitioning involves deciding optimal data division and handling multi-item requests. Many systems combine both techniques to maximize benefits while managing the associated challenges.

Partitioning and replication: benefits & challenges
An introduction to partitioning and replication, and the benefits and challenges they bring

#database #replication #distributed

Follow-Up

A Begginer's guide to database deadlock 🤟

This article explains how deadlocks occur in relational database systems and how these systems, such as Oracle, SQL Server, PostgreSQL, or MySQL, recover from such situations. Deadlocks happen when two concurrent transactions can't proceed because each is waiting for the other to release a lock. A separate process in the database engine detects such cycles and resolves the deadlock by aborting one transaction, thereby releasing its locks. The decision on which transaction to abort can vary based on the system, with some considering rollback cost or deadlock priority. The article underscores the importance of understanding and managing deadlocks to handle unexpected transaction rollbacks.

A beginner’s guide to database deadlock - Vlad Mihalcea
Learn how deadlocks occur in a relational database system, and how Oracle, SQL Server, PostgreSQL, or MySQL recover from a deadlock situation.

#database #deadlock

SQS vs Kinses vs Eventbridge 🍼

This article discusses when to use AWS messaging services SQS, SNS, EventBridge, and Kinesis. SQS is ideal for 1:1 communication, acting as a buffer and ensuring ordered message processing. SNS is used for broadcasting messages to multiple consumers, while EventBridge provides broadcasting, event scheduling, and SaaS integration. Kinesis excels in processing large volumes of real-time streaming data. But which service should you choose for real-time streaming with data persistence? Find the answer in the article!

AWS SQS, SNS, Kinesis, EventBridge : How to choose ?
“SQS, SNS, Kinesis, EventBridge ? Which one should I take ? In which situation ?”. It may be...

#messaging #aws

Klarna BNPL usage of Amazon Kinesis 🍼

Klarna uses Amazon Kinesis Data Analytics for Apache Flink for real-time decision-making, providing faster and more reliable shopping experiences. Initially faced with high latency issues using Apache Kafka and AWS Lambda, Klarna's solution now leverages an API with DynamoDB for decision storage and Kinesis Data Analytics for processing. Find out how the fully managed nature of Kinesis Data Analytics has improved Klarna's workflow, allowing for quick onboarding of new cases and the auto-scaling feature facilitating growth.

How Klarna Bank AB built real-time decision-making with Amazon Kinesis Data Analytics for Apache Flink | Amazon Web Services
This is a joint post co-authored with Nir Tsruya from Klarna Bank AB. Klarna is a leading global payments and shopping service, providing smarter and more flexible shopping and purchase experiences to 150 million active consumers across more than 500,000 merchants in 45 countries. Klarna offers dire…

#aws #streaming

How we learned to improve Kubernetes CronJobs at Scale. Part 1 👷‍♂️

Lyft migrated nearly 500 cron tasks to a Kubernetes infrastructure, aiming for efficiency and containerization. However, the transition presented challenges. Kubernetes CronJobs experienced significant startup delays and complex failure handling. Additionally, the repeated execution of CronJobs was sometimes interrupted by these delays, causing missed runs. Lyft plans to share how these issues were addressed in a future article to improve the reliability and usability of CronJobs. To my personal taste the scale of payload is pretty low, but still a valuable article.

How we learned to improve Kubernetes CronJobs at Scale (Part 1 of 2)
Deep dive into the inner-workings of CronJob and the problems we encountered running them at scale

#kubernetes #k8s

Is it possible to run a huge number of Android UI tests on each PR? 👷‍♂️

Running Android UI tests is a no-joke: you need to design the tests properly, prepare the infrastructure, make the reruns for failed tests to account for flakiness, etc. My friend Evgeny Matsuk wrote a series containing 5 blog posts explaining all those mechanics and giving the solution to the problem of running huge amount of tests in the limited time. Please read it carefully!  

I want to run any number of Android UI tests on each PR. Your actions? Part I
In my first article, I’ll explore the essential guidelines to build or choose a reliable and efficient solution.

#qa #automation #mobile

People and Security Incentives 👷‍♂️

Understanding and managing incentives and biases in people, organizations, and AI is crucial for risk management and cybersecurity. Strategies include utilizing force field analysis to identify forces affecting change and managing hidden incentives and conflicting risks. The practice of 'Escalation as a Service' - highlighting risks to leadership for resolution - is key. Adopting the qualities of High Reliability Organizations (HROs), such as proactivity, critical thinking, flexibility, open communication, and valuing expertise, can enhance security. Aligning the incentive structure with security goals requires an understanding of these incentives and the ability to manipulate them.

People and Security Incentives
Force 6 : People, organizations and AI respond to incentives and inherent biases but not always the ones we think are rational. // Central Idea: Risk management should be driven by incentives - but take care about your assumptions of rationality. Behavioral insights are key to cybersecurity. Conti…

#security