Architecture Weekly Issue #78. Articles, books, and playlists on architecture and related topics. Split by sections, highlighted with complexity: ๐ŸคŸ means hardcore, ๐Ÿ‘ทโ€โ™‚๏ธ is technically applicable right away, ย ๐Ÿผ - is an introduction to the topic or an overview. Now in telegram as well.

WARNING ๐Ÿ‡บ๐Ÿ‡ฆ

It's already been a year since Russia's crazy, brutal and unjustified war against Ukraine. We condemn this war and want it to stop ASAP. We continue this newsletter so you can advance your skill and help the millions of Ukrainian people in any way possible. If you want to help directly, visit this fund.

Big thanks to Nikita, Anatoly, Oleksandr, Dima, Pavel B, Pavel, Robert, Roman, Iyri, Andrey, Lidia, Vladimir, August, Roman, Egor, Roman, Evgeniy, Nadia, Daria and Dzmitry for supporting the newsletter. They receive early access to the articles, influence the content and participate in the closed group where we discuss the architecture problems. They also see my daily updates on all the things I am working on. Join them at Patreon or Boosty! ย 

Highlights

Why you should think twice about creating custom security ๐Ÿผ

As the sheer range of cybersecurity threats widens, it can be tempting to want to create custom security solutions to deal with them. Sadly, though, this can lead to you simply reinventing the wheel. And the end result can actually be a less secure application. Find out the examples of custom made solutions and protocols which led to serious problems.

Why you should think twice about creating custom security solutions
As the sheer range of cybersecurity threats widens, it can be tempting to want to create custom security solutions to deal with them. Sadly, though, this can lead to you simply reinventing the wheel. And the end result can actually be a less secure application.

#security

Cloud Automation a la DDD ๐Ÿ‘ทโ€โ™‚๏ธ

New post from Gregor Hohpe. Here he reasons about applicability of Domain-Driven Design to technical domains, consider EventBridge as an example and shows, how ubiquitous language help building better technical systems.

Cloud Automation ร  la DDD: From stringly typed to affordances
Domain-driven design very much applies to technical domains. Letโ€™s try it on event-driven cloud systems to see why I am such a big fan of object-oriented automation languages.

#ddd

HashiCorp State of Cloud ๐Ÿผ

State of the cloud report from HashiCorp reports interesting trend of boosting cloud spending from high mature organizations in an attempt toactually save money, going multicloud at the same time for efficiency and talent attraction. However, the part of companies who is wasting money on cloud remains high. Find more details inside the report. ย 

HashiCorp State of Cloud
HashiCorp 2023 State of Cloud Strategy Survey

#cloud #report

Follow-Up

A Modern Approach to Securing APIs ๐Ÿ‘ทโ€โ™‚๏ธ

The last highlight for today from The New Stack block covers the critical role of APIs in modern, cloud-native applications and the increased security risks they pose. It highlights the need for stronger API security measures due to the expanded attack surface and the fast pace of application development. Ory Segal suggests a modern approach to API security, which includes API risk profiling, shifting left with application security, and implementing multiple layers of defense. It emphasizes the importance of collaboration between developers and security practitioners to ensure scalable, flexible, multilayered security.

A Modern Approach to Securing APIs
Developers and security teams should work together toward a scalable, flexible, multilayered approach for any type of workload in any environment.

#security

F1: A distributed SQL Database that scales ๐ŸคŸ

Murat published a new blog post which discusses Google's F1, a distributed SQL database that replaced the sharded MySQL implementation for AdWords. However, the paper lacked a clear narrative and didn't provide explicit lessons for building a distributed SQL database. F1, built on top of Spanner, offers features like distributed SQL queries, transactionally consistent secondary indexes, and optimistic transactions. Despite higher database latency, F1 maintains user-facing latency similar to the previous MySQL system. The author calls for a follow-up paper to provide more insights and evaluations of specific optimizations and techniques.

F1: A Distributed SQL Database That Scales
This is a VLDB 2013 paper (appeared earlier at Sigmodโ€ฒ12 it seems) from Google about paying tech-debt. F1 replaces the sharded MySQL hacky ...

#distributedsystems #db #spanner

Building and operating a pretty big storage system called S3 ๐Ÿ‘ทโ€โ™‚๏ธ

How complex could be a Simple Storage Service(S3)? Well, it turns out you would need several years only to realize the complexity of the whole system. However, you have a chance to take a glance at the problems the folks at S3 face in order to make their service reliable and efficient, starting with error rates of hard drives to heat distribution. Find an awesome post in AWS CTO's blog.

Building and operating a pretty big storage system called S3
Three distinct perspectives on scale that come along with building and operating a storage system the size of S3.

#aws #s3

Costwiz: Saving cost for LinkedIn enterprise on Azure ๐Ÿ‘ทโ€โ™‚๏ธ

LinkedIn developed Costwiz, a tool designed to reduce costs by monitoring budgets and resource utilization on Azure. The tool identifies and alerts teams about cost-saving opportunities and under-utilized resources. It also provides a unified experience for leaders to forecast Azure budgets more accurately. The blog post discusses the challenges, progress, and lessons learned from the Costwiz journey. The tool has had a significant impact on the way LinkedIn manages Azure costs, creating a culture of continuous optimization in cloud deployments.

Costwiz: Saving cost for LinkedIn enterprise on Azure
Authors: Deven Walia, Vivek Subramaniam, Simon Desowza, and Karthik Subramanian

#azure #cloud #cost

Service Delivery Index: A driver for Reliability ๐Ÿ‘ทโ€โ™‚๏ธ

Slack Engineering introduced the Service Delivery Index - Reliability (SDI-R), a metric that measures the success of jobs-to-be-done by Slackโ€™s users and Slackโ€™s uptime. This index has become a key driver for reliability culture across the organization. It helps teams to speak the same language of reliability and enables them to spot trends, identify regressions, and improvements. The SDI-R is also used as an error budget to balance the prioritization of investments across the company. The blog post discusses the evolution of the SDI-R, its implementation, and the lessons learned from its use.

Service Delivery Index: A Driver for Reliability - Slack Engineering
How a simple metric drives a reliability culture across the Slack engineering organization

#reliability

Refresher on the CDC

CDC - Change Data Capture - is essential mechanics for propagating changes or replicating data. Check out a twitter post by Aurimas Griciลซnas on CDC!

#cdc #db