What a product manager should do to create the right system architecture

Vivek Rajasekaran
6 min readOct 9, 2021

I recently finished reading Designing Data-Intensive Applications by Martin Kleppmann. I had been looking for a summary on the ideas behind architecting large-scale software systems and this book fit the purpose nicely. Martin has been able to find the right fit between the breadth of ideas behind building reliable, scalable systems as well as going deep to illustrate the finer nuances. The three word summary of the book would be Know Your Tradeoffs. Various system architectures from a single node — vertically scalable system to a distributed — horizontally scalable — fault-tolerant system have their pros and cons. The right choice for any product involves figuring out what aspects of the system performance matters more in a given context.

While the large part of a product manager’s job is to ensure that the right user experience and functional needs of a system are defined and delivered, a PM also needs to ensure that the overall system architecture chosen for the system is also fit for the purpose. Errors in this can cause reliability and performance gaps, are costly to rectify once a system is live typically leading to work-arounds and slower product evolution. So while it will definitely be the system architect / engineering lead’s decision on the right architecture for any product, the PM needs to help in optimizing for the right tradeoffs. And the PM can do that by also explicitly focusing on non-functional system requirements. This blog post provides some guidelines on the areas that should be covered based on my own experiences and the learnings from Martin Kleppmann’s excellent book.

Provide a picture of expected product evolution. There is a lot of emphasis today on the MVP. The agile process also results in teams focusing deeply on the specific features which are immediately planned for development. However, a system architecture is built for the long term. This is possible when there is some clarity on what the system should look like 1–2 years from today (expected features, system analytics and additional modules).

Know the mission-criticality of the system you are creating. How safety or life critical is the application? Are there parts of the system which should be treated as hard real-time systems? Typically, such a categorization is relevant for systems such as medical devices, aircraft software, military systems, etc. and not for the internet enabled consumer or enterprise applications that most of us work on. Hard real-time systems need to have timing guarantees for pretty much any process and require an entirely different way of architecture (specialized operating systems and libraries, dedicated processing, dedicated bandwidth, lots of redundancies). While this blog post covers only soft real-time systems, it is still useful to pose this question when defining any system as it forces you to think about and clarify the expected system performance in the worst cases.

Identify the key constraints in your system that should always be met. In any system, there will be some strict assumptions on the expected behavior. For example, in a flight ticket booking software, you don’t want two users to select the same seat even if they are booking in parallel (or multiple buyers in an e-commerce site to purchase the same item when only 1 is in stock). While such constraints are simple to implement through locks in a single server machine, it needs care in a distributed system. Adding too many strict constraints can lead to performance issues, so there needs to be a proper assessment of their need through analysis of the expected probability of non-conformance to constraints and the cost of the constraint breach. It might be extremely unlikely for two customers to select the same seat in a flight at the same time and let’s say if it happens, it might not be very difficult to identify it and have a process to communicate (either through the system or offline) with one of the customers that their seat will need to be changed. Such a “cost of apology” is a business decision that a PM needs to drive.

Clarify the expected data access patterns. Is some data going to be accessed by users more frequently? Do we know what that data is in advance? For example, the first few episodes of a newly released show on Netflix will be accessed by many more customers than any random video. Similarly, there would be periods of time (holidays, weekends, evenings, etc.) where there might ebb or flow in access. The expected distribution of data requests and user actions by resource, time or any other relevant variables is a key information that needs to be considered in the system design. While the real-world will reveal such information after a system is launched, it is likely to be a painful learning when not contemplated during design.

Define the level of consistency and coherence of customer experiences. Today’s large scale internet enabled businesses run on distributed database and computing technologies which typically fall under what’s called an “eventually consistent” system. There are many system availability and performance benefits to using such a system. But this also means that customers can sometimes encounter scenarios which might seem inconsistent or illogical. For example, you might be checking the live score in a cricket match and see that the match has got completed but the next refresh of the page might show that a couple of balls are yet to be bowled. Or you might be checking your mails both on your phone and your laptop and see that they are not in sync. These are addressable through appropriate engineering even when using “eventually consistent” technologies. But a PM needs to come up with the appropriate behaviors in such scenarios and define the level of consistency guarantees to be provided to decide if it’s worth the engineering effort (and the possible performance costs).

Share the end-to-end requirements. As an example, security of key details such as the user credit card data might be a key requirement. Such information would be accessed in the system in different ways. It is likely that the details are only encrypted when stored in the database and the requirement gets checked off. But you would ideally want the credit card details to be encrypted / secured in some way across the system — during transit (from web application or mobile app to the server), during payment processing, analytics and reminder services (to check which cards are expiring in the next month). There will be separate engineering teams looking into these disparate functions and the product manager needs to ensure that the end to end ask is raised and aligned across these teams.

Share the data privacy requirements while understanding the full implications. You might agree in your terms and conditions to delete all data of a customer once they terminate their account with you or you might provide a way for users to delete some of their data (such as their product search history). This might even be mandated by regulations in some geographies where you operate. Deletion of data is tricky. Often, data is only made less easier to access (by soft deletion) rather than being actually removed. Even when it is removed, it might be done only on the current active copy. What about all the earlier database versions which might have been backed up? There can also be some derived data (such as product recommendations) which would have partially used the data being deleted. Does that need to be removed / recomputed as well? A PM should help in taking the business calls associated with these decisions (ideally working backward from a customer’s implicit expectations) and call out the requirements clearly.

Originally published at http://vivekrajasekaran.com on October 9, 2021.

--

--

Vivek Rajasekaran

Long stories on stuff I know (product management and tech businesses) and short stories on everything else.