The Software Design Questions

Last year, I published a collection of discussions on software design where I discussed a set of central principles that should be considered while designing software. Here, I move to a meta-discussion about what questions you should ask when trying to apply those principles. If you have not read the first post, that is okay. It is not required to understand this discussion so you can read the posts in any order. You only need to know from that post the definitions of a system and software design. Software design is the process of managing tradeoffs when architecting a system. And a system in this definition can range from a single process/thread program to a multi-cluster, multi-region distributed system.

I want to start with what this write-up is not. This is not a step-by-step manual on how to design scalable and simple systems. I believe that software design is part knowledge, part art, and part experience. I also want to emphasize that no matter how good the design is, it is bound to have some shortcomings. So, the goal is not to eliminate those shortcomings but rather to minimize their blast radius when we run into them.

This write-up is not going to answer questions but instead going to raise many of them. It is meant to help a designer ask the right questions to help them make more educated tradeoffs. Remember, software design is all about making tradeoffs, and making tradeoffs is all about asking the right questions.

I will discuss high-level and low-level ideas simultaneously, so some sections may be purely conceptual, and others may contain code snippets (in Python).

Learning the context

The first step in every software design project is to learn the context in which you are working. I have discussed this more thoroughly in a previous post, but I will summarize it in four main questions and expand a little: What do we do? Who are we serving? What is our goal? How are we trying to reach it?

The first question tells you the business line. So, for example, what we do at Zid is e-commerce. Now, the specifics of that come out of the following three questions.

The second question tells you who your customers are. This is where you specify your customer persona or personas. Keep in mind that if your answer to this question is "everyone," it means that you are doing something wrong. If you do not know who your customer is, assume one, do not try to serve all humans all at once.

The third question directs your efforts. It is the "mission" of your business. Answering it should tell you what benefits your customers are gaining from your business, which sets your compass for what solutions you should be looking at. For example, when Zid started, we used to say that our goal was to provide an all-in-one e-commerce platform. However, we now believe that we empower retailers to grow their businesses. That change gave us a dramatically different answer to the next question, as I will show with Uber.

The last question dives into details of your offering, and it builds on your answer to the previous question. Answering this question will determine the kind of solution you will build. For example, Uber's mission is "to help people go anywhere and get anything," they achieve that by offering a ride-hailing service and an on-demand delivery service. Previously, it was "make transportation as reliable as running water, everywhere, for everyone," which primarily focused on ride-hailing, and that is it. Now, Uber shows you all kinds of transportation methods buses, bikes, scooters, and cars (including rentals).

The two most important questions in software design

After learning the business context and background, we can now ask the core questions for our specific project. What are we optimizing for? What can we sacrifice?

We can optimize and sacrifice all sorts of values like performance, time to market, scalability, extensibility, user experience, developer velocity, developer experience, etc.

Any project should have some combination of these. Things you are optimizing for and things you can sacrifice. This decision should be intentional and explicit. For instance, if you are optimizing for time to market, you will have to sacrifice either performance or code quality – maybe a combination of both.

Some sacrifices can be unintentional (or, in a sense, incidental or byproducts), but optimizations cannot be (otherwise, we do not know what we're doing). For example, say you are sacrificing code quality (maybe, you are planning a rewrite if this takes off). You may have to sacrifice performance in the same places because the cleaner version is also more performant, but it will take longer to build.

Priorities do change, so you need to be prepared. Do not bake your assumptions into too many levels of your system.

class InvalidPlan(ValidationError):
  pass

def validate_plan(user: dict):
	if user["plan"] != "pro"
  	raise InvalidPlan()
  return input

def perform_billing(user: dict):
  if user["plan"] != "pro"  # This is a code smell
  	raise InvalidPlan()
  # Do billing logic
  
def bill_user():
  validate_plan(user)
  perform_billing(user)

Your logic should not be too coupled with your input validation. In our example, performing billing should not check for the plan. What if you wanted to change/remove that check? You would have to run through your whole codebase. Imagine changing it in some places, but not all of them. You would start to encounter the weirdest behaviors in your system.

Note On Sacrificing Extensibility

You may sometimes have to trade extensibility (for simplicity or delivery time), which means that you are making a particular assumption and sticking to it. For that, we need to understand the two kinds of assumptions: Baking assumptions and topping assumptions. Baking assumptions are the assumptions that if you were to change them, you would have to significantly rework the system. You can usually spot them when deciding whether to use a one-to-one or a one-to-many relationship. Topping assumptions are the ones that may only require a couple of lines of code to change—for example, changing from a sequential transaction number to a random transaction number.

Remember: do not shoot yourself in the foot. Do not bake a contested assumption. If there is a debate on whether an assumption is the right one for the long run, avoid baking it into your design.

What technology should I use?

Simply use whatever you know unless you know you shouldn't use it. This could also be seen as some sort of a tradeoff. You are optimizing for business, not technology "coolness." Do you want to build a business or build something with new shiny technology? It is a decision that needs to be made.

Organizational Context

Here, we need to understand the team (or teams) working on our software. What is the smallest team unit size? How much freedom does this unit have? To whom do they report? If this unit messed up, who is supposed to fix the problem? Is anyone going to be blamed? And how is your expertise distributed?

For example, if your teams are small (for some measure of "small") and autonomous (that is, they make all kinds of product and tech decisions themselves), a micro-services architecture (yes, it only took me about a thousand words to mention it) is most probably appropriate. However, if your teams are either too large or very interconnected, a micro-services architecture will become the mega-mess architecture.

One case where interconnection can happen is when only a few experts have a high concentration of knowledge about the system. That means they have to participate in almost every change – teams are not autonomous.

And as a side note for the previous example, this is one area where it is common for some people just to follow the trend and want to say that they implemented * insert buzzword * solution at their organization. And part of the community will be going around preaching about this architecture and how it solved their life problem, but they also need to share their organizational context (in what kind of organization was this architecture helpful). Some organizations are inherently slow, not because of technology but because of internal procedures. So implementing X architecture or X methodology will not help. It may make things worse. Remember Conway's law (yes, I had to bring that up).

Design Documentation

Now that we have learned our business context, specified our tradeoffs, chosen our technology, and considered our organizational context, how do we get this into writing? And how do we communicate it?

Unlike most of the previously introduced questions, I have a definitive answer here. You may have heard of class diagrams, use-case diagrams, sequence diagrams, etc. If you have, do not use them. These are unnecessary formalism that does not add to the conversation. On the contrary, they make it harder for people to learn all the modeling terminology and tools involved in producing those diagrams.

A simpler approach is to use informal diagrams and supplement them with textual descriptions with minimal usage of context-specific jargon. This way, you can onboard anyone to your documents by simply providing a glossary section that describes all the terminology used throughout the document – no external resource needed.

Deployed! What now?

Start over. Software design is not a linear procedure. It is a feedback loop. As the software is used, we learn more about its and users' behavior. Shifting our understanding of the business context and the tradeoffs we made – taking us to the first two sections.