Thoughts on Designing Software Abstractions

"All problems in computer science can be solved by another level of indirection" (Butler Lampson).

I think this quote clearly presents how crucial abstractions are as a tool for solving computer problems. However, as important as this topic is, we do not see proper attention drawn to it when training engineers. Yes, there are a few books (e.g., Clean Code by Robert Martin) and many blog posts, but these usually focus on the technicalities (that is, how to write the code) rather than the thinking and mindset orientation. My goal here is to have a comprehensive discussion of what the engineer should be thinking about before and during the process of designing the abstraction. In other words, in my discussion, code is just a demonstration tool rather than the subject of the discussion.

Before we get started, I highly recommend reading Stripe's post on the first 10 years of Stripe's payments APIs. It tells a story of the natural progression of a software business and how its offering has grown, and so has the complexity of its abstractions. Their case is unique because their main product is APIs, so changes in the abstractions are more prominent. I will be referring to it as it has a few great examples of most of the ideas that I will discuss.

Defining the Units of Abstraction

By units of abstraction, I mean what we should expect our users to know and what we expect them never to have to know. This is defining the inputs and outputs of the abstraction. An example of this is Stripe's PaymentIntent API. They expect the API user to learn about a few Stripe objects like Customer, PaymentMethods, and PaymentIntents themselves to be able to use the API. Then, internally they abstract away the complexity of handling the API. They do not expect their API user to know how they store customer info or how they integrate with networks. And this is an intentional design decision.

Another example, this time from a micro perspective, is at ZidShip, we had a balance management service. The engineer working on it was suggesting to implement updating the balance as follows (written in Python and Django):

package = merchant.packages.select_for_update().get_active_package()
if package:
    if package.balance_remaining <= 0:
        raise Exception("Insufficient package balance")
    package.balance_remaining = F('balance_remaining') - 1
    package.save()

This will select the active package for update, then check if it's not null, ensure it has balance, and then deduct 1 from it.

I had a few comments on this design. One, the user needs to know the inner workings of the package and be careful to check the balance. Two, the select_for_update part is easy to miss, and I am worried that one might forget to add it and cause a race condition. So, I asked the engineer to try to abstract it.

The engineer's intent was to make it explicit, specifically, the select_for_update. They wanted to make sure that the user of the packages API is aware that they are locking the row when they do it.

That is valid; however, they were not thinking about their units of abstraction. What this engineer had in mind was abstracting loading the packages, but what I suggested was abstracting all balance manipulation. That is, make an abstraction layer that takes the merchant and how much you want to deduct or add to their balance. In other words, we are abstracting the whole snippet as follows:

def deduct_balance(merchant, amount):
    package = merchant.packages.select_for_update().get_active_package()
    
    if not package:
        raise Exception("Merchant has no active packages")
    
    if package.balance_remaining <= 0:
        raise Exception("Insufficient package balance")
   
    package.balance_remaining = F('balance_remaining') - amount
    package.save()

Now, every call site would only need to know two things, the merchant and how any units (i.e., the amount) we want to deduct. Then loading and locking the row is centralized. And we added a new case for handling merchants that do not have active packages.

Why is this better? First, all operations done on balance HAVE to be atomic and would require locking the row, so this way, we avoid forgetting to select for update. Second, users of the API would require so much less context to be able to interact with the balance. Meaning they will not need to learn about active and inactive packages or how these packages are purchased and activated.

You can see how defining the units is crucial on all levels, whether designing an API with a few hundred endpoints, like Stripe's or even implementing a 10-line function to help you avoid all sorts of nasty race conditions.

Understanding the Dimensions of the Abstraction

This is where we define what I call the depth and breadth of the abstraction. The wider the abstraction, the more use cases it has, and the deeper it is, the more layers it hides. Exploring these dimensions gives you clarity of the boundaries of your scope. Two common traps that engineers fall through are what I call abstraction bloating and pass-through abstractions.

Abstraction bloating happens when you try to incorporate too many use cases into your units of abstraction. In dimensions terms, the abstraction is too wide. Going back to Stripe's example, their team tried to force their original credit card (Charges) abstraction to do too much. Charges API was designed to handle immediately finalized payments that did not require customer action, and then they had to break one or both of these conditions. This an example of both working on the wrong units of abstraction and bloating it at the same time.

Another case of bloating is Django rest framework(DRF)'s serializers. Serializers in most other contexts convert objects from the language's native format to a format that can be transmitted elsewhere (e.g., python object to json). However, DRF's serializers do more than that. They handle data validation and state manipulation (i.e., save and update methods). For those who never used DRF, this is a sample serializer for a user model adapted from DRF's docs.

class UserSerializer(serializers.Serializer):
    email = serializers.EmailField()
    username = serializers.CharField(max_length=100)
    
    def create(self, validated_data):
        return User(**validated_data)

    def update(self, instance, validated_data):
      ...

This converts User objects to json, validates and parses json into User objects, and manages the state in the create and update methods. In this design, the serializers do too much and can be confusing and hard to learn. Yet, some might argue that this abstraction is appropriate. The DRF maintainers were thinking about making fully functional units that, once mastered, could be used to quickly implement the majority of use cases. I will be discussing this idea further in the next section.

A pass-through abstraction is an abstraction that does not hide any complexity. It assumes too much knowledge from its user. It occurs when you do not consider the depth of the abstraction. You may recognize this as "shallow abstraction," but I will refer to it as pass-through for my purposes. A very famous example of this is the Java I/O library. I think most of us had to use this library at some point. I can never write any I/O code in Java from my memory. The library exposes too much complexity to the developer. For example, to read a file, you must know the difference between a File, FileDescriptor, FileReader, BufferedReader, and a Scanner. The library has different sets of abstractions for parsing versus reading a file, even though parsing can be a subset of reading. With this many concepts, it would not be much different from calling the lower-level APIs directly. Hence, a pass-through abstraction.

Start and End With the User in Mind

It is easy to get caught up in the implementation of the abstraction and forget the user. This is where the abstraction starts leaking. Ideally, the designer of the abstraction should think from the user's perspective. Think about the common use cases and how concentrated or scattered they are. Always make the most common use case the easiest.

I usually do this by laying out the use cases, starting with the simplest (not necessarily the most common) and then expanding the design based on what needs to be changed for each new use case. For example, let's say I am writing a function to upload files to S3. The files can be either public or private. Private files can be accessed through signed urls that have an expiration date.

I would do this in three phases. First, consider the simplest case, which is uploading a public file. Then, add support for private files. Lastly, add support for signed urls. So, our interface for the first case would be something like this:

def upload_to_s3(file: IO, key: str):
		pass

We take any IO-compatible object and the key (the path in S3 terms). Then, we expand to support private files by simply adding an is_private parameter like so:

def upload_to_s3(file: IO, key: str, is_private=True):
		pass

Notice that I am setting the default to True because it turns out that, in most cases, we need to upload the files not to be publicly viewable. And we do not want our engineers to accidentally upload a file publicly, so we make that use case the default. We are making uploading a public file an active choice.

Now, we want to support signed url generation. So, we add two new parameters for that, generate_link, and link_expiration. generate_link decides whether or not to generate the link and link_expiration specifics how long (in seconds) this link should be available before it expires.

def upload_to_s3(
		file: IO,
		key: str,
		is_private=True,
		generate_link=False,
		link_expiration=604800,
):
		pass

Again, we are adding defaults to keep the most common case the easiest. So by default, we will not generate a link and will ignore the expiration parameter. And if the generate_link parameter is enabled, we will generate a link with the specified expiration.

Now, notice how we are hiding all complexity of S3 and which buckets are used for which files. There are no S3-specific parameters passed to the function. We can even swap S3 for a different storage backend, for that matter. These are all part of the abstraction. Users of this function who need to upload a private file with a signed link should not have to actively think about which bucket they are uploading to and what permissions should be assigned to the file. These are all implementation details and do not contribute to the user's business value.

Going back to the Java example, I think the Java IO library fails to make the most common use case easy. It focuses so much on purity rather than usability. The most common use case there is reading a local file as a whole or line by line. And for the majority of these cases, using a buffered reader is most optimal. So, these should be our defaults. I would leave designing a better API for this in Java as an exercise for the reader.

And for Stripe, this is exactly what they were excelling at. They made the common case so easy to implement and start someone might even argue that this made them complicate the less common case.

Allow Your Users to Escape Your Abstraction

The whole point of abstractions is to make a certain job easier. Sometimes that job is not exactly what the abstraction designer intended or assumed, which is not necessarily bad. Actually, more often than not is a good sign. The abstraction has proven useful, and its use cases have expanded.

The difficulty here is that designers sometimes intentionally lock down their abstractions, so they cannot be expanded. That could happen due to a few design decisions.

First, building the whole abstraction as one unit or layer where the user cannot use any lower levels of the abstractions to expand the use cases. Let's go back to our S3 upload example and assume that some user wants to generate signed urls for a pre-existing file. Maybe the original url has expired, or they might have never generated one in the first place. If we implemented the logic directly in upload_to_s3, that user would not be able to utilize our abstraction. A better approach would be to extract the link generation logic into a separate function. Like so:

def upload_to_s3(
		file: IO,
		key: str,
		is_private=True,
		generate_link=False,
		link_expiration=604800,
):
	  # Upload logic
		if generate_link:
        return generate_signed_s3_url(key, link_expiration)
        
def generate_signed_s3_url(key, link_expiration=604800):
    pass

Yes, this seems obvious and straightforward, but I have seen this so often that I had to show such a simple case.

The second category of design decisions that prevent the user from escaping the abstraction is mainly caused by object-oriented programming (OOP). I believe pushing engineers to be OOP purists leads to designs like Java's IO library. One OOP that I want to focus on is member visibility. For 90% of the cases, all class members should be public. I would go as far as to say that static analysis tools should have a check for preventing engineers from hiding class members. The reason is that it is common to find a method that does almost the thing you want, but not exactly, so I would copy the method's implementation and change what I need elsewhere. However, I cannot do that if all the members that the method uses are hidden (either private or protected).

One might say that "Just make a subclass and override it." Well, one, that is a new pass-through abstraction. Two, that is not doable if the method I am overriding is private. And three, it is very common that you are trying to debug something in the shell (or maybe running a one-time script), and you want to go through some method and run it line by line, but you cannot do that because members are hidden.

I can go on for hours with more reasons why defaulting to hiding members is a bad practice, but I will stop here. The only case where I think hidden class members make sense is in systems programming, where you want to deliberately prevent the abstraction users from tampering with some state or a resource because that could lead to a series of undefined behaviors. Yet, even then, I would carefully decide what should be hidden and try to leave as much of the interface public as possible.

Closing Remarks

I am only scratching the surface of the considerations that go into designing abstractions. There are many other tradeoffs that should be considered, including the software design questions that I have previously introduced in a separate post.