Ditch The Old Data Access Patterns
It's All Fun and Games Until Somebody Hits Growth Mode
It's In The Name
The most common and popular choice for building data access models in commercial line-of-business software is to create a set of tables to store records and create models that directly map to each table. This approach was codified and popularized as the "Active Record" pattern in Rails circa 2004. This pattern relies on the CRUD model for data access, named for the operations it supports in relation to specified records: create, read, update, and delete. The apparent simplicity of the approach makes it a natural first choice for applications with equally simplistic requirements. If all your application needs to do is read and present records, as-is, to a user and facilitate the creation of new records, changes to old ones, or the deletion of existing records, an Active Record / CRUD approach to data access seems perfectly reasonable. In the early days, it is easy to mistakenly assume that your application's data access needs will always be so simple. The ubiquity of this assumption has made Active Record the default choice for every application I've ever inherited.
The issue at the heart of choosing this approach is that most engineers have discovered, at least once, that this model prizes initial ease of development in exchange for long-term inflexibility, longer development cycles, and a growing cost of feature development. For any commercial software system, Active Record / CRUD leads to unsustainable engineering costs.
Tried and True Gets Tired
A common theme in my career has been how often companies lose their ability to evolve their systems. Every change continues to require more planning, more people, and more calendar time. Common advice amongst software architects is to plan for a rewrite with every 10-fold increase in users. Despite the wisdom, I have never actually worked anywhere that this could be said around "the management", lest one be cast from an open window.
Why is this such a common pattern if the outcomes are consistently poor? The answers I'm familiar with arrive in the forms of platitudes like, "horses for courses", "tried and true", "the devil you know", and "right tool for the job". The problem with these answers is that they lazily assert that the status quo the speaker is comfortable with should be, and is the default and correct choice. Unfortunately, this core assertion does not hold up under scrutiny.
Whether we're working on an existing system or building something new, there are a few things worth considering:
is our user base supposed to grow as large as possible?
is our software external customer-facing?
is our business based on a subscription model?
If you answered, "yes" to any of these questions, there is a good chance that choosing or sticking with a CRUD/Active Record (CRUDAR?) model is betting against your company's future success.
The purpose of all for-profit software companies I've worked for or with is to find a problem domain, propose a solution, and find out whether people will pay for your solution. Founders and funders are especially focused on quick iterations in the early days when there's no guarantee a market exists for a solution the new venture can deliver. In this case, the mission of the company is to fail and pivot with the smallest expenditure of funds possible. If you can fail and pivot quickly, you increase the odds of finding the right product-market fit and progressing to the next phase: growth mode. In growth mode, the pace doesn't slow. Instead, more funds are provided to grow the company, increase the number of experiments, and grow the exposure (and thereby adoption) of the product. The pressure on engineering teams is generally very high during this phase. It is certainly not a great time for engineers to try and convince their stakeholders and investors that a rewrite of anything is in order. Product organizations want their engineering teams to ship and move to the next thing. Not pause and rework the system to handle scale; whether the scale challenges are coming from customers or from the growing number of people attempting to add new features to the system.
The real kick in the teeth here is that our industry's environment incentives and rewards engineers for doing the same thing they've always done so they can ship faster. It punishes activity it sees as risk-taking and ignores the actual business outcomes in order to hyper-focus on one or two metrics in each phase at the expense of everything else.
The Wrong Tool For The Job
The average tenure of software engineers in early-stage and growth-stage companies averages somewhere between 18 to 24 months. This means that a lot of engineers don't get to see first-hand what the consequences of their decisions are. If they're regularly moving on to new and different opportunities in the same early stages, it becomes very easy for them to unintentionally miss the opportunity to learn from poor outcomes stemming from early technical decisions.
In addition, engineers with longer tenures may suffer from a "frog in the pot" situation. The outcomes they were originally able to provide degrade gradually. The longer they are in an organization, the more changes occur, and the easier it is to misidentify changes outside of their influence as the cause for slower cycles and poorer outcomes.
The end result is that, as an industry, we've stopped thinking critically about how some of our unquestioned norms may be the source of the serious pain points we so often encounter. I was recently reminded that lots of folks are not familiar with the Object-relational impedance mismatch problem. In essence, the relational database model, pioneered by Dr. E.F. Codd in the 1970s, was built during a time when labor was cheap but storage was incredibly expensive. We have the inverse problem today. Storage is cheap but labor is incredibly expensive. Ignoring, or worse, institutionalizing solutions to non-existent problems is all trade-off and no pay-off.
Where Do We Go From Here?
While I don't think there's truly a one-size-fits-all, I do believe that teams that are given sufficient autonomy, time, and tools, will produce superior outcomes that are sustainable and don't derail the company's future growth. There are many suitable alternative patterns (CQRS, Event Sourcing) and technologies (Key/Value Stores, Document Databases, Graph Databases, OODBMs, Mixed-mode Databases, etc.) that I encourage everyone to learn more about. There are some surprising ways to remove the undue influence of storage technology from how we model our applications. My goal in writing this post is to challenge assumptions and encourage learning and exploration.