The Primacy of State
What Is State?
State = short-term memory Storage = long-term memory
Stateless means short-term memory is not preserved between operations
Stateful means short-term memory is preserved between operations
Stateless really means forgetful. We should call stateless applications forgetful applications.
All computation is stateful. CPUs are literally only capable if executing instructions based on the state of CPU registers. There’s no such thing as stateless computing. It’s just a question of how long you let a system remember stuff before you pull out the Men in Black memory eraser and wipe the slate clean.
Why stateless?
What’s the benefit of being forgetful?
Let’s take the use case of serving up search results. Being forgetful is a huge benefit. Incoming searches are unlikely to relate to previous searches.
Why stateful?
What’s the downside of being forgetful?
Now let’s think about predicting catastrophic network failure. Being forgetful becomes a major liability.
How far do you have to drive to get an answer?
Databases and data lakes try to position as much data as possible the same average distance away. You’re always going to have to drive somewhere to get an answer, but you’ll never have to drive more than a few hundred milliseconds—or a few hours, in human time scales.
This is kind if weird. No human tries to organize their life so that all their possessions are kept the same average distance away. You keep the things you use most on your desk. What doesn’t fit on your desk spills over into a drawer, then a closet, then the garage, then the storage locker, then craigslist.
Memory hierarchy
The software equivalent of desk -> drawer -> closet -> garage -> storage locker is called memory hierarchy. In the world of silicon, you might see the term NUMA thrown around, short for Non-Uniform Memory Access.
For a chip, you have L1, L2 and L3 caches (desk and drawers), main memory (closet and garage), and on disk (the storage locker).
In the cloud, you have “tiered storage”, in-memory databases, and caching servers. But in reality, these are all treated like storage lockers—they’re all miles down the road—because the software is stateless. There is no short-term memory. The contents of desk are swept into the trash, the drawers emptied, the closet cleared, the garage sold, after every single operation.
Data Locality
The idea of minimizing how far you have to drive to access the things you use most is called optimizing for data locality.
Data locality just means keep the stuff you use most close by, and let the stuff you use less frequently live further away.
The idea of statefulness is to not have to leave the house to do most day to day work.
Implications of state on architecture
If you clear the desk after every operation, when a new request comes in, you can assign that work to any available desk! They’re all the same! Thus is the benefit of statelessness. You don’t need to keep track of which desk is being used for what.
The downside, of course, is that the software isn’t going to spend much time actually doing work at their desk—it’s going to spend mist of its time driving to the storage locker to pick up files.
With stateful applications, you’re going to spend less time driving, and more time working. But you now have the new challenge of having to route incoming requests to the desk that’s handling those requests.
And you have to think a little bit harder about what to do when a roomful of desks spontaneously exploded (server failure). But that’s a discussion fir another day (fault tolerance and high availability).
Organizing Work
You can think about distributed software architectures kind of like big corporations. Stateless and stateful are two different schools of thought about how to organize workers to achieve the goals of these soft-corporations. Neither method is right or wrong. Nor is either approach mutually exclusive of the other. Each has its pros and cons, strength and weaknesses.
Stateless architectures break work down by job function. You have the storage department, the messaging department, the analytics department, and the applications department, which is kind of like the management if the company. The benefit of this approach is that each department—each layer of a stateless software stack—can narrowly specialize. The downside is that the communication and management overhead is extremely high, because each digital department is entirely oblivious to the bug picture. Indeed, no department is really truly in charge. This makes automation—a key objective of modern enterprises—prohibitively difficult.
Stateful architectures divvy up work by client, if you will. Instead of functional departments, stateful architectures stand up a cross-functional teams dedicated to each “customer” or “deliverable”. The same team handles all aspects of what a given client needs. The expertise about the client is localized to the responsible team—it has excellent locality. Each team knows its client well; it can anticipate what the client needs, and deliver great service. Each team is focussed, not having to concern itself with what unrelated teams are doing. The downside is that these teams—the stateful application programs—are not yet commoditized the way, say, database departments are. A d you can’t swap out whole departments—you can’t fire the whole marketing team, only individual marketers.
So you see, statefulness—how long software’s short-term memory lasts—has an incredibly profound impact on how distributed software is architected, and what it’s capable of achieving. If you look at the trends in the industry though—the drive for automation, for tailored customer experiences—the frontier is moving away from Terry Gilliam Brasil-style software bureaucracies to more nimble, adaptive, proactive approaches. A lot of pain in the industry right now is caused by trying to apply the rigid, departmental mentality to problems that require treating every customer, product, device, service, etc. uniquely.
And when you break your architecture down into many small cross-functional teams, instead of big bloated departments, you can send the team to the data, instead of trucking the data back and forth between departments—software in motion!