Your Data Has Opinions About How It Wants to Be Stored

Notes from Chapter 2 of Designing Data-Intensive Applications

There’s a particular kind of argument that breaks out on engineering teams roughly once a quarter: SQL or NoSQL? It’s usually framed as a question about technology, or performance, or scalability, but it almost never actually is. It’s a question about shape. Specifically: what shape is your data, and what shape do you want to query it in?

Chapter 2 of Designing Data-Intensive Applications is called “Data Models and Query Languages,” and Martin Kleppmann uses it to make a case that’s surprisingly hard to internalize: the data model you pick has more impact on how your software gets written — and how it ages — than almost any other decision. Pick the wrong shape and you’ll spend years gluing it back together with application code. Pick the right one and a lot of problems just stop existing.

Here’s the tour.

The relational model won, then NoSQL happened, then everyone calmed down

Kleppmann opens with a quick history lesson, and it’s actually load-bearing. The relational model — tables, rows, columns, joins — won decisively in the 1970s and 80s, beating out the hierarchical and network models that came before. It won because it was simpler, more flexible, and because SQL gave you a clean separation between what you wanted and how the database should get it.

Then in the late 2000s, “NoSQL” happened. Suddenly everyone was on MongoDB and the relational model was dead. Except it wasn’t — and the chapter is genuinely useful at explaining why the rebellion happened and why it cooled off.

The NoSQL pitch was real: better scalability for certain workloads, more flexible schemas, query patterns that mapped naturally to specific applications, and a way to dodge the object-relational impedance mismatch — the awkward translation layer between objects in your code and rows in your tables that ORMs spend their entire existence patching over.

But the relational model didn’t die. It absorbed. Today’s databases are increasingly hybrid: PostgreSQL has JSON columns, MongoDB has joins. The interesting question stopped being “which side are you on?” and became “which shape fits your data?”

Documents are great until your data has friends

The core argument for document databases is locality. If your data is shaped like a tree — a user with their profile, their posts, their preferences, all naturally nested inside one logical thing — then storing it as a single document means one read fetches everything. No joins. No N+1 queries. The structure of the data on disk matches the structure your code wants.

For one-to-many relationships, this is genuinely lovely. A blog post and its tags, a resume and its work history, an order and its line items — these are tree-shaped, and trees fit nicely in documents.

The problem starts when relationships go many-to-many. Suppose two users went to the same university. In a document model, you either duplicate the university name in both user documents (and now updating it is a nightmare) or you store an ID and look the university up separately (which is a join, just one your application has to do by hand). Either way, you’ve reinvented something the relational model gave you for free.

Kleppmann’s point isn’t that one model is better. It’s that the more interconnected your data is, the worse documents fit and the better relational fits. If your domain is mostly self-contained units, documents win. If everything points to everything else, you want joins.

Schema-on-read is not the same as no schema

One of the most useful clarifications in the chapter is about schemas. Document databases are often called “schemaless,” and Kleppmann pushes back on this hard. There’s always a schema — the question is just where it lives.

Relational databases use schema-on-write: the database enforces the structure when you insert data. Document databases typically use schema-on-read: the structure is implicit, enforced (or assumed) by whatever code reads the data later.

Neither is automatically better. Schema-on-read is genuinely useful when your data is heterogeneous, when the structure comes from external sources you don’t control, or when you’re moving fast and the schema is changing constantly. Schema-on-write is genuinely useful when you want guarantees, when many different applications read the same data, or when you’ve been burned one too many times by a field that was supposed to always be a string and turned out to sometimes be null, sometimes a number, and sometimes the literal string "undefined".

The flexibility of schema-on-read isn’t free. It’s a debt you pay later, in defensive code and migration scripts and 3am incidents.

Declarative beats imperative, almost always

The middle of the chapter is about query languages, and the central claim is one I think more programmers should sit with: declarative is better than imperative, and SQL is one of the great triumphs of declarative thinking.

In an imperative query, you tell the database how to find your data — loop through this, filter that, sort the other thing. In a declarative query, you describe what you want and let the database figure out how. SQL is declarative; MapReduce code (in its raw form) is imperative; CSS, surprisingly, is declarative — you describe what styled output you want, the browser figures out how to lay it out.

The advantage of declarative isn’t just elegance. It’s that the database is free to optimize. It can add an index, parallelize the query, change the join order, switch algorithms based on table sizes — all without you rewriting anything. Imperative code locks in the how, which means it can’t get faster while you sleep.

This is why ORMs that generate “good enough” SQL beat hand-tuned procedural code in most real systems: the database has more information than your application does about how to actually run the query, and it’s getting smarter every release.

When your data is mostly relationships, use a graph

The third major data model in the chapter is the graph. If documents are great for tree-shaped data and relational is great for tabular data with joins, graphs are great for data where the relationships matter as much as the things being related.

Social networks are the obvious example: who follows whom, who’s friends with whom, who reposted whom’s post. But graphs also fit naturally for road networks, recommendation systems, knowledge bases, fraud detection, and anywhere you find yourself writing recursive SQL queries with five layers of CTEs that no one will ever understand again.

Kleppmann walks through two main flavors. Property graphs (Neo4j, etc.) treat nodes and edges as first-class objects with their own properties, queried with languages like Cypher. Triple-stores (RDF, SPARQL) model everything as (subject, predicate, object) triples, which is conceptually clean and has roots in the Semantic Web movement. Both can express things that would be miserable in SQL — like “find all the people connected to me by at most three hops who work at companies headquartered in cities I’ve visited” — in a few lines.

The historical note he ends on is great: most of these graph query languages descend from Datalog, a declarative logic programming language from the 1980s. Half of computer science is rediscovering ideas from the 1980s with better marketing.

The takeaway

Three data models — relational, document, graph — and they’re not really competitors. They’re tools for different shapes.

  • Tree-shaped, mostly self-contained data with one-to-many relationships → documents.
  • Tabular data with lots of many-to-many relationships → relational.
  • Highly interconnected data where the relationships are the point → graph.

Most real systems eventually use more than one. Your user accounts live in Postgres, your activity feed lives in a document store, your recommendation engine reads from a graph. That’s not a failure of architecture; it’s the architecture working as intended.

The deeper lesson is the one Kleppmann keeps coming back to: data outlives code. The schema you pick today will still be shaping decisions five years from now, long after the framework you used to build the API has been replaced. Choosing the right model early is one of the cheapest performance and maintainability wins in software.

Choose for your data’s shape, not for the conference talk you saw last week.

The Three Words Every Backend Engineer Should Tattoo on Their Forearm

Notes from Chapter 1 of Designing Data-Intensive Applications

If you’ve spent any time around backend engineers, you’ve probably noticed they love to argue. Postgres or MongoDB? Kafka or RabbitMQ? Microservices or “the modulith”? Most of these debates feel like they’re about technology, but they’re almost never really about technology. They’re about tradeoffs — and the tradeoffs only make sense when you know what you’re optimizing for.

Martin Kleppmann opens Designing Data-Intensive Applications by giving us the vocabulary to have those arguments properly. Chapter 1 is called “Reliable, Scalable, and Maintainable Applications,” and those three adjectives are the entire point of the book. Get them right, and the rest of the 500 pages is essentially a tour of how different systems make different bets in service of those three goals.

Here’s what the chapter actually says, and why it’s worth slowing down on before racing into the chapters about replication and consensus.

What even is a “data-intensive” application?

Kleppmann’s framing in the first few pages is small but important: most of the systems we build today are not bottlenecked by raw CPU. They’re bottlenecked by the amount of data, the complexity of it, or the speed at which it changes. A web app that serves a million users isn’t doing hard math. It’s juggling state — reading it, writing it, caching it, indexing it, replicating it, keeping it consistent enough to be useful and inconsistent enough to be fast.

These applications are built out of remarkably standard parts: databases, caches, search indexes, message queues, stream processors, batch processors. The interesting engineering question isn’t usually “which database?” It’s “how do these pieces fit together for this workload?” Two apps with identical tech stacks can have wildly different architectures because they’ve answered that question differently.

That sets up the rest of the chapter. If your job is gluing data systems together, what are you actually trying to achieve?

Reliability: keep working when things go wrong

The first goal is reliability — and Kleppmann gives a definition that sounds obvious but is genuinely useful: a reliable system continues to work correctly even when things go wrong.

The key distinction here is between a fault (one component misbehaving) and a failure (the whole system stopping). The job of a reliable system isn’t to prevent faults — that’s impossible — it’s to prevent faults from cascading into failures. That’s what “fault-tolerant” means.

Faults come in three flavors:

  • Hardware faults. Disks die, power cuts out, networks flake. We’ve been dealing with these for decades, mostly through redundancy: RAID arrays, dual power supplies, multiple availability zones. This is the easy category, in the sense that the failure modes are well understood.
  • Software errors. Bugs, runaway processes, cascading failures where one slow service takes down everything that depends on it. These are nastier because they can hit every replica simultaneously — your fancy redundancy won’t save you if all three nodes have the same bug.
  • Human errors. And here’s the punchline: humans cause more outages than hardware. The defenses are good abstractions, sandboxed environments for testing, telemetry that catches problems early, and — crucially — making it easy to roll back when someone inevitably ships something broken at 4pm on a Friday.

It’s tempting to skip reliability work on “non-critical” applications, but Kleppmann pushes back on that: the cost of losing user trust usually exceeds the cost of building things properly the first time. A photo app isn’t life-or-death, but if it loses your wedding photos once, you’re never opening it again.

Scalability: cope with growth

Scalability is the one everyone thinks they understand and almost no one defines properly. Kleppmann’s framing is that “scalable” isn’t a property a system has or doesn’t have — it’s a question, and the question only makes sense if you specify two things: what you mean by load, and what you mean by performance.

Load is whatever parameter actually pressures your system. For a web server it might be requests per second. For a cache it might be the hit rate. For a database it might be the read/write ratio. The chapter’s famous Twitter example shows why this matters: serving home timelines is a hard problem, but the right solution depends entirely on whether you optimize for the read path (fan-out on write, materialize each user’s timeline) or the write path (fan-in on read, query everyone’s posts when the user opens the app). Twitter actually switched approaches as their workload changed. Same problem, different load characteristics, different architecture.

Performance is the other side. And here Kleppmann lays down what I’d argue is the single most important graph in backend engineering: don’t use averages, use percentiles. A system with a 100ms average response time can still be miserable to use if the slowest 1% of requests take 10 seconds. Tail latencies — p95, p99, p999 — are what users actually feel, and they tend to disproportionately hit your most engaged customers, the ones who make the most requests and therefore have the most chances to roll the bad-luck dice.

Once you know your load and your performance target, scaling becomes a design problem. You can scale up (bigger machine) or out (more machines), and the right answer depends on the workload. There is no universal scalable architecture. Anyone selling you one is selling you something.

Maintainability: make it livable

The last goal is the one engineers love least and pay for most. Most of a system’s lifetime cost is not in writing it; it’s in keeping it running, evolving it, and onboarding new people to it. Maintainability is the property that lets future-you (or the person who replaces you) keep the lights on without losing their mind.

Kleppmann breaks it into three sub-principles:

  • Operability. Make it easy for the operations team to keep the system healthy. Good monitoring. Predictable behavior under load. Useful logs. Documentation that exists. Default behaviors that make sense. The classic “it works on my machine” failure is an operability failure.
  • Simplicity. Manage complexity. Every system accumulates accidental complexity over time, and the main weapon against it is good abstractions — taking something messy and giving it a clean interface. The opposite, which Kleppmann calls a “big ball of mud,” is what happens when you skip this and let everything tangle into everything else.
  • Evolvability. Make it easy to change. Requirements always change. The org changes, the product changes, the regulations change, the scale changes. Systems that can’t evolve get rewritten, and rewrites are expensive and dangerous.

If you’ve ever worked on a codebase where every change feels like defusing a bomb, you’ve experienced the absence of all three at once.

The takeaway

Reliability, scalability, maintainability. Three words, and the entire book is essentially “how do specific tools and techniques affect these three properties for specific workloads?”

What I find genuinely useful about this chapter is that it gives you a way to evaluate technical decisions without falling into religious wars. When someone says “we should use Kafka,” the question isn’t whether Kafka is good. It’s: which of these three properties does Kafka improve, by how much, for our workload, and at what cost to the others? Sometimes the answer is “a lot, cheaply, do it now.” Sometimes the answer is “not really, and it’ll add a fourth on-call rotation.”

There are no silver bullets. There are only tradeoffs against a specific set of requirements — and the first job of any backend engineer is being able to say what those requirements are.

That’s the real lesson of Chapter 1. The rest of the book is just receipts.

The Power of Less: How Scarcity Shapes Every Decision You Make

A Summary of Chapter 6 from Influence by Robert B. Cialdini

What if the secret to wanting something more had nothing to do with what it actually was and everything to do with how available it seemed? In Chapter 6 of Influence, Robert Cialdini unpacks one of the most quietly devastating forces in human psychology: scarcity. The principle is simple. We place greater value on things that are rare, fleeting, or at risk of being taken away. And the less available something becomes, the more desperately we want it.

The Art of the Almost-Lost Deal

Consider a divorce lawyer who spent years struggling to get couples to agree on settlement terms. Despite presenting identical proposals, she found clients stubbornly resistant — until she made one subtle change in how she framed the moment of decision. The old version went: “All you have to do is agree to the proposal, and we will have a deal.” The new version flipped the sequence: “We have a deal. All you have to do is agree to the proposal.”

The result? A near-perfect success rate. The reason is rooted in loss aversion. In the original phrasing, clients imagined themselves agreeing and therefore potentially giving something up. In the revised phrasing, the deal already existed in their minds — and refusing meant losing it. People will fight far harder to keep something they believe they already have than to gain something new. The lawyer didn’t change the terms. She changed what was at stake.

Midnight Lineups and Louis Vuitton Purses

Apple understands scarcity better than almost any company on earth. When a new iPhone launches with “limited supply” in stores, it does not simply create demand — it manufactures urgency. Long lines form overnight. Social media fills with stories of people who camped out, traded favors, and made bizarre sacrifices just to be among the first to get their hands on the device.

One story stands out particularly well. A woman waiting in line spotted someone just two spots ahead of her and offered to trade her Louis Vuitton handbag for their place in line. The rational mind would question this trade. But in a scarcity mindset, logic yields to the terror of missing out. The possibility of not getting the iPhone — of losing the opportunity — outweighed the objective value of a luxury bag. That is the power Cialdini is describing: not just desire, but the fear of deprivation.

Loss Looms Larger Than Gain

Research confirms what common experience hints at: the pain of losing something is significantly more motivating than the pleasure of gaining something of equal value. In one striking study, team members were found to be 82% more willing to cheat in order to prevent their team from losing status than they were to cheat in order to gain it. The asymmetry is striking. We are not rational optimizers seeking the best outcome — we are loss-averse creatures wired to protect what we already have.

This is why companies that frame their messaging around what customers stand to lose — rather than what they might gain — consistently outperform those that don’t. Health organizations encouraging cancer screenings have found dramatically better results when they ask people not to lose the chance to be healthy, to retain the ability to be present for life’s special moments, rather than simply promoting the benefits of early detection. The framing of loss is simply more compelling to the human mind.

The eBay Dad, the Countdown Clock, and the Three-Call Con

Scarcity operates through two distinct triggers: limited quantity and limited time. A father selling his collection of rare trading cards on eBay discovered this firsthand. When he listed all his cards at once, bids remained modest and interest was lukewarm. But when he staggered the listings — releasing one card at a time with gaps between each — the sense of rarity transformed his results entirely. The same cards, the same buyers, but a completely different outcome driven by perceived scarcity.

Deadlines exploit the same mechanism. When a window of opportunity appears to be closing, people stop deliberating and start acting. This urgency, Cialdini warns, is precisely what unscrupulous salespeople exploit. One chilling example involves a fraudulent investment scheme built on a “three-call method.” The first call is purely informational, delivered under the name of an impressive-sounding company. The second call reports remarkable profits — but regretfully notes that the investment window has closed. Then comes the third call: an exclusive opportunity, available only now, for a limited time. One man, caught in this manufactured urgency, handed over his entire life savings. The genius of the scheme was not greed — it was the engineered fear of missing out.

Freedom, Toddlers, and the Psychology of Reactance

Why does scarcity work at all? Cialdini points to two deeply rooted psychological forces. The first is a reasonable heuristic: things that are hard to obtain are often genuinely better. Rare materials, exclusive access, and limited editions frequently do represent superior quality. The second force is more primal — we hate losing our freedom to choose.

This psychological reactance — the instinct to push back when options are restricted — explains two of life’s most famously difficult developmental stages. At around age two, children first discover that they have independent will. Take something away, and they want it fiercely. Teenagers experience a second surge of this same impulse as they form their identities against the limits imposed by parents and society. Both stages are marked not by irrationality, but by an acute sensitivity to the loss of autonomy.

New Scarcity Hits Hardest

Cialdini closes with a crucial nuance: it is not just scarcity that inflames desire, but newly emerging scarcity. When something that was once plentiful starts to disappear, people react far more intensely than if it had always been rare. The sense of loss is compounded by the contrast with what was previously available. This is why rising restrictions, shrinking stock, and expiring offers trigger such powerful responses — the mind is not just registering scarcity, it is registering loss in motion.

Understanding scarcity means recognizing it everywhere — in the countdown timer on a checkout page, in the “only 3 left in stock” label, in the exclusive offer expiring at midnight. These are not coincidences. They are carefully engineered triggers aimed at the most ancient part of our decision-making brain: the part that is far more afraid of losing than it is excited about winning.

The Paradox of Success: Lessons from The Innovator’s Dilemma

What if the very practices that made your company successful were the same ones destined to destroy it?

This provocative question lies at the heart of Clayton M. Christensen’s groundbreaking 1997 book, The Innovator’s Dilemma: When New Technologies Cause Great Firms to Fail. In the introduction, Christensen presents a counterintuitive thesis that has reshaped how we think about innovation, management, and corporate survival.

The Puzzle: Why Do Great Companies Fail?

Christensen opens by presenting a mystery that had puzzled business scholars for decades. Companies like Sears, IBM, Xerox, and Digital Equipment Corporation were not run by incompetent managers. In fact, they were widely celebrated as some of the best-managed companies in the world. Fortune magazine praised Sears in 1964 for having an organization where “everybody simply did the right thing, easily and naturally.”

Yet these titans fell. Not because they grew complacent, arrogant, or risk-averse—but rather, Christensen argues, precisely because they followed the rules of good management. They listened to their customers, invested in new technologies, studied market trends, and allocated capital to innovations promising the best returns.

This is the innovator’s dilemma: the management practices that work brilliantly for sustaining existing products can become fatal liabilities when disruptive technologies emerge.

Sustaining vs. Disruptive Technologies

Central to Christensen’s framework is the distinction between two types of technological change. Sustaining technologies improve existing products along dimensions that mainstream customers already value. These innovations—whether incremental or radical—make good products better. Established firms excel at sustaining innovation because it aligns perfectly with their processes: listen to customers, invest in R&D, and deliver enhanced performance.

Disruptive technologies are different. They often underperform established products initially. They’re cheaper, simpler, smaller, or more convenient—but not as powerful. Mainstream customers typically don’t want them, and they offer lower margins than established products. By every rational business metric, investing in disruptive technologies looks like a bad decision.

And therein lies the trap.

The Disk Drive Industry: A Laboratory for Disruption

Christensen built his research on the disk drive industry—an industry characterized by relentless technological change. Between 1976 and 1995, the industry witnessed extraordinary turbulence: all but one of the 17 major firms failed or were acquired, along with 109 of 129 new entrants. Yet these firms didn’t fail because they couldn’t innovate. The established leaders were actually the pioneers in almost every sustaining innovation in the industry’s history.

They failed because each generation of smaller disk drives—from 14-inch to 8-inch to 5.25-inch to 3.5-inch—was a disruptive technology. Each new size initially offered less capacity than the larger drives and didn’t meet the needs of existing customers. But each found new markets (minicomputers, desktop PCs, laptops) that valued different attributes like size and portability. By the time these smaller drives improved enough to compete in mainstream markets, it was too late for the incumbents.

The Management Paradox

What makes Christensen’s argument so powerful—and uncomfortable—is its implication for managers. He’s not saying that failed companies were poorly managed. He’s saying they were excellently managed for the wrong context. The three patterns he identifies are damning:

First, disruptive technologies were often technologically straightforward—the established firms could have built them.

Second, established firms were leaders in sustaining innovations, proving their R&D capabilities were strong.

Third, despite developing working prototypes of disruptive technologies, management repeatedly chose not to commercialize them—because their customers didn’t want them.

In other words, these companies failed not because of technical limitations or lazy leadership, but because their rational resource allocation processes—designed to give customers what they want—systematically starved disruptive innovations of the resources they needed to survive.

Reflection: Why This Still Matters

Reading Christensen’s introduction nearly three decades after its publication, the insights feel more relevant than ever. We’ve watched Kodak, despite inventing digital photography in 1975, file for bankruptcy in 2012 because it protected its profitable film business. We’ve seen Blockbuster pass on acquiring Netflix for $50 million, only to become a cautionary footnote in business history.

What strikes me most is the emotional difficulty of Christensen’s prescription. He’s asking managers to invest in products their best customers explicitly say they don’t want. He’s asking them to pursue lower margins when shareholders demand growth. He’s asking them to cannibalize successful products before competitors do. These are not just strategic challenges—they’re psychological and organizational ones.

The introduction also offers a subtle but important comfort: failure in the face of disruption is not a character flaw. The managers at these companies weren’t villains or fools. They were trapped by systems, incentives, and rational decision-making processes that work beautifully—until they don’t. Understanding this helps us approach disruption with humility rather than hubris.

Key Takeaways

Success can breed failure. The practices that create market leadership can blind companies to disruptive threats.

Listening to customers isn’t always the answer. Current customers will optimize for current solutions, not future ones.

Disruptive technologies look unattractive—by design. Lower margins and smaller markets are features of disruption, not bugs.

Good management is situational. What works for sustaining innovation can be catastrophic for disruptive innovation.

Christensen’s introduction sets the stage for a book that doesn’t just diagnose the problem but offers solutions—creating separate organizations, finding new markets that value disruptive attributes, and learning to fail early and cheaply. But the introduction’s lasting contribution is simpler and more profound: it reframes failure not as the result of incompetence, but as the shadow cast by success itself.

— — —

The Innovator’s Dilemma by Clayton M. Christensen was first published in 1997 and remains one of the most influential business books ever written. Steve Jobs called it one of the few books that deeply influenced his thinking.