Topic

Database Design Mistakes in Growing Digital Products

Insights/ Data Systems & Performance / Database Architecture

08 Feb 2025 - 09 min read

Database Design Mistakes in Growing Digital Products
Listen to article00:00 / 10:55

Most database pain comes from a handful of early decisions

Most of the database pain that growing digital products live with traces back to a handful of schema decisions made early and casually. Each one looked harmless at ten thousand rows. None of them are harmless at ten million. By the time the symptom is visible (slow queries, painful migrations, reporting that takes longer to generate than to read, an ORM that has stopped behaving), the original choice that caused it is buried under three years of features that depend on it.

This article is a working catalogue of the recurring schema-design mistakes I see most often in growing digital products. It pairs with the scalable web platform article, which covers the operational scaling decisions one level above the schema; here, the focus is the modelling layer below them, where the cost of getting it wrong shows up two years later and rarely gets attributed to the original cause.

Mistake 1: Weak primary keys

Choosing the wrong primary key is the single hardest mistake to fix later, because every other table that references this one has now embedded the choice. Three weak choices recur.

Using a natural key (email address, employee number, ISBN) as the primary key. The user changes their email; the foreign keys are silently broken. The employee number gets reused after termination; the historical references now point to the wrong person. Natural keys are useful as unique constraints; they are almost never useful as primary keys.

Using a sequential auto-incrementing integer for entities that will be exposed externally. The URL /orders/4521 tells the world that order 4521 exists, that there are at least 4521 orders, and that whoever loads /orders/4520 will see someone else's order if access control is weak. Sequential IDs leak business information and surface area; UUIDs (or ULIDs, where ordering matters) cost a few bytes and remove the leak.

Using a composite primary key across mutable columns. The unique combination is correct today, but the moment any of the columns becomes mutable, the foreign keys break. A surrogate primary key with a unique constraint on the combination is almost always the cleaner shape.

The fix for new tables is straightforward: surrogate keys, immutable, opaque to outside consumers, with natural keys enforced as separate unique constraints. The fix for existing tables that got this wrong is much more expensive, which is why this mistake earns its place at the top of the catalogue.

Mistake 2: Indexes by accident, not by design

Indexes are the highest-leverage performance work the team will ever do, and they are also the most often left to chance. Two failure modes recur.

The first is missing indexes on the columns that real queries actually use. The schema has indexes on the columns that looked important when the table was created, not on the columns that the application now filters and joins on. Six months in, the slow-query log is full of full table scans on columns that should have been indexed from the start.

The second is over-indexing every column "just in case". Every index speeds up reads and slows down writes. A table with twelve indexes pays the write tax twelve times on every insert. Tables with high write volume and casual indexing become bottlenecks for reasons that look like database problems but are actually schema discipline problems.

The fix is to drive indexing from the actual query patterns the application generates, not from how the schema reads on paper. Periodic review of the slow-query log and the index-usage statistics catches both problems early, while they are still cheap to fix.

Mistake 3: Premature normalisation, or premature denormalisation

Both extremes show up regularly, often in the same codebase, and both are expensive in opposite ways.

Premature normalisation breaks every domain object into a graph of small tables that no real query ever fetches in pieces. A "customer profile" becomes seven joins. A "settings page" becomes twelve. The schema is technically correct in third normal form and operationally a nightmare; the application spends most of its time joining its way back to the shape it actually needed.

Premature denormalisation does the opposite: copies of fields proliferate across tables for reads that have not been measured to need them, and now every write has to update multiple places to keep them consistent. The system has earned the cost of denormalisation without the benefit, and the inconsistency bugs that follow are some of the hardest to debug.

The fix is to start with a normalised schema that follows the domain, then denormalise specific fields when measurement (not intuition) shows the read pattern justifies it. Denormalisation is a deliberate trade, not a default.

Mistake 4: Soft deletes everywhere as a default

Adding deleted_at to every table because "we might want to recover deleted records" is a decision that compounds in three painful ways.

Tables grow without bound, including with records that should be gone. Every query has to remember WHERE deleted_at IS NULL, and the one that forgets returns deleted users to active customers. Indexes get bigger and slower because they index records nobody actually wants. And the GDPR-style "right to be forgotten" requests turn into engineering projects because the deleted records never actually left.

The fix is to default to hard deletes and add soft deletes as a deliberate choice for tables where recovery is a real product requirement. When soft deletes are the right call, enforce the deleted_at filter at the query-layer level (a base query class, an ORM scope, a database view) so that no application query has to remember it.

Mistake 5: JSON columns as a kitchen sink

Modern databases support JSON columns natively, which is a real capability and an irresistible temptation. The temptation is to put everything that "might change later" into a JSON blob and call the schema flexible.

The cost shows up in three places. Querying the JSON is slower and uglier than querying typed columns, and the planner cannot help. Validating the JSON happens in application code, if it happens at all, and the schema-as-source-of-truth quality is lost. Migrating a field out of JSON into its own column is much harder than the reverse, because every reader of the JSON has to be updated.

The fix is to use JSON columns deliberately, for genuinely shape-varying data (per-tenant settings, third-party API payloads, audit logs), and to keep the rest typed. The default for a new field is a typed column with a constraint; the JSON column is the exception, justified case by case.

Mistake 6: Avoiding foreign keys "for performance"

A surprising number of growing systems lack foreign keys, often because someone read once that "foreign keys hurt write performance" or because the ORM did not generate them by default. The cost of going without is not visible until the schema is large enough that referential integrity matters.

Without foreign keys, orphan records accumulate silently: an order points to a customer that no longer exists, a comment to a deleted post. Reports show wrong totals, joins return rows that the application then crashes on, and cleanup scripts become a recurring chore. Cascading deletes have to be implemented in application code, which is fragile and easy to forget when a new path to deletion is added.

The fix is to use foreign keys by default and to treat the absence of one as an exception that needs justification, not the other way around. The write-performance argument is real for very high-throughput systems, but most growing products are not in that regime, and the integrity benefit is large.

Mistake 7: Letting the ORM design the schema

ORMs are useful tools that produce reasonable schemas for simple cases and accidental disasters for everything else. Three patterns recur.

N+1 queries baked into the schema: the ORM lazy-loads relationships, the application iterates, and a page that should be one query becomes three hundred. The schema and the access pattern are mismatched, and the ORM hides the mismatch until production load reveals it.

Schema as a side effect of model classes: the schema is whatever the migration generator produces, with no review. Indexes are missing because nobody added them in the model file. Constraints are absent because the model class did not declare them. The schema documents the application's casual choices rather than a deliberate design.

Polymorphic associations: the ORM offers them, the team uses them, and the database loses the ability to enforce referential integrity because the foreign key target depends on a type column. What was supposed to be a clean abstraction becomes a recurring source of orphan records and data-quality bugs.

The fix is to treat the ORM as a query and mapping tool, not as a schema designer. Migrations are written or reviewed deliberately, indexes are explicit, constraints are declared, and N+1 patterns are caught in code review or with query-counting middleware before they reach production.

Final takeaway

Schema-design mistakes are predictable and they compound. The same handful of patterns recurs in product after product, and each one is much cheaper to avoid up front than to fix later, because the rest of the system grows around the mistake before anyone notices the cost. The discipline is not to be perfect; it is to recognise the recurring patterns, treat each schema decision as a deliberate one, and pay the small upfront cost of getting the foundations right.

The wider context, including how schema design connects to operational performance, indexing strategy and the data systems that grow on top, is collected in the data systems and performance insights cluster. And when the question moves from "are we making one of these mistakes" to "we recognise the symptom and we now need someone to design the index strategy, the migration path or the data-model fix", that is exactly what my database architecture and performance practice is built around.

- Haja Faniry

Related services

Database Architecture & Performance Optimization

Database architecture design and performance optimization services to ensure scalable, secure and high-performance data infrastructure.

Web Application Development

Custom web application development for companies, startups and international organisations.

Previous Post
How Website Performance Affects Business Results
Next Post
Signs Your Database Architecture Needs Improvement
Database Design Mistakes in Growing Digital Products | Haja Faniry