Topic

Signs Your Database Architecture Needs Improvement

Insights/ Data Systems & Performance / Database Architecture

09 Jan 2025 - 07 min read

Signs Your Database Architecture Needs Improvement

▶Listen to article00:00 / 09:08

When the database is the problem and nobody is looking at it

Most database problems do not announce themselves as "the database is the problem". They appear as application slowness that the team blames on the framework, deploys that have started feeling risky for reasons nobody has named, monthly hosting bills that are climbing faster than usage, and a pattern of incidents that all somehow trace back to the data layer once you actually look. By the time someone says "wait, is this a database problem", the team has often spent weeks chasing the wrong fix.

This article is a working list of the symptoms that usually mean the database architecture is asking for attention, and what each one tends to mean in practice. It pairs with the database design mistakes article, which catalogues the schema-design choices that produce most of these symptoms; here, the focus is the diagnostic side, written so a team can recognise the pattern in its own production before the cost compounds.

Symptom 1: Queries that used to be fast are now slow

A page that loaded in 200ms last quarter loads in two seconds this quarter. Nothing in the application code has changed; the data behind it has grown. The team adds a cache, the symptom hides for a week, the cache misses one Tuesday and the slowness is back, this time with a thundering herd.

What it usually means: a query that was fast at ten thousand rows is no longer fast at three hundred thousand, because the index it needed never existed (or no longer matches the query the application now sends). The first instinct is often to add hardware or scale horizontally; the cheaper fix is almost always to look at the slow-query log, find the offending statement, and design the right index. If a query is full-scanning a million-row table, no amount of horizontal scaling will save it.

Symptom 2: Migrations have become scary

A schema change that should be a one-line migration is now a project. The team writes a feature flag, prepares a backfill script, schedules a maintenance window, and pages someone for the rollout. Adding a column to an active table is the kind of thing the team avoids unless absolutely necessary.

What it usually means: the table has grown to a size where naive migrations lock production for too long, and the team has not yet built (or formalised) the discipline of online migrations. It can also mean schema entanglement: too many parts of the application read or write to the same tables, so any change has a wide blast radius. The fix is part operational (online migration tooling, backfill patterns, feature-flagged column rollouts) and part architectural (clearer module boundaries on the data layer, less shared mutable state).

Symptom 3: Recurring incidents trace back to the database

The post-mortem template has started having a "database" section that gets filled out more often than it used to. Connection exhaustion, lock contention, replication lag, slow queries blocking faster ones. Different incidents on the surface, same family of root cause underneath.

What it usually means: the database is operating closer to its limits than the team realises. Connection counts are not pooled correctly, long-running queries are blocking short ones, a write workload is competing with read traffic on the same primary, or the working set has grown beyond what fits comfortably in RAM. The fix begins with measurement (saturation indicators, p99 query latency, lock-wait time) and continues with the boring ladder of database scaling: indexes, pooling, read replicas, caching at the boundary, the things covered in the operational layer.

Symptom 4: Reports and dashboards take longer to load than to read

The executive dashboard that shows last quarter's numbers takes ninety seconds to render. The operational report the team uses every Monday gets exported to a spreadsheet because querying it live is too slow. A "real-time" view turns out to be five minutes behind, then ten, then half an hour.

What it usually means: the analytical workload is being run on the same database, and against the same schema, as the operational workload, and the schema was not designed for analytical queries. Joins across normalised tables that are fine for transactional access become expensive aggregations for reporting, and they are competing for resources with the user-facing traffic. The fix is to separate the read path for analytics: a read replica, a materialised view, or a dedicated analytical store fed by change data capture. Trying to make the operational schema fast for both is usually how both end up slow.

Symptom 5: "We cannot easily report on X" keeps happening

The product team wants to know how many users did Y in the last 30 days. The answer takes two engineers, three days, a custom script and a spreadsheet to triangulate. Every variation of the question requires the same effort. The dashboard that was supposed to surface this kind of thing was deprioritised because "the data isn't structured for it".

What it usually means: the data is structured for writes, not for reads. Important state changes are not captured as events, important entities are not modelled as first-class tables, important attributes live in untyped JSON columns where the application reads them but the analyst cannot. The fix is partly about surfacing the events the business actually wants to count (often as an event log alongside the operational tables) and partly about acknowledging that the analytical model is a separate design concern from the transactional one.

Symptom 6: Database costs are growing faster than usage

The hosting bill for the database has doubled in twelve months. User growth has been forty percent. The team is sized the same, the feature set is mostly the same, and yet the database tier is the line item that keeps going up.

What it usually means: indexes have proliferated without discipline, soft-deleted records have inflated tables that no query benefits from, JSON columns hold logs that should have aged out, and the working set is growing because nothing is being archived. Sometimes it means a missing read replica strategy is forcing the primary to do work it should not; sometimes it means a single bad query is consuming most of the cost. The fix begins with cost attribution (which tables, which queries, which workload) before adding any capacity.

Symptom 7: The team is blaming the application before checking the database

The application engineers are spending sprint after sprint on "performance work" in the application layer (lazy loading, code splitting, render optimisation) while the actual response-time budget is being eaten by the database underneath. Nobody on the application side has access to the slow-query log, or knows how to read an EXPLAIN plan, and the database is treated as a black box that the platform team owns and the product team uses.

What it usually means: the diagnostic capability and ownership of the data layer is not where the work is happening. The team will keep working on the wrong layer until someone makes the database measurable to the people who write the queries. The fix is partly cultural (database literacy on the application team, slow-query alerts that page the right person, EXPLAIN reviews in code review) and partly tooling (query monitoring that the application engineer can read without an introduction).

Final takeaway

A database that is asking for architectural attention rarely sends a clear message. It sends six different muffled messages, in different parts of the organisation, that the team has to recognise as connected. The discipline is not to wait for an incident severe enough that the answer is obvious; it is to keep this catalogue in mind, name the symptoms when they show up, and treat each one as a real signal rather than something to work around.

The wider context, including the schema-design and operational scaling decisions that produce these symptoms, is collected in the data systems and performance insights cluster. And when the question moves from "are we seeing one of these symptoms" to "we recognise the pattern and we now need someone to design the index strategy, the migration path or the read-path separation that fixes it", that is exactly what my database architecture and performance practice is built around.

- Haja Faniry

Signs Your Database Architecture Needs Improvement

When the database is the problem and nobody is looking at it

Symptom 1: Queries that used to be fast are now slow

Symptom 2: Migrations have become scary

Symptom 3: Recurring incidents trace back to the database

Symptom 4: Reports and dashboards take longer to load than to read

Symptom 5: "We cannot easily report on X" keeps happening

Symptom 6: Database costs are growing faster than usage

Symptom 7: The team is blaming the application before checking the database

Final takeaway

Related services

Database Architecture & Performance Optimization

Web Application Development

Database Design Mistakes in Growing Digital Products

Executive Dashboards vs Operational Dashboards

Related Posts

Database Design Mistakes in Growing Digital Products

How Website Performance Affects Business Results

What a Good BI Setup Looks Like