Topic

What a Good BI Setup Looks Like

Insights/ Data Systems & Performance / Dashboards & BI

09 Oct 2024 - 08 min read

▶Listen to article00:00 / 09:27

The dashboard is the visible part of an iceberg

The dashboard is the part of business intelligence the organisation sees. The BI setup is the iceberg underneath: the warehouse where the data actually lives, the semantic layer where the metrics are defined, the catalogue where definitions and owners are documented, the refresh schedule that decides whether yesterday's number is yesterday's number, and the access model that decides who can write the SQL behind it. Most "the dashboard is wrong" complaints are really "the BI setup is wrong" complaints, and no amount of redesigning the dashboard will fix them.

This article is about the parts of business intelligence that almost never appear on a slide deck and almost always decide whether the dashboards on top can be trusted. It pairs with the data strategy article, which sits one level up and frames the decisions the BI setup is supposed to serve; here, the focus is on the boring infrastructure that makes good dashboards possible in the first place.

The warehouse and the semantic layer: where metrics actually live

A working BI setup separates the operational systems (the CRM, the ERP, the application database where transactions happen) from the analytical store (the warehouse) where reporting and analytics queries actually run. The separation is not optional once the organisation has more than a handful of users and more than a handful of metrics. Running analytics against the operational database produces two predictable failures: the operational system slows down when the report runs, and the analytical schema is wrong because the operational schema was designed for writes, not for reads.

The warehouse is the first quality criterion. Whether it is Snowflake, BigQuery, Postgres-as-warehouse, DuckDB at small scale, or something else, what matters is that there is one place where the analytical version of the truth lives, fed by ingestion pipelines that the team understands and can fix when they break.

Above the warehouse sits the semantic layer: the place where metrics are defined once and reused everywhere. "Active customer", "monthly recurring revenue", "average resolution time" exist as named, versioned definitions in a single artefact (a dbt project, a LookML model, a Cube schema, sometimes just a well-disciplined set of warehouse views), and every dashboard, report or notebook that uses those metrics references the same definition. Without a semantic layer, the same metric gets re-derived in five different ways across five different dashboards, and "active customer" means a different number depending on which page you read.

The metrics catalogue: one definition, anywhere it appears

A semantic layer is technical infrastructure. A metrics catalogue is the human-readable documentation that pairs with it. For every metric the organisation reports on, the catalogue answers: what is the definition (in one sentence), what is the SQL or formula behind it, who owns it, when was it last changed, what is the refresh cadence, and where does it appear (which dashboards, which reports, which APIs).

A working catalogue is short and current. Long catalogues that nobody updates are worse than no catalogue, because they create a false sense of governance. The discipline is to keep the catalogue narrower than the team would like (only the metrics that genuinely matter) and to enforce that any new metric added to a dashboard must first appear in the catalogue. Without that enforcement, the catalogue drifts and dashboards quietly start defining their own private versions of "active customer" again.

The catalogue is also where business users go when they doubt a number. "What does revenue mean here" should have a one-click answer linked from every dashboard that displays revenue. A BI setup where that answer is "ask the data team" is a BI setup where the data team is the bottleneck for trust.

Refresh discipline: the SLA of the data layer

Every dataset in the warehouse has, implicitly or explicitly, a refresh SLA: how fresh the data is supposed to be when a user reads it. Sales pipeline updated hourly. Marketing attribution updated daily at 06:00. Financial close numbers updated within three business days of period end. The implicit version of these SLAs ("we refresh whenever the pipeline runs") is what produces dashboards the user does not trust because they cannot tell whether they are reading current or stale data.

A working BI setup makes the SLA explicit per dataset, monitors whether the refresh actually meets it, and surfaces a "stale" signal on any artefact that reads a dataset which has missed its window. The signal does not have to be elaborate; a coloured dot, a "last updated 14 hours ago, expected within 4" badge, an automated alert to the data team. The point is that the user can tell whether the freshness they expect is the freshness they got.

The same discipline applies to schema changes. A column being renamed in the source system without notice should not silently change what the warehouse computes; either the change is announced and the downstream definitions are updated together, or the pipeline fails loudly until they are.

Naming conventions and access model: the boring parts that decide trust

A BI setup whose tables, views and metrics are named consistently is a BI setup that is much faster to onboard new analysts to and much harder to misuse. Conventions are small but they compound: tables prefixed by domain (sales_, support_, finance_), staging tables clearly distinguished from production tables, snake_case columns with explicit units (revenue_eur not revenue), boolean columns prefixed with is_ or has_. None of this is glamorous; all of it saves analysts from running the wrong query against the wrong table because the names looked similar.

Access matters as much as naming. A working BI setup has clear answers to: who can read each dataset, who can change a metric definition in the semantic layer, who can publish a new dashboard to a shared space, and who can edit one that already exists. Loose access produces five conflicting versions of the same answer; strict access produces a queue at the data team. The right setting is in between, and it is usually a combination of role-based read access on the warehouse, write access to the semantic layer restricted to a small group, and a lightweight review workflow for new dashboards before they appear in shared spaces.

Ownership at every layer

A BI setup that does not name owners at every layer is a BI setup where everything quietly becomes the data team's problem. The warehouse has a technical owner (data engineering). Each domain inside the warehouse has a business owner (sales, finance, product, support) who is accountable for what the data means and whether it is trustworthy. Each metric in the catalogue has a named owner who can answer "did this go up because of X" without escalating. Each dashboard has an owner who is responsible for retiring it when the underlying decision has moved.

Ownership is what turns a BI setup from a stack of tools into a working system. Without it, every change in the source systems is a fire drill, every "the number looks wrong" question becomes a multi-team investigation, and every retirement of an obsolete metric stalls because nobody can be sure no one else still depends on it.

The cheapest version of this is to keep an ownership list current alongside the metrics catalogue: name (or role) per dataset, per metric, per dashboard, with a rotation rule when people leave. The expensive version is what happens when nobody bothers to maintain the cheap one.

Final takeaway

A good BI setup is not the most sophisticated stack; it is the one where definitions live in one place, refresh SLAs are explicit, naming and access are predictable, and every layer has a named owner. The dashboards built on top of that setup can be trusted because the infrastructure underneath is trustable. The dashboards built on top of a setup without these criteria can be elegant, well-designed and entirely correct in isolation, and still produce numbers the organisation has reason to doubt.

The wider context, including the strategy that decides which metrics matter, the dashboard design that surfaces them, and the engineering layer that powers the warehouse, is collected in the data systems and performance insights cluster. And when the question moves from "we have a BI tool but the numbers we get are not trusted" to "we need to design the warehouse, the semantic layer, the catalogue and the ownership that turn the tool into a working setup", that is exactly what my data analysis and decision-support practice is built around.

- Haja Faniry

What a Good BI Setup Looks Like

The dashboard is the visible part of an iceberg

The warehouse and the semantic layer: where metrics actually live

The metrics catalogue: one definition, anywhere it appears

Refresh discipline: the SLA of the data layer

Naming conventions and access model: the boring parts that decide trust

Ownership at every layer

Final takeaway

Related services

Data Analysis, Business Intelligence & Dashboards

Database Architecture & Performance Optimization

How to Design Useful Dashboards

Why Dashboards Fail to Support Real Decision-Making

Related Posts

Executive Dashboards vs Operational Dashboards

Why Dashboards Fail to Support Real Decision-Making

How to Design Useful Dashboards