Topic

Why API Integration Matters in Modern Platforms

Insights/ Web Architecture & Platforms / APIs & Integrations

25 Jun 2024 - 08 min read

Why API Integration Matters in Modern Platforms

▶Listen to article00:00 / 09:40

An integration is a contract, not a cable

The word "integration" is often used as if it described a cable: connect system A to system B, data flows through, the box is checked. The cable framing is what produces the integrations that work for three months and then break in ways no one is prepared to fix. A more useful framing is that an integration is a contract between two systems, with versions, expectations, failure modes and an owner. Once it is treated as a contract, the right design questions become obvious; while it is treated as a cable, they stay invisible until production incidents force them out.

Modern platforms run on integrations. The CRM talks to the marketing tool, the marketing tool talks to the analytics warehouse, the warehouse feeds the dashboards, the dashboards inform the next campaign. Each link in that chain is a contract someone signed, often implicitly. The platforms that compound over years are the ones whose teams understood the contracts they were signing; the platforms that decay are the ones that thought they were just plugging things in. This article picks up where the NGO platform article covers institutional platform design and where the scalable platform article covers structural decisions; the angle here is narrower, on the integration layer specifically.

The real cost of integration is paid in year two

The first cost of an integration is the build: write the API client, map the fields, run the tests, ship it. That cost is small and visible. The second cost is the operational cost over the following two years, and that cost is large and invisible at the time the contract is signed. It includes the version bump that breaks the response shape, the rate limit that changes without notice, the field that gets renamed, the auth flow that adds an extra step, and the morning when the upstream system is down and three of the dashboards display zero.

Most integration regret is built into the design at the moment of the build, when no one stopped to ask "who maintains this in eighteen months". The healthier question is to size the operational cost realistically up front: how often does this contract change, how is the change communicated, who detects a break, how long is it acceptable for the integration to be silently wrong before someone notices. A team that answers those questions before writing the client tends to ship fewer integrations and pay less for them over time. A team that does not tends to ship many and accumulate a slow tax that no one priced.

Choose the system of record before designing the API

The single most consequential decision in any integration is which system is the source of truth for the entity being moved. If the customer record can be edited in the CRM and in the support tool and in the billing system, the integration cannot resolve the conflict; it can only spread it. Most "data quality" issues that get reported as bugs in dashboards are actually system-of-record issues that were not decided at integration time.

The pragmatic move is to name the system of record for each entity (customer, beneficiary, invoice, content item) explicitly, and to design every integration as a one-way flow from that system into the others, with the others treated as read-only mirrors that have to ask the source if they want to change a field. Two-way sync is sometimes necessary, but it should be a deliberate choice with a documented conflict-resolution rule, not the default. The integrations that look simple in diagrams and become unmanageable in practice are almost always two-way syncs that nobody designed as one.

Synchronous, asynchronous, batch: pick once, own the consequences

Three modes do most of the work in real platforms, and each one comes with different operational properties. Synchronous calls (the user clicks, the system asks the upstream API, the response shapes the page) are simple to reason about and brittle to upstream latency. Asynchronous flows (the system queues an event, the upstream eventually receives it) are more resilient and harder to debug. Batch jobs (every fifteen minutes or every night, the systems reconcile) are operationally cheap and produce data that is always slightly stale.

The mistake is to choose one mode for the whole platform, or to mix them by accident as different developers ship different integrations. The cleaner approach is to decide, per integration, which mode the contract demands, and to make that decision visible: a webhook is asynchronous and that means certain things about idempotency and retry; a REST call inside a request is synchronous and that means a budget for upstream latency; a nightly batch is batch and that means downstream consumers cannot expect real-time freshness. When the mode is named, the failure modes are predictable. When it is implicit, every integration is its own surprise.

Reliability lives in the boundaries, not inside the services

Most reliability problems in modern platforms do not live inside the services themselves. They live at the boundaries between services: the webhook that was retried three times then silently dropped, the response that was partially consumed before the connection died, the migration that updated one side of an integration before the other. Treating those boundaries as first-class engineering surfaces, rather than glue code, is the highest-leverage operational discipline a platform team can adopt.

A few patterns do most of the work. Idempotency keys on every write so that a retry does not double-charge. Bounded retries with backoff so a downstream outage does not turn into a thundering herd. Dead-letter queues for events the system genuinely cannot process, instead of silent drops. A contract test on each integration that fails the build when the upstream shape changes in a way the code does not expect. None of this is exotic; all of it is the difference between a platform that holds together at 2am and one that does not.

Ownership: who pays when an integration breaks at 2am

Integrations break. The question that decides whether the platform survives is whether someone is on the hook to fix them, and whether that someone has the access, the runbook and the authority to do it. Many platforms ship integrations with no named owner: the developer who wrote the original client moved on, the operations team treats it as a developer problem, the developer team treats it as an operations problem, and the integration sits in a state of unowned brittleness until something visible breaks.

A useful pattern is to name an owner per integration in the same place the integration is documented, alongside three things: the runbook for the most common failures, the access credentials the on-call person needs, and the upstream contact (a vendor support address, a partner team) who can confirm whether the upstream is the problem. Without those three, "the integration is broken" becomes a thirty-minute scavenger hunt every time. With them, it is a fifteen-minute fix.

Monitoring is part of the integration, not bolted on later

The default state of most integrations is that no one notices when they break until a user notices. By that point, hours or days have passed, dashboards are wrong, and the recovery cost is much higher than it would have been. Monitoring an integration is not a separate project; it is part of the integration's definition of done.

The minimal version is small and worth the effort: a synthetic check that exercises the contract on a regular cadence, an alert when the success rate drops below a defined threshold, a dashboard panel that shows the integration's health alongside the things that depend on it. For platforms that span several integrations, a single integrations health page is more useful than a per-tool monitoring system, because it reflects how operators actually think about the platform when something is wrong: not "is the CRM up" but "is the data flowing the way it should".

Bringing this into practice

API integrations are the load-bearing connective tissue of modern platforms, and they are also the place where most operational debt accumulates. The platforms that compound treat every integration as a contract with a named owner, an explicit mode, a system of record, idempotent boundaries and observable health. The platforms that decay treat integrations as glue and pay for that choice in year two. For the structural side of platform design that surrounds these integration decisions, the web architecture and platforms insights cluster collects the related analyses.

If the question has moved from "we need to connect tool X to tool Y" to "we need an integration layer that we can reason about and operate without surprises", that is exactly what my API and system integration practice is built around.

- Haja Faniry

Why API Integration Matters in Modern Platforms

An integration is a contract, not a cable

The real cost of integration is paid in year two

Choose the system of record before designing the API

Synchronous, asynchronous, batch: pick once, own the consequences

Reliability lives in the boundaries, not inside the services

Ownership: who pays when an integration breaks at 2am

Monitoring is part of the integration, not bolted on later

Bringing this into practice

Related services

API Development & System Integration

Web Application Development

How to Design a Digital Platform for an NGO or Institution

Headless CMS vs Markdown Workflow

Related Posts

SPA vs SSR vs Static Site: What to Choose

How to Design a Scalable Web Platform

When a Headless Architecture Makes Sense