Data Science in a Glass Box

Trust as a release engineering problem

blog
Published

June 26, 2026

When I wrote about Data Science in a Box, the central idea was that modern analytics doesn’t require a sprawling collection of managed services. A surprisingly capable analytics platform can be built from a relatively small set of tools that emphasize code, reproducibility, and version control.

One criticism of that philosophy is understandable.

Does putting everything into one box simply create a bigger black box?

I think the opposite is true.

The more responsibility a platform takes on, the more transparent it should become. A mature analytics platform shouldn’t just produce datasets. It should produce evidence. Every release should explain where it came from, what changed, and why it deserves to be trusted.

That has led me to think about trust a little differently.

Rather than treating trust as a dashboard problem or a data quality problem, I’ve started thinking of it as a release engineering problem.

Trust is usually treated as an afterthought

Imagine opening a dashboard on Monday morning and noticing that a familiar metric has changed.

The first questions are almost always the same.

  • What changed?
  • Was the change expected?
  • Did the source data change?
  • Did the transformation logic change?
  • Is this a bug?
  • Can I reproduce last week’s numbers?

Most analytics platforms don’t have particularly good answers.

If you’re lucky, there is a pull request describing the code changes. Maybe someone remembers that an upstream provider republished a file. Maybe there’s a CI log showing that the pipeline passed.

Those things are useful, but they aren’t evidence.

Once the next deployment runs, much of that context disappears. The warehouse moves forward. CI logs age out. The “latest” tables overwrite yesterday’s state. Six months later, reconstructing why a release happened often becomes an archaeological exercise.

That feels strange when compared to software engineering.

Software releases are treated as first-class artifacts. They have versions, release notes, provenance, and deployment history. We expect to be able to answer questions about a release long after it has shipped.

Data releases deserve the same treatment.

We tend to think of publishing data as shipping it. The pipeline finishes, the tables are updated, and we move on to the next release.

But software releases aren’t simply shipped—they’re curated. They carry documentation, version history, release notes, and enough context that someone else can understand exactly what changed and why.

I think data releases deserve the same level of care.

Rather than shipping data like sealed crates, we should think about curating releases more like museum exhibits. The data may be the centerpiece, but it’s the surrounding context (the changelog and the validation evidence) that allows someone else to understand what they’re looking at months or even years later.

That’s the kind of transparency I’m after. Transparency into why a release deserves to be trusted.

Transparency is the feature

A black box asks consumers to trust its outputs. A glass box lets them inspect them. A transparent system produces enough evidence that trust becomes a consequence rather than a requirement.

It’s a key distinction.

Transparency isn’t about exposing every SQL statement or expecting analysts to inspect execution plans. It means that every published release carries enough context to answer questions like:

  • What changed compared to the previous release?
  • Were those changes expected?
  • What evidence supported promoting this release?
  • Can this release be reproduced?
  • Can I inspect the reasoning months later?

Those aren’t operational questions. They’re release engineering questions.

A release is more than a collection of tables

Thinking in terms of releases changes the shape of the system. A release is no longer just the current contents of a warehouse. It becomes an artifact with its own identity.

Like a museum exhibit, the data itself is only part of the story. The labels, the historical context, and the documentation are what make the exhibit meaningful.

A data release should work the same way. The tables are only part of the release.

Equally important is the information that explains those tables: what changed, how they were produced, why they were considered safe to publish, and whether they can be reproduced in the future.

That package of contextual information is what I’ll call release provenance.

For a data release, provenance isn’t just an audit log or a lineage graph. It’s the collection of evidence that allows someone months or even years later to understand (and if necessary, reproduce) the release.

That might include:

  • a version identifier
  • a changelog describing what changed
  • validation evidence collected before promotion
  • comparisons to the release being replaced
  • the version of the transformation code used to build the release
  • the versioned source data that fed the pipeline
  • enough metadata to rerun the same comparisons and verify the results

Together, these artifacts answer questions that inevitably arise after publication.

Why did this number change?

Was this an upstream correction or a transformation change?

Can we reproduce exactly what we published last year?

Without release provenance, those questions often become an exercise in digging through Git history, CI logs, and institutional memory.

With it, every release becomes self-describing.

The first principle: build, then promote

One consequence of this way of thinking is that consumers should never observe a warehouse in a partially updated state. A release should be built somewhere isolated. It should be validated completely. Only then should it replace the current production release in a single atomic promotion.

If validation fails, nothing changes. Consumers continue using the previous release.

The important point isn’t blue/green deployment itself. It’s the principle that publication is separate from construction.

That separation creates a natural place to ask an important question: “Is this release ready?”

Evidence should outlive the release process

Passing automated tests is necessary, but it isn’t sufficient.

Tests answer whether a candidate release is internally consistent. They don’t explain how it differs from the version currently in production.

That comparison only exists briefly.

Just before promotion, both the current release and the candidate release coexist.

That’s the moment to capture evidence. Not merely that validation succeeded, but what actually changed.

For me, that evidence includes things like:

  • release-to-release data diffs
  • row count summaries
  • anomaly detection
  • schema changes
  • aggregate statistics
  • release metadata

Unlike CI logs, these artifacts aren’t ephemeral.

They’re stored alongside each release as durable, queryable records.

Months later, they can still answer:

What changed between version v2026r3 and the version it replaced?

Trust should be queryable

One realization that surprised me is that QA itself can become data.

Instead of producing HTML reports or transient CI comments, release validation can produce structured datasets. Those datasets become their own historical record.

Instead of asking people to remember why a release happened, you can ask the system.

Not because the system is infallible. Because it preserved the evidence.

A framework, not a finished methodology

I don’t think this framework is complete. There are some real trade-offs.

  • How much evidence is worth keeping forever?
  • When should validation be exact versus tolerant?
  • How much review should remain human?
  • What constitutes a release in a continuously rebuilt warehouse?

I’m still working through those questions, but the direction feels increasingly clear.

The more I think about analytics platforms, the less convinced I am that trust is something dashboards earn after publication. Trust is something the release process engineers into every published artifact.

That may be the most important lesson I’ve learned since writing Data Science in a Box.

Curation is about more than simply preserving artifacts. It preserves the context needed to understand them.

I think mature analytics platforms should do the same. Not just publishing data, but curating releases.