Semantic layer

The semantic layer is the unit of analysis output. Every dataset has a history of semantic-layer versions; the UI always reads the latest one. When people on the team say “the analysis,” they mean a semantic-layer version.

What’s in a version

A version is built from per-component artifacts. Each component (column stats, EBM model, EBM graphs, UMAP, etc.) writes its own outputs and registers them in a single manifest.json. The pieces that the UI most directly consumes:

Artifact	Used by	Description
`column_stats.json`	Overview tab, Summand	Per-column means, distributions, missingness, cardinality.
`feature_metadata.json`	Features tab, Summand	EBM feature importance, pairwise interactions, types.
`graphs.json.gz`	Features tab	Shape functions and pairwise heatmaps — gzip-compressed because they’re large.
`umap_embedding.json`	Insights tab, Summand	2-D UMAP projection of rows in feature space.
`model.pkl`	Internal	The trained EBM, used to score new rows.
`manifest.json`	Internal	Index of every component’s outputs and their content hashes.

Surprise finding writes its top-N output directly to the summand-task-outputs DynamoDB table — not to S3 — so the Insights tab reads from there. Predictions for new rows are produced on-demand by re-loading the model; there’s no per-row predictions Parquet sitting around. All S3 artifacts are stored under a versioned path (s3://summand-artifacts/curated/{connectorId}/{datasetId}/{version}/) so older versions remain readable.

Why a semantic layer

Every part of the UI — and the Summand assistant — needs to answer questions like “What’s the mean of revenue by segment?” or “Which features matter most?” without re-querying the source. The semantic layer is the cached, structured answer to those questions. Concretely:

Summand gets grounding facts from the semantic layer instead of hallucinating numbers.
Charts suggested by Summand reference real columns and aggregations from the layer.
Refresh comparisons become straightforward — diff version N against version N–1.

Versioning model

A version is (datasetId, version_number). Versions are append-only and immutable: once written, an artifact stays put. Updating the dataset (re-upload, refresh, target change) creates a new version; it does not mutate the previous one.

Time travel

The Overview tab shows the latest version by default. Power users on Enterprise plans can pin an older version for comparison — useful for tracking how a model’s understanding of the data has shifted over months.

Storage and retention

Semantic-layer versions count toward dataset retention, not toward tier-based dataset caps. Pro and Enterprise retain all versions for the life of the subscription. Free tier retains only the latest two versions per dataset.

Relationship to data warehousing

The semantic layer is not a metrics store like Cube or LookML. It’s narrower:

It describes a single dataset, not a curated cross-source business model.
It’s generated, not authored — you don’t write .yml to maintain it.
It exists to power Summand’s analysis surface; it doesn’t expose itself as a SQL endpoint.

If you need a metrics layer, Summand pairs naturally with one — point both at the same warehouse — but the semantic layer described here is internal to a dataset.

Get started

Core concepts

Data sources

Guides

Account & billing

Resources

What’s in a version

Why a semantic layer

Versioning model

Time travel

Storage and retention

Relationship to data warehousing

​What’s in a version

​Why a semantic layer

​Versioning model

​Time travel

​Storage and retention

​Relationship to data warehousing

What’s in a version

Why a semantic layer

Versioning model

Time travel

Storage and retention

Relationship to data warehousing