The semantic layer is the unit of analysis output. Every dataset has a history of semantic-layer versions; the UI always reads the latest one. When people on the team say “the analysis,” they mean a semantic-layer version.Documentation Index
Fetch the complete documentation index at: https://docs.summand.com/llms.txt
Use this file to discover all available pages before exploring further.
What’s in a version
A version is built from per-component artifacts. Each component (column stats, EBM model, EBM graphs, UMAP, etc.) writes its own outputs and registers them in a singlemanifest.json. The pieces that the UI most directly consumes:
| Artifact | Used by | Description |
|---|---|---|
column_stats.json | Overview tab, Summand | Per-column means, distributions, missingness, cardinality. |
feature_metadata.json | Features tab, Summand | EBM feature importance, pairwise interactions, types. |
graphs.json.gz | Features tab | Shape functions and pairwise heatmaps — gzip-compressed because they’re large. |
umap_embedding.json | Insights tab, Summand | 2-D UMAP projection of rows in feature space. |
model.pkl | Internal | The trained EBM, used to score new rows. |
manifest.json | Internal | Index of every component’s outputs and their content hashes. |
summand-task-outputs DynamoDB table — not to S3 — so the Insights tab reads from there. Predictions for new rows are produced on-demand by re-loading the model; there’s no per-row predictions Parquet sitting around.
All S3 artifacts are stored under a versioned path (s3://summand-artifacts/curated/{connectorId}/{datasetId}/{version}/) so older versions remain readable.
Why a semantic layer
Every part of the UI — and the Summand assistant — needs to answer questions like “What’s the mean ofrevenue by segment?” or “Which features matter most?” without re-querying the source. The semantic layer is the cached, structured answer to those questions.
Concretely:
- Summand gets grounding facts from the semantic layer instead of hallucinating numbers.
- Charts suggested by Summand reference real columns and aggregations from the layer.
- Refresh comparisons become straightforward — diff version N against version N–1.
Versioning model
A version is(datasetId, version_number). Versions are append-only and immutable: once written, an artifact stays put. Updating the dataset (re-upload, refresh, target change) creates a new version; it does not mutate the previous one.
Time travel
The Overview tab shows the latest version by default. Power users on Enterprise plans can pin an older version for comparison — useful for tracking how a model’s understanding of the data has shifted over months.Storage and retention
Semantic-layer versions count toward dataset retention, not toward tier-based dataset caps. Pro and Enterprise retain all versions for the life of the subscription. Free tier retains only the latest two versions per dataset.Relationship to data warehousing
The semantic layer is not a metrics store like Cube or LookML. It’s narrower:- It describes a single dataset, not a curated cross-source business model.
- It’s generated, not authored — you don’t write
.ymlto maintain it. - It exists to power Summand’s analysis surface; it doesn’t expose itself as a SQL endpoint.