Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.summand.com/llms.txt

Use this file to discover all available pages before exploring further.

The semantic layer is the unit of analysis output. Every dataset has a history of semantic-layer versions; the UI always reads the latest one. When people on the team say “the analysis,” they mean a semantic-layer version.

What’s in a version

A version is built from per-component artifacts. Each component (column stats, EBM model, EBM graphs, UMAP, etc.) writes its own outputs and registers them in a single manifest.json. The pieces that the UI most directly consumes:
ArtifactUsed byDescription
column_stats.jsonOverview tab, SummandPer-column means, distributions, missingness, cardinality.
feature_metadata.jsonFeatures tab, SummandEBM feature importance, pairwise interactions, types.
graphs.json.gzFeatures tabShape functions and pairwise heatmaps — gzip-compressed because they’re large.
umap_embedding.jsonInsights tab, Summand2-D UMAP projection of rows in feature space.
model.pklInternalThe trained EBM, used to score new rows.
manifest.jsonInternalIndex of every component’s outputs and their content hashes.
Surprise finding writes its top-N output directly to the summand-task-outputs DynamoDB table — not to S3 — so the Insights tab reads from there. Predictions for new rows are produced on-demand by re-loading the model; there’s no per-row predictions Parquet sitting around. All S3 artifacts are stored under a versioned path (s3://summand-artifacts/curated/{connectorId}/{datasetId}/{version}/) so older versions remain readable.

Why a semantic layer

Every part of the UI — and the Summand assistant — needs to answer questions like “What’s the mean of revenue by segment?” or “Which features matter most?” without re-querying the source. The semantic layer is the cached, structured answer to those questions. Concretely:
  • Summand gets grounding facts from the semantic layer instead of hallucinating numbers.
  • Charts suggested by Summand reference real columns and aggregations from the layer.
  • Refresh comparisons become straightforward — diff version N against version N–1.

Versioning model

A version is (datasetId, version_number). Versions are append-only and immutable: once written, an artifact stays put. Updating the dataset (re-upload, refresh, target change) creates a new version; it does not mutate the previous one.

Time travel

The Overview tab shows the latest version by default. Power users on Enterprise plans can pin an older version for comparison — useful for tracking how a model’s understanding of the data has shifted over months.

Storage and retention

Semantic-layer versions count toward dataset retention, not toward tier-based dataset caps. Pro and Enterprise retain all versions for the life of the subscription. Free tier retains only the latest two versions per dataset.

Relationship to data warehousing

The semantic layer is not a metrics store like Cube or LookML. It’s narrower:
  • It describes a single dataset, not a curated cross-source business model.
  • It’s generated, not authored — you don’t write .yml to maintain it.
  • It exists to power Summand’s analysis surface; it doesn’t expose itself as a SQL endpoint.
If you need a metrics layer, Summand pairs naturally with one — point both at the same warehouse — but the semantic layer described here is internal to a dataset.