Where a connector points at a source, a dataset is the unit Summand actually works with. One dataset = one table = one curated Parquet copy = one semantic-layer history.Documentation Index
Fetch the complete documentation index at: https://docs.summand.com/llms.txt
Use this file to discover all available pages before exploring further.
What’s in a dataset
| Field | Purpose |
|---|---|
datasetId | Stable identifier (ds_…). Survives renames. |
connectorId | Parent connector. |
tableName | Source-side name (e.g. public.orders). |
displayName | Human-readable name in the UI. |
glueDb / glueTable | Glue catalog references — what views and Athena queries resolve to. |
enabled | Whether the dataset participates in experiments. Disabled datasets are inert but preserved. |
status | pending, ready, or failed. |
datasetContext | Free-text description that grounds Summand’s answers and component prompts. |
targetColumn field on the dataset itself. Predictive components (like the predictor) take the target column as a typed input when you configure an experiment — see Experiments.
What you do with a dataset
Three primary actions, in increasing order of commitment:- Chat — open Summand and ask questions. The dataset’s curated Parquet, schema, column stats, and context are all available to the assistant. It can run one-shot SQL, generate charts, and propose views or experiments.
- Create a view over it — save a SQL transformation, optionally joining other datasets, queryable by chat and usable as an experiment source.
- Set up an experiment — schedule one or more components to run on the dataset (or a view of it) on a cron. Outputs are versioned.
The dataset detail page
Datasets have a configuration page with nine tabs. They’re all about tuning behavior, not browsing analysis output (analysis output lives in chat, the Surprises page, and component-output views):| Tab | What you do here |
|---|---|
| Overview | Last sync, freshness, summary stats. |
| Schedule | Sync frequency for live connectors; CSV re-upload for CSVs. |
| Schema | Column definitions, types, exclusions. |
| Components | Which components are assigned to this dataset (and run during experiments). |
| Context | Free-text dataset context, surfaced to Summand and component prompts. |
| Access | Sharing and permissions. |
| Notifications | Alerts for sync failures, drift, freshness lag. |
| Run history | Past sync and component-run logs. |
| Advanced | Deletion and danger zone. |
Schema and types
Summand auto-infers types from a sample of the data on first ingest. You can override types and exclude columns from Schema if the inference is wrong. Changes take effect on the next sync or experiment run.Versioning
Each component run produces a new semantic-layer version keyed by(datasetId, version). Versions are immutable and append-only. The latest version is what the UI reads; older versions remain queryable for audit and comparison.
A new version is created when:
- A connector is refreshed (live source pulls new data)
- A CSV is re-uploaded
- An experiment runs (manual or scheduled)
- Schema or context changes trigger re-runs of dependent components
Limits
The active-dataset cap applies per account, not per connector:- Free: 3 active datasets total
- Pro / Education: unlimited
- Enterprise: unlimited