Datasets

Where a connector points at a source, a dataset is the unit Summand actually works with. One dataset = one table = one curated Parquet copy = one semantic-layer history.

What’s in a dataset

Field	Purpose
`datasetId`	Stable identifier (`ds_…`). Survives renames.
`connectorId`	Parent connector.
`tableName`	Source-side name (e.g. `public.orders`).
`displayName`	Human-readable name in the UI.
`glueDb` / `glueTable`	Glue catalog references — what views and Athena queries resolve to.
`enabled`	Whether the dataset participates in experiments. Disabled datasets are inert but preserved.
`status`	`pending`, `ready`, or `failed`.
`datasetContext`	Free-text description that grounds Summand’s answers and component prompts.

There’s no targetColumn field on the dataset itself. Predictive components (like the predictor) take the target column as a typed input when you configure an experiment — see Experiments.

What you do with a dataset

Three primary actions, in increasing order of commitment:

Chat — open Summand and ask questions. The dataset’s curated Parquet, schema, column stats, and context are all available to the assistant. It can run one-shot SQL, generate charts, and propose views or experiments.
Create a view over it — save a SQL transformation, optionally joining other datasets, queryable by chat and usable as an experiment source.
Set up an experiment — schedule one or more components to run on the dataset (or a view of it) on a cron. Outputs are versioned.

The dataset detail page

Datasets have a configuration page with nine tabs. They’re all about tuning behavior, not browsing analysis output (analysis output lives in chat, the Surprises page, and component-output views):

Tab	What you do here
Overview	Last sync, freshness, summary stats.
Schedule	Sync frequency for live connectors; CSV re-upload for CSVs.
Schema	Column definitions, types, exclusions.
Components	Which components are assigned to this dataset (and run during experiments).
Context	Free-text dataset context, surfaced to Summand and component prompts.
Access	Sharing and permissions.
Notifications	Alerts for sync failures, drift, freshness lag.
Run history	Past sync and component-run logs.
Advanced	Deletion and danger zone.

Most users never visit anything beyond Overview.

Schema and types

Summand auto-infers types from a sample of the data on first ingest. You can override types and exclude columns from Schema if the inference is wrong. Changes take effect on the next sync or experiment run.

Versioning

Each component run produces a new semantic-layer version keyed by (datasetId, version). Versions are immutable and append-only. The latest version is what the UI reads; older versions remain queryable for audit and comparison. A new version is created when:

A connector is refreshed (live source pulls new data)
A CSV is re-uploaded
An experiment runs (manual or scheduled)
Schema or context changes trigger re-runs of dependent components

Limits

The active-dataset cap applies per account, not per connector:

Free: 3 active datasets total
Pro / Education: unlimited
Enterprise: unlimited

Get started

Core concepts

Data sources

Guides

Account & billing

Resources

What’s in a dataset

What you do with a dataset

The dataset detail page

Schema and types

Versioning

Limits

​What’s in a dataset

​What you do with a dataset

​The dataset detail page

​Schema and types

​Versioning

​Limits

What’s in a dataset

What you do with a dataset

The dataset detail page

Schema and types

Versioning

Limits