Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.summand.com/llms.txt

Use this file to discover all available pages before exploring further.

Where a connector points at a source, a dataset is the unit Summand actually works with. One dataset = one table = one curated Parquet copy = one semantic-layer history.

What’s in a dataset

FieldPurpose
datasetIdStable identifier (ds_…). Survives renames.
connectorIdParent connector.
tableNameSource-side name (e.g. public.orders).
displayNameHuman-readable name in the UI.
glueDb / glueTableGlue catalog references — what views and Athena queries resolve to.
enabledWhether the dataset participates in experiments. Disabled datasets are inert but preserved.
statuspending, ready, or failed.
datasetContextFree-text description that grounds Summand’s answers and component prompts.
There’s no targetColumn field on the dataset itself. Predictive components (like the predictor) take the target column as a typed input when you configure an experiment — see Experiments.

What you do with a dataset

Three primary actions, in increasing order of commitment:
  1. Chat — open Summand and ask questions. The dataset’s curated Parquet, schema, column stats, and context are all available to the assistant. It can run one-shot SQL, generate charts, and propose views or experiments.
  2. Create a view over it — save a SQL transformation, optionally joining other datasets, queryable by chat and usable as an experiment source.
  3. Set up an experiment — schedule one or more components to run on the dataset (or a view of it) on a cron. Outputs are versioned.

The dataset detail page

Datasets have a configuration page with nine tabs. They’re all about tuning behavior, not browsing analysis output (analysis output lives in chat, the Surprises page, and component-output views):
TabWhat you do here
OverviewLast sync, freshness, summary stats.
ScheduleSync frequency for live connectors; CSV re-upload for CSVs.
SchemaColumn definitions, types, exclusions.
ComponentsWhich components are assigned to this dataset (and run during experiments).
ContextFree-text dataset context, surfaced to Summand and component prompts.
AccessSharing and permissions.
NotificationsAlerts for sync failures, drift, freshness lag.
Run historyPast sync and component-run logs.
AdvancedDeletion and danger zone.
Most users never visit anything beyond Overview.

Schema and types

Summand auto-infers types from a sample of the data on first ingest. You can override types and exclude columns from Schema if the inference is wrong. Changes take effect on the next sync or experiment run.

Versioning

Each component run produces a new semantic-layer version keyed by (datasetId, version). Versions are immutable and append-only. The latest version is what the UI reads; older versions remain queryable for audit and comparison. A new version is created when:
  • A connector is refreshed (live source pulls new data)
  • A CSV is re-uploaded
  • An experiment runs (manual or scheduled)
  • Schema or context changes trigger re-runs of dependent components

Limits

The active-dataset cap applies per account, not per connector:
  • Free: 3 active datasets total
  • Pro / Education: unlimited
  • Enterprise: unlimited