Components

A component is a self-contained piece of analysis with typed inputs and outputs. Components are the things experiments run; they’re also what the dataset’s Components tab assigns and configures. You don’t write components — they’re built by Summand and exposed through a catalog the in-product editor reads to render forms, validate inputs, and dispatch runs.

The catalog

Six components are visible in the experiment editor:

Column stats

Per-column distributions, missingness, cardinality, and basic numeric statistics. Runs automatically on first ingest.

Predictors (EBM)

Fits an Explainable Boosting Machine and writes feature importance, shape functions, and pairwise interactions.

Correlation-association matrix

Correlations and other measures of association for every column pair. Beta

Forecast

Time-series projection with 95% prediction intervals, using a Holt-Winters model under the hood.

Outlier report

Identifies unusual values and anomalous rows. Beta

Feature metadata

LLM-generated, business-friendly descriptions for every dataset column. Powers chat grounding.

UMAP embedding

A 2-D projection of feature space for cluster visualization and “similar rows” queries.

Missing-value patterns

Identifies patterns in the missing data. Beta

Distribution profile

Per-column information about the shape of the data distribution. Beta

Two other components (ebm_model, forecast_model) exist behind the scenes as dependencies — they hold the heavy fitted state and are consumed by the user-visible ones. You don’t pick them in the editor; they auto-run when their dependent component is selected.

Anatomy of a component

Every component declares the same metadata:

Field	Purpose
`name`	Catalog ID (e.g. `column_stats`, `ebm_graphs`). Stable across versions.
`display.label`	Human-readable name shown in the UI.
`description`	One-sentence summary.
`dependencies`	Other components that must run first. The dispatcher topologically orders them.
`inputs`	Typed parameters the user fills in (column references, numeric thresholds, etc.).
`compute_profile`	`LambdaProfile` for fast steps; `EcsProfile(tier)` for the heavier ones (EBM fitting, UMAP).
`agent_config`	What the Summand chat agent can read and filter.
`display.blocks`	How the artifact renders in the UI — tables, charts, key/value pairs. Some components ship with bespoke React viewers instead.

When you select a component in the experiment editor, the catalog is what tells the form which inputs to render and how to validate them.

How components run

Experiment fires (manual / scheduled)
   │
   ▼
trigger-semantic-layer (Lambda)
   │
   ▼
Step Functions dispatcher
   │
   ├─ ResolveComponentPlan (topological sort → waves)
   │
   ├─ For each wave (in order):
   │     └─ Distributed Map fan-out (parallel within wave)
   │           ├─ Lambda components → component-runner Lambda
   │           └─ Fargate components → ECS task
   │
   ├─ WriteManifest (consolidates results → manifest.json)
   │
   └─ UpdateSemanticLayerRecord (marks version completed)

Outputs land in:
   - S3: semantic-layers/{datasetId}/versions/{N}/{component}.json
   - DynamoDB summand-task-outputs (per-run output items)

Components within a wave run in parallel; waves run sequentially because of dependencies.

Reading component outputs

Three places where outputs surface:

Dataset detail → Components tab. Per-component status plus the latest artifact, rendered with either the component’s declared blocks or a bespoke React viewer (for EBM, UMAP, and Feature metadata).
Summand chat. The analyze tool resolves the latest version of any component and returns its data inline. Filtering by column or feature name is supported through each component’s agent_config.
Downstream views. Components write to S3 paths and DynamoDB items you can join against in custom SQL views.

What’s not a component

A few analyses run outside the component catalog:

Surprise finding has its own Step Functions pipeline that runs alongside the semantic-layer dispatcher. It’s not selectable in the experiment editor today — surprises surface through their own page in the product.
Sync / curation (the read-from-source-and-write-to-Parquet step) is part of the connector pipeline, not the component catalog.
Chat itself isn’t a component — it’s a separate Lambda (claude-stream) that consumes component outputs via the analyze tool.

Catalog stability

The catalog is small and slow-moving. Most recent commits are bug fixes (cache improvements, edge-case handling) rather than new components. If you have a need for analysis the catalog doesn’t cover, custom components are on the roadmap for Enterprise — email enterprise@summand.com and tell us what you’d build.

Per-component details

Column stats — what runs by default on every dataset.
Predictors (EBM) — interpretable model fitting, with shape functions and pairwise interactions.
Correlation-association matrix — meaasures of association between every column pair.
Forecast — time-series projection with prediction intervals.
Feature metadata — LLM-generated column descriptions.
UMAP embedding — 2-D projection for clustering and similarity.

Get started

Core concepts

Data sources

Guides

Account & billing

Resources

The catalog

Column stats

Predictors (EBM)

Correlation-association matrix

Forecast

Outlier report

Feature metadata

UMAP embedding

Missing-value patterns

Distribution profile

Anatomy of a component

How components run

Reading component outputs

What’s not a component

Catalog stability

Per-component details

​The catalog

Column stats

Predictors (EBM)

Correlation-association matrix

Forecast

Outlier report

Feature metadata

UMAP embedding

Missing-value patterns

Distribution profile

​Anatomy of a component

​How components run

​Reading component outputs

​What’s not a component

​Catalog stability

​Per-component details

The catalog

Anatomy of a component

How components run

Reading component outputs

What’s not a component

Catalog stability

Per-component details