Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.summand.com/llms.txt

Use this file to discover all available pages before exploring further.

A component is a self-contained piece of analysis with typed inputs and outputs. Components are the things experiments run; they’re also what the dataset’s Components tab assigns and configures. You don’t write components — they’re built by Summand and exposed through a catalog the in-product editor reads to render forms, validate inputs, and dispatch runs.

The catalog

Five components are visible in the experiment editor:

Column stats

Per-column distributions, missingness, cardinality, and basic numeric statistics. Runs automatically on first ingest.

Predictors (EBM)

Fits an Explainable Boosting Machine and writes feature importance, shape functions, and pairwise interactions.

Forecast

Time-series projection with 95% prediction intervals, using a Holt-Winters model under the hood.

Feature metadata

LLM-generated, business-friendly descriptions for every dataset column. Powers chat grounding.

UMAP embedding

A 2-D projection of feature space for cluster visualization and “similar rows” queries.
Two other components (ebm_model, forecast_model) exist behind the scenes as dependencies — they hold the heavy fitted state and are consumed by the user-visible ones. You don’t pick them in the editor; they auto-run when their dependent component is selected.

Anatomy of a component

Every component declares the same metadata:
FieldPurpose
nameCatalog ID (e.g. column_stats, ebm_graphs). Stable across versions.
display.labelHuman-readable name shown in the UI.
descriptionOne-sentence summary.
dependenciesOther components that must run first. The dispatcher topologically orders them.
inputsTyped parameters the user fills in (column references, numeric thresholds, etc.).
compute_profileLambdaProfile for fast steps; EcsProfile(tier) for the heavier ones (EBM fitting, UMAP).
agent_configWhat the Summand chat agent can read and filter.
display.blocksHow the artifact renders in the UI — tables, charts, key/value pairs. Some components ship with bespoke React viewers instead.
When you select a component in the experiment editor, the catalog is what tells the form which inputs to render and how to validate them.

How components run

Experiment fires (manual / scheduled)


trigger-semantic-layer (Lambda)


Step Functions dispatcher

   ├─ ResolveComponentPlan (topological sort → waves)

   ├─ For each wave (in order):
   │     └─ Distributed Map fan-out (parallel within wave)
   │           ├─ Lambda components → component-runner Lambda
   │           └─ Fargate components → ECS task

   ├─ WriteManifest (consolidates results → manifest.json)

   └─ UpdateSemanticLayerRecord (marks version completed)

Outputs land in:
   - S3: semantic-layers/{datasetId}/versions/{N}/{component}.json
   - DynamoDB summand-task-outputs (per-run output items)
Components within a wave run in parallel; waves run sequentially because of dependencies.

Reading component outputs

Three places where outputs surface:
  • Dataset detail → Components tab. Per-component status plus the latest artifact, rendered with either the component’s declared blocks or a bespoke React viewer (for EBM, UMAP, and Feature metadata).
  • Summand chat. The analyze tool resolves the latest version of any component and returns its data inline. Filtering by column or feature name is supported through each component’s agent_config.
  • Downstream views. Components write to S3 paths and DynamoDB items you can join against in custom SQL views.

What’s not a component

A few analyses run outside the component catalog:
  • Surprise finding has its own Step Functions pipeline that runs alongside the semantic-layer dispatcher. It’s not selectable in the experiment editor today — surprises surface through their own page in the product.
  • Sync / curation (the read-from-source-and-write-to-Parquet step) is part of the connector pipeline, not the component catalog.
  • Chat itself isn’t a component — it’s a separate Lambda (claude-stream) that consumes component outputs via the analyze tool.

Catalog stability

The catalog is small and slow-moving. Most recent commits are bug fixes (cache improvements, edge-case handling) rather than new components. If you have a need for analysis the catalog doesn’t cover, custom components are on the roadmap for Enterprise — email enterprise@summand.com and tell us what you’d build.

Per-component details

  • Column stats — what runs by default on every dataset.
  • Predictors (EBM) — interpretable model fitting, with shape functions and pairwise interactions.
  • Forecast — time-series projection with prediction intervals.
  • Feature metadata — LLM-generated column descriptions.
  • UMAP embedding — 2-D projection for clustering and similarity.