Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.summand.com/llms.txt

Use this file to discover all available pages before exploring further.

feature_metadata produces one short, human-readable description per column, plus a dataset-level summary. The descriptions are generated by Claude using column statistics and sample values as context. They show up in the dataset’s Configuration sidebar, in column tooltips, and most importantly as grounding the chat agent reads when answering questions about your data. This component runs once per dataset version. Re-running only re-generates if the schema or context has changed.

Why it exists

Most dataset columns have names like ord_amt_usd_ttl or mau_d30 — meaningful to whoever set up the warehouse, opaque to anyone else. Without semantic context:
  • Summand has to guess what a column represents from the name alone.
  • Chart suggestions and view-builder field labels read like database internals.
  • New teammates can’t navigate the dataset without a SME walking them through it.
feature_metadata is the bridge. The Anthropic API call sees the column name, dtype, sample values, and column stats, and returns a one-sentence plain-English description plus a dataset-level summary.

Inputs

None. The component reads the curated Parquet sample and the column-stats artifact.

Output shape

{
  "context": "E-commerce orders for an enterprise SaaS company. Each row is an order placed by a customer, with payment, geography, and post-checkout fulfilment status.",
  "features": {
    "order_id": "Unique identifier for the order. Always populated.",
    "customer_id": "Foreign key to the customers table. Always populated.",
    "ord_amt_usd_ttl": "Total order amount in USD, including tax and shipping. Range $0–$84k, mean ~$430.",
    "ship_country": "ISO 3166-1 alpha-2 country code where the order shipped. 78% US.",
    "checkout_at": "Timestamp the customer completed checkout, in UTC."
  }
}
context is the dataset-level summary; features is keyed by column name. The artifact is feature_metadata.json.

Display

The Feature metadata component ships with a bespoke React viewer (FeatureMetadataView):
  • Dataset context rendered at the top.
  • Per-column descriptions in a sortable table alongside the column’s dtype and missingness.
  • Inline editing — you can override any description; overrides take precedence on re-run.

Where the descriptions show up

  • Dataset detail → Components tab — viewable like any other component.
  • Configuration sidebar — column tooltips show the description.
  • View builder — the field picker shows the description as a hint under each column name.
  • Summand chat — the agent reads the dataset context and per-column descriptions on every chat turn, grounding answers in business meaning rather than column-name guesses.

Filtering from chat

Summand can ask for a specific column’s metadata:
analyze({ component: "feature_metadata", target: ..., params: { column_name: "ord_amt_usd_ttl" } })
Returns just that column’s description plus the dataset-level context.

Compute profile

ProfileMemoryTimeout
Lambda2 GB600 s
The long timeout is because the component calls the Anthropic API for descriptions. It’s still typically under a minute end-to-end; the timeout is there to absorb occasional API slowness.

Privacy

Column descriptions are generated by sending the Anthropic API a small payload: column names, dtypes, basic stats from column_stats, and a handful of sample values. The full dataset is not sent. Anthropic is a listed subprocessor — see Compliance for the data-handling agreement. If your organization has policies against sending column samples to external APIs, contact enterprise@summand.com — Enterprise customers can scope LLM-using components per-org.

Override and refresh

Dataset owners can override any description from the Components tab — overrides persist across re-runs. To regenerate descriptions (e.g. after major schema changes), re-run the component manually from the Components tab or include it in a scheduled experiment.