Column stats

column_stats is the baseline component. It runs on first ingest of any dataset (no experiment configuration needed) and powers the Overview tab, dataset summaries shown to Summand, and the schema-level views the rest of the product builds on.

What it computes

For every column in the dataset:

Common stats — name, dtype, count, nullCount, nullPct, uniqueCount, uniquePct.
Numeric columns — min, max, mean, median, std, first and third quartiles, zeroCount, zeroPct.
Categorical / string columns — top 10 distinct values with counts.

Plus dataset-level totals: totalRows, totalColumns.

Inputs

None. column_stats takes no configuration — it always runs on the full dataset.

Output shape

{
  "totalRows": 50000,
  "totalColumns": 24,
  "columns": [
    {
      "name": "revenue",
      "dtype": "float64",
      "count": 49872,
      "nullCount": 128,
      "nullPct": 0.26,
      "uniqueCount": 12451,
      "uniquePct": 24.94,
      "min": 0.0,
      "max": 9842.5,
      "mean": 142.7,
      "median": 89.5,
      "std": 218.3,
      "q1": 32.0,
      "q3": 184.2,
      "zeroCount": 4,
      "zeroPct": 0.01
    },
    {
      "name": "tier",
      "dtype": "object",
      "count": 50000,
      "nullCount": 0,
      "nullPct": 0.0,
      "uniqueCount": 3,
      "uniquePct": 0.01,
      "topValues": [
        { "value": "free",    "count": 38240 },
        { "value": "pro",     "count":  9810 },
        { "value": "enterprise", "count": 1950 }
      ]
    }
  ]
}

The artifact is written as column_stats.json to the dataset’s versioned S3 path.

Where it shows up

Dataset detail → Overview tab reads the latest column_stats to render the column-by-column summary.
Summand chat queries it via the analyze tool — “What’s the missingness of revenue?” resolves to a filtered read of the artifact.
The Predictors component uses column stats internally to make feature-engineering decisions (numeric vs. categorical, low- vs. high-cardinality).

Compute profile

Profile	Memory	Timeout
Lambda	2 GB	120 s

For most datasets this finishes in a few seconds. The component reads the curated Parquet, computes the stats in pandas, and writes the JSON.

Filtering from chat

Summand can ask for a single column instead of the whole stats blob:

analyze({ component: "column_stats", target: ..., params: { column_name: "revenue" } })

The agent receives just that column’s stats — efficient even on wide tables.

Get started

Core concepts

Data sources

Guides

Account & billing

Resources

What it computes

Inputs

Output shape

Where it shows up

Compute profile

Filtering from chat

​What it computes

​Inputs

​Output shape

​Where it shows up

​Compute profile

​Filtering from chat

What it computes

Inputs

Output shape

Where it shows up

Compute profile

Filtering from chat