Documentation Index
Fetch the complete documentation index at: https://docs.summand.com/llms.txt
Use this file to discover all available pages before exploring further.
column_stats is the baseline component. It runs on first ingest of any dataset (no experiment configuration needed) and powers the Overview tab, dataset summaries shown to Summand, and the schema-level views the rest of the product builds on.
What it computes
For every column in the dataset:- Common stats —
name,dtype,count,nullCount,nullPct,uniqueCount,uniquePct. - Numeric columns —
min,max,mean,median,std, first and third quartiles,zeroCount,zeroPct. - Categorical / string columns — top 10 distinct values with counts.
totalRows, totalColumns.
Inputs
None.column_stats takes no configuration — it always runs on the full dataset.
Output shape
column_stats.json to the dataset’s versioned S3 path.
Where it shows up
- Dataset detail → Overview tab reads the latest
column_statsto render the column-by-column summary. - Summand chat queries it via the
analyzetool — “What’s the missingness ofrevenue?” resolves to a filtered read of the artifact. - The Predictors component uses column stats internally to make feature-engineering decisions (numeric vs. categorical, low- vs. high-cardinality).
Compute profile
| Profile | Memory | Timeout |
|---|---|---|
| Lambda | 2 GB | 120 s |