Summary Statistics
Summary Statistics
The Summary Statistics node runs R’s summary() function on your entire dataset, giving you a quick overview of all variables.
What it does
- Applies
summary()to the connected dataframe - For numeric columns: shows min, 1st quartile (\(Q_1\)), median, mean (\(\bar{x}\)), 3rd quartile (\(Q_3\)), and max
- For categorical columns: shows frequency counts of each level
How to use it
- Connect a data source — drag an edge from an Input CSV node
- Click Run — no additional configuration needed
This is the simplest node to use — just connect data and run it.
Configuration
| Setting | Required | Description |
|---|---|---|
| Upstream connection | Yes | A node providing data |
| Comment | No | Annotation for generated R code |
There are no user-configurable parameters beyond the upstream connection.
Output
A summary table showing descriptive statistics for each column. For example:
| height | weight | species | |
|---|---|---|---|
| Min. | 2.10 | 15.0 | setosa: 50 |
| 1st Qu. | 4.30 | 28.0 | versicolor: 50 |
| Median | 5.10 | 34.0 | virginica: 50 |
| Mean | 5.84 | 42.0 | |
| 3rd Qu. | 6.40 | 56.0 | |
| Max. | 7.90 | 79.0 |
Generated R code
summary(my_data)Tips
- This is a great first step after loading data — it quickly reveals the range, central tendency, and distribution of your variables
- Look for unexpected
NAvalues in the summary, which may indicate missing data - For more targeted statistics, use the Mean / SD node with a formula
- The five-number summary (min, \(Q_1\), median, \(Q_3\), max) is the basis of a box plot