Mean / SD
Mean / SD
The Mean / SD node calculates descriptive statistics — mean, standard deviation, variance, count, and standard error — grouped by one or more explanatory variables.
What it does
- Computes grouped summary statistics using R’s
aggregate()function - Calculates: mean (\(\bar{x}\)), standard deviation (\(s\)), variance (\(s^2\)), count (\(n\)), and standard error (\(SE = \frac{s}{\sqrt{n}}\))
- Optionally stores the result as a new dataframe for downstream use
How to use it
- Connect a data source — drag an edge from an Input CSV node
- Enter a formula — use the format
response ~ explanatory(e.g.weight ~ diet) - Optionally name the output — provide a dataframe name to store results
- Click Run
Formula syntax
| Formula | Meaning |
|---|---|
weight ~ diet |
Mean/SD of weight grouped by diet |
weight ~ diet + time |
Grouped by both diet and time |
Configuration
| Setting | Required | Description |
|---|---|---|
| Upstream connection | Yes | A node providing data |
| Formula | Yes | response ~ explanatory format |
| Output dataframe name | No | Store results under this name for downstream use |
| Comment | No | Annotation for generated R code |
Output
A table showing the computed statistics for each group:
| Group | mean | sd | var | n | se |
|---|---|---|---|---|---|
| A | 5.03 | 1.21 | 1.46 | 10 | 0.38 |
| B | 7.89 | 0.95 | 0.90 | 10 | 0.30 |
Generated R code
aggregate(
formula = weight ~ diet,
data = chick_data,
FUN = function(x) c(
mean = mean(x),
sd = sd(x),
var = var(x),
n = length(x),
se = sd(x) / sqrt(length(x))
)
)If you provide an output dataframe name, the result is assigned:
diet_summary <- aggregate(...)Tips
- The formula variables shown as “Uses: weight, diet” below the input help you verify that column names are correct
- Providing an output dataframe name lets you chain this node’s results into downstream nodes like Output CSV
- The standard error \(SE = \frac{s}{\sqrt{n}}\) is useful for constructing confidence intervals: \(\bar{x} \pm t_{\alpha/2} \times SE\)