Linear Model

The Linear Model node fits a linear regression model to your data using the R lm() function and displays either a coefficient summary or an ANOVA table.

What it does

Fits a linear model using R formula syntax
Displays regression coefficients with standard errors, t-values, and p-values
Alternatively displays an ANOVA table (Type I, Type II, or Type III sums of squares)
Optionally stores the fitted model for use by downstream nodes

How to use it

Connect a data source — drag an edge from an Input CSV node to this node’s input handle
Choose output type — select either “Summary (Coefficients & R²)” or “ANOVA Table”
If using ANOVA output, choose ANOVA type — Type I (Sequential SS), Type II (Hierarchical SS), or Type III (Marginal SS)
Enter a formula — type an R formula in the format response ~ predictor (e.g. height ~ conc)
Click Run — the model is fitted on the server

Formula syntax

R formula syntax is used to specify the model. Here are some common patterns:

Formula	Meaning
`y ~ x`	Simple linear regression
`y ~ x1 + x2`	Multiple regression
`y ~ x1 * x2`	Interaction (includes main effects)
`y ~ x1 + I(x1^2)`	Polynomial term

Mathematically, the simple model \(y = \beta_0 + \beta_1 x + \epsilon\) is specified as y ~ x, where:

\(\beta_0\) is the intercept
\(\beta_1\) is the slope coefficient
\(\epsilon\) is the error term

Configuration

Setting	Required	Description
Upstream connection	Yes	A node providing data
Output type	Yes	“Summary” or “ANOVA”
ANOVA type	Conditional	Type I (Sequential SS), Type II (Hierarchical SS), or Type III (Marginal SS); shown when Output type is ANOVA
Formula	Yes	R formula (e.g. `height ~ conc`)
Output to storr	No	Store the fitted model for downstream use
Model name	Conditional	Required if outputting to storr
Comment	No	Annotation for generated R code

Output

Summary mode

Displays:

Model fit statistics — \(R^2\) and Adjusted \(R^2\)
Coefficients table — Term, Estimate, Std. Error, t value, \(Pr(>|t|)\)

ANOVA mode

Displays an ANOVA table with:

Term, Df, Sum Sq, Mean Sq, F value, \(Pr(>F)\)

By default, base R anova() uses Type I (sequential) sums of squares. For more complex hierarchical designs, or where interactions are present, consider Type II or Type III sums of squares.

Summary of Differences Between Type I, Type II, and Type III Sums of Squares

Feature	Type I (Sequential)	Type II (Hierarchical)	Type III (Adjusted)
Depends on order of terms	Yes	No	No
Adjustment	Each term adjusted only for terms entered before it	Adjusts each main effect for all other main effects, but not for interactions involving the effect	Adjusts each term for all other terms, including interactions
Handling of interactions	Does not adjust main effects for interactions	Does not adjust main effects for interactions	Main effects tested while controlling for interactions involving them
Balanced designs	Same results as Types II and III	Same results as Types I and III	Same results as Types I and II
Unbalanced designs	Order-dependent; can be misleading	Appropriate when interactions are absent or not of interest	Appropriate when interactions are present or full adjustment is required
Typical use	Hierarchical models, polynomial terms, nested designs	Classical factorial ANOVA without interactions	General unbalanced designs; common in SAS/SPSS
Interpretation	Extra variance explained when the term enters the model at that stage	Variance explained by the term, given the other main effects	Unique contribution of each term after adjusting for all other terms

The F-statistic tests the null hypothesis \(H_0: \beta_1 = \beta_2 = \dots = \beta_p = 0\) — that none of the predictors are related to the response.

Generated R code

my_data_lm <- lm(height ~ conc, data = my_data)
summary(my_data_lm)

Or for ANOVA output:

my_data_lm <- lm(height ~ conc, data = my_data)
anova(my_data_lm)

Type II ANOVA:

my_data_lm <- lm(height ~ conc, data = my_data)
car::Anova(my_data_lm, type = 2)

Type III ANOVA:

my_data_lm <- lm(height ~ conc, data = my_data)
car::Anova(my_data_lm, type = 3)

Tips

Variable names in the formula must match column names in your dataset exactly
The available columns are shown as clickable chips above the formula input — use them as a reference
If you enable “Output to storr”, downstream nodes can use the fitted model object
Check \(R^2\) to assess model fit: values closer to 1 indicate a better fit