In 2016 I started thinking about using sum to zero contrasts and wrote a brief blog post about them. Someone read it! I got this email the other day:
Thank you for this post. I have difficulties understanding it though because the text below the first R output seems not to match with the model output. At first, one cannot see which species is species1,2 etc. Second the numbers are also not correct… Would be cool if you could check it and correct it because then it would be a useful post!
oops. Maybe didn’t get all the bugs out. I’ll re-run everything there and fix it. Also, I’ve recently figured out why setting sum to zero contrasts looses the names of the species in the summary, so I’ll add that description here.
So the problem is:
data(iris)
library(dplyr) #Stay in the tidyverse!
iris <- iris %>% mutate(cSepal.Length = Sepal.Length - mean(Sepal.Length))
iris$szSpecies <- iris$Species
contrasts(iris$szSpecies) <- contr.sum(3)
m2 <- lm(Sepal.Width ~ cSepal.Length * szSpecies, data = iris)
(summary_m2 <- summary(m2))
##
## Call:
## lm(formula = Sepal.Width ~ cSepal.Length * szSpecies, data = iris)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.72394 -0.16327 -0.00289 0.16457 0.60954
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.21278 0.04098 78.395 < 2e-16 ***
## cSepal.Length 0.45005 0.04900 9.185 4.11e-16 ***
## szSpecies1 0.88386 0.07086 12.473 < 2e-16 ***
## szSpecies2 -0.47240 0.04680 -10.094 < 2e-16 ***
## cSepal.Length:szSpecies1 0.34848 0.08038 4.335 2.72e-05 ***
## cSepal.Length:szSpecies2 -0.13033 0.06553 -1.989 0.0486 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2723 on 144 degrees of freedom
## Multiple R-squared: 0.6227, Adjusted R-squared: 0.6096
## F-statistic: 47.53 on 5 and 144 DF, p-value: < 2.2e-16
Notice that instead of Speciessetosa I have Species1 in the coefficient names.
contr.matrix <- contr.sum(3)
colnames(contr.matrix) <- c("setosa", "versicolor")
contrasts(iris$szSpecies) <- contr.matrix
m2 <- lm(Sepal.Width ~ cSepal.Length * szSpecies, data = iris)
(summary_m2 <- summary(m2))
##
## Call:
## lm(formula = Sepal.Width ~ cSepal.Length * szSpecies, data = iris)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.72394 -0.16327 -0.00289 0.16457 0.60954
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.21278 0.04098 78.395 < 2e-16 ***
## cSepal.Length 0.45005 0.04900 9.185 4.11e-16 ***
## szSpeciessetosa 0.88386 0.07086 12.473 < 2e-16 ***
## szSpeciesversicolor -0.47240 0.04680 -10.094 < 2e-16 ***
## cSepal.Length:szSpeciessetosa 0.34848 0.08038 4.335 2.72e-05 ***
## cSepal.Length:szSpeciesversicolor -0.13033 0.06553 -1.989 0.0486 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2723 on 144 degrees of freedom
## Multiple R-squared: 0.6227, Adjusted R-squared: 0.6096
## F-statistic: 47.53 on 5 and 144 DF, p-value: < 2.2e-16
And now we have the names back!