Chapter 5 Conduct LME in R: Example 2

Example 2. Data were derived from an experiment to determine how in vivo calcium (Ca++) activity of PV cells (measured longitudinally by the genetically encoded Ca++ indicator GCaMP6s) changes over time after ketamine treatment. We show four mice whose Ca++ event frequencies were measured at 24h, 48h, 72h, and 1 week after ketamine treatment and compare Ca++ event frequency at 24h to the other three time points. In total, Ca++ event frequencies of 1,724 neurons were measured. First let us evaluate the effect of ketamine using LM (or ANOVA, which ignores mouse-specific effect).

###load libraries
library(nlme)
library(lme4)
### read the data
Ex2=read.csv("https://www.ics.uci.edu/~zhaoxia/Data/BeyondTandANOVA/Example2.txt", head=T)
Ex2$treatment_idx=Ex2$treatment_idx-4
Ex2$treatment_idx=as.factor(Ex2$treatment_idx)
### change the variable of mouse IDs to a factor
Ex2$midx=as.factor(Ex2$midx)

5.1 Wrong analysis

### wrong analysis: using the linear model
lm.obj=lm(res~treatment_idx, data=Ex2)
summary(lm.obj)

## 
## Call:
## lm(formula = res ~ treatment_idx, data = Ex2)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.66802 -0.10602 -0.00916  0.09028  2.43137 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     0.714905   0.012337  57.946  < 2e-16 ***
## treatment_idx2 -0.078020   0.017011  -4.586 4.84e-06 ***
## treatment_idx3  0.009147   0.017189   0.532  0.59467    
## treatment_idx4  0.049716   0.016332   3.044  0.00237 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2414 on 1720 degrees of freedom
## Multiple R-squared:  0.03715,    Adjusted R-squared:  0.03548 
## F-statistic: 22.12 on 3 and 1720 DF,  p-value: 4.677e-14

The LM (including ANOVA, t-test) analysis results indicate significantly reduced Ca++ activity at 48h relative to 24h with \(p=4.8\times 10^{-6}\), and significantly increased Ca++ event frequency at 1week compared to 24h with \(p=2.4\times 10^{-3}\). However, if we account for repeated measures due to cells clustered in mice using LME, most of p-values are greater than 0.05 except that the overall p-value is 0.04.

5.2 LME

### lme
lme.obj=lme(res~treatment_idx, random= ~ 1| midx, data= Ex2, method="ML")
summary(lme.obj)
anova(lme.obj)

The results (estimates \(\pm\) s.e., and p-values) the Ca++ event frequency data using LM and LME (Example 2).

A TABLE HERE.

5.3 Why pooling data naively is not a good idea?

To understand the discrepancy between the results from LM and LME, we created boxplots using individual mice as well as all the mice (Figure 5.1). Although the pooled data and the corresponding p value from the LM show significant reduction in Ca++ activities from 24h to 48h, we see that the only mouse showing a noticeable reduction was Mouse 2. In fact, a close examination of Figure 5.1 suggests that there might be small increases in the other three mice.

knitr::include_graphics("Fig2S.jpg")

The boxplots of Ca++ event frequencies measured at four time points. (A) Boxplot of Ca++ event frequencies using the pooled neurons from four mice. (B) boxplots of Ca++ event frequencies stratified by individual mice.

Figure 5.1: The boxplots of Ca++ event frequencies measured at four time points. (A) Boxplot of Ca++ event frequencies using the pooled neurons from four mice. (B) boxplots of Ca++ event frequencies stratified by individual mice.

To examine why the pooled data follow the pattern of Mouse 2 and not that of other mice, we checked the number of neurons in each of the mouse-by-time combinations.

# one mouse contributed 43% cells
# the number of cells in each animal-time combination
table(Ex2$midx, Ex2$treatment_idx)

##    
##       1   2   3   4
##   1  81 254  88  43
##   2 206 101 210 222
##   3  33  18  51 207
##   4  63  52  58  37

# compute the percent of cells contributed by each mouse
rowSums(table(Ex2$midx, Ex2$treatment_idx))/1724

##         1         2         3         4 
## 0.2703016 0.4286543 0.1792343 0.1218097

The above table doesn’t work

The last column of the table above shows that Mouse 2 contributed 43% cells, which likely explains why the pooled data are more similar to Mouse 2 than to the other mice. The lesson from this example is that naively pooling data from different animals is a potentially dangerous practice, as the results can be dominated by a single animal that can misrepresent the data. Application of LME solves this troubling potential problem as it takes dependency and weighting into account.

5.4 Remark: on the minimum number of levels for using random-effects

In this example, the number of levels in the random-effects variable is four, as there are four mice. This number may be smaller than the recommended number for using random-effects. However, as discussed in Gelman and Hill (2006), using a random-effects model in this situation of a small sample size might not do much harm. An alternative is to include the animal ID variable as a factor with fixed animal effects in the conventional linear regression. Note that neither of the two analyses is the same as fitting a linear model to the pooled cells together, which erroneously ignores the between-animal heterogeneity and fails to account for the data dependency due to the within-animal similarity. In a more extreme case, for an experiment using only two monkeys for example, naively pooling the neurons from the two animals faces the risk of making conclusions mainly from one animal and unrealistic homogeneous assumptions across animals, as discussed above. A more appropriate approach is to analyze the animals separately and check whether the results from these two animals “replicate” each other. Exploratory analysis such as data visualization is highly recommended to identify potential issues.

References

Gelman, Andrew, and Jennifer Hill. 2006. Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge university press.