Chapter 3 Mixed-effects model analysis

The word “mixed” in linear mixed-effects (LME) means that the model consists of both fixed and random effects. Fixed effects refer to fixed but unknown coefficients for the variables of interest and explanatory covariates, as identified in the traditional linear model (LM). Random effects, refer to variables that are not of direct interest – however, they may potentially lead to correlated outcomes. A major difference between fixed and random effects is that the fixed effects are considered as parameters whereas the random effects are considered as random variables drawn from a distribution (e.g., a normal distribution).

In order to apply the LME, it is necessary to understand its inner workings in sufficient detail. Let $Y_{ij}$ indicate the $j$th observed response of the $i$th mouse, and $x_{ij}$ be the treatment label, with $x_{ij}=1$ for baseline, $x_{ij}=2$ for 24 hours, $x_{ij}=3$ for 48 hours, $x_{ij}=4$ for 72 hours, and $x_{ij}=5$ for 1 week after ketamine treatments. In Example 1, Ex1$res is the responses, and Ex1$treatment_idx is the treatment label variable. It is important to remember that because this is the variable of variables, the vectors are factors, rather than numerical. In last chapter, we used the following code to make sure the computing software R understands this

Ex1 = read.csv("https://www.ics.uci.edu/~zhaoxia/Data/BeyondTandANOVA/Example1.txt", head=T)
Ex1$treatment_idx = as.factor(Ex1$treatment_idx)

In in the inner mathematical computation, four dummy variables, which take value 0 or 1, are generated: $x_{ij,1} = 1$ for 24 hours, $x_{ij,2} = 1$ for 48 hours, $x_{ij,3} = 1$ for 72 hours, and $x_{ij,4} = 1$ for 1 week after ketamine treatments, respectively. Remarks. (1) Note that the variable for baseline $x_{ij,0} = 0$ is not needed in the equation, as the effect at baseline serves as the reference for other groups. (2) users do NOT need to define the the dummy variables, as they are generated as a inner step in the mathematical computation. We present the inner working to help readers understand how the parameters ($\beta_0, \beta_1, \beta_2, \beta_3, \beta_4$) are connected to the treatment levels.

Because there are multiple observations from the same animal, the data are naturally clustered by animal. We account for the resulting dependence by adding an animal specific mean to the regression framework discussed in the previous section, as follows:

\[Y_{ij} = \beta_0 + x_{ij,1}\beta_1 + … + x_{ij,4}\beta_4 + u_i + \epsilon_{ij}, i=1, …, 24; j=1, …, n_i;\]

where $n_i$ is the number of observations from the $i$th mouse, $u_i$ indicates the deviance between the overall intercept $\beta_0$ and the mean specific to the $i$th mouse, and $\epsilon_{ij}$ represents the deviation in pCREB immunoreactivity of observation (cell) $j$ in mouse $i$ from the mean pCREB immunoreactivity of mouse i. Among the coefficients, the coefficients of the fixed-effects component, ($\beta_0, \beta_1, \beta_2, \beta_3, \beta_4$), are assumed to be fixed but unknown, whereas ($u_1, \cdots, u_{24}$) are treated as independent and identically distributed random variables from a normal distribution with mean 0 and a variance parameter that reflects the variation across animals. It is important to notice that the cluster/animal-specific means are more generally referred to as random intercepts in an LME. Similar to the treatment variable, for the animal ID variable, the users do not need to define the dummy variables, which are generated by R automatically in its inner working. Thus, equivalently, one could write the previous equation by using a vector ($z_{ij,1}, …, z_{ij,24}$) of dummy variables for the cluster/animal memberships such that $z_{ij,k}=1$ for $i=k$ and 0 otherwise:

\[Y_{ij} = \beta_0 + x_{ij,1}\beta_1 + … + x_{ij,4}\beta_4 + z_{ij,1}u_1 + … + z_{ij,24}u_{24} + Ԑ_{ij}, i=1, …, 24; j=1, …, n_i;\] In the model above, $Y_{ij}$ is modeled by four components: the overall intercept $\beta_0$, which is the population mean of the reference group in this example, the fixed-effects from the covariates ($x_{xij,1}, …, x_{ij,4}$), the random-effects due to the clustering ($z_{ij,1}, …, z_{ij,24}$), and the random errors $\epsilon_{ij}$’s, assumed to be independently and identically distributed (i.i.d.) from a normal distribution with mean 0.

It is often convenient to write the LME in a very general matrix form, which was first derived in (Henderson et al. (1959)). This format gives a compact expression of the linear mixed-effects model:

$Y= \beta_0 \mathbf{1} + X\beta +Zu + Ԑ,$

where $Y$ is an n-by-1 vector of individual observations, $\mathbf{1}$ is the $n-by-1$ vector of ones, the columns of $X$ are predictors whose coefficients $\beta$, a $p-by-1$ vector, are assumed to be fixed but unknown, the columns of $Z$ are the variables whose coefficients $u$, a $q-by-1$ vector, are random variables drawn from a distribution with mean 0 and a partially or completely unknown covariance matrix, and $\epsilon_{ij}$ is the residual random error.

References

Henderson, Charles R, Oscar Kempthorne, Shayle R Searle, and CM Von Krosigk. 1959. “The Estimation of Environmental and Genetic Trends from Records Subject to Culling.” Biometrics 15 (2): 192–218.