Professor, Department of Statistics
2026-04-07
\[\begin{align*} \boldsymbol \mu & =E[\mathbf X]=\begin{pmatrix} E[X_1]\\ \vdots \\ E[X_p] \end{pmatrix}\\ \boldsymbol \Sigma_{p\times p} &= Cov(\mathbf X) = E[(\mathbf X -\boldsymbol \mu)(\mathbf X - \boldsymbol \mu)^T] \end{align*}\]
\[Y=\mathbf a^T \mathbf X,\] where \(\mathbf a\) is a \(p\times 1\) vector of constants, i.e., \(a=(a_1, \cdots, a_p)^T\).
\(Y=\mathbf a^T \mathbf X =\sum_{i}a_i X_i = a_1 X_1 + \cdots + a_p X_p\).
\(Y\) is a random variable, which is a linear combination of the random vector \(\mathbf X\). It is a univariate random variable.
\[a=(1/3, 1/3, 1/3)^T=\begin{pmatrix}1/3 \\ 1/3 \\ 1/3 \end{pmatrix} = \frac{1}{3} \begin{pmatrix}1 \\ 1 \\ 1 \end{pmatrix}= \frac{1}{3}\mathbf 1.\]
The mean of \(Y\) can be expressed as: \[ \begin{aligned} E(Y) &= E(\mathbf{a}^T\mathbf{X}) \\ &= \mathbf{a}^T E(\mathbf{X}) \\ &= \mathbf{a}^T \boldsymbol{\mu} \end{aligned} \]
Intuitively, the mean of \(Y\) is a weighted average of the components of \(\mathbf{X}\), with weights given by the corresponding components of \(\mathbf{a}\).
The variance of \(Y\) depends on the covariance structure of \(\mathbf{X}\), as well as the weights given by \(\mathbf{a}\).
We call forms like \(\mathbf a^T \boldsymbol \Sigma \mathbf a\) as quadratic forms.
Note, we can also write the variance of \(Y\) as:
\[ \begin{aligned} \mathbf a^T \boldsymbol \Sigma a &&\\ =& \sum_i\sum_j \sigma_{ij} a_i a_j&\\ =& \sigma_{11}a_1^2 + \sigma_{12}a_1 a_2 + \cdots + \sigma_{1p}a_1 a_p + \sigma_{21}a_2 a_1 + \cdots + \sigma_{p1}a_p a_1 + \cdots&\\ =& \sigma_{11}a_1^2 + \sigma_{22}a_2^2 + \cdots + \sigma_{pp}a_p^2 + 2(\sigma_{12}a_1 a_2+ \sigma_{13}a_1 a_3 + \cdots + \sigma_{p-1,p}a_{p-1} a_p)&\\ \end{aligned}\]
If \(\mathbf X\sim (\boldsymbol \mu, \boldsymbol \Sigma)\), then \(Y=\mathbf a^T\mathbf X\sim (\mathbf a^T\boldsymbol \mu, \mathbf a^T\boldsymbol \Sigma \mathbf a)\).
\(\mathbf a^T \boldsymbol \mu\) is the inner product between \(\mathbf a\) and \(\boldsymbol \mu\), \[\sum_{i=1}^p a_i \mu_i.\]
\(\mathbf a^T \boldsymbol \Sigma \mathbf a\) is a quadratic form: \[\mathbf a^T \boldsymbol \Sigma \mathbf a=\sum_{i=1}^p\sum_{j=1}^p a_i a_j \sigma_{ij},\] where \(\sigma_{ij}\) is the \((i,j)\)th element of \(\boldsymbol \Sigma\).
Recall that if \(X_1, \cdots, X_n\overset{iid}\sim (\boldsymbol \mu, \boldsymbol \Sigma)\), then
We can estimate its mean \(\mathbf a^T \boldsymbol \mu\) by \(\mathbf a^T \bar{\mathbf X}\), which is a linear combination of the sample mean vector.
We can estimate its variance, which is \(\mathbf a^T \boldsymbol \Sigma \mathbf a\), by \(\mathbf a^T \mathbf S \mathbf a\), which is a quadratic form of the sample covariance matrix.
Then \[Y=a^TX=\frac{1}{2}(X_1 + X_2)\]
\(E(Y)=\frac{1}{2}(\mu_1 + \mu_2)\).
\(Var(Y) = \frac{1}{4} (\sigma_1^2 + \sigma_2^2 + 2\sigma_{12})= \sum_{i=1}^2 \sum_{j=1}^2 a_i a_j \sigma_{ij}\). Note that \(\sigma_{12}=\sigma_{21}\).
Assume we have a random sample from a distribution with mean \(\mu\) and variance \(\sigma^2\), i.e., \(X_1, \cdots, X_n\overset{iid}\sim (\mu, \sigma^2)\).
We often stack the random variables vertically: \[\mathbf X_{n\times 1}=\begin{pmatrix} X_1 \\ \vdots \\ X_n\end{pmatrix}.\]
An equivalent expression, \(\mathbf X=(X_1, \cdots, X_n)^T\).
\[\begin{align*} E[\mathbf X] &=\mu \mathbf 1_n =\begin{pmatrix}\mu \\ \vdots \\ \mu\end{pmatrix}\\ Cov(\mathbf X) &=\sigma^2 \mathbf I_n = \begin{bmatrix} \sigma^2 & 0 & \cdots & 0 \\ 0 & \sigma^2 & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & \sigma^2 \end{bmatrix} \end{align*}\]
\[ b=\begin{pmatrix} 1/4 \\ 1/4 \\ 1/4 \\ 1/4 \end{pmatrix} \]
\[\begin{aligned} Y&=\mathbf Xb= \begin{pmatrix} X_1^T \\ \vdots \\ X_n^T \end{pmatrix} b = \begin{pmatrix} X_1^Tb \\ \vdots \\ X_n^Tb \end{pmatrix} =\begin{pmatrix} b^TX_1 \\ \vdots \\ b^TX_n \end{pmatrix}\\ & =\begin{pmatrix} \frac{x_{11} +x_{12} + x_{13} + x_{14}}{4} \\ \vdots \\ \frac{x_{n1} +x_{n2} + x_{n3} + x_{n4}}{4} \end{pmatrix} \end{aligned} \]
\(\sigma^2\) is unknown. But the sample variance \(s^2\) is an unbiased estimator of \(\sigma^2\), i.e., \(E[s^2]=\sigma^2\). We often use \(\hat\sigma^2=s^2\) to estimate \(\sigma^2\).
The sample variance \(s^2\) is defined as \(s^2=\frac{1}{n-1}\sum_{i=1}^n (X_i-\bar X)^2\)
The standard error of \(\bar X\) is defined as \(se(\bar X)=\sqrt{Var(\bar X)}=\frac{\sigma}{\sqrt{n}}\).
We can estimate it by \(se(\bar X)=\frac{s}{\sqrt{n}}\).
In many situations, the parameter of interest is a function of the means.
For example, we may be interested in the mean of a linear combination of the means, i.e., \(\mathbf a^T \boldsymbol \mu = \sum_{i=1}^p a_i \mu_i\), where \(\mathbf a=(a_1, \cdots, a_p)^T\) is a \(p\times 1\) vector.
In the following simulated study, we will show how to construct a large-sample confidence interval for \(\mathbf a^T \boldsymbol \mu\).
It is a random vector with
mean vector \(E[\bar{\mathbf X}]=\boldsymbol \mu\), i.e., the sample mean vector is unbaised for the population mean vector. \(\bar{\mathbf X}\) can be used to estimate \(\boldsymbol \mu\).
covariance matrix \(Cov(\bar{\mathbf X}) = \frac{1}{n} \boldsymbol \Sigma\)
The sample covariance matrix is \[\mathbf S = \frac{1}{n-1}\sum_{i=1}^n(\mathbf X_i-\bar{\mathbf X})(\mathbf X_i-\bar{\mathbf X})^T\]
It is unbiased for \(\boldsymbol \Sigma\), i.e., \(E[\mathbf S]=\boldsymbol \Sigma\).
We showed that \[\mathbf S= \frac{1}{n-1} \mathbf X^T \mathbf C \mathbf X \] where \(\mathbf C_{n\times n}=\mathbf I-\frac{1}{n}\mathbf J=\mathbf I-\frac{1}{n}\mathbf 1 \mathbf 1^T\)
This expression is helpful when we derive the distribution of \(\mathbf S\).
Two basic questions:
How to estimate it?
What is the standard error of the estimator?
We have shown that \(\bar{\mathbf X}\sim (\boldsymbol \mu, \frac{1}{n}\boldsymbol \Sigma)\), which implies that \[\mathbf a^T \bar{\mathbf X} \sim (\mathbf a^T \boldsymbol \mu, \mathbf a^T \frac{1}{n}\boldsymbol \Sigma \mathbf a)\]
An unbiased estimator of \(\mathbf a^T \boldsymbol \mu\) is \(\mathbf a^T \bar{\mathbf X}\), which is a linear combination of the sample mean vector.
The standard error of \(\mathbf a^T \bar{\mathbf X}\) is defined as \(se(\mathbf a^T \bar{\mathbf X})=\sqrt{Var(\mathbf a^T \bar{\mathbf X})}=\sqrt{\mathbf a^T \frac{1}{n}\boldsymbol \Sigma \mathbf a}\).
We can estimate it by \(se(\mathbf a^T \bar{\mathbf X})=\sqrt{\mathbf a^T \frac{1}{n}\mathbf S \mathbf a}\).
Confidence intervals can be constructed by \(\mathbf a^T \bar{\mathbf X} \pm z_{\alpha/2} se(\mathbf a^T \bar{\mathbf X})\) for large-sample C.I. and \(\mathbf a^T \bar{\mathbf X} \pm t_{\alpha/2} se(\mathbf a^T \bar{\mathbf X})\) for small-sample C.I.
We will explain exact and asymptotic distributions of \(\mathbf a^T \bar{\mathbf X}\) later.
This is a simulated data set
For adults, the recommended range of daily protein intake is between 0.8 g/kg and 1.8 g/kg of body weight
60 observations
4 sources of proteins
meat
dairy
vegetables / nuts / tofu
other
eigen() decomposition
$values
[1] 8.8 4.0 4.0 4.0
$vectors
[,1] [,2] [,3] [,4]
[1,] -0.5 0.8660254 0.0000000 0.0000000
[2,] -0.5 -0.2886751 -0.5773503 -0.5773503
[3,] -0.5 -0.2886751 -0.2113249 0.7886751
[4,] -0.5 -0.2886751 0.7886751 -0.2113249
meat dairy veg other
[1,] 29.08891 17.54865 5.814221 7.264953
[2,] 23.65965 13.06336 8.734581 9.452868
[3,] 26.43410 16.83504 9.278807 8.409798
[4,] 21.68232 15.51922 3.379171 5.954558
[5,] 22.22387 15.45446 8.804571 7.562144
[6,] 25.54395 16.46835 8.556332 10.299174
[7,] 20.15075 14.71290 10.660378 7.584075
[8,] 25.44330 14.98680 4.866275 6.323171
[9,] 23.41142 16.34138 6.667006 6.164109
[10,] 28.21604 16.64242 5.874860 7.078538
[11,] 22.58127 13.61817 5.178349 5.652878
[12,] 22.19211 16.04745 8.714666 6.732854
[13,] 25.97926 16.80008 7.189986 9.716474
[14,] 25.66703 20.61869 13.775770 9.078236
[15,] 20.16010 16.09623 5.020107 8.049388
[16,] 24.57145 18.88263 6.894722 5.917792
[17,] 23.25621 14.96338 10.680367 7.196099
[18,] 22.60198 16.38243 5.220357 6.195492
[19,] 22.91070 15.01628 7.664447 5.536308
[20,] 22.09802 15.96519 6.882419 7.530778
[21,] 21.65197 16.70303 8.420131 3.772608
[22,] 22.60577 11.60921 9.084612 8.060032
[23,] 25.92991 15.29974 10.415539 3.912418
[24,] 24.31179 20.74766 11.504271 11.239025
[25,] 24.10939 18.66505 3.604761 5.943392
[26,] 24.65994 13.87394 12.148044 5.651083
[27,] 26.07243 12.43699 7.787358 10.627558
[28,] 25.65462 17.71194 11.203610 10.155746
[29,] 25.35010 17.99635 9.018100 6.472298
[30,] 23.84272 16.53127 8.723904 4.422476
[31,] 21.04508 13.96795 5.848427 7.077553
[32,] 26.24455 14.99075 8.126523 7.248014
[33,] 25.43487 15.58447 6.258134 6.422490
[34,] 25.29261 18.33085 5.907034 6.788726
[35,] 28.79099 17.70326 11.313192 6.362595
[36,] 25.58286 15.77977 11.144694 5.954819
[37,] 22.37370 16.52364 8.412213 11.029752
[38,] 23.09505 18.58350 6.704652 7.968698
[39,] 20.24731 15.93243 8.673596 4.620260
[40,] 22.04807 13.03450 6.280768 10.108768
[41,] 23.16952 18.11268 5.688636 10.005273
[42,] 24.44874 16.62458 8.455707 7.974149
[43,] 21.38847 14.99773 6.050585 9.428152
[44,] 23.44805 14.45326 6.170624 8.625406
[45,] 23.88782 19.45005 7.836918 8.911576
[46,] 28.11042 11.39956 9.940819 10.746740
[47,] 24.70061 15.72228 6.630922 6.783132
[48,] 24.43655 17.83334 4.404197 4.766235
[49,] 24.83206 15.88387 8.316900 7.633714
[50,] 25.60672 13.17837 6.049305 5.938031
[51,] 22.30839 14.24987 5.246096 11.833706
[52,] 24.10819 20.38798 4.572755 10.562201
[53,] 25.97482 14.87898 5.463695 7.658656
[54,] 24.54808 17.48112 10.983295 9.687974
[55,] 21.51529 14.44885 6.041175 5.492619
[56,] 20.38223 13.44285 5.149195 5.276100
[57,] 23.99043 15.17236 9.281141 9.734778
[58,] 25.06526 16.68357 6.961285 13.484693
[59,] 24.01093 12.41687 8.165424 8.026655
[60,] 23.89317 14.91410 7.783749 10.210253
[,1] [,2] [,3] [,4]
[1,] 24.03403 15.92836 7.66049 7.738634
meat dairy veg other
meat 4.2956426 0.8150757 1.1294478 0.5532420
dairy 0.8150757 4.4052993 0.3497889 0.2337300
veg 1.1294478 0.3497889 5.1705794 0.5897121
other 0.5532420 0.2337300 0.5897121 4.5287293
Suppose we only have a random sample and we would like to make inference of the following:
Q1: Construct a large-sample (approximate) C.I. for protein from meat. In other words, the parameter of interest is \(\mu_1\).
Q2: Construct a large-sample C.I. for the total protein intake
Q3: Construct a large-sample C.I. for the difference of protein intake between from meat and from vegetable
Q1: Construct a large-sample (approximate) C.I. for protein from meat. In other words, the parameter of interest is \(\mu_1\).
Estimate \(\bar X_{(1)}=24.0\).
We need compute the standard error (s.e.) of \(\bar X_1\), which is defined as \(se(\bar X_{(1)})=\sqrt{\hat{var}(\bar X_{(1)})}\)
Two ways to compute the s.e.,
\(se(\bar X_{(1)})=\sqrt{4.2956/60}=0.27\)
The calculation can also be done by noticing that \(\bar X_1\) is a linear combination of \(\bar{\mathbf X}\): \(\bar X_{(1)} =\mathbf a^T \bar{\mathbf X}\), where \(\mathbf a^T=(1, 0, 0, 0)\). Thus,
\[\hat{Var}(\bar X_{(1)})=\mathbf a^T \frac{\mathbf S}{60} \mathbf a\]
Q2: Construct a large-sample C.I. for the total protein intake.
The parameter of interest is \(\mu_1+\mu_2+\mu_3+\mu_4=\mathbf a^T \boldsymbol \mu\), where \(\mathbf a=(1,1,1,1)^T\).
Estimate: \(\mathbf a^T \bar{\mathbf X}\)
Standard error: \[\sqrt{\mathbf a^T\frac{\mathbf S}{n} \mathbf a}\]
[1] 0.2675706
[,1]
[1,] 0.2675706
Both methods give \(s.e.(\bar X_{(1)})=0.27\)
An approximate 95% C.I. for \(\mu_1\) is \(24.0 \pm 1.96*0.27\)
[,1]
[1,] 16.37354
[,1]
[1,] 0.3465864
[1] 15.69423 17.05285