Lecture 4: A Random Sample from A Multivariate Distribution

Zhaoxia Yu

Professor, Department of Statistics

2026-04-27

Code

knitr::opts_chunk$set(echo = TRUE)
library(tidyr) #the pipe (%>%) tool is extremely useful
library(MASS)
library(corrplot)#for visualizing the corr matrix of the iris data
library(car)

Outline

Linear functions of a random vector
Inference of Means
Inference of a Linear Combination of Mean
A Simulation Study

Linear Combinations

Definition

Random Vector

Let \(\mathbf X_{p\times 1}=(X_1, \cdots, X_p)^T\) be a \(p\)-dimensional random vector. Its mean vector and covariance are defined as follows:

\[\begin{align*} \boldsymbol \mu & =E[\mathbf X]=\begin{pmatrix} E[X_1]\\ \vdots \\ E[X_p] \end{pmatrix}\\ \boldsymbol \Sigma_{p\times p} &= Cov(\mathbf X) = E[(\mathbf X -\boldsymbol \mu)(\mathbf X - \boldsymbol \mu)^T] \end{align*}\]

A Linear Combination of a Random Vector

Consider a linear combination of \(\mathbf X\):

\[Y=\mathbf a^T \mathbf X,\] where \(\mathbf a\) is a \(p\times 1\) vector of constants, i.e., \(a=(a_1, \cdots, a_p)^T\).

\(Y=\mathbf a^T \mathbf X =\sum_{i}a_i X_i = a_1 X_1 + \cdots + a_p X_p\).
\(Y\) is a random variable, which is a linear combination of the random vector \(\mathbf X\). It is a univariate random variable.

Examples

Let \(\mathbf X=(X_1, X_2, X_3)^T\) be a random vector.

\(Y= (\frac{1}{3}, \frac{1}{3}, \frac{1}{3})\mathbf X = \frac{1}{3}(X_1 + X_2+ X_3)\). In this case,

\[a=(1/3, 1/3, 1/3)^T=\begin{pmatrix}1/3 \\ 1/3 \\ 1/3 \end{pmatrix} = \frac{1}{3} \begin{pmatrix}1 \\ 1 \\ 1 \end{pmatrix}= \frac{1}{3}\mathbf 1.\]

\(Y= (1, 1, 0)\mathbf X = X_1 + X_2\). In this case, \[a=(1, 1, 0)^T=\begin{pmatrix}1 \\ 1 \\ 0\end{pmatrix}\]

Mean

Mean of \(Y=a^TX\)

The mean of \(Y\) can be expressed as: \[ \begin{aligned} E(Y) &= E(\mathbf{a}^T\mathbf{X}) \\ &= \mathbf{a}^T E(\mathbf{X}) \\ &= \mathbf{a}^T \boldsymbol{\mu} \end{aligned} \]
Intuitively, the mean of \(Y\) is a weighted average of the components of \(\mathbf{X}\), with weights given by the corresponding components of \(\mathbf{a}\).

Variance

Variance of \(Y\)

The variance of \(Y\) can be expressed as: \[ \begin{aligned} \text{Var}(Y) &= \text{Var}(\mathbf{a}^T\mathbf{X}) \\ &= E[(\mathbf{a}^T\mathbf{X}-\mathbf{a}^T\boldsymbol\mu)^2]\\ &=E[(\mathbf{a}^T\mathbf{X}-\mathbf{a}^T\boldsymbol\mu)(\mathbf{a}^T\mathbf{X}-\mathbf{a}^T\boldsymbol\mu)^T]\\ &=E[\mathbf{a}^T(\mathbf{X}-\boldsymbol\mu)(\mathbf{X}-\boldsymbol\mu)^T\mathbf a]\\ &= \mathbf{a}^T \boldsymbol{\Sigma} \mathbf{a} \end{aligned} \]

Variance of \(Y\) and Quadratic Forms

The variance of \(Y\) depends on the covariance structure of \(\mathbf{X}\), as well as the weights given by \(\mathbf{a}\).
We call forms like \(\mathbf a^T \boldsymbol \Sigma \mathbf a\) as quadratic forms.
Note, we can also write the variance of \(Y\) as:

\[ \begin{aligned} \mathbf a^T \boldsymbol \Sigma a &&\\ =& \sum_i\sum_j \sigma_{ij} a_i a_j&\\ =& \sigma_{11}a_1^2 + \sigma_{12}a_1 a_2 + \cdots + \sigma_{1p}a_1 a_p + \sigma_{21}a_2 a_1 + \cdots + \sigma_{p1}a_p a_1 + \cdots&\\ =& \sigma_{11}a_1^2 + \sigma_{22}a_2^2 + \cdots + \sigma_{pp}a_p^2 + 2(\sigma_{12}a_1 a_2+ \sigma_{13}a_1 a_3 + \cdots + \sigma_{p-1,p}a_{p-1} a_p)&\\ \end{aligned}\]

Quadratic Forms

A quadratic form is a polynomial of degree 2 in the components of a vector, and it can be expressed as: \[Q(\mathbf{X}) = \mathbf{X}^T \mathbf{A} \mathbf{X}\] where \(\mathbf{A}\) is a symmetric matrix.
We will discuss distributions of certain quadratics forms later.

Summary: Mean and Var of \(Y=\mathbf a^T \mathbf X\)

If \(\mathbf X\sim (\boldsymbol \mu, \boldsymbol \Sigma)\), then \(Y=\mathbf a^T\mathbf X\sim (\mathbf a^T\boldsymbol \mu, \mathbf a^T\boldsymbol \Sigma \mathbf a)\).
\(\mathbf a^T \boldsymbol \mu\) is the inner product between \(\mathbf a\) and \(\boldsymbol \mu\), \[\sum_{i=1}^p a_i \mu_i.\]
\(\mathbf a^T \boldsymbol \Sigma \mathbf a\) is a quadratic form: \[\mathbf a^T \boldsymbol \Sigma \mathbf a=\sum_{i=1}^p\sum_{j=1}^p a_i a_j \sigma_{ij},\] where \(\sigma_{ij}\) is the \((i,j)\)th element of \(\boldsymbol \Sigma\).

Estimation of Mean and Variance of \(Y\)

Recall that if \(X_1, \cdots, X_n\overset{iid}\sim (\boldsymbol \mu, \boldsymbol \Sigma)\), then
- \(\bar{X}\sim (\boldsymbol \mu, \frac{1}{n}\boldsymbol \Sigma)\)
- \(E[\mathbf S]=\boldsymbol \Sigma\) where \[\mathbf S=\frac{1}{n-1}\sum_{i=1}^n(X_i-\bar{X})(X_i-\bar{X})^T.\]
We can estimate its mean \(\mathbf a^T \boldsymbol \mu\) by \(\mathbf a^T \bar{\mathbf X}\), which is a linear combination of the sample mean vector.
We can estimate its variance, which is \(\mathbf a^T \boldsymbol \Sigma \mathbf a\), by \(\mathbf a^T \mathbf S \mathbf a\), which is a quadratic form of the sample covariance matrix.

Example 1

\(\mathbf X = (X_1, X_2)^T\), \(a=(1/2, 1/2)^T\).

Then \[Y=a^TX=\frac{1}{2}(X_1 + X_2)\]

\(E(Y)=\frac{1}{2}(\mu_1 + \mu_2)\).
\(Var(Y) = \frac{1}{4} (\sigma_1^2 + \sigma_2^2 + 2\sigma_{12})= \sum_{i=1}^2 \sum_{j=1}^2 a_i a_j \sigma_{ij}\). Note that \(\sigma_{12}=\sigma_{21}\).

Example 2

Assume we have a random sample from a distribution with mean \(\mu\) and variance \(\sigma^2\), i.e., \(X_1, \cdots, X_n\overset{iid}\sim (\mu, \sigma^2)\).
We often stack the random variables vertically: \[\mathbf X_{n\times 1}=\begin{pmatrix} X_1 \\ \vdots \\ X_n\end{pmatrix}.\]
An equivalent expression, \(\mathbf X=(X_1, \cdots, X_n)^T\).

Example 2

The random vector \(\mathbf X\) formed by a random sample from a distribution with mean \(\mu\) and variance \(\sigma^2\) has the following mean vector and covariance matrix:

\[\begin{align*} E[\mathbf X] &=\mu \mathbf 1_n =\begin{pmatrix}\mu \\ \vdots \\ \mu\end{pmatrix}\\ Cov(\mathbf X) &=\sigma^2 \mathbf I_n = \begin{bmatrix} \sigma^2 & 0 & \cdots & 0 \\ 0 & \sigma^2 & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & \sigma^2 \end{bmatrix} \end{align*}\]

Example 2

We can express the sample mean as a linear combination of the random vector \(\mathbf X\): \[\bar X= \frac{1}{n}X_1 + \cdots + \frac{1}{n}X_n=\frac{1}{n}\mathbf 1^T \mathbf X,\] where \(\mathbf 1=(1, \cdots, 1)^T\) is a \(n\times 1\) vector.
By the linear combination results, we have \[\begin{align*} E[\bar X]&=\frac{1}{n} \mathbf 1^T E[\mathbf X]=\frac{1}{n} \mathbf 1^T \mu \mathbf 1=\mu\\ Var[\bar X]&=\frac{1}{n^2} \mathbf 1^T Cov(\mathbf X) \mathbf 1=\frac{1}{n^2} \mathbf 1^T \sigma^2 \mathbf I_n \mathbf 1 = \frac{1}{n}\sigma^2. \end{align*}\]

Example 2

\(\mathbf 1\) vs \(\mathbf 1^T\):
- \(\mathbf 1\) is a \(n\times 1\) vector, which is a column vector.
- \(\mathbf 1^T\) is a \(1\times n\) vector, which is a row vector.
\(\mathbf 1^T \mathbf 1\) vs \(\mathbf 1 \mathbf 1^T\):
- inner product: \(\mathbf 1^T \mathbf 1\) is a \(1\times 1\) scalar, which is equal to \(n\).
- outer product: \(\mathbf 1 \mathbf 1^T\) is a \(n\times n\) matrix, which is a matrix of all ones. We often denote it by \(\mathbf J\), i.e., \(\mathbf J=\mathbf 1 \mathbf 1^T\)

Example 3

Linear Combinations of Iris Setosa Features

Recall that for the iris setosa, \(\mathbf X\) is \(50\times 4\).
Consider a linear combination of the features \(Y= \mathbf Xb\), where

\[ b=\begin{pmatrix} 1/4 \\ 1/4 \\ 1/4 \\ 1/4 \end{pmatrix} \]

\(Yb\) is a \(50\times 1\) vector, with the \(i\)th row be the average of the four features of the \(i\)th iris setosa flower. To see this

Linear Combinations of Iris Setosa Features

\[\begin{aligned} Y&=\mathbf Xb= \begin{pmatrix} X_1^T \\ \vdots \\ X_n^T \end{pmatrix} b = \begin{pmatrix} X_1^Tb \\ \vdots \\ X_n^Tb \end{pmatrix} =\begin{pmatrix} b^TX_1 \\ \vdots \\ b^TX_n \end{pmatrix}\\ & =\begin{pmatrix} \frac{x_{11} +x_{12} + x_{13} + x_{14}}{4} \\ \vdots \\ \frac{x_{n1} +x_{n2} + x_{n3} + x_{n4}}{4} \end{pmatrix} \end{aligned} \]

Linear Combinations of Iris Setosa Features: sample mean

Code

setosa=as.matrix(iris[iris$Species=="setosa", 1:4])
sample.meanvec=matrix(colMeans(setosa), 4, 1)
sample.cov=cov(setosa)
b=matrix(1/4, 4, 1)
Y=setosa%*%b
#sample mean of Y: the following two results are the same
mean(Y)

[1] 2.5355

Code

t(b)%*%sample.meanvec

       [,1]
[1,] 2.5355

Linear Combinations of Iris Setosa Features: sample variance

Code

#sample variance of Y: the following two results are the same
var(Y)

           [,1]
[1,] 0.03844617

Code

t(b)%*%cov(setosa)%*%b

           [,1]
[1,] 0.03844617

Inference of a Mean

The Mean of a Univariate Distribution

A random sample \(X_1, \cdots, X_n\) from a univariate distribution with mean \(\mu\) and variance \(\sigma^2\).
We are interested in making inference about the population mean \(\mu\).
The sample mean \(\bar X\) is an unbiased estimator of \(\mu\), i.e., \(E[\bar X]=\mu\). We often use \(\hat\mu=\bar X\) to estimate \(\mu\).
How to quantify the uncertainty of \(\hat\mu\)? Recall that \(Var(\bar X)=\frac{\sigma^2}{n}\)..

Variance, Sample Variance, and Standard Error

\(\sigma^2\) is unknown. But the sample variance \(s^2\) is an unbiased estimator of \(\sigma^2\), i.e., \(E[s^2]=\sigma^2\). We often use \(\hat\sigma^2=s^2\) to estimate \(\sigma^2\).
The sample variance \(s^2\) is defined as \(s^2=\frac{1}{n-1}\sum_{i=1}^n (X_i-\bar X)^2\)
The standard error of \(\bar X\) is defined as \(se(\bar X)=\sqrt{Var(\bar X)}=\frac{\sigma}{\sqrt{n}}\).
We can estimate it by \(se(\bar X)=\frac{s}{\sqrt{n}}\).

Confidence Intervals for \(\mu\)

A ``large-sample’’ (approximate) confidence interval for \(\mu\) is given by \[\bar X \pm z_{\alpha/2} se(\bar X)\] where \(z_{\alpha/2}\) is the upper \(\alpha/2\) quantile of the standard normal distribution.
A ``small-sample’’ (approximate) confidence interval for \(\mu\) is given by \[\bar X \pm t_{\alpha/2} se(\bar X)\] where \(t_{\alpha/2}\) is the upper \(\alpha/2\) quantile of the t-distribution with \(n-1\) degrees of freedom.

Inference of a Linear Combination of Means

Linear Combinations of Means

In many situations, the parameter of interest is a function of the means.
For example, we may be interested in the mean of a linear combination of the means, i.e., \(\mathbf a^T \boldsymbol \mu = \sum_{i=1}^p a_i \mu_i\), where \(\mathbf a=(a_1, \cdots, a_p)^T\) is a \(p\times 1\) vector.
In the following simulated study, we will show how to construct a large-sample confidence interval for \(\mathbf a^T \boldsymbol \mu\).

A Random Sample From a Multivariate Distribution

Consider a random sample \(\mathbf X_1, \cdots, \mathbf X_n\) from a multivariate distribution with mean vector \(\boldsymbol \mu_{p\times 1}\) and covariance matrix \(\boldsymbol \Sigma_{p\times p}\).
We often stack the random vectors to form an \(n\times p\) matrix: \[\mathbf X_{n\times p}=\begin{pmatrix}\mathbf X_1^T \\ \vdots \\ \mathbf X_n^T\end{pmatrix}\]

Sample Mean Vector

Sample mean vector is \[\bar{\mathbf X}=\frac{1}{n}\sum_{i=1}^n \mathbf X_i=(\frac{1}{n}\mathbf 1_n^T \mathbf X)^T\]

It is a random vector with

mean vector \(E[\bar{\mathbf X}]=\boldsymbol \mu\), i.e., the sample mean vector is unbaised for the population mean vector. \(\bar{\mathbf X}\) can be used to estimate \(\boldsymbol \mu\).
covariance matrix \(Cov(\bar{\mathbf X}) = \frac{1}{n} \boldsymbol \Sigma\)

Sample Covariance Matrix

The sample covariance matrix is \[\mathbf S = \frac{1}{n-1}\sum_{i=1}^n(\mathbf X_i-\bar{\mathbf X})(\mathbf X_i-\bar{\mathbf X})^T\]
It is unbiased for \(\boldsymbol \Sigma\), i.e., \(E[\mathbf S]=\boldsymbol \Sigma\).
We showed that \[\mathbf S= \frac{1}{n-1} \mathbf X^T \mathbf C \mathbf X \] where \(\mathbf C_{n\times n}=\mathbf I-\frac{1}{n}\mathbf J=\mathbf I-\frac{1}{n}\mathbf 1 \mathbf 1^T\)
This expression is helpful when we derive the distribution of \(\mathbf S\).

Inference of \(\mathbf a^T \boldsymbol \mu\)

Two basic questions:
- How to estimate it?
- What is the standard error of the estimator?
We have shown that \(\bar{\mathbf X}\sim (\boldsymbol \mu, \frac{1}{n}\boldsymbol \Sigma)\), which implies that \[\mathbf a^T \bar{\mathbf X} \sim (\mathbf a^T \boldsymbol \mu, \mathbf a^T \frac{1}{n}\boldsymbol \Sigma \mathbf a)\]
An unbiased estimator of \(\mathbf a^T \boldsymbol \mu\) is \(\mathbf a^T \bar{\mathbf X}\), which is a linear combination of the sample mean vector.

Standard Error of \(\mathbf a^T \bar{\mathbf X}\)

The standard error of \(\mathbf a^T \bar{\mathbf X}\) is defined as \(se(\mathbf a^T \bar{\mathbf X})=\sqrt{Var(\mathbf a^T \bar{\mathbf X})}=\sqrt{\mathbf a^T \frac{1}{n}\boldsymbol \Sigma \mathbf a}\).
We can estimate it by \(se(\mathbf a^T \bar{\mathbf X})=\sqrt{\mathbf a^T \frac{1}{n}\mathbf S \mathbf a}\).
Confidence intervals can be constructed by \(\mathbf a^T \bar{\mathbf X} \pm z_{\alpha/2} se(\mathbf a^T \bar{\mathbf X})\) for large-sample C.I. and \(\mathbf a^T \bar{\mathbf X} \pm t_{\alpha/2} se(\mathbf a^T \bar{\mathbf X})\) for small-sample C.I.
We will explain exact and asymptotic distributions of \(\mathbf a^T \bar{\mathbf X}\) later.

A Simulated Study

Description and Parameters

This is a simulated data set
For adults, the recommended range of daily protein intake is between 0.8 g/kg and 1.8 g/kg of body weight
60 observations
4 sources of proteins
- meat
- dairy
- vegetables / nuts / tofu
- other

Choose Mean Vector and Covariance Matrix

The multivariate distribution has
- mean vector \[\boldsymbol \mu=\begin{pmatrix}24, 16, 8, 8\end{pmatrix}^T\]
- covariane matrix \[\boldsymbol \Sigma=4* \begin{pmatrix} 1.3 & 0.3 & 0.3 & 0.3\\ 0.3 & 1.3 & 0.3 & 0.3\\ 0.3 & 0.3 & 1.3 & 0.3\\ 0.3 & 0.3 & 0.3 & 1.3 \end{pmatrix}\]

Define Mean Vector and Covariance Matrix in R

Code

#the library "MASS" is required
library(MASS)
my.cov=4*(diag(4) + 0.3* rep(1,4)%o%rep(1,4))
eigen(my.cov)#to check whether the cov matrix is p.d.

eigen() decomposition
$values
[1] 8.8 4.0 4.0 4.0

$vectors
     [,1]       [,2]       [,3]       [,4]
[1,] -0.5  0.8660254  0.0000000  0.0000000
[2,] -0.5 -0.2886751 -0.5773503 -0.5773503
[3,] -0.5 -0.2886751 -0.2113249  0.7886751
[4,] -0.5 -0.2886751  0.7886751 -0.2113249

Code

my.mean=8*c(3,2,1,1)
n=60

Generate a random sample from the multivariate distribution

Simulate A Random Sample

Code

set.seed(1)
x=mvrnorm(n, mu=my.mean, Sigma=my.cov)
dim(x)

[1] 60  4

Code

protein=as.matrix(data.frame(meat=x[,1],dairy=x[,2], 
                             veg=x[,3], other=x[,4]))

The simulated data

Code

protein

          meat    dairy       veg     other
 [1,] 29.08891 17.54865  5.814221  7.264953
 [2,] 23.65965 13.06336  8.734581  9.452868
 [3,] 26.43410 16.83504  9.278807  8.409798
 [4,] 21.68232 15.51922  3.379171  5.954558
 [5,] 22.22387 15.45446  8.804571  7.562144
 [6,] 25.54395 16.46835  8.556332 10.299174
 [7,] 20.15075 14.71290 10.660378  7.584075
 [8,] 25.44330 14.98680  4.866275  6.323171
 [9,] 23.41142 16.34138  6.667006  6.164109
[10,] 28.21604 16.64242  5.874860  7.078538
[11,] 22.58127 13.61817  5.178349  5.652878
[12,] 22.19211 16.04745  8.714666  6.732854
[13,] 25.97926 16.80008  7.189986  9.716474
[14,] 25.66703 20.61869 13.775770  9.078236
[15,] 20.16010 16.09623  5.020107  8.049388
[16,] 24.57145 18.88263  6.894722  5.917792
[17,] 23.25621 14.96338 10.680367  7.196099
[18,] 22.60198 16.38243  5.220357  6.195492
[19,] 22.91070 15.01628  7.664447  5.536308
[20,] 22.09802 15.96519  6.882419  7.530778
[21,] 21.65197 16.70303  8.420131  3.772608
[22,] 22.60577 11.60921  9.084612  8.060032
[23,] 25.92991 15.29974 10.415539  3.912418
[24,] 24.31179 20.74766 11.504271 11.239025
[25,] 24.10939 18.66505  3.604761  5.943392
[26,] 24.65994 13.87394 12.148044  5.651083
[27,] 26.07243 12.43699  7.787358 10.627558
[28,] 25.65462 17.71194 11.203610 10.155746
[29,] 25.35010 17.99635  9.018100  6.472298
[30,] 23.84272 16.53127  8.723904  4.422476
[31,] 21.04508 13.96795  5.848427  7.077553
[32,] 26.24455 14.99075  8.126523  7.248014
[33,] 25.43487 15.58447  6.258134  6.422490
[34,] 25.29261 18.33085  5.907034  6.788726
[35,] 28.79099 17.70326 11.313192  6.362595
[36,] 25.58286 15.77977 11.144694  5.954819
[37,] 22.37370 16.52364  8.412213 11.029752
[38,] 23.09505 18.58350  6.704652  7.968698
[39,] 20.24731 15.93243  8.673596  4.620260
[40,] 22.04807 13.03450  6.280768 10.108768
[41,] 23.16952 18.11268  5.688636 10.005273
[42,] 24.44874 16.62458  8.455707  7.974149
[43,] 21.38847 14.99773  6.050585  9.428152
[44,] 23.44805 14.45326  6.170624  8.625406
[45,] 23.88782 19.45005  7.836918  8.911576
[46,] 28.11042 11.39956  9.940819 10.746740
[47,] 24.70061 15.72228  6.630922  6.783132
[48,] 24.43655 17.83334  4.404197  4.766235
[49,] 24.83206 15.88387  8.316900  7.633714
[50,] 25.60672 13.17837  6.049305  5.938031
[51,] 22.30839 14.24987  5.246096 11.833706
[52,] 24.10819 20.38798  4.572755 10.562201
[53,] 25.97482 14.87898  5.463695  7.658656
[54,] 24.54808 17.48112 10.983295  9.687974
[55,] 21.51529 14.44885  6.041175  5.492619
[56,] 20.38223 13.44285  5.149195  5.276100
[57,] 23.99043 15.17236  9.281141  9.734778
[58,] 25.06526 16.68357  6.961285 13.484693
[59,] 24.01093 12.41687  8.165424  8.026655
[60,] 23.89317 14.91410  7.783749 10.210253

Sample Mean and Sample Covariance

Code

xbar=matrix(colMeans(protein), 4, 1)
t(xbar)

         [,1]     [,2]    [,3]     [,4]
[1,] 24.03403 15.92836 7.66049 7.738634

Code

S=cov(protein)
S

           meat     dairy       veg     other
meat  4.2956426 0.8150757 1.1294478 0.5532420
dairy 0.8150757 4.4052993 0.3497889 0.2337300
veg   1.1294478 0.3497889 5.1705794 0.5897121
other 0.5532420 0.2337300 0.5897121 4.5287293

Estimation

An unbiased estimator of \(\boldsymbol\mu\) is the sample mean vector, i.e., \(\hat{\boldsymbol \mu}=\bar{\mathbf X}\).
An unbiased estimator of \(\boldsymbol \Sigma\) is the sample covariance matrix \(\mathbf S\), i.e., \(\hat{\boldsymbol \Sigma}=\mathbf S\)
We have shown that \(Var(\bar{\mathbf X})=\frac{1}{n}\boldsymbol \Sigma\), where \(n=60\).
We can estimate it by \[\hat{Var}(\bar{\mathbf X})=\frac{1}{60}\mathbf S\]

Linear Functions/Combinations: Three Questions

Suppose we only have a random sample and we would like to make inference of the following:
Q1: Construct a large-sample (approximate) C.I. for protein from meat. In other words, the parameter of interest is \(\mu_1\).
Q2: Construct a large-sample C.I. for the total protein intake
Q3: Construct a large-sample C.I. for the difference of protein intake between from meat and from vegetable

Linear Functions/Combinations: Question 1

Q1: Construct a large-sample (approximate) C.I. for protein from meat. In other words, the parameter of interest is \(\mu_1\).
Estimate \(\bar X_{(1)}=24.0\).
We need compute the standard error (s.e.) of \(\bar X_1\), which is defined as \(se(\bar X_{(1)})=\sqrt{\hat{var}(\bar X_{(1)})}\)
Two ways to compute the s.e.,
1. \(se(\bar X_{(1)})=\sqrt{4.2956/60}=0.27\)
2. The calculation can also be done by noticing that \(\bar X_1\) is a linear combination of \(\bar{\mathbf X}\): \(\bar X_{(1)} =\mathbf a^T \bar{\mathbf X}\), where \(\mathbf a^T=(1, 0, 0, 0)\). Thus,

\[\hat{Var}(\bar X_{(1)})=\mathbf a^T \frac{\mathbf S}{60} \mathbf a\]

Linear Functions/Combinations: Question 2

Q2: Construct a large-sample C.I. for the total protein intake.
The parameter of interest is \(\mu_1+\mu_2+\mu_3+\mu_4=\mathbf a^T \boldsymbol \mu\), where \(\mathbf a=(1,1,1,1)^T\).
Estimate: \(\mathbf a^T \bar{\mathbf X}\)
Standard error: \[\sqrt{\mathbf a^T\frac{\mathbf S}{n} \mathbf a}\]

Linear Functions/Combinations: Question 3

Q3: Construct a large-sample C.I. for the difference of protein intake between from meat and from vegetable
The parameter of interest is \(\mu_1 - \mu_3=\mathbf a^T \boldsymbol \mu\), where \(\mathbf a=(1, 0, -1, 0)^T\).
Estimate: \(\mathbf a^T \bar{\mathbf X}\)
Standard error: \(\sqrt{\mathbf a^T\frac{\mathbf S}{n} \mathbf a}\)

R Code

Question 1

R code to compute using the above two ways

Code

sqrt(S[1,1]/60) # Method 1

[1] 0.2675706

Code

# Method 2
a=matrix(c(1,0,0,0),4,1)
sqrt(t(a)%*%S%*%a/60)

          [,1]
[1,] 0.2675706

Both methods give \(s.e.(\bar X_{(1)})=0.27\)
An approximate 95% C.I. for \(\mu_1\) is \(24.0 \pm 1.96*0.27\)

Question 2

Code

a=matrix(1,4,1)
t(a)%*% xbar #estimate

         [,1]
[1,] 55.36152

Code

sqrt(t(a)%*%S%*%a/60) #standard error

          [,1]
[1,] 0.6550095

Code

#a large-sample 95% C.I. 
c(t(a)%*% xbar- 1.96*sqrt(t(a)%*%S%*%a/60), 
  t(a)%*% xbar+ 1.96*sqrt(t(a)%*%S%*%a/60))

[1] 54.07770 56.64534

Question 3

Code

a=matrix(c(1,0,-1,0),4,1)
t(a)%*% xbar #estimate

         [,1]
[1,] 16.37354

Code

sqrt(t(a)%*%S%*%a/60) #standard error

          [,1]
[1,] 0.3465864

Code

#a large-sample 95% C.I. 
c(t(a)%*% xbar- 1.96*sqrt(t(a)%*%S%*%a/60), 
  t(a)%*% xbar+ 1.96*sqrt(t(a)%*%S%*%a/60))

[1] 15.69423 17.05285

Code

knitr::knit_exit()