STAT200C Assignment 2
- Problem 1. The purpose of this problem is to explore the distribution of the trace of sample covariance. Suppose \(Y\sim Wishart(n-1, \Sigma)\), what is the distribution of \(\operatorname{tr}(Y)\)?
Set up:
Choose the sample size \(n\) to be 50 or 100.
Set the number of simulations to \(B=1000\).
Hint: Use the function
mvnnormfrom the RMASSlibrary to generate multivariate normal data.Hint: Ensure your covariance matrix \(\Sigma\) is positive definite. You can check this by computing the eigenvalues using the
eigenfunction in R and confirming that all eigenvalues are positive.
1.1 The case of the identity covariance matrix \(\Sigma = I_3\).
Generate \(B=1000\) datasets, each with \(n\) observations from \(N(\mathbf 0, \mathbf I_3)\).
For each dataset, calcualte the sample covariance matrix \(\mathbf S\) and record \((n-1)\operatorname{tr}(\mathbf S)\).
Guess: Based on the properties of the Wishart distribution, what is the exact distribution of \((n-1) \operatorname{tr}(S)\) when \(\Sigma=\mathbf I_3\)?
Check: Plot a histogram of your 1,000 recorded \((n-1) \operatorname{tr}(S)\). Overlay the theoretical function from your guess in previous step. Does the empirical distribution match your guess?
2.2: The general case.
Choose a \(3-by-3\) covariance matrix with non-zero covariances (the off-diagonal elements should not be 0).
Make sure that your \(\Sigma\) is valid (positive definite) and use \(pairs()\) to show the scatter plots of one simulated dataset to visualize the correlations.
Generate \(B=1000\) datasets using this \(\Sigma\) and record the trace of each sample covariance matrix \(\mathbf S\). Summarize the traces.
In the general case, \((n-1)\operatorname{tr}(S)\) does not follow a simple Chi-squared distribution but is instead a weighted sum (mixture) of independent Chi-squared variables. Calculate the average of your 1,000 simulated traces. Compare this average to \(tr(\Sigma)\).
- Problem 2. Find a good data example to conduct a two-sample Hotelling’s \(T^2\) test. Do not use the data example discussed in this course. Please (1) include visualizations as exploratory methods and (2) make conclusion in the context of the data example.