Conduct PCA

We use princomp() function to do PCA. This function allows us to find principal components even when we don’t have the entire data matrix and only the covariance or correlation matrix is available.

pca.everitt = princomp(covmat = everitt)  
summary(pca.everitt, loadings = T)

## Importance of components:
##                           Comp.1    Comp.2    Comp.3    Comp.4     Comp.5
## Standard deviation     1.6518727 1.0624463 0.7844052 0.7764077 0.72285155
## Proportion of Variance 0.4547806 0.1881320 0.1025486 0.1004681 0.08708573
## Cumulative Proportion  0.4547806 0.6429126 0.7454612 0.8459293 0.93301505
##                            Comp.6
## Standard deviation     0.63396346
## Proportion of Variance 0.06698495
## Cumulative Proportion  1.00000000
## 
## Loadings:
##      Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6
## [1,]  0.400  0.418  0.207  0.455  0.629  0.138
## [2,]  0.417  0.273  0.677 -0.347 -0.383 -0.160
## [3,]  0.313  0.602 -0.662 -0.145 -0.283       
## [4,]  0.445 -0.393         0.233 -0.336  0.693
## [5,]  0.449 -0.351 -0.169  0.394 -0.130 -0.689
## [6,]  0.411 -0.333 -0.176 -0.664  0.498

# The coefficients (eigenvectors) are displayed under 'Loadings'
# Each column vector is the vector of coefficients for the corresponding principal component (PC)

Below are the obtained eigenvectors corresponding to each principal component.

(A <- loadings(pca.everitt))

## 
## Loadings:
##      Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6
## [1,]  0.400  0.418  0.207  0.455  0.629  0.138
## [2,]  0.417  0.273  0.677 -0.347 -0.383 -0.160
## [3,]  0.313  0.602 -0.662 -0.145 -0.283       
## [4,]  0.445 -0.393         0.233 -0.336  0.693
## [5,]  0.449 -0.351 -0.169  0.394 -0.130 -0.689
## [6,]  0.411 -0.333 -0.176 -0.664  0.498       
## 
##                Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6
## SS loadings     1.000  1.000  1.000  1.000  1.000  1.000
## Proportion Var  0.167  0.167  0.167  0.167  0.167  0.167
## Cumulative Var  0.167  0.333  0.500  0.667  0.833  1.000

# SS loadings here are just the squared sum of the coefficients for each component which is 1 (meaningless...)
# Both signs & relative magnitudes of weights are important in interpreting the PCs
# PC1 can be interpreted as the general intelligence/ability
# PC2 is a component that distinguishes the first three exams and the last three exams
# Children at the high end of the second PC perform disproportionally better on the first three exams (literacy-related) and worse on the last three exams (math-related)

So we can express each principal component (PC) as:

\[ PC_1 = 0.400 X_1 + 0.417 X_2 + 0.313 X_3 + 0.445 X_4 + 0.449 X_5 + 0.411 X_6\\ PC_2 = 0.418 X_1 + 0.273 X_2 + 0.602 X_3 - 0.393 X_4 - 0.351 X_5 - 0.333 X_6\\ PC_3 = 0.207 X_1 + 0.677 X_2 - 0.662 X_3 - \textit{0.023} X_4 - 0.169 X_5 - 0.176 X_6\\ PC_4 = 0.455 X_1 - 0.347 X_2 - 0.145 X_3 + 0.233 X_4 + 0.394 X_5 - 0.664 X_6\\ PC_5 = 0.629 X_1 - 0.383 X_2 - 0.283 X_3 - 0.336 X_4 - 0.130 X_5 + 0.498 X_6\\ PC_6 = 0.138 X_1 - 0.160 X_2 + \textit{0.030} X_3 + 0.693 X_4 - 0.689 X_5 + \textit{0.006} X_6\\ \]

Proportion of variance explained

The variance of PC scores can be calculated as \(var(PC_i)=\mathbf{a_i^TSa_i}\) which is equivalent to eigenvalue \(\lambda_i\).

(lambda <- pca.everitt$sdev^2) # eigenvalues (correspond to the variance of each PC)

##    Comp.1    Comp.2    Comp.3    Comp.4    Comp.5    Comp.6 
## 2.7286835 1.1287922 0.6152914 0.6028089 0.5225144 0.4019097

# The first PC appears to be about three times more powerful than a single original variable

The total variance of PCs (i.e., the sum of eigenvalues (\(=\Sigma_{i=1}^p \lambda_i\)) and this equals the total variance of the original variables (i.e., \(=trace(\mathbf{S})\)).

sum(lambda)  # total variance of PCs = total number of variables (correlation matrix)

## [1] 6

The proportion of the variance explained by the principal components can be calculated by dividing the variance of each component with the total variance. \[\text{Proportion of variance explained by } PC_i =\frac{\lambda_i}{\Sigma_{i=1}^p \lambda_i}=\frac{\lambda_i}{trace(\mathbf{S})}\]

prop <- c(lambda[1]/sum(lambda), lambda[2]/sum(lambda),
          lambda[3]/sum(lambda), lambda[4]/sum(lambda),
          lambda[5]/sum(lambda), lambda[6]/sum(lambda)) 
prop

##     Comp.1     Comp.2     Comp.3     Comp.4     Comp.5     Comp.6 
## 0.45478058 0.18813203 0.10254857 0.10046814 0.08708573 0.06698495

summary(pca.everitt)

## Importance of components:
##                           Comp.1    Comp.2    Comp.3    Comp.4     Comp.5
## Standard deviation     1.6518727 1.0624463 0.7844052 0.7764077 0.72285155
## Proportion of Variance 0.4547806 0.1881320 0.1025486 0.1004681 0.08708573
## Cumulative Proportion  0.4547806 0.6429126 0.7454612 0.8459293 0.93301505
##                            Comp.6
## Standard deviation     0.63396346
## Proportion of Variance 0.06698495
## Cumulative Proportion  1.00000000

Number of components to extrract

1. Scree plot

screeplot(pca.everitt)  # suggests extracting 2 components

#or
plot(lambda, xlab="Component number", ylab="Component variance (eigenvalue)", type="l", pch=16, main="Scree plot")

2. The proportion of variance-accounted-for

Let’s say we hope to extract PCs until 80% of the total variance is explained.

cum.prop <- c(sum(prop[1]), sum(prop[1:2]), 
              sum(prop[1:3]), sum(prop[1:4]), 
              sum(prop[1:5]), sum(prop[1:6]))
cum.prop

## [1] 0.4547806 0.6429126 0.7454612 0.8459293 0.9330151 1.0000000

summary(pca.everitt)   # suggests extracting 3-4 components

## Importance of components:
##                           Comp.1    Comp.2    Comp.3    Comp.4     Comp.5
## Standard deviation     1.6518727 1.0624463 0.7844052 0.7764077 0.72285155
## Proportion of Variance 0.4547806 0.1881320 0.1025486 0.1004681 0.08708573
## Cumulative Proportion  0.4547806 0.6429126 0.7454612 0.8459293 0.93301505
##                            Comp.6
## Standard deviation     0.63396346
## Proportion of Variance 0.06698495
## Cumulative Proportion  1.00000000

3. Keep PCs having eigenvalues greater than the average eigenvalues

average <- sum(lambda)/length(lambda)  # produces 1 as we analyzed correlation matrix
lambda > average   # suggests extracting 2 components

## Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 
##   TRUE   TRUE  FALSE  FALSE  FALSE  FALSE

screeplot(pca.everitt); abline(h=1, col="red")

## The methods for deciding how many components to retain can often lead to different conclusions

Interpreting principal components

Calculating the correlation between the original variables and principal components can be helpful in interpreting the principal components. The correlation of variable \(x_i\) and component \(z_j\) can be computed as \(\frac{a_{ij}\sqrt{\lambda_j}}{s_i}\). As we analyzed the correlation matrix, \(s_i=1\), so it becomes \(a_{ij}\sqrt{\lambda_j}\).

Recall that

\(X_1\) = French score

\(X_2\) = English score

\(X_3\) = History score

\(X_4\) = Arithmetic score

\(X_5\) = Algebra score

\(X_6\) = Geometry score

A[,1]* sqrt(lambda[1])  # correlation of PC1 with the original variables

## [1] 0.6608026 0.6884648 0.5163560 0.7356196 0.7418676 0.6781684

A[,2]* sqrt(lambda[2])  # correlation of PC2 with the original variables

## [1]  0.4444749  0.2897705  0.6395524 -0.4170184 -0.3727588 -0.3540996

loadings=NULL
for (i in 1:length(lambda)){
  loadings <- cbind(loadings, A[,i]* sqrt(lambda[i]))
}
loadings

##           [,1]       [,2]        [,3]       [,4]       [,5]         [,6]
## [1,] 0.6608026  0.4444749  0.16264111  0.3535839  0.4545472  0.087721919
## [2,] 0.6884648  0.2897705  0.53133039 -0.2696851 -0.2769629 -0.101482646
## [3,] 0.5163560  0.6395524 -0.51917689 -0.1128110 -0.2042286  0.019206576
## [4,] 0.7356196 -0.4170184 -0.01782593  0.1811840 -0.2429384  0.439084313
## [5,] 0.7418676 -0.3727588 -0.13293760  0.3056990 -0.0939495 -0.436729414
## [6,] 0.6781684 -0.3540996 -0.13781337 -0.5158018  0.3600526  0.004393172

## The sum of squares of each loading vectors produces the corresponding eigenvalues
apply(loadings^2, 2, sum)   # sum of squares of each column of loading matrix

## [1] 2.7286835 1.1287922 0.6152914 0.6028089 0.5225144 0.4019097

lambda    # eigenvalues

##    Comp.1    Comp.2    Comp.3    Comp.4    Comp.5    Comp.6 
## 2.7286835 1.1287922 0.6152914 0.6028089 0.5225144 0.4019097

Reconstructing covariance/correlation matrix

We can reconstruct the covariance/correlation matrix from the eigenvectors and eigenvalues. According to the spectral decomposition theorem, \(\mathbf{S=A\Lambda A'}\) where \(\mathbf{\Lambda}\) is a diagonal matrix with eigenvalues and \(\mathbf{A}\) is a matrix of eigenvectors \(\mathbf{a_i}\) in its columns.

We can rewrite the above equation as \[\mathbf{S=A\Lambda^{1/2}\Lambda^{1/2} A'=(A\Lambda^{1/2})(\Lambda^{1/2} A')=(A\Lambda^{1/2})(A\Lambda^{1/2})'=A^*A^{*'}}\] where \(\mathbf{A^*}\) is a matrix with component loading vectors (i.e., \(\mathbf{a_i^*}=\sqrt{\lambda_i}\mathbf{a_i}\)) that we created above and assigned to loadings object.

## Reconstruct correlation matrix from all PCs
(pred.all <- loadings %*% t(loadings))    # predicted correlation matrix

##      [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 1.00 0.44 0.41 0.29 0.33 0.25
## [2,] 0.44 1.00 0.35 0.35 0.32 0.33
## [3,] 0.41 0.35 1.00 0.16 0.19 0.18
## [4,] 0.29 0.35 0.16 1.00 0.59 0.47
## [5,] 0.33 0.32 0.19 0.59 1.00 0.46
## [6,] 0.25 0.33 0.18 0.47 0.46 1.00

everitt     # the original correlation matrix

##      [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 1.00 0.44 0.41 0.29 0.33 0.25
## [2,] 0.44 1.00 0.35 0.35 0.32 0.33
## [3,] 0.41 0.35 1.00 0.16 0.19 0.18
## [4,] 0.29 0.35 0.16 1.00 0.59 0.47
## [5,] 0.33 0.32 0.19 0.59 1.00 0.46
## [6,] 0.25 0.33 0.18 0.47 0.46 1.00

everitt - pred.all   # residual matrix

##               [,1]          [,2]          [,3]          [,4]          [,5]
## [1,] -6.661338e-16 -7.771561e-16  1.110223e-16 -3.330669e-16 -1.665335e-16
## [2,] -7.771561e-16 -1.776357e-15 -1.221245e-15 -9.992007e-16 -8.326673e-16
## [3,]  1.110223e-16 -1.221245e-15 -2.220446e-16 -5.551115e-16 -5.551115e-16
## [4,] -3.330669e-16 -9.992007e-16 -5.551115e-16  2.220446e-16  1.110223e-16
## [5,] -1.665335e-16 -8.326673e-16 -5.551115e-16  1.110223e-16  3.330669e-16
## [6,]  2.220446e-16 -3.330669e-16 -1.221245e-15  2.220446e-16  5.551115e-17
##               [,6]
## [1,]  2.220446e-16
## [2,] -3.330669e-16
## [3,] -1.221245e-15
## [4,]  2.220446e-16
## [5,]  5.551115e-17
## [6,]  1.110223e-16

## Reconstruct correlation matrix from the first 2 PCs
(pred.two <- loadings[, 1:2] %*% t(loadings[, 1:2]))    # predicted correlation matrix

##           [,1]      [,2]      [,3]      [,4]      [,5]      [,6]
## [1,] 0.6342180 0.5837351 0.6254744 0.3007451 0.3245461 0.2907470
## [2,] 0.5837351 0.5579508 0.5408164 0.3856086 0.4027352 0.3642874
## [3,] 0.6254744 0.5408164 0.6756508 0.1131365 0.1446690 0.1237110
## [4,] 0.3007451 0.3856086 0.1131365 0.7150405 0.7011796 0.6465400
## [5,] 0.3245461 0.4027352 0.1446690 0.7011796 0.6893167 0.6351049
## [6,] 0.2907470 0.3642874 0.1237110 0.6465400 0.6351049 0.5852988

everitt - pred.two    # residual matrix

##              [,1]        [,2]        [,3]        [,4]         [,5]        [,6]
## [1,]  0.365781971 -0.14373510 -0.21547438 -0.01074514  0.005453879 -0.04074705
## [2,] -0.143735097  0.44204920 -0.19081639 -0.03560858 -0.082735239 -0.03428744
## [3,] -0.215474378 -0.19081639  0.32434920  0.04686355  0.045331023  0.05628896
## [4,] -0.010745143 -0.03560858  0.04686355  0.28495948 -0.111179614 -0.17653997
## [5,]  0.005453879 -0.08273524  0.04533102 -0.11117961  0.310683343 -0.17510487
## [6,] -0.040747047 -0.03428744  0.05628896 -0.17653997 -0.175104871  0.41470116

(pred.five <- loadings[, 1:5] %*% t(loadings[, 1:5]))    # predicted correlation matrix

##           [,1]      [,2]      [,3]      [,4]      [,5]      [,6]
## [1,] 0.9923049 0.4489023 0.4083152 0.2514827 0.3683107 0.2496146
## [2,] 0.4489023 0.9897013 0.3519491 0.3945594 0.2756795 0.3304458
## [3,] 0.4083152 0.3519491 0.9996311 0.1515667 0.1983881 0.1799156
## [4,] 0.2514827 0.3945594 0.1515667 0.8072050 0.7817610 0.4680710
## [5,] 0.3683107 0.2756795 0.1983881 0.7817610 0.8092674 0.4619186
## [6,] 0.2496146 0.3304458 0.1799156 0.4680710 0.4619186 0.9999807

everitt - pred.five    # residual matrix

##               [,1]          [,2]          [,3]         [,4]         [,5]
## [1,]  0.0076951351 -0.0089022525  1.684838e-03  0.038517319 -0.038310742
## [2,] -0.0089022525  0.0102987275 -1.949134e-03 -0.044559438  0.044320457
## [3,]  0.0016848377 -0.0019491341  3.688926e-04  0.008433306 -0.008388077
## [4,]  0.0385173185 -0.0445594381  8.433306e-03  0.192795034 -0.191761035
## [5,] -0.0383107423  0.0443204567 -8.388077e-03 -0.191761035  0.190732581
## [6,]  0.0003853774 -0.0004458307  8.437778e-05  0.001928973 -0.001918627
##               [,6]
## [1,]  3.853774e-04
## [2,] -4.458307e-04
## [3,]  8.437778e-05
## [4,]  1.928973e-03
## [5,] -1.918627e-03
## [6,]  1.929996e-05

[PCA] PCA with Everitt data

Nana Kim

2022-09-10

Everitt (1984) data