Chapter 6 Factor Analysis
6.1 items verification
- easier to subset the items using
DF2 <- subset(DF1, select = c("VARIABLE", … ))withDF1being the original data frame,DF2the data frame do be created with the subset, andVARIABLEthe variable names or numbers in the data frame - general descriptive of the items with
freq(DF2),boxplot(DF2)or similar functions - create a data frame
descwith the descriptive usingdesc <- data.frame (describe(DF2))and:
- compute skews statistics
desc$skewstat <- desc$skew/desc$se - compute kurtosis statistics
desc$kurtosisstat <- desc$kurtosis/desc$se
- create a correlation matrix
corwithcor <- cor(DF2, use = "pairwise.complete.obs")
- verify
useoptions for dealing with missing data - visualize the correlations with
corrplot()(requirescorrplotpackage), for example, incorrplot(cor, method="number", type="lower", diag=FALSE, number.cex=0.5)the matrixcoris represented as numbers with only the lower section a ignoring the diagonal and with the at half the size
- test for identity matrix with
cortest.bartlett(DF2)(the guideline for a good matrix is \(p<.05\) meaning it is not an identity matrix) - test common variance with
KMO(DF2)(the guideline for acceptable common variance is \(Overall MSA >.70\))
6.2 exploratory factor analysis
This section requires the psych package. The example uses a ordinary least squared estimation with fm=ols and oblimin rotation with `rotate = “oblimin.” Complete step by step guide available soon.
6.2.1 deciding de number of factors
EFA.m1 <- fa(DF2, nfactors = X, rotate = "none", fm="ols", missing=TRUE, impute = "mean")creates the objectEFA.m1with the factorial model data based on theDF2subset produced in the item verification:nfactors = Xshould be set to the number of items and consequentlyrotate = "none"set that the factor rotation is no neededmissing=TRUEindicates the existence of missing data andimpute = "mean"imputation using the mean
plot(EFA.m1$values, type = "b", xlab = "nº of factors", ylab = "eigenvalue")plots the eigenvalues by the number of factorsEFA.m1prints all the factorial model data but it can be more specific, for example, usingEFA.m1$loadingswill only print the loadings
6.2.2 evaluating factorial models
EFA.m2 <- fa(DF2, nfactors = X, rotate = "oblimin", fm="ols", missing=TRUE, impute = "mean")creates a new objectEFA.m2with the factorial model data based on theDF2subset produced in the item verification:- this time
nfactors = Xshould be set to the number factors determined in byEFA.m1a rotation method inrotate = ""should be specified for ease of interpretation
- this time
EFA.m2prints all the factorial model data,EFA.m2$loadingswill only print the loadings
6.2.3 computing the factors
DF$FACTOR <- rowMeans(DF2)computes the row means (coarse scores) for allDF2variables on the originalDFdata frame with the nameFACTORDF <- cbind(DF2, EFA.m2$scores)computes the factor scores (refined scores) for allDF2variables on the originalDFdata frame;names(DF)[names(DF) == "DF$scores"] <- "FACTOR"can be used to change variable nameFACTOR
6.3 confirmatory factor analysis
This section requires the lavaan and semPlot packages. Complete step by step guide available soon.
6.3.1 model estimation
- example of a model with 2 correlating factors, with 3 items per factor, and a higher order factor:
model <-
'factor.1 =~ item.1 + item.2 + item.3
factor.2 =~item.4 + item.5 + item.6
global.factor =~ factor.1 + factro.2
factor.1 ~~ factor.2
'model.fit <- cfa(model, data = DF)estimates de confirmatory model with thecfa()function, using themodeldefined above and with theDFdatabase (names in th emodelshould be exactly the same the names in the variable names in theDF)
6.3.2 overall goodness of fit
summary(model.fit, fit.measures = TRUE, standardized = TRUE, rsq = TRUE)provides a detailed summary of themodel.fitfitMeasures(model.fit)provides only the fit measures of themodel.fit
6.3.3 measures of strain
resid(model.fit, type = "standardized")provides a matrix with the standardized residuals (the guideline for the absence of focal areas of strain is residuals < |2.58|)- for further verification
model.fit.residuals <- data.frame(resid(model.fit, type = "standardized"))creates the data framemodel.fit.residualswith the residuals andwrite.csv (model.fit.residuals, "model.fit.residuals.csv", row.names = TRUE)exports the data frame with residuals to a spreadsheet
- for further verification
modificationindices(model.fit)provides a table with the modification indexes (the guideline for the absence of focal areas of strain is modification indexes < 3.84)- for further verification
model.fit.mi <- modificationindices(model.fit)creates the data framemodel.fit.miwith the residuals andsubset(model.fit.mi, mi>3.84)shows only the modification indexes above the criteria
- for further verification
- follows the code chunk:
resid(model.fit, type = "standardized")
model.fit.residuals <- data.frame(resid(model.fit, type = "standardized"))
write.csv (model.fit.residuals, "model.fit.residuals.csv", row.names = TRUE)
modificationindices(model.fit)
model.fit.mi <- modificationindices(model.fit)
subset(model.fit.mi, mi>3.84)6.3.4 estimates
parameterEstimates(model.fit)provides the non-standardized estimatesstandardizedSolution(model.fit)# provides the standardized estimates (the guideline is that estimates < |.30| indicate week associations with the factors)semPaths(model fit, style="lisrel", what="stand", layout="tree")provides a graph with the associations with the factors according the the model estimation
6.3.5 computing the factors
- for coarse scores computation see 6.2.3
- for refined scores use
DF.refined <- data.frame(predict(model.fit))followed byDF <- cbind(DF, DF.refined)(the name of the variable will be the name of latent variable in the model estimation)