Princeton University
4/2/23
A broad and extremely versatile multivariate framework
Integration of:
Path analysis (this week)
Confirmatory factor analysis (next week)
Test and quantify theories
You already know how to do it!
It is regression on steroids
Model many relationships at once, rather than run single regressions
Model variables that exist (manifest) and those that don’t technically exist (latent factors)
TL;DR
Exogenous: no arrows pointed at it
Endogenous: arrows pointed at it
A variable can be both (example?)
Manifest or observed variables
Represented by squares ❏
Measured from participants, business data, or other sources
While most measured variables are continuous, you can use categorical and ordered measures as well
Represented by circles ◯
Abstract phenomena you are trying to model
Are not represented by a number in the dataset
Linked to the measured variables
Represented indirectly by those variables
Y~X + Residual
Here that is Endogenous ~ Exogenous + disturbance
Represent the influence of factors not included in model
error in your prediction of each endogenous variable
Every endogenous variable has a disturbance
Circles are latent (unobserved) variables
Squares are manifest (observed) variables
Each endogenous variable is regressed on all exogenous variables that are connected in the chain that leads directly to it
You cannot test all models
If not identified, cannot analyze model
How is this determined?
There must be at least as many known values in the model as there are free parameters
Free parameters:
Knowns:
\[\frac{(K (K+1))}{2}\] - where k is number of measured variables
Math example:
10=2x+y
2 = x-y
2 knowns and 2 unknowns
One set of values that can solve the equation
Can’t test other models
10=2x+y
2 = x-y
5 = x + 2y
We can tell model is identified by calculating model DFs
Additional pathways you can estimate
Model DF = (known values) - (free parameters)
If model DF >=1 you can analyze model
DF = 0
You can still analyze the model
But:
Fits data perfectly
No fit indices
Multiple regressions are just-identified model
Let’s play a game
What makes someone apply to graduate school?
Endogenous
Exogenous:
Declare equations for every endogenous variable in your model
Declare indirect and covariances
`~` indicates a regression
`~~` indicates a covariance/correlation
`=~` indicates a latent variable
*=name of variable
grad_model = '
intent.to.apply~a*perceived.value+b*external.pressure+c*perceived.control
application.behaviour~d*intent.to.apply+perceived.control
#indirect
value.through.intent:=a*d
#indirect
pressure.through.intent:=b*d
control.through.intent:=c*d
perceived.control ~~ perceived.value # These are covariance paths
perceived.control ~~ external.pressure # These are covariance paths
external.pressure ~~ perceived.value # These are covariance paths
'
fit <- sem(grad_model, se="bootstrap", data=grad)
Absolute fit (measures how well data fits specified model)
\(\chi^2\)
SRMR
Badness of fit
RMSEA
Relative goodness of fit
fit <- sem(grad_model, se="bootstrap", data=grad)
summary(fit, ci=TRUE, standardize=TRUE, fit.measures=TRUE)
lavaan 0.6.15 ended normally after 60 iterations
Estimator ML
Optimization method NLMINB
Number of model parameters 13
Number of observations 60
Model Test User Model:
Test statistic 0.862
Degrees of freedom 2
P-value (Chi-square) 0.650
Model Test Baseline Model:
Test statistic 136.416
Degrees of freedom 10
P-value 0.000
User Model versus Baseline Model:
Comparative Fit Index (CFI) 1.000
Tucker-Lewis Index (TLI) 1.045
Loglikelihood and Information Criteria:
Loglikelihood user model (H0) -993.733
Loglikelihood unrestricted model (H1) -993.303
Akaike (AIC) 2013.467
Bayesian (BIC) 2040.693
Sample-size adjusted Bayesian (SABIC) 1999.805
Root Mean Square Error of Approximation:
RMSEA 0.000
90 Percent confidence interval - lower 0.000
90 Percent confidence interval - upper 0.200
P-value H_0: RMSEA <= 0.050 0.690
P-value H_0: RMSEA >= 0.080 0.257
Standardized Root Mean Square Residual:
SRMR 0.019
Parameter Estimates:
Standard errors Bootstrap
Number of requested bootstrap draws 1000
Number of successful bootstrap draws 1000
Regressions:
Estimate Std.Err z-value P(>|z|) ci.lower ci.upper
intent.to.apply ~
percevd.vl (a) 0.444 0.058 7.630 0.000 0.315 0.553
extrnl.prs (b) 0.029 0.034 0.860 0.390 -0.035 0.099
prcvd.cntr (c) -0.064 0.062 -1.020 0.308 -0.199 0.056
application.behaviour ~
intnt.t.pp (d) 1.520 0.542 2.802 0.005 0.480 2.595
prcvd.cntr 0.734 0.307 2.393 0.017 0.235 1.457
Std.lv Std.all
0.444 0.807
0.029 0.095
-0.064 -0.126
1.520 0.350
0.734 0.336
Covariances:
Estimate Std.Err z-value P(>|z|) ci.lower ci.upper
perceived.value ~~
perceivd.cntrl 34.696 10.798 3.213 0.001 16.524 58.825
external.pressure ~~
perceivd.cntrl 46.660 12.947 3.604 0.000 22.482 72.197
perceived.value ~~
external.prssr 39.758 11.519 3.452 0.001 18.269 63.729
Std.lv Std.all
34.696 0.665
46.660 0.505
39.758 0.472
Variances:
Estimate Std.Err z-value P(>|z|) ci.lower ci.upper
.intent.to.pply 5.780 0.974 5.932 0.000 3.517 7.254
.applicatn.bhvr 179.514 28.108 6.386 0.000 115.792 224.946
perceived.valu 47.616 9.519 5.002 0.000 31.137 67.064
external.prssr 149.236 22.284 6.697 0.000 104.931 190.640
perceivd.cntrl 57.154 13.773 4.150 0.000 32.634 86.664
Std.lv Std.all
5.780 0.400
179.514 0.657
47.616 1.000
149.236 1.000
57.154 1.000
Defined Parameters:
Estimate Std.Err z-value P(>|z|) ci.lower ci.upper
val.thrgh.ntnt 0.675 0.255 2.648 0.008 0.192 1.205
prssr.thrgh.nt 0.045 0.058 0.774 0.439 -0.057 0.184
cntrl.thrgh.nt -0.097 0.120 -0.807 0.420 -0.411 0.065
Std.lv Std.all
0.675 0.282
0.045 0.033
-0.097 -0.044
The model is not significantly different from a baseline model (Chi2(2) = 0.86,
p = 0.650). The GFI (.99 > .95) suggest a satisfactory fit. The PNFI (.20 <
.50) suggests a poor fit., The model is not significantly different from a
baseline model (Chi2(2) = 0.86, p = 0.650). The AGFI (.96 > .90) suggest a
satisfactory fit. The PNFI (.20 < .50) suggests a poor fit., The model is not
significantly different from a baseline model (Chi2(2) = 0.86, p = 0.650). The
NFI (.99 > .90) suggest a satisfactory fit. The PNFI (.20 < .50) suggests a
poor fit., The model is not significantly different from a baseline model
(Chi2(2) = 0.86, p = 0.650). The NNFI (.05 > .90) suggest a satisfactory fit.
The PNFI (.20 < .50) suggests a poor fit., The model is not significantly
different from a baseline model (Chi2(2) = 0.86, p = 0.650). The CFI (.00 >
.90) suggest a satisfactory fit. The PNFI (.20 < .50) suggests a poor fit., The
model is not significantly different from a baseline model (Chi2(2) = 0.86, p =
0.650). The RMSEA (.00 < .05) suggest a satisfactory fit. The PNFI (.20 < .50)
suggests a poor fit., The model is not significantly different from a baseline
model (Chi2(2) = 0.86, p = 0.650). The SRMR (.02 < .08) suggest a satisfactory
fit. The PNFI (.20 < .50) suggests a poor fit., The model is not significantly
different from a baseline model (Chi2(2) = 0.86, p = 0.650). The RFI (.97 >
.90) suggest a satisfactory fit. The PNFI (.20 < .50) suggests a poor fit. and
The model is not significantly different from a baseline model (Chi2(2) = 0.86,
p = 0.650). The IFI (.01 > .90) suggest a satisfactory fit. The PNFI (.20 <
.50) suggests a poor fit.
- Describe significance of the paths
#| fig.align: center
#|
library(semPlot)
# Example of plotting the variables in specific locations
locations = matrix(c(0, 0, .5, 0, -.5, .5, -.5, 0, -.5, -.5), ncol=2, byrow=2)
labels = c("Intent\nTo Apply","Application\nBehaviour","Perceived\nValue","External\nPressure","Perceived\nControl")
diagram = semPaths(fit, whatLabels="std", nodeLabels = labels, layout=locations, sizeMan = 12, rotation=2)
Modification (mod) indices
Tell you what the chi-square change would be if you added the path suggested
Can make your model better
lhs | op | rhs | mi | epc | sepc.lv | sepc.all | sepc.nox |
---|---|---|---|---|---|---|---|
intent.to.apply | ~ | application.behaviour | 0.503 | -0.0234 | -0.0234 | -0.102 | -0.102 |
application.behaviour | ~ | perceived.value | 0.35 | 0.277 | 0.277 | 0.116 | 0.116 |
application.behaviour | ~ | external.pressure | 0.561 | 0.126 | 0.126 | 0.0934 | 0.0934 |
perceived.value | ~ | application.behaviour | 0.137 | 0.024 | 0.024 | 0.0575 | 0.0575 |
external.pressure | ~ | application.behaviour | 0.43 | 0.0654 | 0.0654 | 0.0885 | 0.0885 |
perceived.control | ~ | application.behaviour | 0.787 | -0.089 | -0.089 | -0.195 | -0.195 |
Comparing multiple models
Constraining paths
Assess alternative hypotheses/models
Sample size
Assumptions
Constraining paths
In SEM you can explicitly test hypotheses about the size of specific paths
Constrain a path to certain value
Constrain two paths to be equal
grad_model_constrained = '
intent.to.apply ~ a*perceived.value + 0*external.pressure + c*perceived.control
application.behaviour ~ d*intent.to.apply + perceived.control
perceived.control ~~ perceived.value # These are covariance paths
perceived.control ~~ external.pressure # These are covariance paths
external.pressure ~~ perceived.value # These are covariance paths
value.through.intent:=a*d
control.through.intent:=c*d
'
grad_analysis_constrained =
sem(grad_model_constrained, data=grad, se="bootstrap")
If you can create one model from another by the addition or subtraction of parameters, then it is nested
Model A is said to be nested within Model B, if Model B is a more complicated version of Model A
Evaluating models
Ensure both fit data well
Use LRT test
Ensure both models fit well
If so compare models with AIC or BIC
\(\Delta{BIC}\) (log odds of model with lower BIC)
If not, choose model that fits
Use compare_performance()
from easystats
We wanted to see if the data fit Ajzen’s (1985) Theory of planned behavior (“Unconstrained Model,” Figure 1) better than a constrained model that posits no relationship between external pressure and intention to apply to graduate school (“Constrained Model,” Figure 2). The constrained model fit the data well, SRMR = .03, RMSEA = 0, 90% CI [0, 0.18], CFI = 1, AIC = 2000.42, BIC = 2012.98. A Likelihood Ratio test of the two models suggested that the models fit the data equally well, \(\chi^2\) (1) = 0.95, p = 0.33. Thus, we trimmed this path in the interest of parsimony.
We also compared a non-nested model that considered the strongest pathway of our originally hypothesized model in the context of job opportunities (“Opportunities Model,” Figure 3). The opportunities model had good absolute and relative goodness of fit but the relative badness of fit was poor, SRMR = .05, RMSEA =0.28, 90% CI [0.13, 0.44], CFI = 0.96, AIC = 1502.77, BIC = 1519.52. Comparing the Opportunities Model to the Hypothesized Model (Figure 1) using BIC (Kass & Raftery, 1995) reveals that the evidence strongly favors the Opportunities Model, \(BIC_{Hypothesized}\) = 2040.69, ΔBIC= 521.
sem(test="satorra.bentler")
PSY 504: Advanced Statistics