PSY 504: Advanced Statistics

class: center, middle, inverse, title-slide

.title[
# PSY 504: Advanced Statistics
]
.subtitle[
## Missing Data in R
]
.author[
### Jason Geller, Ph.D. (he/him/his)
]
.institute[
### Princeton University
]
.date[
### Updated:2023-02-13
]

---

# Today

- Screening data for missingness

- Missingness patterns
   
- Diagnosing missing data mechanisms in R 
  
- Missing data methods in R

- Listwise deletion
    - Casewise deletion
    - Nonconditional and conditional imputation
    - Multiple imputation
    
---
# Packages

Install the **mice** package

```r
install.packages("mice")
```

Load these packages:

```r
library(tidyverse) 
library(broom) #tidy statistics
library(ggmice) #graph missing data
library(mice) # dealing and visualizing missing data
library(naniar) # missing data + visualization
library(finalfit)# missing data visualization
```
---
# `nhanes` Data

.pull-left[
<table class="huxtable" style="border-collapse: collapse; border: 0px; margin-bottom: 2em; margin-top: 2em; ; margin-left: auto; margin-right: auto;  " id="tab:unnamed-chunk-4">
<col><col><col><col><tr>
<th style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0.4pt 0pt 0.4pt 0.4pt;    padding: 6pt 6pt 6pt 6pt; font-weight: bold;">age</th><th style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0.4pt 0pt 0.4pt 0pt;    padding: 6pt 6pt 6pt 6pt; font-weight: bold;">bmi</th><th style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0.4pt 0pt 0.4pt 0pt;    padding: 6pt 6pt 6pt 6pt; font-weight: bold;">hyp</th><th style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0.4pt 0.4pt 0.4pt 0pt;    padding: 6pt 6pt 6pt 6pt; font-weight: bold;">chl</th></tr>
<tr>
<td style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0.4pt 0pt 0pt 0.4pt;    padding: 6pt 6pt 6pt 6pt; background-color: rgb(242, 242, 242); font-weight: normal;">1</td><td style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0.4pt 0pt 0pt 0pt;    padding: 6pt 6pt 6pt 6pt; background-color: rgb(242, 242, 242); font-weight: normal;">  </td><td style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0.4pt 0pt 0pt 0pt;    padding: 6pt 6pt 6pt 6pt; background-color: rgb(242, 242, 242); font-weight: normal;"></td><td style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0.4pt 0.4pt 0pt 0pt;    padding: 6pt 6pt 6pt 6pt; background-color: rgb(242, 242, 242); font-weight: normal;"></td></tr>
<tr>
<td style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0pt 0pt 0pt 0.4pt;    padding: 6pt 6pt 6pt 6pt; font-weight: normal;">2</td><td style="vertical-align: top; text-align: right; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">22.7</td><td style="vertical-align: top; text-align: right; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">1</td><td style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0pt 0.4pt 0pt 0pt;    padding: 6pt 6pt 6pt 6pt; font-weight: normal;">187</td></tr>
<tr>
<td style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0pt 0pt 0pt 0.4pt;    padding: 6pt 6pt 6pt 6pt; background-color: rgb(242, 242, 242); font-weight: normal;">1</td><td style="vertical-align: top; text-align: right; white-space: normal; padding: 6pt 6pt 6pt 6pt; background-color: rgb(242, 242, 242); font-weight: normal;">  </td><td style="vertical-align: top; text-align: right; white-space: normal; padding: 6pt 6pt 6pt 6pt; background-color: rgb(242, 242, 242); font-weight: normal;">1</td><td style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0pt 0.4pt 0pt 0pt;    padding: 6pt 6pt 6pt 6pt; background-color: rgb(242, 242, 242); font-weight: normal;">187</td></tr>
<tr>
<td style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0pt 0pt 0pt 0.4pt;    padding: 6pt 6pt 6pt 6pt; font-weight: normal;">3</td><td style="vertical-align: top; text-align: right; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">  </td><td style="vertical-align: top; text-align: right; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;"></td><td style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0pt 0.4pt 0pt 0pt;    padding: 6pt 6pt 6pt 6pt; font-weight: normal;"></td></tr>
<tr>
<td style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0pt 0pt 0pt 0.4pt;    padding: 6pt 6pt 6pt 6pt; background-color: rgb(242, 242, 242); font-weight: normal;">1</td><td style="vertical-align: top; text-align: right; white-space: normal; padding: 6pt 6pt 6pt 6pt; background-color: rgb(242, 242, 242); font-weight: normal;">20.4</td><td style="vertical-align: top; text-align: right; white-space: normal; padding: 6pt 6pt 6pt 6pt; background-color: rgb(242, 242, 242); font-weight: normal;">1</td><td style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0pt 0.4pt 0pt 0pt;    padding: 6pt 6pt 6pt 6pt; background-color: rgb(242, 242, 242); font-weight: normal;">113</td></tr>
<tr>
<td style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0pt 0pt 0.4pt 0.4pt;    padding: 6pt 6pt 6pt 6pt; font-weight: normal;">3</td><td style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0pt 0pt 0.4pt 0pt;    padding: 6pt 6pt 6pt 6pt; font-weight: normal;">  </td><td style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0pt 0pt 0.4pt 0pt;    padding: 6pt 6pt 6pt 6pt; font-weight: normal;"></td><td style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0pt 0.4pt 0.4pt 0pt;    padding: 6pt 6pt 6pt 6pt; font-weight: normal;">184</td></tr>
</table>

]

.pull-right[
- `age`: age group (1=20-39, 2=40-59, 3=60+)

- `bmi`: body mass index (kg/m**2)

- `hyp`: hypertensive (1=no,2=yes)

- `chl`: total serum cholesterol (mg/dL)
]
---
# Exploratory Data Analysis (EDA)

- Always look at your data

- Look for errors
  
  - Explore data using descriptive statistics and graphs
  
    - What variables have missing data? How much is missing in total? By variable?

---
# EDA

```r
#library(naniar)
vis_miss(nhanes)
```

<img src="missing_data_in_r_files/figure-html/unnamed-chunk-5-1.png" width="70%" style="display: block; margin: auto;" />
---
# Missing Patterns

```r
#library(mice)

md.pattern(nhanes, plot=FALSE)
```

```
##    age hyp bmi chl   
## 13   1   1   1   1  0
## 3    1   1   1   0  1
## 1    1   1   0   1  1
## 1    1   0   0   1  2
## 7    1   0   0   0  3
##      0   8   9  10 27
```

---
# Missing Patterns

.pull-left[

```r
#md.pattern will give you a table

nhanes %>%
 # create missing data pattern plot
plot_pattern()
```

<img src="missing_data_in_r_files/figure-html/unnamed-chunk-7-1.png" width="100%" style="display: block; margin: auto;" />
]

<br>
<br>
.pull-right[

]

---
# Missing Patterns

.pull-left[

- Univariate: one variable with missing data

- Monotone: patterns in the data can be arranged

- Associated with a longitudinal studies where members drop out and never return

- Non-monotone: missingness of one variable does not affect the missingness of any other variables

- Look for islands of missingness
]

.pull-right[
<img src="new_patterns.webp" width="100%" style="display: block; margin: auto;" />
]
---
background-image: url(cake.jpeg)
background-position: center
background-size: 40%
---
# Is it MCAR or MAR?

– Create a dummy coded variable where 1 = score missing and 0 = score not missing on missing variable

- If these variables are related to other variables in dataset
  - MAR

```r
nhanes_r <- nhanes %>%
#can also use case_when #if missing 1 else 0
mutate(bmi_1 = ifelse(is.na(bmi), 1, 0))

head(nhanes_r)
```

```r
#library(naniar)

head(bind_shadow(nhanes))
```

---
# Is it MCAR or MAR?

.pull-left[

```r
library(finalfit)

explanatory = c("hyp", "age")
dependent = "bmi"

misspairs <- nhanes %>%
  missing_pairs(explanatory, dependent)
   # mice function visualize data
```
]

.pull-right[

<img src="missing_data_in_r_files/figure-html/unnamed-chunk-13-1.png" width="100%" style="display: block; margin: auto;" />
]

---
# Categorical Missingness

- `nhanes2` treats `age` and `hyp` as factors

---
# Testing

- Logistic regression approach

```r
#glm approch
nhanes_r_glm <-glm(bmi_1~age+hyp, family=binomial,data=nhanes_r)

#easystats package to get paramters from glm model in a tidy way
parameters::model_parameters(nhanes_r_glm)
```

---
# Testing

- *t*-test

```r
model <- t.test(age~bmi_1,data=nhanes_r)

tidy(model)
```

---
class: middle

# Methods for dealing with MCAR 
---
# Listwise Deletion

.pull-left[

```r
#create data frame
nhanes
```

]

.pull-right[

```r
nhanes %>%
  drop_na()
```

]
---
# Listwise Deletion: Pros and Cons

- Pros:

- Produces the correct parameter estimates if missingness is MCAR
  
  - If not, biased 
  
- Cons:

- Can result in a lot of data loss

---
# Casewise (Pairwise) Deletion

- In each comparison, delete only observations if the missing data is relevant to this comparison

```r
#create data frame
nhanes
```

---
# Casewise Deletion: Pros and Cons

Pros:

- Avoids data loss

- Non-biased
  - Only for MCAR

Cons:

- But, results not completely consistent or comparable--based on different observations

---
class: middle

# Methods for MAR

---
# Unconditional (Mean) Imputation

- Replace missing values with the mean of the observed values

- Reduces variance 
  
    - Increases Type 1 error rate

.pull-left[

```r
d <- c(5, 8, 3, NA, NA)

#calc mean remove NAs
d_mean <- mean(d, na.rm = TRUE)

mean(d_mean)
```

```
## [1] 5.333333
```

```r
sd(d, na.rm=TRUE)
```

```
## [1] 2.516611
```
]

.pull-right[

```r
d_mean_imp <- ifelse(is.na(d), d_mean, d) # add mean

sd(d_mean_imp)
```

```
## [1] 1.779513
```

```r
#NANIRE Package 
#impute_mean(d) %>% head()
# Do this in Mice
#complete(mice(data, m=1, method="mean"))
```
]
---
class: center middle

# DO NOT DO THIS!

---
# Conditional Imputation (Regression)

.pull-left[
- Use the Y to replace the variable

- All the other related variables in the data set are used to predict the values of the variable with missing data

- Missing scores have the predicted values provided to replace them

```r
library(mice)
library(ggmice)

m=1

imp <- mice(nhanes, m = m, seed = 24415, method="norm.predict", maxit=1)
```

```
## 
##  iter imp variable
##   1   1  bmi  hyp  chl
```

]

.pull-right[
<div class="figure" style="text-align: center">
<img src="reg_imp.png" alt="Baraldi &amp; Enders (2010)" width="100%" />
<p class="caption">Baraldi & Enders (2010)</p>
</div>
]
---
# Stochastic Regression

- Regression imputation with added error variance to predicted values

.pull-left[

```r
m=1
imp <- mice(nhanes, m = m, seed = 24415, method="norm.nob", maxit=1)
```

```
## 
##  iter imp variable
##   1   1  bmi  hyp  chl
```
]

.pull-right[
<div class="figure" style="text-align: center">
<img src="stoch_reg.png" alt="Baraldi &amp; Enders (2010)" width="100%" />
<p class="caption">Baraldi & Enders (2010)</p>
</div>
]
---
# Multiple Imputation

- Basically doing conditional imputation several times

1. We make several multiply imputed data sets with the `mice()` function

2. We fit our model of choice to each version of the data with the `with()` function

3. We then pool (i.e., combine) the results with the `pool()` function.

<img src="MIsteps.png" width="80%" style="display: block; margin: auto;" />
---
# 1. Impute with `Mice`

```r
m=5
# impute several data sets
imp <- mice(nhanes, m = m, seed = 24415, method="pmm", print = FALSE)
```

- What is `imp`?

```r
str(imp, max.level = 1)
```

```
## List of 22
##  $ data           :'data.frame':	25 obs. of  4 variables:
##  $ imp            :List of 4
##  $ m              : num 5
##  $ where          : logi [1:25, 1:4] FALSE FALSE FALSE FALSE FALSE FALSE ...
##   ..- attr(*, "dimnames")=List of 2
##  $ blocks         :List of 4
##   ..- attr(*, "calltype")= Named chr [1:4] "type" "type" "type" "type"
##   .. ..- attr(*, "names")= chr [1:4] "age" "bmi" "hyp" "chl"
##  $ call           : language mice(data = nhanes, m = m, method = "pmm", printFlag = FALSE, seed = 24415)
##  $ nmis           : Named int [1:4] 0 9 8 10
##   ..- attr(*, "names")= chr [1:4] "age" "bmi" "hyp" "chl"
##  $ method         : Named chr [1:4] "" "pmm" "pmm" "pmm"
##   ..- attr(*, "names")= chr [1:4] "age" "bmi" "hyp" "chl"
##  $ predictorMatrix: num [1:4, 1:4] 0 1 1 1 1 0 1 1 1 1 ...
##   ..- attr(*, "dimnames")=List of 2
##  $ visitSequence  : chr [1:4] "age" "bmi" "hyp" "chl"
##  $ formulas       :List of 4
##  $ post           : Named chr [1:4] "" "" "" ""
##   ..- attr(*, "names")= chr [1:4] "age" "bmi" "hyp" "chl"
##  $ blots          :List of 4
##  $ ignore         : logi [1:25] FALSE FALSE FALSE FALSE FALSE FALSE ...
##  $ seed           : num 24415
##  $ iteration      : num 5
##  $ lastSeedValue  : int [1:626] 10403 409 -1884349714 -394216038 1011282966 -909482523 1913636358 -1791339674 1393248822 -1412788497 ...
##  $ chainMean      : num [1:4, 1:5, 1:5] NaN 27.31 1.25 178.1 NaN ...
##   ..- attr(*, "dimnames")=List of 3
##  $ chainVar       : num [1:4, 1:5, 1:5] NA 35.271 0.214 1503.211 NA ...
##   ..- attr(*, "dimnames")=List of 3
##  $ loggedEvents   : NULL
##  $ version        :Classes 'package_version', 'numeric_version'  hidden list of 1
##  $ date           : Date[1:1], format: "2023-02-13"
##  - attr(*, "class")= chr "mids"
```

---
# 1. Impute with `Mice`

What is `imp` within `imp`?

```r
str(imp$imp, max.level = 1)
```

```
## List of 4
##  $ age:'data.frame':	0 obs. of  5 variables:
##  $ bmi:'data.frame':	9 obs. of  5 variables:
##  $ hyp:'data.frame':	8 obs. of  5 variables:
##  $ chl:'data.frame':	10 obs. of  5 variables:
```

```r
#get the imputed values for that var
imp$imp$bmi
```

```r
#get the imputed datasets out
#complete(imp, "full")
```
---
# PMM

.pull-left[
- Predictive mean matching

- PMM involves selecting a data point from the original, non-missing data which has a predicted value close to the predicted value of the missing sample.
]

.pull-right[

]
---
# 2. Model with `Mice`

We'll fit a simple statistical model

```r
#fit the model to each set of imputaed data

fit <- with(data = imp, expr = lm(bmi ~ age))

summary(fit)
```

---
# 3. `Mice` Pool Results

```r
#combine the results
result <- pool(fit)
tidy(result)
```

<table class="huxtable" style="border-collapse: collapse; border: 0px; margin-bottom: 2em; margin-top: 2em; ; margin-left: auto; margin-right: auto;  " id="tab:unnamed-chunk-32">
<col><col><col><col><col><col><col><col><col><col><col><col><col><tr>
<th style="vertical-align: top; text-align: left; white-space: normal; border-style: solid solid solid solid; border-width: 0.4pt 0pt 0.4pt 0.4pt;    padding: 6pt 6pt 6pt 6pt; font-weight: bold;">term</th><th style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0.4pt 0pt 0.4pt 0pt;    padding: 6pt 6pt 6pt 6pt; font-weight: bold;">estimate</th><th style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0.4pt 0pt 0.4pt 0pt;    padding: 6pt 6pt 6pt 6pt; font-weight: bold;">std.error</th><th style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0.4pt 0pt 0.4pt 0pt;    padding: 6pt 6pt 6pt 6pt; font-weight: bold;">statistic</th><th style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0.4pt 0pt 0.4pt 0pt;    padding: 6pt 6pt 6pt 6pt; font-weight: bold;">p.value</th><th style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0.4pt 0pt 0.4pt 0pt;    padding: 6pt 6pt 6pt 6pt; font-weight: bold;">b</th><th style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0.4pt 0pt 0.4pt 0pt;    padding: 6pt 6pt 6pt 6pt; font-weight: bold;">df</th><th style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0.4pt 0pt 0.4pt 0pt;    padding: 6pt 6pt 6pt 6pt; font-weight: bold;">dfcom</th><th style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0.4pt 0pt 0.4pt 0pt;    padding: 6pt 6pt 6pt 6pt; font-weight: bold;">fmi</th><th style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0.4pt 0pt 0.4pt 0pt;    padding: 6pt 6pt 6pt 6pt; font-weight: bold;">lambda</th><th style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0.4pt 0pt 0.4pt 0pt;    padding: 6pt 6pt 6pt 6pt; font-weight: bold;">m</th><th style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0.4pt 0pt 0.4pt 0pt;    padding: 6pt 6pt 6pt 6pt; font-weight: bold;">riv</th><th style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0.4pt 0.4pt 0.4pt 0pt;    padding: 6pt 6pt 6pt 6pt; font-weight: bold;">ubar</th></tr>
<tr>
<td style="vertical-align: top; text-align: left; white-space: normal; border-style: solid solid solid solid; border-width: 0.4pt 0pt 0pt 0.4pt;    padding: 6pt 6pt 6pt 6pt; background-color: rgb(242, 242, 242); font-weight: normal;">(Intercept)</td><td style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0.4pt 0pt 0pt 0pt;    padding: 6pt 6pt 6pt 6pt; background-color: rgb(242, 242, 242); font-weight: normal;">29.6 </td><td style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0.4pt 0pt 0pt 0pt;    padding: 6pt 6pt 6pt 6pt; background-color: rgb(242, 242, 242); font-weight: normal;">2.7 </td><td style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0.4pt 0pt 0pt 0pt;    padding: 6pt 6pt 6pt 6pt; background-color: rgb(242, 242, 242); font-weight: normal;">11   </td><td style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0.4pt 0pt 0pt 0pt;    padding: 6pt 6pt 6pt 6pt; background-color: rgb(242, 242, 242); font-weight: normal;">4.72e-05</td><td style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0.4pt 0pt 0pt 0pt;    padding: 6pt 6pt 6pt 6pt; background-color: rgb(242, 242, 242); font-weight: normal;">3.27 </td><td style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0.4pt 0pt 0pt 0pt;    padding: 6pt 6pt 6pt 6pt; background-color: rgb(242, 242, 242); font-weight: normal;">5.7 </td><td style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0.4pt 0pt 0pt 0pt;    padding: 6pt 6pt 6pt 6pt; background-color: rgb(242, 242, 242); font-weight: normal;">23</td><td style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0.4pt 0pt 0pt 0pt;    padding: 6pt 6pt 6pt 6pt; background-color: rgb(242, 242, 242); font-weight: normal;">0.646</td><td style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0.4pt 0pt 0pt 0pt;    padding: 6pt 6pt 6pt 6pt; background-color: rgb(242, 242, 242); font-weight: normal;">0.54 </td><td style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0.4pt 0pt 0pt 0pt;    padding: 6pt 6pt 6pt 6pt; background-color: rgb(242, 242, 242); font-weight: normal;">5</td><td style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0.4pt 0pt 0pt 0pt;    padding: 6pt 6pt 6pt 6pt; background-color: rgb(242, 242, 242); font-weight: normal;">1.17</td><td style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0.4pt 0.4pt 0pt 0pt;    padding: 6pt 6pt 6pt 6pt; background-color: rgb(242, 242, 242); font-weight: normal;">3.34 </td></tr>
<tr>
<td style="vertical-align: top; text-align: left; white-space: normal; border-style: solid solid solid solid; border-width: 0pt 0pt 0.4pt 0.4pt;    padding: 6pt 6pt 6pt 6pt; font-weight: normal;">age</td><td style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0pt 0pt 0.4pt 0pt;    padding: 6pt 6pt 6pt 6pt; font-weight: normal;">-1.73</td><td style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0pt 0pt 0.4pt 0pt;    padding: 6pt 6pt 6pt 6pt; font-weight: normal;">1.25</td><td style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0pt 0pt 0.4pt 0pt;    padding: 6pt 6pt 6pt 6pt; font-weight: normal;">-1.38</td><td style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0pt 0pt 0.4pt 0pt;    padding: 6pt 6pt 6pt 6pt; font-weight: normal;">0.205   </td><td style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0pt 0pt 0.4pt 0pt;    padding: 6pt 6pt 6pt 6pt; font-weight: normal;">0.563</td><td style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0pt 0pt 0.4pt 0pt;    padding: 6pt 6pt 6pt 6pt; font-weight: normal;">7.72</td><td style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0pt 0pt 0.4pt 0pt;    padding: 6pt 6pt 6pt 6pt; font-weight: normal;">23</td><td style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0pt 0pt 0.4pt 0pt;    padding: 6pt 6pt 6pt 6pt; font-weight: normal;">0.538</td><td style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0pt 0pt 0.4pt 0pt;    padding: 6pt 6pt 6pt 6pt; font-weight: normal;">0.432</td><td style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0pt 0pt 0.4pt 0pt;    padding: 6pt 6pt 6pt 6pt; font-weight: normal;">5</td><td style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0pt 0pt 0.4pt 0pt;    padding: 6pt 6pt 6pt 6pt; font-weight: normal;">0.76</td><td style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0pt 0.4pt 0.4pt 0pt;    padding: 6pt 6pt 6pt 6pt; font-weight: normal;">0.889</td></tr>
</table>

---
# Plot Imputations

- Make sure they look similar to real data

```r
# create stripplot with boxplot overlay
ggmice(imp, aes(x = .imp, y = bmi)) + 
  geom_jitter(height = 0) +
  geom_boxplot(fill = "white", alpha = 0.75, outlier.shape = NA) +
  labs(x = "Imputation number")
```

<img src="missing_data_in_r_files/figure-html/unnamed-chunk-33-1.png" width="60%" style="display: block; margin: auto;" />
---
# Distinguish Between NMAR and MAR

- Pray you don't have to 😂

- It's complicated

- Not many good techniques
  - Try and track down the missing data
  - Auxiliary variables
  - Collect more data for explaining missingness
---
# Reporting Missing Data

- Template from [Stepf van Buuren](https://stefvanbuuren.name/fimd/sec-reporting.html)

The *percentage of missing values* across the nine variables varied between *0 and 34%*. In total *1601 out of 3801 records (42%)* were incomplete. Many girls had no score because the nurse felt that the measurement was “unnecessary,” or because the girl did not give permission. Older girls had many more missing data. We used *multiple imputation* to create and analyze *40 multiply imputed datasets*. Methodologists currently regard multiple imputation as a state-of-the-art technique because it improves accuracy and statistical power relative to other missing data techniques. *Incomplete variables were imputed under fully conditional specification, using the default settings of the mice 3.0 package (Van Buuren and Groothuis-Oudshoorn 2011)*. The parameters of substantive interest were estimated in each imputed dataset separately, and combined using Rubin’s rules. For comparison, *we also performed the analysis on the subset of complete cases.*

---
# Report

- *Amount of missing data* 
- *Reasons for missingness*
- *Consequences*
- *Method*
- *Imputation model*
- *Pooling*
- *Software*
- *Complete-case analysis*
---
#Is it MCAR, MAR, NMAR?

The post-experiment manipulation-check
questionnaires for five participants were accidentally
thrown away.

- MCAR

---
# Is it MCAR, MAR, NMAR?

In a 2-day memory experiment, people who know they
would do poorly on the memory test are discouraged
and don’t want to return for the second session.

- NMAR

---
# Is it MCAR, MAR, NMAR?

There was a problem with one of the auditory stimulus
files in an ERP study of speech comprehension during
noise, so we discarded any data from item #43.

- MAR

---

- Make sure you can run through all the code

- Read Chapter 18 in *R in Action*

- Missing lab on Wednesday