Aims and content

The present document includes the analytical steps implemented to characterize and explore the antecedents of sleep patterns in a sample of older adolescents with (N = 47; 31 girls) and without (N = 46; 28 girls) DSM-5 insomnia.

The data pre-processing pipeline is depicted at this page.

Here, we prepare the data for the main analyses, we evaluate the psychometric proprieties of the aggregate self-report scale, and we summarize the main variables collected for the present study. Then, the processed datasets with aggregate measures are exported to be used for fitting a series of linear regression models, depicted at this page (Characterization of sleep and diary ratings) and at this page (Relationships between sleep and diary ratings).


The following R packages are used in this document (see References section):

# required packages
packages <- c("ggplot2","gridExtra","lubridate","reshape2","dplyr","lavaan","MuMIn","lme4","tcltk","Rmisc","Hmisc")

# generate packages references
knitr::write_bib(c(.packages(), packages),"packagesAn.bib")

# # run to install missing packages
# xfun::pkg_attach2(packages, message = FALSE); rm(list=ls())



1. Data description

1.1. Data loading

Here, we load the long-form ema and the wide-form demos dataset resulting from the processing pipeline.

# removing all objets from the workspace
rm(list=ls())

# setting system time zone to GMT (for consistent temporal synchronization)
Sys.setenv(tz="GMT")

# loading pre-processed data
load("DATA/datasets/ema_finalClean.RData")
load("DATA/datasets/demos_clean.RData")

# used R packages
library(ggplot2); library(gridExtra); library(lubridate); library(reshape2)


1.2. Data recoding

Here, we recode some of the variables to be used in the analyses below:

  1. we operationalize time by computing dayNr and partdayNr indexing, respectively, the No. of days since the first day of assessment, and the number of the current day of participation (i.e., without considering missing days)

  2. we compute light.p, deep.p, and rem.p quantifying the percentage of time in each sleep stage over TST

  3. we compute the SO.num and WakeUp.num representing the numeric equivalence of SO and WakeUp (currently in “mm-dd-yyyy hh:mm:ss” format) to indicate the distance (hours) from midnight

  4. we compute the TotalSteps1000 variable, quantifying the daily No. of TotalSteps in thousands (i.e., 1 TotalSteps1000 = 1,000 TotalSteps) to make values more comparable across variables

  5. we create the weekday variable, indicating whether the observation was collected during weekdays or weekends

  6. we create the covid19 variable to mark cases observed after the COVID-19 emergency outbreak


1.2.1. Operationalizing time

Here, we operationalize time by computing the variable dayNr, indexing the No. of days since the first day of assessment, and partdayNr, indexing the number of the current day of participation (i.e., without considering missing days).

# sorting by ID and ActivityDate
ema <- ema[order(ema$ID,ema$ActivityDate,ema$StartTime),]

# computing dayNr and partdayNr
library(dplyr)
ema <- ema %>% group_by(ID) %>% 
  mutate(dayNr=round(as.numeric(difftime(ActivityDate,min(ActivityDate),units="days")))+1)
ema <- as.data.frame(ema)
detach("package:dplyr", unload=TRUE)
ema <- plyr::ddply(ema,c("ID"),transform,partdayNr=seq_along(ID))

# sorting dataset
AD <- which(colnames(ema)=="ActivityDate")
ema <- ema[,c(1:AD,ncol(ema)-1,ncol(ema),(AD+1):(ncol(ema)-2))]


1.2.2. % light, deep & REM

Here, we recode the variables light, deep and REM as percentage of of time in each sleep stage over TST.

# sanity check (no values > TST)
cat("Sanity check:",nrow(ema[!is.na(ema$light) & (ema$light>ema$TST | ema$deep>ema$TST | ema$rem>ema$TST),])==0)
## Sanity check: TRUE
# sanity check (sum: no values > TST)
ema$totalStages <- apply(ema[,c("light","deep","rem")],1,sum)
cat("Sanity check:",nrow(ema[!is.na(ema$light) & ema$TST!=ema$totalStages,])==0)
## Sanity check: TRUE
# computing percentages
ema$light.p <- 100*ema$light/ema$TST
ema$deep.p <- 100*ema$deep/ema$TST
ema$rem.p <- 100*ema$rem/ema$TST

# sorting dataset columns
AD <- which(colnames(ema)=="rem")
ema <- ema[,c(1:AD,(ncol(ema)-2):ncol(ema),(AD+1):(ncol(ema)-3))]
ema$totalStages <- NULL

# plotting
grid.arrange(ggplot(ema,aes(light))+geom_histogram(),ggplot(ema,aes(light.p))+geom_histogram(),
             ggplot(ema,aes(deep))+geom_histogram(),ggplot(ema,aes(deep.p))+geom_histogram(),
             ggplot(ema,aes(rem))+geom_histogram(),ggplot(ema,aes(rem.p))+geom_histogram(),nrow=3)


1.2.3. Numeric timings

Here, we recode the variables SO and WakeUp as numeric, indicating their relative distance in hours from midnight. Positive values will reflect hours after midnight, whereas negative values will reflect hours before midnight.

# isolating hour from date
ema$SO.num <- as.POSIXct(paste(hour(ema$SO),minute(ema$SO)),format="%H %M") 
ema$WakeUp.num <- as.POSIXct(paste(hour(ema$WakeUp),minute(ema$WakeUp)),format="%H %M")

# sorting dataset columns
AD <- which(colnames(ema)=="midSleep")
ema <- ema[,c(1:AD,(ncol(ema)-1):ncol(ema),(AD+1):(ncol(ema)-2))]

# computing distance from midnight
ema$SO.num <- as.numeric(difftime(ema$SO.num,
                                  as.POSIXct(paste(substr(Sys.time(),1,10),"00:00:00"),tz="GMT"),units="hours"))
ema$WakeUp.num <- as.numeric(difftime(ema$WakeUp.num,
                                      as.POSIXct(paste(substr(Sys.time(),1,10),"00:00:00"),tz="GMT"),units="hours"))

# subtracting 24h to cases with values before midnight (i.e., so that positive = after, and negative = before)
ema[!is.na(ema$SO.num) & ema$SO.num>15,"SO.num"] <- 
  ema[!is.na(ema$SO.num) & ema$SO.num>15,"SO.num"] - 24 
ema[!is.na(ema$WakeUp.num) & ema$WakeUp.num>15,"WakeUp.num"] <- 
  ema[!is.na(ema$WakeUp.num) & ema$WakeUp.num>15,"WakeUp.num"] - 24

# plotting
grid.arrange(ggplot(ema,aes(SO.num)) + geom_histogram() + ggtitle("Sleep Onset (hours from 00:00)") + xlim(c(-6,15)),
             ggplot(ema,aes(WakeUp.num)) + geom_histogram() + ggtitle("Wake-up time (hours from 00:00)") + xlim(c(-6,15)))

# sanity check
cat("Sanity check:",
    nrow(ema[!is.na(ema$TIB) & ((ema$SO.num > max(ema$TIB,na.rm=TRUE) & ema$WakeUp.num > max(ema$TIB,na.rm=TRUE)) |
                                  ema$SO.num < (-6) | ema$SO.num > (6)),])==0)
## Sanity check: TRUE


Comments:

  • SO and WakeUp have been effectively converted as their distance from midnight

  • no cases have SO.num or WakeUp.num higher than 15, or SO.num lower than -6 or higher than 6


1.2.4. TotalSteps1000

Here, we recode the TotalSteps variable to be quantified in thousands of steps rather than in single steps.

# computing TotalSteps1000
ema$TotalSteps1000 <- ema$TotalSteps/1000

# sorting dataset columns
AD <- which(colnames(ema)=="TotalSteps")
ema <- ema[,c(1:AD,ncol(ema),(AD+1):(ncol(ema)-1))]

# plotting
grid.arrange(ggplot(ema,aes(TotalSteps))+geom_histogram(),ggplot(ema,aes(TotalSteps1000))+geom_histogram(),nrow=1)


1.2.5. Late responses

Here, we create the variable diary.nextDay to mark those cases in which daily diaries were filled on the following day, that is we we mark the cases with dailyDiary StartedTime after sleepLog EndTime, and those with valid dailyDiary but invalid sleepLog data with a StartedTime later than 6 AM.

# marking cases with StartedTime > EndTime
ema[!is.na(ema$dailyStress) & !is.na(ema$TIB),c("diary.nextDay","diary.nextDay2")] <- 0
ema[!is.na(ema$dailyStress) & !is.na(ema$TIB) & ema$StartedTime>ema$EndTime,"diary.nextDay"] <- 1
cat(nrow(ema[!is.na(ema$diary.nextDay) & ema$diary.nextDay==1,]),"diary responses entered after wake up (",
    round(100*nrow(ema[!is.na(ema$diary.nextDay) & ema$diary.nextDay==1,])/nrow(ema[!is.na(ema$StartedTime),]),1),"% )")
## 790 diary responses entered after wake up ( 16 % )
# marking cases with StartedTime > 6 AM and < 8 PM
ema$StartedHour <- as.POSIXct(paste(hour(ema$StartedTime),minute(ema$StartedTime)),format="%H %M",tz="GMT") # isolating time
sixPM <- as.POSIXct(paste(substr(Sys.time(),1,10),"06:00:00"),tz="GMT") # 6 AM
ema[!is.na(ema$StartedHour) & ema$StartedHour>sixPM & ema$StartedHour<sixPM+14*60*60,"diary.nextDay2"] <- 1
ema[!is.na(ema$StartedHour) & (ema$StartedHour<=sixPM | ema$StartedHour>=sixPM+14*60*60),"diary.nextDay2"] <- 0
cat(nrow(ema[!is.na(ema$diary.nextDay2) & ema$diary.nextDay2==1,]),"diary responses entered after 6 AM (",
    round(100*nrow(ema[!is.na(ema$diary.nextDay2) & ema$diary.nextDay2==1,])/nrow(ema[!is.na(ema$StartedTime),]),1),
    "% ), of which",nrow(ema[!is.na(ema$diary.nextDay2) & ema$diary.nextDay2==1 &
                               !is.na(ema$diary.nextDay) & ema$diary.nextDay==0,]),"marked with diary.nextDay = 0 (showed below)")
## 998 diary responses entered after 6 AM ( 20.2 % ), of which 36 marked with diary.nextDay = 0 (showed below)
# showing cases with StartedTime > 6 AM and diary.nextDay = 0
ema[!is.na(ema$diary.nextDay2) & ema$diary.nextDay2==1 & !is.na(ema$diary.nextDay) & ema$diary.nextDay==0,
    c("ID","ActivityDate","StartTime","EndTime","StartedTime","diary.nextDay")]
# joining diary.nextDay and diary.nextDay2
ema[!is.na(ema$diary.nextDay2) & ema$diary.nextDay2==1,"diary.nextDay"] <- 1
ema[is.na(ema$diary.nextDay) & !is.na(ema$diary.nextDay2) & ema$diary.nextDay2==0,"diary.nextDay"] <- 0
ema$StartedHour <- ema$diary.nextDay2 <- NULL

# sorting columns
AD <- which(colnames(ema)=="surveyDuration")
ema <- ema[,c(1:AD,ncol(ema),(AD+1):(ncol(ema)-1))]

# printing final info
cat(nrow(ema[!is.na(ema$diary.nextDay) & ema$diary.nextDay==1,]),"diary responses entered after wake up or after 6 AM (",
    round(100*nrow(ema[!is.na(ema$diary.nextDay) & ema$diary.nextDay==1,])/nrow(ema[!is.na(ema$StartedTime),]),1),"% )")
## 1010 diary responses entered after wake up or after 6 AM ( 20.5 % )


Comments:

  • in 790 cases (16%), participants filled the diary after sleepLog WakeUp time

  • in 998 cases (20.2%), participants filled the diary after 6 AM

  • overall, in 1,010 cases (25.5%), participants filled the diary on the following day


1.2.6. weekends

Here, the ActivityDate variable is used to discriminate the observations collected in weekdays (Tuesday to Friday) vs weekend days (Saturday and Sunday). Since sleep habits are mainly influenced by the following day than the previous one (i.e., people go to bed late on Friday, and early on Sunday), we create two variables: weekday (marking Saturday and Sunday as “weekend”), and weekday.sleep (marking Friday and Saturday as “weekend”).

# extracting weekday from Activity Date
ema$wd <- weekdays(as.Date(ema$ActivityDate))

# creating weekday
ema$weekday <- "weekday"
ema[ema$wd=="sabato" | ema$wd=="domenica","weekday"] <- "weekend"

# creating weekday.sleep
ema$weekday.sleep <- "weekday"
ema[ema$wd=="venerdì" | ema$wd=="sabato","weekday.sleep"] <- "weekend"
ema$wd <- NULL

# sorting dataset
AD <- which(colnames(ema)=="ActivityDate")
ema <- ema[,c(1:AD,ncol(ema)-1,ncol(ema),(AD+1):(ncol(ema)-2))]

# summarizing
ema[,c("weekday","weekday.sleep")] <- lapply(ema[,c("weekday","weekday.sleep")],as.factor)
summary(ema[,c("weekday","weekday.sleep")])
##     weekday     weekday.sleep 
##  weekday:4463   weekday:4451  
##  weekend:1756   weekend:1768


1.2.7. COVID-19 emergency

Here, the ActivityDate variable is used to create the variable covid19 highlighting the cases recorded during the COVID-19 emergency, whose beginning in the San Francisco Bay Area was identified around March 20th, 2020.

# creating covid19 variable
ema$covid19 <- "pre-COVID19"
ema[ema$ActivityDate>=as.Date("2020-03-20"),"covid19"] <- "post-COVID19"
ema$covid19 <- factor(ema$covid19,levels=c("pre-COVID19","post-COVID19"))

# sorting dataset columns
ema <- ema[,c(1:AD,ncol(ema),(AD+1):(ncol(ema)-1))]

# plotting
ggplot(ema,aes(ActivityDate)) + geom_histogram() + geom_vline(xintercept=as.Date("2020-03-20"),color="red")

# summarizing cases
summary(ema$covid19) # 2,317 post-COVID19 (37%)
##  pre-COVID19 post-COVID19 
##         3902         2317
# No. participants with both pre-COVID19 and post-COVID19 observations
pre <- levels(as.factor(as.character(ema[ema$covid19=="pre-COVID19","ID"])))
post <- levels(as.factor(as.character(ema[ema$covid19=="post-COVID19","ID"])))
cat("-",length(pre[!(pre%in%post)]),"participants with only pre-COVID19 observations",
    "\n-",length(post[!(post%in%pre)]),"participants with only post-COVID19 observations",
    "\n-",length(post[post%in%pre]),"participants with both pre- and post-COVID19 observations")
## - 54 participants with only pre-COVID19 observations 
## - 32 participants with only post-COVID19 observations 
## - 7 participants with both pre- and post-COVID19 observations


Comments:

  • the 37% of the data was collected after the COVID-19 outbreak

  • nevertheless, only seven participants have data collected both before and after the COVID-19 outbreak, with most participants having ended their data collection period before March 20th, 2020, and a smaller group (N = 32) having started four-to-five months later

  • consequently, any comparison will be mainly a comparison between subjects


1.3. Data dictionary

Here, we describe each variable included in the ema dataset.

# removing unuseful variables
toRemove <- c("LogId","SOL","nAwake","fragIndex","midSleep",paste("meanHR",1:3,sep=""),paste("sleepHR",1:3,sep=""),
              paste("HRtimeCoeff",1:3,sep=""),paste("stageHR",c("TST","SOL","WASO"),sep="_"),"majMiss","minMiss")
ema[,toRemove] <- NULL

# changing diary items names
colnames(ema)[which(colnames(ema)=="dailyStress")] <- "stress"
colnames(ema)[which(colnames(ema)=="eveningMood")] <- "mood"
colnames(ema)[which(colnames(ema)=="eveningWorry")] <- "worry"

# changing HR variable names
colnames(ema)[which(colnames(ema)=="stageHR_NREM")] <- "HR_NREM"
colnames(ema)[which(colnames(ema)=="stageHR_REM")] <- "HR_REM"

# showing variables
str(ema)
## 'data.frame':    6219 obs. of  54 variables:
##  $ ID            : Factor w/ 93 levels "s001","s002",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ sex           : Factor w/ 2 levels "F","M": 1 1 1 1 1 1 1 1 1 1 ...
##  $ age           : num  19.3 19.3 19.3 19.3 19.3 ...
##  $ BMI           : num  20.6 20.6 20.6 20.6 20.6 ...
##  $ insomnia      : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
##  $ insomnia.group: Factor w/ 3 levels "control","DSM.ins",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ ActivityDate  : Date, format: "2019-01-07" "2019-01-08" ...
##  $ covid19       : Factor w/ 2 levels "pre-COVID19",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ weekday       : Factor w/ 2 levels "weekday","weekend": 1 1 1 1 1 2 2 1 1 1 ...
##  $ weekday.sleep : Factor w/ 2 levels "weekday","weekend": 1 1 1 1 2 2 1 1 1 1 ...
##  $ dayNr         : num  1 2 3 4 5 6 7 8 9 10 ...
##  $ partdayNr     : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ StartTime     : POSIXct, format: "2019-01-07 22:06:30" "2019-01-08 21:59:30" ...
##  $ EndTime       : POSIXct, format: "2019-01-08 08:07:30" "2019-01-09 07:45:00" ...
##  $ SleepDataType : Factor w/ 2 levels "classic","stages": 2 2 2 NA NA NA NA NA NA NA ...
##  $ EBEDataType   : chr  "stages" "stages" "stages" NA ...
##  $ TIB           : num  601 586 538 NA NA ...
##  $ TST           : num  507 502 470 NA NA ...
##  $ SE            : num  84.4 85.8 87.3 NA NA ...
##  $ SO            : POSIXct, format: "2019-01-07 22:06:30" "2019-01-08 22:00:00" ...
##  $ WakeUp        : POSIXct, format: "2019-01-08 08:07:30" "2019-01-09 07:45:00" ...
##  $ SO.num        : num  -1.9 -2 -1.22 NA NA ...
##  $ WakeUp.num    : num  8.12 7.75 7.72 NA NA ...
##  $ WASO          : num  94 82.5 66 NA NA NA NA NA NA NA ...
##  $ light         : num  353 297 283 NA NA NA NA NA NA NA ...
##  $ deep          : num  66 91 97.5 NA NA NA NA NA NA NA ...
##  $ rem           : num  88 114 89 NA NA ...
##  $ light.p       : num  69.6 59.1 60.3 NA NA ...
##  $ deep.p        : num  13 18.1 20.8 NA NA ...
##  $ rem.p         : num  17.4 22.8 19 NA NA ...
##  $ HR_NREM       : num  63.3 63.4 61.9 NA NA ...
##  $ HR_REM        : num  63.6 63.7 61.5 NA NA ...
##  $ TotalSteps    : int  NA 16343 14904 7898 NA NA NA NA NA 6718 ...
##  $ TotalSteps1000: num  NA 16.3 14.9 7.9 NA ...
##  $ StartedTime   : POSIXct, format: "2019-01-07 21:00:00" "2019-01-08 21:07:00" ...
##  $ SubmittedTime : POSIXct, format: "2019-01-07 21:01:00" "2019-01-08 21:07:00" ...
##  $ surveyDuration: num  1 0 0 0 NA 0 1 0 0 0 ...
##  $ diary.nextDay : num  0 0 0 0 NA 0 1 0 0 0 ...
##  $ stress        : num  3 1 3 2 NA 3 1 2 1 2 ...
##  $ worry         : num  2 2 3 2 NA 1 1 1 2 1 ...
##  $ mood          : num  4 4 4 5 NA 5 5 5 4 5 ...
##  $ stress_school : Factor w/ 2 levels "0","1": 2 1 2 2 NA 1 1 2 1 2 ...
##  $ stress_family : Factor w/ 2 levels "0","1": 2 1 1 1 NA 1 1 1 1 1 ...
##  $ stress_health : Factor w/ 2 levels "0","1": NA NA NA NA NA NA NA NA NA NA ...
##  $ stress_COVID  : Factor w/ 2 levels "0","1": NA NA NA NA NA NA NA NA NA NA ...
##  $ stress_peers  : Factor w/ 2 levels "0","1": 1 1 1 1 NA 1 1 1 1 1 ...
##  $ stress_other  : Factor w/ 2 levels "0","1": 1 1 1 1 NA 2 1 1 1 1 ...
##  $ worry_school  : Factor w/ 2 levels "0","1": 2 1 2 2 NA 1 1 1 2 1 ...
##  $ worry_family  : Factor w/ 2 levels "0","1": 1 1 1 1 NA 1 1 1 1 1 ...
##  $ worry_health  : Factor w/ 2 levels "0","1": NA NA NA NA NA NA NA NA NA NA ...
##  $ worry_peer    : Factor w/ 2 levels "0","1": 1 1 1 1 NA 1 1 1 1 1 ...
##  $ worry_COVID   : Factor w/ 2 levels "0","1": NA NA NA NA NA NA NA NA NA NA ...
##  $ worry_sleep   : Factor w/ 2 levels "0","1": NA NA NA NA NA NA NA NA NA NA ...
##  $ worry_other   : Factor w/ 2 levels "0","1": 1 2 1 2 NA 1 1 1 1 1 ...


INDIVIDUAL-LEVEL VARIABLES

Demographics:

  1. ID = participants’ identification code

  2. sex = participants’ sex (“F” = female, “M” = male)

  3. age = participants’ age (years)

  4. BMI = participants’ BMI (kg*m^(-2))

  5. insomnia = participants’ group (1 = insomnia, 0 = control)

  6. insomnia.group = participants’ insomnia group (“control” = control, “DSM.ins” = DSM insomnia, “sub.ins” = insomnia subthreshold)


DAY-LEVEL VARIABLES

  1. ActivityDate = day of assessment (date in “mm-dd-yyyy” format)

  2. covid19 = factor indicating whether the observation was collected “pre-COVID19” or “post-COVID19”

  3. weekday = factor indicating whther the observation was collected during a “weekday” (Mon-Fry) or “weekend” (Sat-Sun)

  4. weekday.sleep = factor indicating whther the observation was collected during a night preceding a “weekday” (Sun-Thu) or “weekend” (Fry-Sat)

  5. dayNr = No. of days since the first day of assessment, within participant

  6. partdayNr = current ‘valid’ day of assessment


sleepLog

  1. StartTime = sleep period start hour (in “mm-dd-yyyy hh:mm:ss” format)

  2. EndTime = sleep period end hour (in “mm-dd-yyyy hh:mm:ss” format)

  3. SleepDataType: sleep data type originally scored by Fitabase

  4. EBEDataType: type of EBE data used for manually recomputing sleep measures (i.e., updated by excluding the last wake epochs)

  5. TIB = time in bed (min) computed as the number of minutes between StartTime (i.e., considering missing epochs at the beginning as wake epochs) and the last epoch included in EBE data (i.e., excluding the last wake epochs)

  6. TST = total sleep time (min)

  7. SE = sleep efficiency as the percent of TST over TIB (%)

  8. SO = sleep onset hour (in “mm-dd-yyyy hh:mm:ss” format) corresponding to the time of the first epoch classified as sleep

  9. WakeUp = wake-up time (in “mm-dd-yyyy hh:mm:ss” format) corresponding to the time of the last epoch classified as sleep

  10. SO.num = SO expressed as the No. of hours from midnight

  11. WakeUp.num = WakeUp expressed as the No. of hours from midnight

  12. WASO = wake after sleep onset (min)

  13. light = No. of minutes classified as ‘light’ sleep

  14. deep = No. of minutes classified as ‘deep’ sleep

  15. rem = No. of minutes classified as REM sleep

  16. light.p = Percentage of light sleep over TST

  17. deep.p = Percentage of deep sleep over TST

  18. rem.p = Percentage of REM sleep over TST


sleepHR

  1. HR_NREM = mean HR values (bpm) computed from all epochs categorized as NREM sleep

  2. HR_REM = mean HR values (bpm) computed from all epochs categorized as REM sleep


dailyAct

  1. TotalSteps = sum of the No. of steps recoded in each day (recomputed from hourly steps data)

  2. TotalSteps1000 = TotalSteps expressed in thousands of steps


dailyDiary

  1. StartedTime = initiation hour of the diary form (in “mm-dd-yyyy hh:mm:ss” format)

  2. SubmittedTime = submission hour of the diary form (in “mm-dd-yyyy hh:mm:ss” format)

  3. surveyDuration = duration of the survey (min)

  4. diary.nextDay = factor indicating whether diary ratings were entered on the following day (1) or not (0)

  5. stress = score at the daily stress item (1-5)

  6. worry = score at the evening worry item (1-5)

  7. mood = score at the evening Mood item (1-5)

  8. stress_school, ..., stress_other = stressor categories (0 or 1)

  9. worry_school, ..., worry_other = sources of worry (0 or 1)


2. Psychometrics

Here, we evaluate the psychometric proprieties of the psychological distress scale resulting from the three Likert-type items submitted each evening: stress, mood, and worry.


First, we load the required packages, and we negatively recode mood so that higher scores reflect negative mood, coherently with the other items.

# loading packages
library(lavaan); library(MuMIn)

# negatively recoding evening mood (higher values = negative mood)
ema$mood <- 6 - ema$mood


2.1. Descriptives

Then, we use the corr.matrices function to visualize the correlation matrix of item scores at the overall, between- and within-subject level, and the itemsICC function to compute items intraclass correlation coefficients (ICC) for each item. Note that only cases with complete responses to each item ware kept in the dataset.

show corr.matrices

corr.matrices <- function(data=data,IDvar="ID",items=c("stress","worry","mood")){ 
  require(ggplot2); require(dplyr); require(reshape2)
  
  data <- na.omit(data[,c(IDvar,items)])
  data <- plyr::ddply(data,"ID",transform,day=seq_along(ID))
  data <- data %>%
  group_by(ID) %>% # grouping by ID
  mutate(stress.cm = mean(na.omit(stress)), # computing individual mean for any given item
         worry.cm = mean(na.omit(worry)),mood.cm = mean(na.omit(mood)),
         
         stress.dm = stress-stress.cm, # computing within-individual deviations (occasional score - mean score)
         worry.dm = worry-worry.cm, mood.dm = mood-mood.cm)
  
  # Matrix 1 (all scores as independent)
  c1 <- melt(cor(data[,which(colnames(data)=="stress"):which(colnames(data)=="mood")],
                 data[,which(colnames(data)=="stress"):which(colnames(data)=="mood")],
                 use="complete.obs",method="pearson"))
  p1 <- ggplot(c1,aes(x=Var1, y=Var2, fill=value)) + 
    geom_tile() + 
    geom_text(aes(x = Var1, y = Var2, label = round(value,2)),color="black",size=5)+
    ggtitle(paste("Correlation Matrix 1 (all independent); N =",nrow(data)))+labs(x="",y="")+
    scale_fill_gradient2(low="darkblue",high="#f03b20",
                         mid="white",midpoint=0,
                         limit = c(-1,1), space = "Lab",
                         name="Pearson\nCorrelation",
                         guide="legend",
                         breaks=round(seq(1,-1,length.out = 11),2),
                         minor_breaks=round(seq(1,-1,length.out = 11),2)) + theme_minimal() +
    theme(text=element_text(face="bold",size=12))
  
  # Matrix 2 (individual means)
  DATA <- data[data$day==1,]
  c2 <- melt(cor(DATA[,which(colnames(DATA)=="stress.cm"):which(colnames(DATA)=="mood.cm")],
                 DATA[,which(colnames(DATA)=="stress.cm"):which(colnames(DATA)=="mood.cm")],
                 use="complete.obs",method="pearson"))
  p2 <- ggplot(c2,aes(x=Var1, y=Var2, fill=value)) + geom_tile() +
    geom_text(aes(x = Var1, y = Var2, label = round(value,2)),color="black",size=5)+
    ggtitle(paste("Correlation Matrix 2 (individual means); N =",nrow(DATA)))+labs(x="",y="")+
    scale_fill_gradient2(low="darkblue",high="#f03b20",
                         mid="white",midpoint=0,
                         limit = c(-1,1), space = "Lab",
                         name="Pearson\nCorrelation",
                         guide="legend",
                         breaks=round(seq(1,-1,length.out = 11),2),
                         minor_breaks=round(seq(1,-1,length.out = 11),2)) + theme_minimal() +
    theme(text=element_text(face="bold",size=12))
  
  # Matrix 3 (deviations from individual means)
  c3 <- melt(cor(data[,which(colnames(data)=="stress.dm"):which(colnames(data)=="mood.dm")],
                 data[,which(colnames(data)=="stress.dm"):which(colnames(data)=="mood.dm")],
                 use="complete.obs",method="pearson"))
  p3 <- ggplot(c3,aes(x=Var1, y=Var2, fill=value)) + 
    geom_tile() + 
    geom_text(aes(x = Var1, y = Var2, label = round(value,2)),color="black",size=5)+
    ggtitle(paste("Correlation Matrix 3 (mean-centered scores); N =",nrow(data)))+labs(x="",y="")+
    scale_fill_gradient2(low="darkblue",high="#f03b20",
                         mid="white",midpoint=0,
                         limit = c(-1,1), space = "Lab",
                         name="Pearson\nCorrelation",
                         guide="legend",
                         breaks=round(seq(1,-1,length.out = 11),2),
                         minor_breaks=round(seq(1,-1,length.out = 11),2)) + theme_minimal() +
    theme(text=element_text(face="bold",size=12))
  
  return(list(p1,p2,p3))}

show itemsICC

itemsICC <- function(data,items,IDvar,output="text",digits=2){ require(lme4)
  data <- na.omit(data[,c(IDvar,items)])
  res <- data.frame(item=NA,icc=NA)
  for(i in 1:length(items)){
    m <- lmer(formula=gsub("ID",IDvar,gsub("d1",items[i],"d1~(1|ID)")),data=data) # VAR_between / (VAR_between + VAR_within)
    out <- round(as.data.frame(lme4::VarCorr(m))[1,4]/(as.data.frame(lme4::VarCorr(m))[1,4]+as.data.frame(lme4::VarCorr(m))[2,4]),
                 digits)
    
    # textual output or data.frame
    if(output=="text"){cat(items[i],"ICC =",out,"\n")
    }else{ res <- rbind(res,cbind(item=items[i],icc=out)) }} 
  if(output!="text"){return(res) }}

# plotting item scores distributions
grid.arrange(ggplot(ema,aes(stress)) + geom_histogram(),
             ggplot(ema,aes(worry)) + geom_histogram(),
             ggplot(ema,aes(mood)) + geom_histogram(),nrow=1)

# correlation matrix considering all observations as independent
corr.matrices(data=ema,IDvar="ID",items=c("stress","worry","mood"))[[1]]

# correlation matrix considering mean scores
corr.matrices(data=ema,IDvar="ID",items=c("stress","worry","mood"))[[2]]

# correlation matrix considering person mean-centered scores
corr.matrices(data=ema,IDvar="ID",items=c("stress","worry","mood"))[[3]]

# computing item ICCs
itemsICC(data=ema,IDvar="ID",items=c("stress","worry","mood"))
## stress ICC = 0.26 
## worry ICC = 0.33 
## mood ICC = 0.25


Comments

  • item scores are quite negatively skewed, especially in the case of stress and worry, implying some deviation from the assumption of the MCFA model below

  • item scores show a pattern of positive medium-to-strong correlations, indicating an internal coherence of the scale at both levels, with level-2 (between) correlations being higher than level-1 (within) correlations, as expected

  • at level 2 (between), we also observe a strong correlation between stress and worry, in contrast to the moderate correlations shown between mood and both stress and worry

  • items ICCs show similar values, indicating that most of the variance (from 67 to 75%) is at the within-individual level


2.2. Multilevel CFA

Then, we perform a Multilevel Confirmatory Factor Analysis (MCFA) by specifying a measurement model that assumes a configural cluster construct (with the same mono-factor structure at both levels). Following Jak & Jorgensen (2017), we also test weak and strong cross-level isomorphism. The fit.ind function from https://github.com/Luca-Menghini/LuMenPsy.Rfunctions is used to optimize model comparison.

show fit.ind

fit.ind <- function(model=NA,from_summary=FALSE,type="multilevel",models.names=NA,
                    fits=c("npar","chisq","df","pvalue","rmsea","cfi","srmr_within","srmr_between")){ require(lavaan); require(MuMIn)
  # removing level-specific fit indices when model is "monolevel"
  if(type=="monolevel"){
      fits <- gsub("srmr_within","srmr",fits)
      fits <- fits[fits!="srmr_between"] }
  if(from_summary==FALSE){
    # returning dataframe of models fit indices when more than one model is considered
    if(length(model)>1){
      fit.indices <- fitmeasures(model[[1]])[fits]
      for(i in 2:length(model)){
        fit.indices <- rbind(fit.indices,fitmeasures(model[[i]])[fits]) }
      if(!is.na(models.names[1])){
        row.names(fit.indices) <- models.names }
      return(as.data.frame(fit.indices))
      } else { return(fitmeasures(model)[fits]) }
    
    } else { # in some cases the fit indices are available only from the model's summary 
      quiet <- function(fit) { # this was written by Alicia FRANCO MARTÍNEZ on the lavaan Google group
        sink(tempfile())
        on.exit(sink()) 
        invisible(summary(fit, standardized = TRUE, fit.measures=TRUE))
        } 
      sum <- quiet(model)
      fit.indices <- sum$FIT[fits]
      return(fit.indices)}}


2.2.1. Model specification

First, we specify all models of interest.

We start with the preliminary steps suggested by Hox (2010, chapter 14) to evaluate the factor structure at level 2: the following benchmark models imply (1) no specification at level 2 (Null model), (2) only variances but not covariances at level 2 (Independence model), and (3) a full covariance matrix at level 2 (Saturated model).

m.null <- 'level: 1
           Distress_w =~ stress + worry + mood
           level: 2
           stress ~~ 0*stress
           worry ~~ 0*worry
           mood ~~ 0*mood'

m.ind <- 'level: 1
          Distress_w =~ stress + worry + mood
          level: 2
          stress ~~ stress
          worry ~~ worry
          mood ~~ mood'

m.sat <- 'level: 1
          Distress_w =~ stress + worry + mood
          level: 2
          stress ~~ stress + worry + mood
          worry ~~ worry + mood
          mood ~~ mood'


Then, we specify the target models: the configural model (i.e., same structure at level 1 and 2), the weak invariance model (i.e., equivalent factor loading at level 1 and level 2), and the strong invariance model (i.e., equivalent factor loadings, and no residual variance at level 2).

m.conf <- 'level: 1
           Distress_w =~ stress + worry + mood
           level: 2
           Distress_b =~ stress + worry + mood'

m.weak <- 'level: 1
           Distress_w =~ a*stress + b*worry + c*mood
           level: 2
           Distress_b =~ a*stress + b*worry + c*mood'

m.strong <- 'level: 1
             Distress_w =~ a*stress + b*worry + c*mood
             level: 2
             Distress_b =~ a*stress + b*worry + c*mood
             stress ~~ 0*stress
             worry ~~ 0*worry
             mood ~~ 0*mood'


2.2.2. Model fit

Here, we fit the specified models to the observed data.

We start with the benchmark models.

# selecting only observations with complete dailyDiary responses
dailyDiary <- na.omit(ema[,c("ID","stress","worry","mood")])

# fitting models
fit.null <- cfa(model=m.null,data=dailyDiary,cluster="ID",std.lv=TRUE)
## Warning in lav_data_full(data = data, group = group, cluster = cluster, : lavaan WARNING:
##     Level-1 variable "stress" has no variance within some clusters.
##     The cluster ids with zero within variance are: s050
## Warning in lav_data_full(data = data, group = group, cluster = cluster, : lavaan WARNING:
##     Level-1 variable "worry" has no variance within some clusters. The
##     cluster ids with zero within variance are: s050
fit.ind <- cfa(model=m.ind,data=dailyDiary,cluster="ID",std.lv=TRUE)
## Warning in lav_data_full(data = data, group = group, cluster = cluster, : lavaan WARNING:
##     Level-1 variable "stress" has no variance within some clusters.
##     The cluster ids with zero within variance are: s050

## Warning in lav_data_full(data = data, group = group, cluster = cluster, : lavaan WARNING:
##     Level-1 variable "worry" has no variance within some clusters. The
##     cluster ids with zero within variance are: s050
fit.sat <- cfa(model=m.sat,data=dailyDiary,cluster="ID",std.lv=TRUE)
## Warning in lav_data_full(data = data, group = group, cluster = cluster, : lavaan WARNING:
##     Level-1 variable "stress" has no variance within some clusters.
##     The cluster ids with zero within variance are: s050

## Warning in lav_data_full(data = data, group = group, cluster = cluster, : lavaan WARNING:
##     Level-1 variable "worry" has no variance within some clusters. The
##     cluster ids with zero within variance are: s050


Then, we fit the target models.

# fitting models
fit.conf <- cfa(model=m.conf,data=dailyDiary,cluster="ID",std.lv=TRUE) # Heywood case
## Warning in lav_data_full(data = data, group = group, cluster = cluster, : lavaan WARNING:
##     Level-1 variable "stress" has no variance within some clusters.
##     The cluster ids with zero within variance are: s050
## Warning in lav_data_full(data = data, group = group, cluster = cluster, : lavaan WARNING:
##     Level-1 variable "worry" has no variance within some clusters. The
##     cluster ids with zero within variance are: s050
## Warning in lav_object_post_check(object): lavaan WARNING: some estimated ov
## variances are negative
fit.weak <- cfa(model=m.weak,data=dailyDiary,cluster="ID",std.lv=TRUE)
## Warning in lav_data_full(data = data, group = group, cluster = cluster, : lavaan WARNING:
##     Level-1 variable "stress" has no variance within some clusters.
##     The cluster ids with zero within variance are: s050
## Warning in lav_data_full(data = data, group = group, cluster = cluster, : lavaan WARNING:
##     Level-1 variable "worry" has no variance within some clusters. The
##     cluster ids with zero within variance are: s050
fit.strong <- cfa(model=m.strong,data=dailyDiary,cluster="ID",std.lv=TRUE)
## Warning in lav_data_full(data = data, group = group, cluster = cluster, : lavaan WARNING:
##     Level-1 variable "stress" has no variance within some clusters.
##     The cluster ids with zero within variance are: s050

## Warning in lav_data_full(data = data, group = group, cluster = cluster, : lavaan WARNING:
##     Level-1 variable "worry" has no variance within some clusters. The
##     cluster ids with zero within variance are: s050


Comments:

  • participant s050 has no intraindividual varianec for items stress and worry (but we keep it for now)

  • an Heywood case (i.e., estimated negative variance for item worry) is observed for the configural model


2.2.1.1. Heywood cases

Here, we can see that the Heywood case does not seem to be caused by structural misspecification (the upper CI for that parameter is positive), but most probably to sample fluctuations around a parameter close to zero (see Kolenikov & Bollen, 2012).

# structural misspecification? But upper CI is > 0
parameterestimates(fit.conf)[parameterestimates(fit.conf)$op=="~~" & parameterestimates(fit.conf)$ci.lower<0,
                             c("lhs","level","est","se","ci.lower","ci.upper")]


To solve the problem, we first try to fix the residual variance of item worry to the 15% of its estimated variance at level 2 (i.e., var = .15 x rho2_between) (see Joreskog & Sobrom 1996).

# estimating lv-2 variance for worry item
library(lme4)
fit <- lmer(worry ~ 1 + (1|ID),data=dailyDiary) # null LMER model
varlv2 <- as.data.frame(lme4::VarCorr(fit))[1,4] # between-subjects variance of item t2

# re-fitting the configural model with the new variance constraint
fit.conf.fix <- cfa(model=gsub("rho2",varlv2*0.15,
                               "level: 1
                                Distress_w =~ stress + worry + mood
                                level: 2
                                Distress_b =~ stress + worry + mood
                                worry ~~ rho2*worry"),data=dailyDiary,cluster='ID',std.lv=TRUE)
## Warning in lav_data_full(data = data, group = group, cluster = cluster, : lavaan WARNING:
##     Level-1 variable "stress" has no variance within some clusters.
##     The cluster ids with zero within variance are: s050
## Warning in lav_data_full(data = data, group = group, cluster = cluster, : lavaan WARNING:
##     Level-1 variable "worry" has no variance within some clusters. The
##     cluster ids with zero within variance are: s050
# inspecting covariances
parameterestimates(fit.conf.fix)[parameterestimates(fit.conf.fix)$op=="~~" & parameterestimates(fit.conf)$ci.lower<0,
                                 c("lhs","level","est","se","ci.lower","ci.upper")]


Comments:

  • the re-fitting of the configural model with worry variance fixed at the 15% of the total variance at level 2 does no show further improper solutions

  • now, item stress show a negative lower CI, but the estimate is positive


For better comparison, we also re-fit the weak invariance model with the same constraint.

# re-fitting the weak invariance model with the new variance constraint
fit.weak.fix <- cfa(model=gsub("rho2",varlv2*0.15,
                               "level: 1
                                Distress_w =~ a*stress + b*worry + c*mood
                                level: 2
                                Distress_b =~ a*stress + b*worry + c*mood
                                worry ~~ rho2*worry"),data=dailyDiary,cluster='ID',std.lv=TRUE)
## Warning in lav_data_full(data = data, group = group, cluster = cluster, : lavaan WARNING:
##     Level-1 variable "stress" has no variance within some clusters.
##     The cluster ids with zero within variance are: s050
## Warning in lav_data_full(data = data, group = group, cluster = cluster, : lavaan WARNING:
##     Level-1 variable "worry" has no variance within some clusters. The
##     cluster ids with zero within variance are: s050


As an alternative solution, we inspect the changes in residual variances following the removal of each participant. This is done with the sample.fluct and plot.infl functions from https://github.com/Luca-Menghini/LuMenPsy.Rfunctions.

show sample.fluct

#' @title Influential analysis
#' @param data = data.frame used by the model.
#' @param cluster = Character. Variable name in the data frame defining the cluster in a two-level dataset.
#' @param parameter = Character. "var" for estimated variances, "load" for loadings.
#' @param st = Character indicating the standardization level of loadings: "st.all" or "st.lv".
#' @param n.items = Integer. Number of observed variables considered by the model.
#' @param item.labels = Character vector indicating the labels of the considered observed variables.
#' @param m = Character string specifying the model using the lavaan synthax.

sample.fluct <- function(data=NA,cluster="ID",parameter="var",st="st.all",
                         n.items=3,item.labels=c("stress","worry","mood"),
                         m = 'level: 1
                              Distress_w =~ stress + worry + mood
                              level: 2
                              Distress_b =~ stress + worry + mood'){ require(lavaan); require(tcltk)
  
  # function to estiamte and store estimated parameters
  estTable <- function(data=data,m=m,cluster=cluster){
    
    # model fitting
    m.res <- cfa(model=m,data=data,cluster=cluster,std.lv=TRUE)
    
    # parameters
    if(parameter=="var"){
      p <- parameterestimates(m.res)
      lv1 <- p[p$op=="~~" & p$est!=1 & p$level==1,"est"]
      lv2 <- p[p$op=="~~" & p$est!=1 & p$level==2,"est"]
    } else if(parameter=="load"){
      if(st=="st.all"){
        p <- standardizedsolution(m.res)
        lv1 <- p[p$op=="=~","est.std"][1:n.items]
        lv2 <- p[p$op=="=~","est.std"][(n.items+1):(n.items+n.items)]
        } else if(st=="st.lv"){
          p <- standardizedsolution(m.res,type="std.lv")
          lv1 <- p[p$op=="=~","est.std"][1:n.items]
          lv2 <- p[p$op=="=~","est.std"][(n.items+1):(n.items+n.items)]
    } else{
      lv1 <- p[p$op=="=~" & p$est!=1 & p$level==1,"est"]
      lv2 <- p[p$op=="=~" & p$est!=1 & p$level==2,"est"] }}
    
    # dataframe storing results
    parameters <- data.frame(ID=rep("all",n.items),X=item.labels,lv1=lv1,lv2=lv2,neg.var=rep(1,n.items))
    
    # marking neg.var as "0" when no negative values are highlighted
    if(!any(diag(lavInspect(m.res,"est")[["within"]][["theta"]]) < 0) & 
       !any(diag(lavInspect(m.res,"est")[["ID"]][["theta"]]) < 0)){ 
      parameters[1:n.items,"neg.var"] <- 0 }
    
    return(parameters) }
  
  # extracting baseline parameters
  parameters <- estTable(data=data,m=m,cluster=cluster)

  # replicate parameters estimation by excluding any participant one-by-one
  IDs <- levels(as.factor(as.character(data$ID)))
  pb <- tkProgressBar("Modeling", "Data modeling",0, 100, 50) # progress bar
  for(ID in IDs){ 
    info <- sprintf("%d%% done", round(which(IDs==ID)/length(IDs)*100))
    setTkProgressBar(pb, round(which(IDs==ID)/length(IDs)*100), sprintf("Data modeling", info), info)
    parameters <- rbind(parameters,estTable(data=data[data$ID!=ID,],m=m,cluster=cluster))
    parameters[(nrow(parameters)-n.items+1):nrow(parameters),"ID"] <- as.character(ID)}
  close(pb)
  parameters <- parameters[,c("ID","X","lv1","lv2","neg.var")]
  return(parameters)}

show plot.infl

#' @title Plotting results of influential analysis
#' @param data = data.frame generated by the influential.analysis function.
#' @param parameter = Character. "var" for estimated variances, "load" for loadings.
#' @param variable = Character indicating the name of the observed variable to be plotted.
#' @param level = Integer indicating if focusing on level 1 or 2 (default).
#' @param threshold_lower = Numeric. Threeshold below which the participants'IDs are showed.
#' @param threshold_upper = Numeric. Threeshold above which the participants'IDs are showed.

plot.infl <- function(data,parameter="var",variable,level=2,
                             threshold_lower=NA,threshold_upper=NA){
  
  require(ggplot2)
  
  if(parameter=="var"){ par <- "variance"} else { par = "loading" }
  if(level==2){ colnames(data)[colnames(data)=="lv2"] <- "par" 
  } else { colnames(data)[colnames(data)=="lv1"] <- "par" }
  
  p <- ggplot(data[data$X==variable,],aes(ID,par))+
  geom_point()+ggtitle(paste(variable,par,"on level",level))+
  geom_point(data=data[data$X==variable & data$ID=="all",],colour="blue",size=5)+
  geom_point(data=data[data$X==variable & data$neg.var==1,],colour="red")+
  theme(axis.text.x = element_blank()) + xlab("")
    
    if(!is.na(threshold_lower) & is.na(threshold_upper)){
      p <- p + geom_text(data=data[data$X==variable & 
                                     data$par < threshold_lower,],
                         aes(label=ID),nudge_x=-5,size=3)
      } else if(is.na(threshold_lower) & !is.na(threshold_upper)){
        p <- p + geom_text(data=data[data$X==variable &
                                       data$par > threshold_upper,],
                           aes(label=ID),nudge_x=-5,size=3)
        } else if(!is.na(threshold_lower) & !is.na(threshold_upper)){
          p <- p + geom_text(data=data[data$X==variable & 
                                         (data$par > threshold_upper |
                                            data$par < threshold_lower),],
                             aes(label=ID),nudge_x=-5,size=3) }
  
  return(p)} 

# estimating parameters
infl <- sample.fluct(data=dailyDiary,parameter="var")

# plotting
plot.infl(data=infl,par="var",variable="worry",level=2,threshold_upper=0)


Comments:

  • the exclusion of either one among three participants (s093, s021, s123) seems to solve the Heywood case highlighted at level 2 for item worry


Here, we re-specify all target models by excluding participant s093 (i.e., associated with the lowest variance for worry).

# fitting models
fit.conf_noInfl1 <- cfa(model=m.conf,data=dailyDiary[dailyDiary$ID!="s093",],cluster="ID",std.lv=TRUE) # Heywood case
## Warning in lav_data_full(data = data, group = group, cluster = cluster, : lavaan WARNING:
##     Level-1 variable "stress" has no variance within some clusters.
##     The cluster ids with zero within variance are: s050
## Warning in lav_data_full(data = data, group = group, cluster = cluster, : lavaan WARNING:
##     Level-1 variable "worry" has no variance within some clusters. The
##     cluster ids with zero within variance are: s050
fit.weak_noInfl1 <- cfa(model=m.weak,data=dailyDiary[dailyDiary$ID!="s093",],cluster="ID",std.lv=TRUE)
## Warning in lav_data_full(data = data, group = group, cluster = cluster, : lavaan WARNING:
##     Level-1 variable "stress" has no variance within some clusters.
##     The cluster ids with zero within variance are: s050

## Warning in lav_data_full(data = data, group = group, cluster = cluster, : lavaan WARNING:
##     Level-1 variable "worry" has no variance within some clusters. The
##     cluster ids with zero within variance are: s050
fit.strong_noInfl1 <- cfa(model=m.strong,data=dailyDiary[dailyDiary$ID!="s093",],cluster="ID",std.lv=TRUE)
## Warning in lav_data_full(data = data, group = group, cluster = cluster, : lavaan WARNING:
##     Level-1 variable "stress" has no variance within some clusters.
##     The cluster ids with zero within variance are: s050

## Warning in lav_data_full(data = data, group = group, cluster = cluster, : lavaan WARNING:
##     Level-1 variable "worry" has no variance within some clusters. The
##     cluster ids with zero within variance are: s050


Comments:

  • no more Heywood cases are highlighted from the models specified without participant s095


2.2.3. Model comparison

Here, we compare the models specified above by inspecting their fit indices and information criteria. The fitInd function is used to optimize the process.

show fitInd

fitInd <- function(model=NA,from_summary=FALSE,type="multilevel",models.names=NA,
                    fits=c("npar","chisq","df","rmsea","cfi","srmr_within","srmr_between")){ 
  require(lavaan); require(MuMIn)
  # removing level-specific fit indices when model is "monolevel"
  if(type=="monolevel"){
      fits <- gsub("srmr_within","srmr",fits)
      fits <- fits[fits!="srmr_between"] }
  if(from_summary==FALSE){
    # returning dataframe of models fit indices when more than one model is considered
    if(length(model)>1){
      fit.indices <- fitmeasures(model[[1]])[fits]
      for(i in 2:length(model)){
        m <- model[[i]]
        fit.indices <- rbind(fit.indices,fitmeasures(m)[fits]) }
      if(!is.na(models.names[1])){
        row.names(fit.indices) <- models.names }
      return(as.data.frame(fit.indices))
      } else { return(fitmeasures(model)[fits]) }
    
    } else { # in some cases the fit indices are available only from the model's summary 
      quiet <- function(fit) { # this was written by Alicia FRANCO MARTÍNEZ on the lavaan Google group
        sink(tempfile())
        on.exit(sink()) 
        invisible(summary(fit, standardized = TRUE, fit.measures=TRUE))
        } 
      sum <- quiet(model)
      fit.indices <- sum$FIT[fits]
      return(fit.indices)}}


2.2.3.1. Benchmark models

First, we evaluate the fit of the benchmark models specified for examining the factor structure at level 2.

# fit indices
round(fitInd(model=list(fit.null,fit.ind,fit.sat),
              models.names=c("Lv2 Null","Lv2 Independence","Lv2 Saturated")),3)
# AIC also considering the configural model
Weights(AIC(fit.null,fit.ind,fit.sat,fit.conf))
##  model weights 
## [1] 0.0 0.0 0.5 0.5
# BIC also considering the configural model
Weights(BIC(fit.null,fit.ind,fit.sat,fit.conf))
##  model weights 
## [1] 0.0 0.0 0.5 0.5


Comments:

  • all benchmark models converged with no problems

  • lv2 Null and Independence models show unsatisfactory goodness of fit, and are rejected, suggesting that there is some interesting structure also at level 2

  • lv2 Saturated model is saturated, so we cannot use chi-squared-derived fit indices, and we cannot compare it with the configural model (which is also saturated)


2.2.3.2. Target models

Second, we evaluate the target models by keeping in mind that the configural model is saturated and show an Heywood case for worry at level 2.

cbind(round(fitInd(model=c(fit.conf,fit.weak,fit.strong),
                   models.names=c("Configural","Weak Invariance","Strong Invariance")),3),
      AICw=round(Weights(AIC(fit.conf,fit.weak,fit.strong)),3),
      BICw=round(Weights(BIC(fit.conf,fit.weak,fit.strong)),3))


Comments:

  • the configural model is saturated, so we cannot evaluate its fit indices, but it shows the highest AICw and BIC

  • the weak invariance model shows acceptable RMSEA, CFI, and SRMR within, but excessively high SRMR between, suggesting a poor fit at level 2

  • the strong invariance model is rejected


2.2.3.3. Fixed variance

Then, we replicate the model comparison by considering the configural and weak invariance models specified by constraining the level-2 residual variance for item worry to the 15% of its total variance at level 2.

cbind(round(fitInd(model=c(fit.conf.fix,fit.weak.fix,fit.strong),
                   models.names=c("Configural","Weak Invariance","Strong Invariance")),3),
      AICw=round(Weights(AIC(fit.conf.fix,fit.weak.fix,fit.strong)),3),
      BICw=round(Weights(BIC(fit.conf.fix,fit.weak.fix,fit.strong)),3))


Comments:

  • the configural model shows the best fit indices and the highest AICw and BIC

  • the weak invariance model shows acceptable RMSEA, CFI, and SRMR within, but excessively high SRMR between, suggesting a poor fit at level 2

  • the strong invariance model is rejected


2.2.3.4. Excluding s095

Finally, we replicate the model comparison on the models specified without participant s095, associated with the lowest (negative) estimate for the level-2 residual variance of item worry.

cbind(round(fitInd(model=c(fit.conf_noInfl1,fit.weak_noInfl1,fit.strong_noInfl1),
                   models.names=c("Configural","Weak Invariance","Strong Invariance")),3),
      AICw=round(Weights(AIC(fit.conf_noInfl1,fit.weak_noInfl1,fit.strong_noInfl1)),3),
      BICw=round(Weights(BIC(fit.conf_noInfl1,fit.weak_noInfl1,fit.strong_noInfl1)),3))


Comments:

  • the configural model is saturated, so we cannot evaluate its fit indices, but it shows the highest AICw and BIC

  • the weak invariance model shows acceptable RMSEA, CFI, and SRMR within, but excessively high SRMR between, suggesting a poor fit at level 2

  • the strong invariance model is rejected


2.2.3.5. Conclusions

  • In all model comparisons, the configural invariance model consistently showed the strongest evidence in terms of AIC and BIC, and the best fit indices when level-2 residual variance for item worry is constrained to the 15% of its total variance at level 2 (otherwise the model is saturated, and shows 1 Heywood case).

  • In contrast, the weak invariance model is consistently associated with satisfactory fit at level 1 but not at level 2, as indicated by the unsatisfactory SRMR coefficient at the between level.

  • The strong invariance model is consistently rejected due to the unsatisfactory fit, suggesting that different factors might influence item responses at level 2.


Thus, the configural invariance model is selected as the best measurement model describing the factor structure of the Psychological distress scale.


2.2.4. Loadings

Here, we inspect and compare the standardized factor loadings estimated by the models showing the best fit indices: the configural model (i.e., we consider both its basic fit, and the conf.fix and conf.noInfl1 versions).

# fit.conf
(p <- cbind(standardizedsolution(fit.conf)[standardizedsolution(fit.conf)$op=="=~",1:5],
      standardizedsolution(fit.conf.fix)[standardizedsolution(fit.conf)$op=="=~",4:5],
      standardizedsolution(fit.conf_noInfl1)[standardizedsolution(fit.conf)$op=="=~",4:5]))


Comments:

  • standardized parameters range from 0.54 to 0.99 in all models, suggesting adequate factorial validity

  • the model with level-2 worry residual variance constrained to the 15% of its total variance shows fit indices from 0.54 to 0.96, with estimates very close to those estimated by the model without participant s095


2.2.4. Factor scores

Finally, we inspect the factor scores predicted at the within and between level by the selected model (i.e., configural model), and we compare the factor scores predicted by fit.conf.fix (in blue) and fit.conf.noInfl1 (in red). Factor scores are then included in the ema dataset, to be used in the following analyses.

# level 1
hist(lavPredict(fit.conf.fix,level=1),col=rgb(0,0,1,alpha=0.5),breaks=100,main="predicted factor scores lv1")
hist(lavPredict(fit.conf_noInfl1,level=1),add=TRUE,col=rgb(1,0,0,alpha=0.5),breaks=100) 

# level 2
hist(lavPredict(fit.conf.fix,level=2),col=rgb(0,0,1,alpha=0.5),breaks=10,main="predicted factor scores lv2")
hist(lavPredict(fit.conf_noInfl1,level=2),add=TRUE,col=rgb(1,0,0,alpha=0.5),breaks=10) 

# including fit.conf.fix factor scores in the ema dataset (fs)
ema[!is.na(ema$StartedTime),"fs.w"] <- as.numeric(lavPredict(fit.conf.fix,level=1))
demos$fs.b <- as.numeric(lavPredict(fit.conf.fix,level=2))
ema <- plyr::join(ema,demos[,c("ID","fs.b")],type="left",by="ID")

# computing sum of factor scores (cluster mean + within-cluster deviations)
ema$fs <- ema$fs.b + ema$fs.w


Comments:

  • factor scores are very similar between the two models at level 1, but more differences are highlighted at level 2

  • only the factor scores predicted by the fit.conf.fix model (i.e., with level-2 residual variance for worry being constrained to the 15% of its total variance at level 2) are included in the ema dataset


2.3. Reliability

Here, we compute reliability coefficients at both the within and the between level, both based on the MCFA results (following Geldhof, 2014) and based on the generalizability theory (following Cranford et al., 2006). The MCFArel and GTHEORYrel functions from https://github.com/Luca-Menghini/LuMenPsy.Rfunctions are used to optimize the analyses:

show MCFArel

#' @title Computing composite reliability index from a MCFA model
#' @param fit = multilevel CFA model.
#' @param level = level of interest (either 1 or 2)
#' @param items = numeric string indicating the items of interest (e.g., 1:3 for items 1, 2 and 3)
#' @param item.labels = character string indicating the names of the items of interest
MCFArel <- function(fit,level,items,item.labels){ require(lavaan)
  if(level==1){ 
    sl <- standardizedsolution(fit)[1:(nrow(standardizedSolution(fit))/2),] # pars within
  } else if(level==2){ 
    sl <- standardizedsolution(fit)[(nrow(standardizedSolution(fit))/2):nrow(standardizedsolution(fit)),] # pars between
  } else { stop("Error: level can be either 1 or 2") }
  sl <- sl$est.std[sl$op == "=~"][items] # standardized loadings of the selected items
  names(sl) <- item.labels # item names
  re <- 1 - sl^2 # residual variances of items
  
  omega <- sum(sl)^2 / (sum(sl)^2 + sum(re)) # composite reliability index
  return(round(omega,2))}

show GTHEORYrel

#' @title Computing variance components and reliability indices of a given scale 
#' @param data = dataset
#' @param items = numeric string indicating the items of interest (e.g., 1:3 for items 1, 2 and 3)
#' @param item.labels = character string indicating the names of the items of interest
GTHEORYrel <- function(data,items,latent.lab,what=c("varComp","rel")){ require(lme4)
  
  # creating variable TIME
  data <- plyr::ddply(data,"ID",transform,TIME=seq_along(ID))

  # preparing long dataset with one row per item
  psymetr <- stack(data[,items])
  psymetr$ID <- data$ID
  psymetr$time <- as.factor(data$TIME)
  psymetr <- psymetr[,c(3,4,2,1)]
  colnames(psymetr) <- c("person","time","item","y")
  psymetr <- psymetr[order(psymetr$person,psymetr$time,psymetr$item),]
  
  # random-only GLMM specificiation
  mod1 <- lmer(y ~  1 + (1|person) + (1|time) + (1|item) + 
                 (1|person:time) + (1|person:item) + (1|time:item),data=psymetr)
  
  # variance decomposition
  SIGMAp <- lme4::VarCorr(mod1)[["person"]][1,1]
  SIGMAt <- lme4::VarCorr(mod1)[["time"]][1,1]
  SIGMAi <- lme4::VarCorr(mod1)[["item"]][1,1]
  SIGMAtp <- lme4::VarCorr(mod1)[["person:time"]][1,1]
  SIGMApi <- lme4::VarCorr(mod1)[["person:item"]][1,1]
  SIGMAti <- lme4::VarCorr(mod1)[["time:item"]][1,1]
  SIGMAres <- sigma(mod1)^2
  
  # printing variance components
  vars <- data.frame(Component=c("SIGMAp","SIGMAt","SIGMAi","SIGMAtp","SIGMApi","SIGMAti","SIGMAres","Total"),
                     VAR=c(SIGMAp,SIGMAt,SIGMAi,SIGMAtp,SIGMApi,SIGMAti,SIGMAres,
                           sum(SIGMAp,SIGMAt,SIGMAi,SIGMAtp,SIGMApi,SIGMAti,SIGMAres)))
  vars$VAR <- round(vars$VAR,2)
  vars$perc <- round(100*vars$VAR/vars[nrow(vars),2],2)
  colnames(vars)[2:3] <- c(latent.lab,paste(latent.lab,"%"))
  
  if(what=="varComp"){ return(vars)
    
  }else if(what=="rel"){ # reliability coeff. based on Cranfort et al. (2006)
    rel <- data.frame(measure=latent.lab,
                      # R1F = (varPERSON + varPERSON*ITEM/n.item) / (varPERSON + varPERSON*ITEM/n.item + varERROR/n.item)
                      R1F = (vars[1,2] + vars[5,2]/length(items)) / (vars[1,2] + vars[5,2]/length(items) + vars[7,2]/length(items)),
                      # RkF = (varPERSON + varPERSON*ITEM/n.item) / (varPERSON + varPERSON*ITEM/n.item + varERROR/(n.item*n.occasions))
                      RkF = (vars[1,2] + vars[5,2]/length(items))/ (vars[1,2] + vars[5,2]/length(items) +
                                                                      vars[7,2]/(length(items)*max(data$TIME))),
                      # Rc = varPERSON*TIME / (varPERSON*TIME + varERROR/n.items)
                      Rc = vars[4,2] / (vars[4,2] + vars[7,2]/length(items)))
    rel[,2:ncol(rel)] <- round(rel[,2:ncol(rel)],2)
    return(rel)
    }else { stop("Error: what argument can be either 'varComp' or 'rel'") }}


Level-specific reliability indices are computed from the configural invariance model fitted on the whole dataset, but with the level-2 residual variance of item worry being constrained to the 15% of its total variance at level 2.

# MCFA-based reliability
(omega <- data.frame(measure="Psychological distress",
                     omega_w=MCFArel(fit=fit.conf.fix,level=1,items=1:3,
                                     item.labels=c("stress","worry","mood")),
                     omega_b=MCFArel(fit=fit.conf.fix,level=2,items=1:3,
                                     item.labels=c("stress","worry","mood"))))
# G-Theory-based reliability
GTHEORYrel(data=dailyDiary,items=c(which(colnames(dailyDiary)=="stress"),
                                   which(colnames(dailyDiary)=="worry"),
                                   which(colnames(dailyDiary)=="mood")),
           latent.lab="Psychological distress",what="rel")


Comments:

  • overall, MCFA-based reliability indices are quite satisfactory (> .65) although not optimal

  • G-Theory-based reliability indices show similar results, with relatively low although acceptable between-participants reliability (R1F) and sensitivity to change (Rc)


2.4. Overall conclusion

Although the Psychological Distress scale presents some problems (i.e., Heywood case in the configural model, high SRMR between in the weak invariance model suggesting different factor loadings across levels, acceptable but not optimal reliability within-subject), it might work sufficiently well as a continuous measure of psychological distress.


2.5. Aggregate score

Here, we compute the aggregate PsyDist score by averaging the three item scores.

# computing aggregated score
ema$PsyDist <- apply(ema[,c("stress","worry","mood")],1,mean)

# sorting columns
AD <- which(colnames(ema)=="mood")
ema <- ema[,c(1:AD,ncol(ema),ncol(ema)-2,ncol(ema)-1,(AD+1):(ncol(ema)-3))]

# plotting single items vs. aggregate score
grid.arrange(ggplot(ema,aes(stress))+geom_histogram()+ggtitle("stress"),
             ggplot(ema,aes(worry))+geom_histogram()+ggtitle("worry"),
             ggplot(ema,aes(mood))+geom_histogram()+ggtitle("mood"),
             ggplot(ema,aes(PsyDist))+geom_histogram(fill="red")+ggtitle("PsyDist"),nrow=2)

# plotting aggregate score vs. aggregate factor score
ggplot(ema,aes(PsyDist)) + geom_histogram(binwidth=0.1) + 
  geom_histogram(aes(fs,binwidth=0.1),fill=rgb(1,0,0,alpha=0.5)) +
  ggtitle("PsyDist: Observed aggregated scores (black) vs. Aggregated factor scores (red)")


3. Descriptives

Here, we summarize and visual inspect each considered variable.


3.1. Sample description

First, we summarize the categorical and continuous demographic variables in the demos dataset: sex, insomnia, and insomnia.group.

# summarizing demographics
summary(demos[,c("sex","insomnia","insomnia.group")])
##  sex    insomnia insomnia.group
##  F:59   0:46     control:46    
##  M:34   1:47     DSM.ins:26    
##                  sub.ins:21
# sex by insomnia
table(demos[,c("sex","insomnia")])
##    insomnia
## sex  0  1
##   F 28 31
##   M 18 16
# sex by insomnia.group
table(demos[,c("sex","insomnia.group")])
##    insomnia.group
## sex control DSM.ins sub.ins
##   F      28      20      11
##   M      18       6      10
# age & BMI
cat("age mean =",mean(demos$age)," sd =",sd(demos$age)," min =",min(demos$age)," max =",max(demos$age),
    "\nBMI mean =",mean(demos$BMI)," sd =",sd(demos$BMI)," min =",min(demos$BMI)," max =",max(demos$BMI))
## age mean = 17.91859  sd = 0.9496819  min = 15.81404  max = 19.69924 
## BMI mean = 21.81638  sd = 3.209905  min = 16.44353  max = 34.66943


Comments:

  • the sample includes 93 adolescents aged from 15.81 to 19.70 years (mean age = 17.91, SD = 0.95; 59 girls, 63%; mean BMI = 21.82, SD = 3.21)

  • 47 adolescents (31 girls; 67%) are in the insomnia group, whereas 46 (28 girls; 61%) are in the control group

  • 26 adolescents (20 girls; 77%) met full DSM-5 criteria for insomnia disorder, whereas 21 adolescents (10 girls; 48%) met all the DSM-5 criteria for insomnia disorder except for weekly frequency


Then, we report the data collection period as indexed by the date of the first and last data point.

# data collection beginning and end
cat("Start:",as.character(min(ema$ActivityDate))," Stop:",as.character(max(ema$ActivityDate)))
## Start: 2019-01-07  Stop: 2021-04-30


3.2. Compliance & response rate

Here, we describe the participants’ compliance and response rate for each class of variables.

# computing response rate and compliance
compliance <- demos[,c("ID","insomnia")]
compVars <- c("TIB","rem.p","HR_REM","TotalSteps","stress")
for(i in 1:nrow(compliance)){ IDdata <- ema[ema$ID==as.character(compliance[i,"ID"]),]
  for(Var in compVars){ colnames(IDdata)[which(colnames(IDdata)==Var)] <- "Var"
    IDdata.TIB <- IDdata[!is.na(IDdata$TIB),]
    IDdata.DD <- IDdata[!is.na(IDdata$stress),]
    compliance[i,paste(Var,"No",sep=".")] <- nrow(IDdata[!is.na(IDdata$Var),]) # No. of valid data entries per participant
    compliance[i,paste(Var,"RR",sep=".")] <- 100*nrow(IDdata[!is.na(IDdata$Var),])/60 # response rate over 60 days
    compliance[i,paste(Var,"RR.sleep",sep=".")] <- 100*nrow(IDdata.TIB[!is.na(IDdata.TIB$Var),])/nrow(IDdata.TIB) # RR over sleep data
    compliance[i,paste(Var,"RR.diary",sep=".")] <- 100*nrow(IDdata.DD[!is.na(IDdata.DD$Var),])/nrow(IDdata.DD) # RR over diary ratings
    colnames(IDdata)[which(colnames(IDdata)=="Var")] <- Var }}

# adjusting compliance (100% when = or > 60)
compliance$TIB.RR.sleep <- compliance$stress.RR.diary <- 100
for(Var in paste(compVars,"No",sep=".")){ colnames(compliance)[which(colnames(compliance)==Var)] <- "Var"
  if(any(compliance$Var>60)){
    cat("\n\n",nrow(compliance[compliance$Var>60,]),"participants with more than 60 days of",gsub("No","",Var),"(from",
        min(compliance[compliance$Var>60,"Var"]),"to",max(compliance[compliance$Var>60,"Var"]),")")
    compliance[compliance$Var>60,gsub("No","RR",Var)] <- 100 } # fixing response rate to 100
   colnames(compliance)[which(colnames(compliance)=="Var")] <- Var }
## 
## 
##  44 participants with more than 60 days of TIB. (from 61 to 85 )
## 
##  13 participants with more than 60 days of rem.p. (from 61 to 78 )
## 
##  12 participants with more than 60 days of HR_REM. (from 61 to 63 )
## 
##  28 participants with more than 60 days of TotalSteps. (from 61 to 78 )
## 
##  22 participants with more than 60 days of stress. (from 61 to 82 )
# computing compliance info relative to sleep.archit
TIBok <- ema[!is.na(ema$TIB),]
cat(nrow(TIBok),"valid sleep periods:\n",
    " - mean No. per participants =",round(mean(compliance$TIB.No),1),"SD =",round(sd(compliance$TIB.No),1),
    "mean response rate =",round(mean(compliance$TIB.RR),1),"% SD =",round(sd(compliance$TIB.RR),1),"%")
## 5121 valid sleep periods:
##   - mean No. per participants = 55.1 SD = 14 mean response rate = 88.5 % SD = 20.8 %
for(Var in compVars){ colnames(TIBok)[which(colnames(TIBok)==Var)] <- "Var"
  colnames(compliance)[which(colnames(compliance)==paste(Var,"RR.sleep",sep="."))] <- "Var"
  cat("\n - ",nrow(TIBok[!is.na(TIBok$Var),]),"also including",Var,"(",
      round(100*nrow(TIBok[!is.na(TIBok$Var),])/nrow(TIBok),1),"%, mean =",
      round(mean(compliance$Var),1),"% SD =",round(sd(compliance$Var),1),"% )")
  colnames(TIBok)[which(colnames(TIBok)=="Var")] <- Var
  colnames(compliance)[which(colnames(compliance)=="Var")] <- paste(Var,"RR.sleep",sep=".") }
## 
##  -  5121 also including TIB ( 100 %, mean = 100 % SD = 0 % )
##  -  4401 also including rem.p ( 85.9 %, mean = 86 % SD = 15 % )
##  -  4352 also including HR_REM ( 85 %, mean = 85.1 % SD = 15 % )
##  -  4310 also including TotalSteps ( 84.2 %, mean = 82.8 % SD = 20 % )
##  -  4333 also including stress ( 84.6 %, mean = 83.7 % SD = 14.3 % )
# computing compliance info relative to diary ratings
DIARYok <- ema[!is.na(ema$stress),]
cat(nrow(DIARYok),"valid diary entries:\n",
    " - mean No. per participants =",round(mean(compliance$stress.No),1),"SD =",round(sd(compliance$stress.No),1),
    "mean response rate =",round(mean(compliance$stress.RR),1),"% SD =",round(sd(compliance$stress.RR),1),"%")
## 4932 valid diary entries:
##   - mean No. per participants = 53 SD = 10.2 mean response rate = 87.2 % SD = 15.5 %
for(Var in compVars){ colnames(DIARYok)[which(colnames(DIARYok)==Var)] <- "Var"
  colnames(compliance)[which(colnames(compliance)==paste(Var,"RR.diary",sep="."))] <- "Var"
  cat("\n - ",nrow(DIARYok[!is.na(DIARYok$Var),]),"also including",Var,"(",
      round(100*nrow(DIARYok[!is.na(DIARYok$Var),])/nrow(DIARYok),1),"%, mean =",
      round(mean(compliance$Var),1),"% SD =",round(sd(compliance$Var),1),"% )")
  colnames(DIARYok)[which(colnames(DIARYok)=="Var")] <- Var
  colnames(compliance)[which(colnames(compliance)=="Var")] <- paste(Var,"RR.diary",sep=".") }
## 
##  -  4333 also including TIB ( 87.9 %, mean = 86.4 % SD = 18.8 % )
##  -  3741 also including rem.p ( 75.9 %, mean = 74.9 % SD = 21.3 % )
##  -  3713 also including HR_REM ( 75.3 %, mean = 74.3 % SD = 21.3 % )
##  -  3949 also including TotalSteps ( 80.1 %, mean = 79.3 % SD = 23.6 % )
##  -  4932 also including stress ( 100 %, mean = 100 % SD = 0 % )
# No. valid diary but not sleep data
cat(nrow(ema[!is.na(ema$stress) & is.na(ema$TIB),]),"cases with valid diary entry but missing sleep data (",
    round(100*nrow(ema[!is.na(ema$stress) & is.na(ema$TIB),])/nrow(ema[!is.na(ema$stress),]),1),"% )")
## 599 cases with valid diary entry but missing sleep data ( 12.1 % )


Comments:

  • a total of 5,121 valid sleep periods were obtained from the EMA protocol, corresponding to an average compliance of 88.5 ± 20.8% (note that compliance was rounded at 100% for those participant with more than 60 days in any study variable)

  • among the valid sleep periods, 85.9% were associated with valid sleep staging information, 84.5% with valid NREM/REM HR data, 84.2% with valid daily steps data, and 84.6% with valid daily diary data

  • compliance with the diary protocol was equal to 87.2 ± 15.5% (note that compliance was rounded at 100% for those participant with more than 60 days in any study variable), with 599 cases of valid diary data without valid sleep data).


Then, we mark seven participants with extremely low No. of responses, that is with less than 14 days of valid observations for any of the data types.

# marking cases with less than 14 observations in any focal variable
compliance$majMiss <- 0
compliance[compliance$TIB.No<14 | compliance$rem.p.No<14 | compliance$HR_REM.No<14 | 
             compliance$TotalSteps.No<14 | compliance$stress.No < 14,"majMiss"] <- 1
ema <- plyr::join(ema,compliance[,c("ID","majMiss")],by="ID",type="left")

# showing compliance: majMiss
compliance[compliance$majMiss==1,c(1,2,seq(3,20,by=4))]


3.3. Descriptive statistics

Here, we use the multidesc function for computing the descriptive statistics (mean, SD and ICC) of level-1 (daily observations) and level-2 variables (demographics), as well as the mean and SD for each group.

show multidesc

multidesc <- function(long,wide,cluster="S.ID",lv1,lv2,group,group.labels=NA){ require(Rmisc); require(lme4)
  
  out <- data.frame(Measure=lv1[1],
                    N=summarySE(long,lv1[1],na.rm=TRUE)[,2],
                    Mean=paste(round(summarySE(long,lv1[1],na.rm=TRUE)[,3],2)," (",
                               round(summarySE(long,lv1[1],na.rm=TRUE)[,4],2),")",sep=""))
  if(!is.na(group)){
    colnames(long)[which(colnames(long)==group)] <- colnames(wide)[which(colnames(wide)==group)] <- "group"
    long$group <- as.factor(as.character(long$group))
    wide$group <- as.factor(as.character(wide$group))
    groups <- levels(long$group)
    out$Mean1 = paste(round(summarySE(long[long$group==groups[1],],lv1[1],na.rm=TRUE)[,3],2)," (",
                      round(summarySE(long[long$group==groups[1],],lv1[1],na.rm=TRUE)[,4],2),")",sep="")
    out$Mean2 = paste(round(summarySE(long[long$group==groups[2],],lv1[1],na.rm=TRUE)[,3],2)," (",
                      round(summarySE(long[long$group==groups[2],],lv1[1],na.rm=TRUE)[,4],2),")",sep="") }
  
  for(i in 2:length(lv1)){
    if(!is.na(group)){
      out <- rbind(out,
                 data.frame(Measure=lv1[i],
                            N=summarySE(long,lv1[i],na.rm=TRUE)[,2],
                            Mean=paste(round(summarySE(long,lv1[i],na.rm=TRUE)[,3],2)," (",
                                       round(summarySE(long,lv1[i],na.rm=TRUE)[,4],2),")",sep=""),
                            Mean1=paste(round(summarySE(long[long$group==groups[1],],lv1[i],na.rm=TRUE)[,3],2)," (",
                                        round(summarySE(long[long$group==groups[1],],lv1[i],na.rm=TRUE)[,4],2),")",sep=""),
                            Mean2=paste(round(summarySE(long[long$group==groups[2],],lv1[i],na.rm=TRUE)[,3],2)," (",
                                        round(summarySE(long[long$group==groups[2],],lv1[i],na.rm=TRUE)[,4],2),")",sep="")))
    } else {
      out <- rbind(out,
                 data.frame(Measure=lv1[i],
                            N=summarySE(long,lv1[i],na.rm=TRUE)[,2],
                            Mean=paste(round(summarySE(long,lv1[i],na.rm=TRUE)[,3],2)," (",
                                       round(summarySE(long,lv1[i],na.rm=TRUE)[,4],2),")",sep=""))) }}
  
  # demos data
  for(i in 1:length(lv2)){
    if(!is.na(group)){
      out <- rbind(out,
                   data.frame(Measure=lv2[i],
                              N=summarySE(wide,lv2[i],na.rm=TRUE)[,2],
                              Mean=paste(round(summarySE(wide,lv2[i],na.rm=TRUE)[,3],2)," (",
                                         round(summarySE(wide,lv2[i],na.rm=TRUE)[,4],2),")",sep=""),
                              Mean1=paste(round(summarySE(wide[wide$group==groups[1],],lv2[i],na.rm=TRUE)[,3],2)," (",
                                          round(summarySE(wide[wide$group==groups[1],],lv2[i],na.rm=TRUE)[,4],2),")",sep=""),
                              Mean2=paste(round(summarySE(wide[wide$group==groups[2],],lv2[i],na.rm=TRUE)[,3],2)," (",
                                          round(summarySE(wide[wide$group==groups[2],],lv2[i],na.rm=TRUE)[,4],2),")",sep="")))
    } else {
      out <- rbind(out,
                 data.frame(Measure=lv2[i],
                            N=summarySE(wide,lv2[i],na.rm=TRUE)[,2],
                            Mean=paste(round(summarySE(wide,lv2[i],na.rm=TRUE)[,3],2)," (",
                                       round(summarySE(wide,lv2[i],na.rm=TRUE)[,4],2),")",sep=""))) }}
  
  # ICC
  out$ICC <- NA
  for(i in 1:length(lv1)){
    m <- lmer(formula=gsub("var",lv1[i],gsub("ID",cluster,"var~(1|ID)")),data=long) # VAR_between / (VAR_between + VAR_within)
    out[out$Measure==lv1[i],"ICC"] <- round(as.data.frame(lme4::VarCorr(m))[1,4]/
                                              (as.data.frame(lme4::VarCorr(m))[1,4]+as.data.frame(lme4::VarCorr(m))[2,4]),2)}
  rownames(out) <- gsub(".cm","",rownames(out))
  if(!is.na(group)){ if(!is.na(group.labels)){
    colnames(out)[c(which(colnames(out)=="Mean1"),which(colnames(out)=="Mean2"))] <- group.labels 
    } else{ colnames(out)[c(which(colnames(out)=="Mean1"),which(colnames(out)=="Mean2"))] <-  levels(long$group)  }}
  return(out)}

# selecting lv1 continuous variables
lv1 <- c("TIB","TST","WASO","SE","light.p","deep.p","rem.p", # sleep architecture
         "SO.num","WakeUp.num", # sleep timing
         "HR_NREM","HR_REM", # sleep autonomic functioning
         "TotalSteps1000", # steps
         "stress","worry","mood") # self-report

# selecting lv2 continuous variables
lv2 <- c("age","BMI")

# computing descriptive statistics
desc <- multidesc(long=ema,wide=demos,cluster="ID",lv1=lv1,lv2=lv2,
                  group="insomnia",group.labels=c(paste("Control (N =",nrow(demos[demos$insomnia=="0",]),")",sep=""),
                                                  paste("Insomnia (N = ",nrow(demos[demos$insomnia=="1",]),")",sep="")))
knitr::kable(desc)
Measure N Mean Control (N =46) Insomnia (N = 47) ICC
TIB 5121 468.52 (92.19) 466.09 (96.68) 470.82 (87.7) 0.18
TST 5121 414.27 (80.87) 413.04 (84.71) 415.43 (77.07) 0.17
WASO 5121 48.19 (23.97) 46.98 (22.66) 49.33 (25.09) 0.20
SE 5121 88.55 (4.05) 88.74 (3.77) 88.38 (4.29) 0.22
light.p 4401 58.77 (8.24) 57.7 (8.01) 59.9 (8.33) 0.31
deep.p 4401 19.8 (5.25) 20.08 (5.15) 19.51 (5.33) 0.24
rem.p 4401 21.43 (6.17) 22.22 (6.32) 20.6 (5.9) 0.41
SO.num 5121 0.52 (1.63) 0.51 (1.68) 0.54 (1.58) 0.47
WakeUp.num 5121 8.23 (1.62) 8.17 (1.63) 8.28 (1.61) 0.31
HR_NREM 4320 60.04 (7.85) 61.44 (8.58) 58.58 (6.7) 0.76
HR_REM 4352 62.06 (7.71) 63.48 (8.5) 60.58 (6.45) 0.74
TotalSteps1000 4664 8.14 (5.16) 8.89 (5.61) 7.45 (4.6) 0.42
stress 4932 2.21 (1.08) 2.16 (1.02) 2.27 (1.13) 0.26
worry 4932 2.25 (1.09) 2.21 (1.04) 2.27 (1.13) 0.33
mood 4932 2.39 (1.03) 2.49 (0.98) 2.3 (1.07) 0.25
age 93 17.92 (0.95) 18.14 (0.94) 17.7 (0.92) NA
BMI 93 21.82 (3.21) 22.34 (3.7) 21.3 (2.58) NA


Comments:

  • the sample is characterized by an average TIB of about 7.81 +- 1.54 hours, of which about seven hours of TST (mean SE around 88%)

  • similar summary statistics are shown by the two groups

  • ICCs vary between .17 and .75, with sleep-related HR showing the highest values, whereas none of the remaining variables shows ICC > .50, suggesting that most of the variance is at the within-individual level


3.4. Correlations

Here, we use the multicorr function from https://github.com/Luca-Menghini/LuMenPsy.Rfunctions to compute the inter-individual (shown below the main diagonal) and the intra-individual correlations (shown above the main diagonal) between the focal continuous variables.

show multicorr

multicorr <- function(long,wide,lv1,lv2,cluster="ID"){ require(Hmisc); require(Rmisc)
  
  colnames(long)[which(colnames(long)==cluster)] <- "ID"
  
  # individual means (lv2) of lv1 variables
  for(VAR in lv1){
    wide <- cbind(wide,newVar=summarySE(long,VAR,"ID",na.rm=TRUE)[,3])
    colnames(wide)[which(colnames(wide)=="newVar")] <- paste(VAR,".cm",sep="") }
  
  # joining individual means (lv2) to the long dataset
  long <- plyr::join(long,
                     wide[,c(which(colnames(wide)==cluster),
                             which(colnames(wide)==paste(lv1[1],
                                                         ".cm",sep="")):which(colnames(wide)==paste(lv1[length(lv1)],
                                                                                                    ".cm",sep="")))],
                     type="left",by="ID")
  
  # mean-centered (lv1) values
  for(VAR in lv1){
    long$newVar <- long[,VAR] - long[,paste(VAR,".cm",sep="")]
    colnames(long)[which(colnames(long)=="newVar")] <- paste(VAR,".dm",sep="") }
  
  # between-subjects correlations (all variables)
  out.b <- rcorr(as.matrix(wide[,c(paste(lv1,".cm",sep=""),lv2)]), type = "pearson")
  rb <- round(out.b$r,2)
  rb[lower.tri(rb)] <- NA
  
  # within-participant correlations (HRV and ESM deviations from individual mean)
  out.w <- rcorr(as.matrix(long[,paste(lv1,".dm",sep="")]), type = "pearson")
  rw <- round(out.w$r,2)
  rw[upper.tri(rw)] <- NA
  
  # filling rb empty cells
  rb[1:length(lv1),1:length(lv1)][lower.tri(rb[1:length(lv1),1:length(lv1)])] <- rw[lower.tri(rw)]
  
  rownames(rb) <- gsub(".cm","",rownames(rb))
  
  return(t(rb))}

desc <- cbind(desc,
              multicorr(long=ema,wide=demos,cluster="ID",lv1=lv1,lv2=lv2))
rownames(desc) <- desc$Measure
desc$Measure <- NULL
write.csv(desc,"RESULTS/descriptives.csv") # saving result
desc


Correlations are also plotted below (blue = negative, white = uncorrelated, red = positive):

ggplot(melt(as.matrix(desc[,6:ncol(desc)])),aes(x=Var1, y=Var2, fill=value)) + 
    geom_tile() + 
    geom_text(aes(x = Var1, y = Var2, label = round(value,2)),color="black",size=3.5)+
    ggtitle("Correlation Matrix (above = within-individual, below = between-individuals)")+labs(x="",y="")+
    scale_fill_gradient2(low="darkblue",high="#f03b20",mid="white",
                         midpoint=0,limit = c(-1,1), space = "Lab",
                         name="Pearson\nCorrelation",guide="legend",
                         breaks=round(seq(1,-1,length.out = 11),2),
                         minor_breaks=round(seq(1,-1,length.out = 11),2))+
    theme(text=element_text(face="bold",size=10),legend.position="none",
          axis.text.x=element_text(angle=30))


3.5. Univariate distributions

Here, we visualize the univariate distributions of the focal variables, for each class of variables.


3.5.1. Sleep architecture

Variables describing sleep architecture include TIB (min), TST (min), WASO (min), SE (%), light (%), deep (%), and rem (%).

# selecting variables
s.archit <- c("TIB","TST","WASO","SE","light.p","deep.p","rem.p")

# plotting
par(mfrow=c(3,3))
for(Var in s.archit){ hist(ema[,Var],main=Var,xlab="",breaks=30) }

# outliers with WASO > 150 min (23 cases)
nrow(ema[!is.na(ema$TIB) & ema$WASO>150,])
## [1] 23
# cases with WASO = 0 (SE = 100%) (8 cases)
nrow(ema[!is.na(ema$TIB) & ema$WASO==0,])
## [1] 8


Comments:

  • s.archit variables are quite normally distributed

  • some univariate outliers can be highlighted in both WASO (23 cases (0.4%) with WASO > 150 min) and SE (14 cases (0.3%) with SE < 70%)

  • in a few cases (8), WASO is equal to zero, with a SE of 100%


3.5.2. Sleep timing

Sleep timing variables include SO (hours since 00:00) and WakeUp (hours since 00:00).

# selecting variables
s.timing <- c("SO.num","WakeUp.num")

# plotting
par(mfrow=c(1,2))
for(Var in s.timing){ hist(ema[,Var],main=Var,xlab="",breaks=30) }


Comments:

  • s.timing variables are quite normally distributed, with only a few outliers for WakeUp.num


3.5.3. Daily variability

Night-to-night variability in sleep measures is operationalized as the squared successive differences (SSD) between consecutive nights in the main s.archit (i.e., TIB, TST, and WASO) and s.timing variables (i.e., SO.num and WakeUp.num).

# selecting s.archit and s.timing variables
sleepVars <- c(s.archit[1:3],s.timing)

# computing squared successive differences (SSD)
for(Var in sleepVars){
  for(i in 2:nrow(ema)){
    if(!is.na(ema[i,Var]) & !is.na(ema[i-1,Var])){ # non-missing current and previous day
      if(ema[i,"ID"] == ema[i-1,"ID"] & ema[i,"dayNr"] == ema[i-1,"dayNr"] + 1){ # same ID and consecutive day
        ema[i,paste(Var,"SSD",sep=".")] <- (ema[i,Var] - ema[i-1,Var])^2 }}}}

# plotting
par(mfrow=c(2,3))
s.variab <- paste(sleepVars,"SSD",sep=".")
for(Var in s.variab){ hist(ema[,Var],main=Var,xlab="",breaks=30) }


Comments:

  • s.variab variables are highly skewed and negatively bounded to zero


3.5.4. Sleep autonomic func.

Among the variables describing sleep autonomic functioning we focus on stageHR_NREM and stageHR_REM.

# selecting variables
s.auton <- c("HR_NREM","HR_REM")

# plotting
par(mfrow=c(1,2))
for(Var in s.auton){ hist(ema[,Var],main=Var,xlab="",breaks=30) }


Comments:

  • s.auton variables are quite normally distributed, with a few outliers with HR > 90 bmp


3.5.5. TotalSteps

TotalSteps1000 quantifies the daily thousands of steps.

# plotting
hist(ema[,"TotalSteps1000"],main="TotalSteps1000",xlab="",breaks=30)


Comments:

  • TotalSteps1000 is bounded at zero, and positively skewed, with outliers showing > 30 thousants of steps per day


3.5.6. Diary ratings

Pre-sleep diary ratings include the stress, the worry and the mood item ratings (from 1 to 5).

# selecting variables
d.ratings <- c("stress","mood","worry")

# plotting
par(mfrow=c(1,3))
for(Var in d.ratings){ hist(ema[,Var],main=Var,xlab="",breaks=30) }


Comments:

  • all diary ratings are ordinal variables with a negatively skewed distribution


Here, we visualize the distributions of the aggregated scores and that of the factor scores predicted by the configural invariance MCFA model selected above.

# mean centering PsyDist
demos$PsyDist.cm <- summarySE(ema,measurevar="PsyDist",groupvars="ID",na.rm=TRUE)[,3]
ema <- plyr::join(ema,demos[,c("ID","PsyDist.cm")],by="ID",type="left")
ema$PsyDist.mc <- ema$PsyDist - ema$PsyDist.cm

# plotting
grid.arrange(ggplot(ema,aes(PsyDist))+geom_histogram(),ggplot(ema,aes(fs))+geom_histogram(),
             ggplot(ema[!duplicated(ema$ID),],aes(PsyDist.cm))+geom_histogram(),ggplot(demos,aes(fs.b))+geom_histogram(),
             ggplot(ema,aes(PsyDist.mc))+geom_histogram(),ggplot(ema,aes(fs.w))+geom_histogram(),nrow=3)


3.6. Bivariate distributions

Then, we visualize the distribution of each focal variable against categorical predictors.


3.6.1. Insomnia

Here, we explore the distributions of each continuous variable between insomnia and controls, and between the two insomnia sub-groups.

INSOMNIA VS CONTROLS

Demos
for(Var in c("age","BMI")){ 
  print(ggplot(demos,aes_string(x="insomnia",y=Var,fill="insomnia")) + ggtitle(paste(Var,"in controls vs. insomnia")) +
          geom_point(col="gray",position = position_jitter(width = .15)) + geom_violin(alpha=.4)) }


s.archit
for(Var in s.archit){ 
  print(ggplot(ema,aes_string(x="insomnia",y=Var,fill="insomnia")) + ggtitle(paste(Var,"in controls vs. insomnia")) +
          geom_point(col="gray",position = position_jitter(width = .15)) + geom_violin(alpha=.4)) }


Comments:

  • the insomnia group is characterized by a slightly longer TIB and TST, a slightly lower SE, a slightly higher light.p and lower REM.p than the control group


s.timing
for(Var in s.timing){ 
  print(ggplot(ema,aes_string(x="insomnia",y=Var,fill="insomnia")) + ggtitle(paste(Var,"in controls vs. insomnia")) +
          geom_point(col="gray",position = position_jitter(width = .15)) + geom_violin(alpha=.4)) }


Comments:

  • the insomnia group is characterized by a slightly later SO, and a slightly later WakeUp times compared to the control group


s.auton
for(Var in s.auton){ 
  print(ggplot(ema,aes_string(x="insomnia",y=Var,fill="insomnia")) + ggtitle(paste(Var,"in controls vs. insomnia")) +
          geom_point(col="gray",position = position_jitter(width = .15)) + geom_violin(alpha=.4)) }


Comments:

  • the insomnia group is characterized by a slightly lower HR both in HR_NREM and especially in HR_REM compared to the control group


s.variab
# day-level
for(Var in s.variab){ 
  print(ggplot(ema,aes_string(x="insomnia",y=Var,fill="insomnia")) + ggtitle(paste(Var,"in controls vs. insomnia")) +
          geom_point(col="gray",position = position_jitter(width = .15)) + geom_violin(alpha=.4) +
          ylim(0,max(ema[,Var],na.rm=TRUE)/4))} # higher limit fixed to 1/4 of the variable length for better visualization


Comments:

  • the two groups show quite similar night-to-night variability values


TotalSteps1000
for(Var in "TotalSteps1000"){ 
  print(ggplot(ema,aes_string(x="insomnia",y=Var,fill="insomnia")) + ggtitle(paste(Var,"in controls vs. insomnia")) +
          geom_point(col="gray",position = position_jitter(width = .15)) + geom_violin(alpha=.4)) }


Comments:

  • the insomnia group is characterized by a slightly lower TotalSteps1000 count compared to the control group


Diary ratings
# item ratings
for(Var in d.ratings){ 
  print(ggplot(ema,aes_string(x="insomnia",y=Var,fill="insomnia")) + ggtitle(paste(Var,"in controls vs. insomnia")) +
          geom_point(col="gray",position = position_jitter(width = .15)) + geom_violin(alpha=.4)) }

# aggregated scores
for(Var in c("PsyDist.cm","fs.b")){ 
  print(ggplot(demos,aes_string(x="insomnia",y=Var,fill="insomnia")) + ggtitle(paste(Var,"in controls vs. insomnia")) +
          geom_point(col="gray",position = position_jitter(width = .15)) + geom_violin(alpha=.4)) }


Comments:

  • the two groups are quite similar in terms of item ratings and aggregated scores


INSOMNIA GROUPS

Demos
for(Var in c("age","BMI")){ 
  print(ggplot(demos,aes_string(x="insomnia.group",y=Var,fill="insomnia.group")) + 
          ggtitle(paste(Var,"in controls vs. insomnia groups")) +
          geom_point(col="gray",position = position_jitter(width = .15)) + geom_violin(alpha=.4)) }


s.archit
for(Var in s.archit){ 
  print(ggplot(ema,aes_string(x="insomnia.group",y=Var,fill="insomnia.group")) + 
          ggtitle(paste(Var,"in controls vs. insomnia groups")) +
          geom_point(col="gray",position = position_jitter(width = .15)) + geom_violin(alpha=.4)) }


Comments:

  • no substantial differences are highlighted between insomnia groups, with both showing differences from the control group similar to those highlighted above

  • rem.p`` looks slighlty **higher in theDSM.ins`** group compared to controls


s.timing
for(Var in s.timing){ 
  print(ggplot(ema,aes_string(x="insomnia.group",y=Var,fill="insomnia.group")) + 
          ggtitle(paste(Var,"in controls vs. insomnia groups")) +
          geom_point(col="gray",position = position_jitter(width = .15)) + geom_violin(alpha=.4)) }


Comments:

  • no substantial SO differences are highlighted between insomnia groups, whereas the DSM.ins group shows higher WakeUp.num than the control group


s.auton
for(Var in s.auton){ 
  print(ggplot(ema,aes_string(x="insomnia.group",y=Var,fill="insomnia.group")) + 
          ggtitle(paste(Var,"in controls vs. insomnia groups")) +
          geom_point(col="gray",position = position_jitter(width = .15)) + geom_violin(alpha=.4)) }


Comments:

  • both meanHR and sleepHR values computed from the first part of the night show lower values in the insomnia groups (especially the DSM.ins) compared to the control group (look substantial)

  • this is more evident for the DSM-insomnia group, whereas a bimodal distribution is shown by the sub-insomnia group


s.variab
for(Var in s.variab){ 
  print(ggplot(ema,aes_string(x="insomnia.group",y=Var,fill="insomnia.group")) + 
          ggtitle(paste(Var,"in controls vs. insomnia groups")) +
          geom_point(col="gray",position = position_jitter(width = .15)) + geom_violin(alpha=.4) +
          ylim(0,max(ema[,Var],na.rm=TRUE)/4))} # higher limit fixed to 1/4 of the variable length for better visualization


Comments:

  • the three groups show quite similar night-to-night variability values


TotalSteps1000
for(Var in "TotalSteps1000"){ 
  print(ggplot(ema,aes_string(x="insomnia.group",y=Var,fill="insomnia.group")) + 
          ggtitle(paste(Var,"in controls vs. insomnia groups")) +
          geom_point(col="gray",position = position_jitter(width = .15)) + geom_violin(alpha=.4)) }


Comments:

  • the DSM-insomnia group is characterized by a slightly lower TotalSteps count compared to the other two groups


Diary ratings
# item ratings
for(Var in d.ratings){ 
  print(ggplot(ema,aes_string(x="insomnia.group",y=Var,fill="insomnia.group")) + 
          ggtitle(paste(Var,"in controls vs. insomnia groups")) +
          geom_point(col="gray",position = position_jitter(width = .15)) + geom_violin(alpha=.4)) }

# aggregated scores
for(Var in c("PsyDist.cm","fs.b")){ 
  print(ggplot(demos,aes_string(x="insomnia.group",y=Var,fill="insomnia.group")) + 
          ggtitle(paste(Var,"in controls vs. insomnia groups")) +
          geom_point(col="gray",position = position_jitter(width = .15)) + geom_violin(alpha=.4)) }


Comments:

  • the two groups are quite similar in terms of PsyDist, although some variables showing slightly lower values in the sub-insomnia, and a slighlty higher values in the DSM-insomnia compared to the other groups


3.6.2. Sex

Here, we explore the distributions of each continuous variable between girls and boys.

Demos

for(Var in c("age","BMI")){ 
  print(ggplot(demos,aes_string(x="sex",y=Var,fill="sex")) + ggtitle(paste(Var,"in females vs. males")) +
          geom_point(col="gray",position = position_jitter(width = .15)) + geom_violin(alpha=.4)) }


Comments:

  • girls are slighly older than boys


s.archit

for(Var in s.archit){ 
  print(ggplot(ema,aes_string(x="sex",y=Var,fill="sex")) + ggtitle(paste(Var,"in females vs. males")) +
          geom_point(col="gray",position = position_jitter(width = .15)) + geom_violin(alpha=.4)) }


Comments:

  • the two groups are quite similar, with the exception of a slighly lower TST and TIB, and a slightly higher REM.p in boys compared to girls


s.timing

for(Var in s.timing){ 
  print(ggplot(ema,aes_string(x="sex",y=Var,fill="sex")) + ggtitle(paste(Var,"in females vs. males")) +
          geom_point(col="gray",position = position_jitter(width = .15)) + geom_violin(alpha=.4)) }


Comments:

  • SO is slightly later in boys compared to girls, whereas WakeUp times are similar


s.auton

for(Var in s.auton){ 
  print(ggplot(ema,aes_string(x="sex",y=Var,fill="sex")) + ggtitle(paste(Var,"in females vs. males")) +
          geom_point(col="gray",position = position_jitter(width = .15)) + geom_violin(alpha=.4)) }


Comments:

  • boys show lower nocturnal HR compared to girls, across all considered variables (looks substantial)


s.variab

for(Var in s.variab){ 
  print(ggplot(ema,aes_string(x="sex",y=Var,fill="sex")) + ggtitle(paste(Var,"in females vs. males")) +
          geom_point(col="gray",position = position_jitter(width = .15)) + geom_violin(alpha=.4) +
          ylim(0,max(ema[,Var],na.rm=TRUE)/4))} # higher limit fixed to 1/4 of the variable length for better visualization


Comments:

  • the two groups show similar distributions


TotalSteps1000

for(Var in "TotalSteps1000"){ 
  print(ggplot(ema,aes_string(x="sex",y=Var,fill="sex")) + ggtitle(paste(Var,"in females vs. males")) +
          geom_point(col="gray",position = position_jitter(width = .15)) + geom_violin(alpha=.4)) }


Comments:

  • no substantial differences can be highlighted between the two groups


Diary ratings

# item ratings
for(Var in d.ratings){ 
  print(ggplot(ema,aes_string(x="sex",y=Var,fill="sex")) + ggtitle(paste(Var,"in females vs. males")) +
          geom_point(col="gray",position = position_jitter(width = .15)) + geom_violin(alpha=.4)) }

# aggregated scores
for(Var in c("PsyDist.cm","fs.b")){ 
  print(ggplot(demos,aes_string(x="sex",y=Var,fill="sex")) + ggtitle(paste(Var,"in females vs. males")) +
          geom_point(col="gray",position = position_jitter(width = .15)) + geom_violin(alpha=.4)) }


Comments:

  • the two groups show quite similar distributions, although fs is slightly higher in girls compared to boys


3.6.3. Weekend

Here, we explore the explore the differences between cases observed during weekend vs weekdays. Both the weekday (weekend = Saturday and Sunday) and the weekday.sleep variable (weekend = Friday and Saturday) are explored.

s.archit

# weekday
for(Var in s.archit){ 
  print(ggplot(ema,aes_string(x="weekday",y=Var,fill="weekday")) + ggtitle(paste(Var,"in weekdays vs. weekend (SAT-SUN)")) +
          geom_point(col="gray",position = position_jitter(width = .15)) + geom_violin(alpha=.4)+
          theme(legend.position = "none")) }

# weekday.sleep
for(Var in s.archit){ 
  print(ggplot(ema,aes_string(x="weekday.sleep",y=Var,fill="weekday.sleep")) + 
          ggtitle(paste(Var,"in weekdays vs. weekend (FRY-SAT)")) +
          geom_point(col="gray",position = position_jitter(width = .15)) + geom_violin(alpha=.4) +
          theme(legend.position = "none")) }


Comments:

  • weekend days show slightly longer TIB and TST than weekdays, with slightly larger differences when considering Friday and Saturday as weekend, rather than Saturday and Sunday


s.timing

# weekday
for(Var in s.timing){ 
  print(ggplot(ema,aes_string(x="weekday",y=Var,fill="weekday")) + ggtitle(paste(Var,"in weekdays vs. weekend (SAT-SUN)")) +
          geom_point(col="gray",position = position_jitter(width = .15)) + geom_violin(alpha=.4)+
          theme(legend.position = "none")) }

# weekday.sleep
for(Var in s.timing){ 
  print(ggplot(ema,aes_string(x="weekday.sleep",y=Var,fill="weekday.sleep")) + 
          ggtitle(paste(Var,"in weekdays vs. weekend (FRY-SAT)")) +
          geom_point(col="gray",position = position_jitter(width = .15)) + geom_violin(alpha=.4)+
          theme(legend.position = "none")) }


Comments:

  • weekend days show later SO and WakeUp times compared to weekdays, with substantially larger differences when considering Friday and Saturday as weekend, rather than Saturday and Sunday


s.auton

# weekday
for(Var in s.auton){ 
  print(ggplot(ema,aes_string(x="weekday",y=Var,fill="weekday")) + ggtitle(paste(Var,"in weekdays vs. weekend (SAT-SUN)")) +
          geom_point(col="gray",position = position_jitter(width = .15)) + geom_violin(alpha=.4)+
          theme(legend.position = "none")) }

# weekday.sleep
for(Var in s.auton){ 
  print(ggplot(ema,aes_string(x="weekday.sleep",y=Var,fill="weekday.sleep")) +
          ggtitle(paste(Var,"in weekdays vs. weekend (FRY-SAT)")) +
          geom_point(col="gray",position = position_jitter(width = .15)) + geom_violin(alpha=.4)+
          theme(legend.position = "none")) }


Comments:

  • weekdays and weekend days are quite similar in terms of sleep-related HR


s.variab

for(Var in s.variab){ 
  print(ggplot(ema,aes_string(x="weekday",y=Var,fill="weekday")) + ggtitle(paste(Var,"in weekdays vs. weekend (SAT-SUN)")) +
          geom_point(col="gray",position = position_jitter(width = .15)) + geom_violin(alpha=.4) +
          ylim(0,max(ema[,Var],na.rm=TRUE)/4))} # higher limit fixed to 1/4 of the variable length for better visualization


Comments:

  • no substantial differences can be highlighted between weekdays and weekend days


TotalSteps1000

# weekday
for(Var in "TotalSteps1000"){ 
  print(ggplot(ema,aes_string(x="weekday",y=Var,fill="weekday")) + ggtitle(paste(Var,"in weekdays vs. weekend (SAT-SUN)")) +
          geom_point(col="gray",position = position_jitter(width = .15)) + geom_violin(alpha=.4)+
          theme(legend.position = "none")) }

# weekday.sleep
for(Var in "TotalSteps1000"){ 
  print(ggplot(ema,aes_string(x="weekday.sleep",y=Var,fill="weekday.sleep")) +
          ggtitle(paste(Var,"in weekdays vs. weekend (FRY-SAT)")) +
          geom_point(col="gray",position = position_jitter(width = .15)) + geom_violin(alpha=.4)+
          theme(legend.position = "none")) }


Comments:

  • weekend days show slightly lower TotalSteps compared with weekdays, but only when considering Saturday and Sunday as weekend


Diary ratings

# weekday - item ratings
for(Var in d.ratings){ 
  print(ggplot(ema,aes_string(x="weekday",y=Var,fill="weekday")) + ggtitle(paste(Var,"in weekdays vs. weekend (SAT-SUN)")) +
          geom_point(col="gray",position = position_jitter(width = .15)) + geom_violin(alpha=.4)) }

# weekday.sleep - item ratings
for(Var in d.ratings){ 
  print(ggplot(ema,aes_string(x="weekday.sleep",y=Var,fill="weekday.sleep")) + 
          ggtitle(paste(Var,"in weekdays vs. weekend (FRY-SAT)")) +
          geom_point(col="gray",position = position_jitter(width = .15)) + geom_violin(alpha=.4)) }

# weekday - aggregated scores
for(Var in c("PsyDist","fs.w")){ 
  print(ggplot(ema,aes_string(x="weekday",y=Var,fill="weekday")) + ggtitle(paste(Var,"in weekdays vs. weekend (SAT-SUN)")) +
          geom_point(col="gray",position = position_jitter(width = .15)) + geom_violin(alpha=.4)) }

# weekday.sleep - aggregated scores
for(Var in c("PsyDist","fs.w")){ 
  print(ggplot(ema,aes_string(x="weekday.sleep",y=Var,fill="weekday.sleep")) + 
          ggtitle(paste(Var,"in weekdays vs. weekend (FRY-SAT)")) +
          geom_point(col="gray",position = position_jitter(width = .15)) + geom_violin(alpha=.4)) }


Comments:

  • weekend days show slightly lower PsyDist than weekdays, especially when considering the factor scores at the within level with Friday and Saturday as weekend days


3.6.4. COVID-19

Here, we explore the differences between cases observed before and during the COVID-19 emergency (i.e., after March 20th, 2020).

s.archit

for(Var in s.archit){ 
  print(ggplot(ema,aes_string(x="covid19",y=Var,fill="covid19")) + ggtitle(paste(Var,"pre- and post-covid19 lockdown")) +
          geom_point(col="gray",position = position_jitter(width = .15)) + geom_violin(alpha=.4)) }


Comments:

  • the post-COVID-19 period is associated with longer TIB and TST compared to the pre-COVID-19 period, whereas difference in other s.archit variables do not seem substantial


s.timing

for(Var in s.timing){ 
  print(ggplot(ema,aes_string(x="covid19",y=Var,fill="covid19")) + ggtitle(paste(Var,"pre- and post-covid19 lockdown")) +
          geom_point(col="gray",position = position_jitter(width = .15)) + geom_violin(alpha=.4)) }


Comments:

  • SO and especially WakeUp are substantially delayed in the post-COVID-19 period compared to the pre-COVID-19 period.


s.auton

for(Var in s.auton){ 
  print(ggplot(ema,aes_string(x="covid19",y=Var,fill="covid19")) + ggtitle(paste(Var,"pre- and post-covid19 lockdown")) +
          geom_point(col="gray",position = position_jitter(width = .15)) + geom_violin(alpha=.4)) }


Comments:

  • the post-COVID-19 period shows slightly lower sleep-related HR than the pre-COVID-19 period, especially in terms of stageHR_REM


s.variab

for(Var in s.variab){ 
  print(ggplot(ema,aes_string(x="covid19",y=Var,fill="covid19")) + ggtitle(paste(Var,"pre- and post-covid19 lockdown")) +
          geom_point(col="gray",position = position_jitter(width = .15)) + geom_violin(alpha=.4) +
          ylim(0,max(ema[,Var],na.rm=TRUE)/4))} # higher limit fixed to 1/4 of the variable length for better visualization


Comments:

  • the two periods show quite similar night-to-night variability values


TotalSteps1000

for(Var in "TotalSteps1000"){ 
  print(ggplot(ema,aes_string(x="covid19",y=Var,fill="covid19")) + ggtitle(paste(Var,"pre- and post-covid19 lockdown")) +
          geom_point(col="gray",position = position_jitter(width = .15)) + geom_violin(alpha=.4)) }


Comments:

  • the post-COVID-19 period shows substantially lower No. of steps compared to the pre-COVID-19 period


Diary ratings

# item ratings
for(Var in d.ratings){ 
  print(ggplot(ema,aes_string(x="covid19",y=Var,fill="covid19")) + ggtitle(paste(Var,"pre- and post-covid19 lockdown")) +
          geom_point(col="gray",position = position_jitter(width = .15)) + geom_violin(alpha=.4)) }

# aggregated scores
for(Var in c("PsyDist","fs.w")){ 
  print(ggplot(ema,aes_string(x="covid19",y=Var,fill="covid19")) + ggtitle(paste(Var,"pre- and post-covid19 lockdown")) +
          geom_point(col="gray",position = position_jitter(width = .15)) + geom_violin(alpha=.4)) }


Comments:

  • the post-COVID-19 period shows slightly higher PsyDist compared to the pre-COVID-19 period, especially in terms of factor scores


3.6.5. Late responses

Finally, we explore the differences between diary ratings obtained before bedtime and those obtained on the following day.

Diary ratings

# item ratings
ema$diary.nextDay <- as.factor(ema$diary.nextDay)
for(Var in d.ratings){ 
  print(ggplot(ema[!is.na(ema$diary.nextDay),],aes_string(x="diary.nextDay",y=Var,fill="diary.nextDay")) + 
          ggtitle(paste(Var,"bedtime vs. next day diary entries")) +
          geom_point(col="gray",position = position_jitter(width = .15)) + geom_violin(alpha=.4)) }

# aggregated scores
for(Var in c("PsyDist","fs.w")){ 
  print(ggplot(ema[!is.na(ema$diary.nextDay),],aes_string(x="diary.nextDay",y=Var,fill="diary.nextDay")) + 
          ggtitle(paste(Var,"bedtime vs. next day diary entries")) +
          geom_point(col="gray",position = position_jitter(width = .15)) + geom_violin(alpha=.4)) }


Comments:

  • the distributions of diary ratings are quite similar when entered before bedtime or on the following day


3.7. Stress, Mood, & Worry

Here, we descriptively characterize the sample and the subgroups of participants (based on sex, insomnia, and insomnia.group) in terms of pre-sleep stress, worry, and mood as well as in terms of sources of stress and worry.


3.7.1. Stress

Here, we describe the stress variable in the sample, and across the subgroups of participants.

# plotting
grid.arrange(ggplot(ema[!is.na(ema$stress),],aes(x=as.factor(stress))) + geom_histogram(stat="count") + ggtitle("stress in the sample") + 
               geom_text(aes(label = scales::percent((..count..)/sum(..count..))), stat = "count", vjust = -0.25) + ylim(0,1700),
             ggplot(ema,aes(stress,fill=sex)) + geom_histogram(position=position_dodge(0.5)),
             ggplot(ema,aes(stress,fill=insomnia)) + geom_histogram(position=position_dodge(0.5)),
             ggplot(ema,aes(stress,fill=insomnia.group)) +  geom_histogram(position=position_dodge(0.5)),nrow=2)

# computing mean and SD in the sample and by group
summarySE(ema,measurevar="stress",na.rm=TRUE)[,1:4] # stress in the sample
summarySE(ema,measurevar="stress",groupvars="sex",na.rm=TRUE)[,1:4] # stress by sex
summarySE(ema,measurevar="stress",groupvars="insomnia",na.rm=TRUE)[,1:4] # stress by insomnia
summarySE(ema,measurevar="stress",groupvars="insomnia.group",na.rm=TRUE)[,1:4] # stress by insomnia.group
# very or extremely stressful
ema[!is.na(ema$stress) & ema$stress<4,"stress.VE"] <- 0
ema[!is.na(ema$stress) & ema$stress>=4,"stress.VE"] <- 1
ema$stress.VE <- as.factor(ema$stress.VE)
df <- ema[!is.na(ema$stress),]
grid.arrange(ggplot(df,aes(x=stress.VE)) + geom_histogram(stat="count") + 
               ggtitle("very/estremely stressed in the sample") + ylim(0,4500) +
               geom_text(aes(label = scales::percent((..count..)/sum(..count..))), stat="count", vjust = -0.25),
             ggplot(df,aes(stress.VE,fill=sex)) + geom_histogram(stat="count",position=position_dodge(0.5)),
             ggplot(df,aes(stress.VE,fill=insomnia)) + geom_histogram(stat="count",position=position_dodge(0.5)),
             ggplot(df,aes(stress.VE,fill=insomnia.group)) +  geom_histogram(stat="count",position=position_dodge(0.5)),nrow=2)

# summarizing stress.VE
cat(nrow(df[df$stress.VE==1,]),"cases with very or extreme stress,",
    round(100*nrow(df[df$stress.VE==1,])/nrow(df),2),"%\n",
    nrow(df[df$insomnia==0 & df$stress.VE==1,]),"cases with very or extreme stress controls ratings,",
    round(100*nrow(df[df$insomnia==0 & df$stress.VE==1,])/nrow(df[df$insomnia==1,]),2),"% of controls ratings\n",
    nrow(df[df$insomnia==1 & df$stress.VE==1,]),"cases with very or extreme stress insomnia ratings,",
    round(100*nrow(df[df$insomnia==1 & df$stress.VE==1,])/nrow(df[df$insomnia==1,]),2),"% of insomnia ratings")
## 607 cases with very or extreme stress, 12.31 %
##  225 cases with very or extreme stress controls ratings, 8.87 % of controls ratings
##  382 cases with very or extreme stress insomnia ratings, 15.06 % of insomnia ratings


Comments:

  • in the 12.3% of the days adolescents reported being ‘very’ or ‘extremely stressed’, with slightly higher percentages in insomnia (15%) than in controls (9%)


3.7.2. stress sources

Here, we describe the sources reported by participants for the stress variable in the sample, and across the subgroups of participants. Note that many responses are missing, with unbalanced numbers of missing responses for each category. Here, we recode all categories (with the exception of stress_COVID, that was added later than the others, and thus not considered in the present analyses) by giving value zero when the score to one or more of the remaining categories was not missing.

# showing summary (original)
ema$stress_COVID <- NULL # removing stress_COVID
stressSources <- colnames(ema)[which(substr(colnames(ema),1,7)=="stress_")]
summary(ema[!is.na(ema$stress),stressSources])
##  stress_school stress_family stress_health stress_peers stress_other
##  0   :2617     0   :3440     0   :2628     0   :2832    0   :3556   
##  1   :2141     1   : 404     1   : 309     1   : 537    1   :1277   
##  NA's: 174     NA's:1088     NA's:1995     NA's:1563    NA's:  99
# assigning zero when 1+ of the other scores are not missing
ema[,stressSources] <- lapply(ema[,stressSources],as.character) # from factor to numeric
ema[,stressSources] <- lapply(ema[,stressSources],as.numeric)
ema$stress_total <- rowSums(ema[,stressSources],na.rm=TRUE) # computing total score
ema[is.na(ema$stress_school) & !is.na(ema$stress_total),"stress_school"] <- 0
ema[is.na(ema$stress_family) & !is.na(ema$stress_total),"stress_family"] <- 0
ema[is.na(ema$stress_health) & !is.na(ema$stress_total),"stress_health"] <- 0
ema[is.na(ema$stress_peers) & !is.na(ema$stress_total),"stress_peers"] <- 0
ema[is.na(ema$stress_other) & !is.na(ema$stress_total),"stress_other"] <- 0
ema[,stressSources] <- lapply(ema[,stressSources],as.factor) # back to factor

# showing summary (updated)
summary(ema[!is.na(ema$stress),stressSources])
##  stress_school stress_family stress_health stress_peers stress_other
##  0:2791        0:4528        0:4623        0:4395       0:3655      
##  1:2141        1: 404        1: 309        1: 537       1:1277
# plotting variables in the sample
for(Var in stressSources){
  print(ggplot(ema[!is.na(ema$stress),],aes_string(x=Var)) + geom_histogram(stat="count",fill="gray") + 
               ggtitle(paste(Var,"in the sample")) +
          geom_text(aes(label = scales::percent((..count..)/sum(..count..))), stat="count", vjust = 1.2))}

# plotting variables only in cases with stress > 1
for(Var in stressSources){
  print(ggplot(ema[!is.na(ema$stress) & ema$stress>1,],aes_string(x=Var)) + geom_histogram(stat="count",fill="yellow") + 
               ggtitle(paste(Var,"in the sample when stress > 1")) +
          geom_text(aes(label = scales::percent((..count..)/sum(..count..))), stat="count", vjust = 1.2))}

# plotting variables only in cases with stress = Very or Extremely stressed
for(Var in stressSources){
  print(ggplot(ema[!is.na(ema$stress) & ema$stress>3,],aes_string(x=Var)) + geom_histogram(stat="count",fill="red") + 
               ggtitle(paste(Var,"in the sample when stress > 3")) +
          geom_text(aes(label = scales::percent((..count..)/sum(..count..))), stat="count", vjust = 1.2))}


Comments:

  • the most frequently reported sources of stress are school (63% of times with stress > 1, 70% of times with stress > 3) and other (38% of times with stress > 1, 44% of times with stress > 3).


Finally, we visualize the frequency of each source of stress by sex, insomnia, and insomnia.group.

# plotting variables by sex
for(Var in stressSources){
  print(ggplot(ema[!is.na(ema$stress) & ema$stress > 1,],aes_string(x="sex",fill=Var)) + 
          geom_histogram(stat="count",position="dodge") + ggtitle(paste(Var,"by sex")) +
          geom_text(aes(label = scales::percent((..count..)/sum(..count..))),stat="count", vjust = 1.2,position="identity"))}

# plotting variables by insomnia
for(Var in stressSources){
  print(ggplot(ema[!is.na(ema$stress) & ema$stress > 1,],aes_string(x="insomnia",fill=Var)) + 
          geom_histogram(stat="count",position="dodge") + ggtitle(paste(Var,"by insomnia")) +
          geom_text(aes(label = scales::percent((..count..)/sum(..count..))),stat="count", vjust = 1.2,position="identity"))}

# plotting variables by insomnia.group
for(Var in stressSources){
  print(ggplot(ema[!is.na(ema$stress) & ema$stress > 1,],aes_string(x="insomnia.group",fill=Var)) + 
          geom_histogram(stat="count",position="dodge") + ggtitle(paste(Var,"by insomnia.group")) +
          geom_text(aes(label = scales::percent((..count..)/sum(..count..))),stat="count", vjust = 1.2,position="identity"))}


3.7.3. worry

Here, we describe the worry variable in the sample, and across the subgroups of participants.

# plotting
grid.arrange(ggplot(ema[!is.na(ema$worry),],aes(x=as.factor(worry))) + geom_histogram(stat="count") + 
               ggtitle("worry in the sample") + 
               geom_text(aes(label = scales::percent((..count..)/sum(..count..))), stat = "count", vjust = -0.25) + ylim(0,1700),
             ggplot(ema,aes(worry,fill=sex)) + geom_histogram(position=position_dodge(0.5)),
             ggplot(ema,aes(worry,fill=insomnia)) + geom_histogram(position=position_dodge(0.5)),
             ggplot(ema,aes(worry,fill=insomnia.group)) +  geom_histogram(position=position_dodge(0.5)),nrow=2)

# computing mean and SD in the sample and by group
summarySE(ema,measurevar="worry",na.rm=TRUE)[,1:4] # worry in the sample
summarySE(ema,measurevar="worry",groupvars="sex",na.rm=TRUE)[,1:4] # worry by sex
summarySE(ema,measurevar="worry",groupvars="insomnia",na.rm=TRUE)[,1:4] # worry by insomnia
summarySE(ema,measurevar="worry",groupvars="insomnia.group",na.rm=TRUE)[,1:4] # worry by insomnia.group
# very or extremely stressful
ema[!is.na(ema$worry) & ema$worry<4,"worry.VE"] <- 0
ema[!is.na(ema$worry) & ema$worry>=4,"worry.VE"] <- 1
ema$worry.VE <- as.factor(ema$worry.VE)
df <- ema[!is.na(ema$worry),]
grid.arrange(ggplot(df,aes(x=worry.VE)) + geom_histogram(stat="count") + 
               ggtitle("worry in the sample") + ylim(0,4500) +
               geom_text(aes(label = scales::percent((..count..)/sum(..count..))), stat="count", vjust = -0.25),
             ggplot(df,aes(worry.VE,fill=sex)) + geom_histogram(stat="count",position=position_dodge(0.5)),
             ggplot(df,aes(worry.VE,fill=insomnia)) + geom_histogram(stat="count",position=position_dodge(0.5)),
             ggplot(df,aes(worry.VE,fill=insomnia.group)) +  geom_histogram(stat="count",position=position_dodge(0.5)),nrow=2)

# summarizing worry.VE
cat(nrow(df[df$worry.VE==1,]),"cases with very or extreme worry,",
    round(100*nrow(df[df$worry.VE==1,])/nrow(df),2),"%\n",
    nrow(df[df$insomnia==0 & df$worry.VE==1,]),"cases with very or extreme worry controls ratings,",
    round(100*nrow(df[df$insomnia==0 & df$worry.VE==1,])/nrow(df[df$insomnia==1,]),2),"% of controls ratings\n",
    nrow(df[df$insomnia==1 & df$worry.VE==1,]),"cases with very or extreme worry insomnia ratings,",
    round(100*nrow(df[df$insomnia==1 & df$worry.VE==1,])/nrow(df[df$insomnia==1,]),2),"% of insomnia ratings")
## 621 cases with very or extreme worry, 12.59 %
##  256 cases with very or extreme worry controls ratings, 10.09 % of controls ratings
##  365 cases with very or extreme worry insomnia ratings, 14.39 % of insomnia ratings


Comments:

  • in the 12.6% of the days adolescents reported being ‘very’ or ‘extremely worried’, with slightly higher percentages in insomnia (14%) than in controls (10%)


3.7.4. worry sources

Here, we describe the sources reported by participants for the worry variable in the sample, and across the subgroups of participants. Note that many responses are missing, with unbalanced numbers of missing responses for each category. Here, we recode all categories (with the exception of stress_COVID, that was added later than the others) by giving value zero when the score to one or more of the remaining categories was not missing.

# showing summary (original)
ema$worry_COVID <- NULL # removing worry_COVID
worrySources <- colnames(ema)[which(substr(colnames(ema),1,6)=="worry_")]
summary(ema[!is.na(ema$worry),worrySources])
##  worry_school worry_family worry_health worry_peer  worry_sleep worry_other
##  0   :2500    0   :3281    0   :2670    0   :2890   0   :2928   0   :3313  
##  1   :2241    1   : 294    1   : 351    1   : 554   1   : 732   1   :1289  
##  NA's: 191    NA's:1357    NA's:1911    NA's:1488   NA's:1272   NA's: 330
# assigning zero when other scores are not missing
ema[,worrySources] <- lapply(ema[,worrySources],as.character) # from factor to numeric
ema[,worrySources] <- lapply(ema[,worrySources],as.numeric)
ema$worry_total <- rowSums(ema[,worrySources],na.rm=TRUE) # computing total score
ema[is.na(ema$worry_school) & !is.na(ema$worry_total),"worry_school"] <- 0
ema[is.na(ema$worry_family) & !is.na(ema$worry_total),"worry_family"] <- 0
ema[is.na(ema$worry_health) & !is.na(ema$worry_total),"worry_health"] <- 0
ema[is.na(ema$worry_peer) & !is.na(ema$worry_total),"worry_peer"] <- 0
ema[is.na(ema$worry_sleep) & !is.na(ema$worry_total),"worry_sleep"] <- 0
ema[is.na(ema$worry_other) & !is.na(ema$worry_total),"worry_other"] <- 0
ema[,worrySources] <- lapply(ema[,worrySources],as.factor) # back to factor

# showing summary (updated)
summary(ema[!is.na(ema$worry),worrySources])
##  worry_school worry_family worry_health worry_peer worry_sleep worry_other
##  0:2691       0:4638       0:4581       0:4378     0:4200      0:3643     
##  1:2241       1: 294       1: 351       1: 554     1: 732      1:1289
# plotting variables in the sample
for(Var in worrySources){
  print(ggplot(ema[!is.na(ema$worry),],aes_string(x=Var)) + geom_histogram(stat="count",fill="gray") + 
               ggtitle(paste(Var,"in the sample")) +
          geom_text(aes(label = scales::percent((..count..)/sum(..count..))), stat="count", vjust = 1.2))}

# plotting variables only in cases with worry > 1
for(Var in worrySources){
  print(ggplot(ema[!is.na(ema$worry) & ema$worry>1,],aes_string(x=Var)) + geom_histogram(stat="count",fill="yellow") + 
               ggtitle(paste(Var,"in the sample when worry > 1")) +
          geom_text(aes(label = scales::percent((..count..)/sum(..count..))), stat="count", vjust = 1.2))}

# plotting variables only in cases with worry = Very or Extremely stressed
for(Var in worrySources){
  print(ggplot(ema[!is.na(ema$worry) & ema$worry>3,],aes_string(x=Var)) + geom_histogram(stat="count",fill="red") + 
               ggtitle(paste(Var,"in the sample when worry > 3")) +
          geom_text(aes(label = scales::percent((..count..)/sum(..count..))), stat="count", vjust = 1.2))}


Comments:

  • the most frequently reported sources of stress are school (66% of times with worry > 1, 76% of times with worry > 3) and other (38% of times with worry > 1, 43% of times with worry > 3).


Here, we plot the frequency of each source of stress by sex, insomnia, and insomnia.group.

# plotting variables by sex
for(Var in worrySources){
  print(ggplot(ema[!is.na(ema$worry) & ema$worry > 1,],aes_string(x="sex",fill=Var)) + 
          geom_histogram(stat="count",position="dodge") + ggtitle(paste(Var,"by sex")) +
          geom_text(aes(label = scales::percent((..count..)/sum(..count..))),stat="count", vjust = 1.2,position="identity"))}

# plotting variables by insomnia
for(Var in worrySources){
  print(ggplot(ema[!is.na(ema$worry) & ema$worry > 1,],aes_string(x="insomnia",fill=Var)) + 
          geom_histogram(stat="count",position="dodge") + ggtitle(paste(Var,"by insomnia")) +
          geom_text(aes(label = scales::percent((..count..)/sum(..count..))),stat="count", vjust = 1.2,position="identity"))}

# plotting variables by insomnia.group
for(Var in worrySources){
  print(ggplot(ema[!is.na(ema$worry) & ema$worry > 1,],aes_string(x="insomnia.group",fill=Var)) + 
          geom_histogram(stat="count",position="dodge") + ggtitle(paste(Var,"by insomnia.group")) +
          geom_text(aes(label = scales::percent((..count..)/sum(..count..))),stat="count", vjust = 1.2,position="identity"))}


3.7.5. Sources of stress & worry

Here, we generate an aggregate plot integrating stress source, stress level, and group comparisons.

# stress
ss.df <- ema[!is.na(ema$stress) & ema$stress>1,] # selecting data
ss.df$stressSource <- "Multiple" # Multiple sources when stress_total > 1
ss.df[ss.df$stress_school==1 & ss.df$stress_total==1,"stressSource"] <- "School" # single source only when stress_total = 1
ss.df[ss.df$stress_family==1 & ss.df$stress_total==1,"stressSource"] <- "Family"
ss.df[ss.df$stress_peers==1 & ss.df$stress_total==1,"stressSource"] <- "Peers"
ss.df[ss.df$stress_health==1 & ss.df$stress_total==1,"stressSource"] <- "Health"
ss.df[ss.df$stress_other==1 & ss.df$stress_total==1,"stressSource"] <- "Other"
ss.df[ss.df$stress_total==0,"stressSource"] <- "Other" # other when unspecified source of stress (N = 25)
ss.df$stressSource <- factor(ss.df$stressSource,levels=c("School","Family","Peers","Health","Other","Multiple"))
ss.df$Stress <- "Not so/somwhat stressful" # recoding stress in two categories
ss.df[ss.df$stress>3,"Stress"] <- "Very/Extremely stressful"
ss.df$Stress <- factor(ss.df$Stress,levels=c("Very/Extremely stressful","Not so/somwhat stressful"))
ss.df$insomnia <- as.factor(gsub("0","Controls",gsub("1","Insomnia",ss.df$insomnia))) # recoding insomnia for plotting
p1 <- ggplot(ss.df, aes(stressSource, fill = Stress)) + geom_histogram(stat="count") + # plotting
  facet_wrap("insomnia") + labs(y="No. of observations") + guides(fill=guide_legend(title="Daily stress")) + 
  theme(axis.text.x = element_text(angle=45),axis.title.x=element_blank())

# worry
ws.df <- ema[!is.na(ema$worry) & ema$worry>1,] # selecting data
ws.df$worrySource <- "Multiple" # Multiple sources when worry_total > 1
ws.df[ws.df$worry_school==1 & ws.df$worry_total==1,"worrySource"] <- "School"
ws.df[ws.df$worry_family==1 & ws.df$worry_total==1,"worrySource"] <- "Family"
ws.df[ws.df$worry_peer==1 & ws.df$worry_total==1,"worrySource"] <- "Peers"
ws.df[ws.df$worry_health==1 & ws.df$worry_total==1,"worrySource"] <- "Health"
ws.df[ws.df$worry_sleep==1 & ws.df$worry_total==1,"worrySource"] <- "Sleep"
ws.df[ws.df$worry_other==1 & ws.df$worry_total==1,"worrySource"] <- "Other"
ws.df[ws.df$worry_total==0,"worrySource"] <- "Other" # other when unspecified source of worry (N = 25)
ws.df$worrySource <- factor(ws.df$worrySource,levels=c("School","Family","Peers","Health","Sleep","Other","Multiple"))
ws.df$Worry <- "Not so/somwhat worried" # recoding worry in two categories
ws.df[ws.df$worry>3,"Worry"] <- "Very/Extremely worried"
ws.df$Worry <- factor(ws.df$Worry,levels=c("Very/Extremely worried","Not so/somwhat worried"))
ws.df$insomnia <- as.factor(gsub("0","Controls",gsub("1","Insomnia",ws.df$insomnia))) # recoding insomnia for plotting
p2 <- ggplot(ws.df, aes(worrySource, fill = Worry)) + geom_histogram(stat="count") + # plotting
  facet_wrap("insomnia") + labs(y="No. of observations") + guides(fill=guide_legend(title="Pre-sleep worry")) + 
  theme(axis.text.x = element_text(angle=45),axis.title.x=element_blank())

# final plot
p <- grid.arrange(p1,p2,nrow=2)

ggsave(filename="RESULTS/Figure2.tiff",plot=p,dpi=500)


3.7.6. mood

Here, we describe the mood variable in the sample, and across the subgroups of participants.

# plotting
grid.arrange(ggplot(ema[!is.na(ema$mood),],aes(x=as.factor(mood))) + geom_histogram(stat="count") + 
               ggtitle("mood in the sample") + 
               geom_text(aes(label = scales::percent((..count..)/sum(..count..))), stat = "count", vjust = -0.25),
             ggplot(ema,aes(mood,fill=sex)) + geom_histogram(position=position_dodge(0.5)),
             ggplot(ema,aes(mood,fill=insomnia)) + geom_histogram(position=position_dodge(0.5)),
             ggplot(ema,aes(mood,fill=insomnia.group)) +  geom_histogram(position=position_dodge(0.5)),nrow=2)

# computing mean and SD in the sample and by group
summarySE(ema,measurevar="mood",na.rm=TRUE)[,1:4] # mood in the sample
summarySE(ema,measurevar="mood",groupvars="sex",na.rm=TRUE)[,1:4] # mood by sex
summarySE(ema,measurevar="mood",groupvars="insomnia",na.rm=TRUE)[,1:4] # mood by insomnia
summarySE(ema,measurevar="mood",groupvars="insomnia.group",na.rm=TRUE)[,1:4] # mood by insomnia.group
# somehow bad/bad mood
ema[!is.na(ema$mood) & ema$mood<4,"mood.VE"] <- 0
ema[!is.na(ema$mood) & ema$mood>=4,"mood.VE"] <- 1
ema$mood.VE <- as.factor(ema$mood.VE)
df <- ema[!is.na(ema$mood),]
grid.arrange(ggplot(df,aes(x=mood.VE)) + geom_histogram(stat="count") + 
               ggtitle("mood in the sample") + ylim(0,4500) +
               geom_text(aes(label = scales::percent((..count..)/sum(..count..))), stat="count", vjust = -0.25),
             ggplot(df,aes(mood.VE,fill=sex)) + geom_histogram(stat="count",position=position_dodge(0.5)),
             ggplot(df,aes(mood.VE,fill=insomnia)) + geom_histogram(stat="count",position=position_dodge(0.5)),
             ggplot(df,aes(mood.VE,fill=insomnia.group)) +  geom_histogram(stat="count",position=position_dodge(0.5)),nrow=2)

# summarizing mood.VE
cat(nrow(df[df$mood.VE==1,]),"cases with very or extreme mood,",
    round(100*nrow(df[df$mood.VE==1,])/nrow(df),2),"%\n",
    nrow(df[df$insomnia==0 & df$mood.VE==1,]),"cases with very or extreme mood controls ratings,",
    round(100*nrow(df[df$insomnia==0 & df$mood.VE==1,])/nrow(df[df$insomnia==1,]),2),"% of controls ratings\n",
    nrow(df[df$insomnia==1 & df$mood.VE==1,]),"cases with very or extreme mood insomnia ratings,",
    round(100*nrow(df[df$insomnia==1 & df$mood.VE==1,])/nrow(df[df$insomnia==1,]),2),"% of insomnia ratings")
## 713 cases with very or extreme mood, 14.46 %
##  342 cases with very or extreme mood controls ratings, 13.49 % of controls ratings
##  371 cases with very or extreme mood insomnia ratings, 14.63 % of insomnia ratings


Comments:

  • in the 14.46% of the days adolescents reported being ‘somehow bad’ or ‘bad’ mood, with slightly higher percentages in insomnia (14.63%) than in controls (13.49%)


4. Data export

Finally, we export the recoded ema and demos datasets to be used in the main analyses. Click here for Part 2 (Characterization of sleep and diary ratings) and here for Part 3 (Relationships between sleep and diary ratings).

save(ema,file="emaFINAL.RData")
save(demos,file="demosFINAL.RData")


References

  • Cranford, J. A., Shrout, P. E., Iida, M., Rafaeli, E., Yip, T., & Bolger, N. (2006). A Procedure for Evaluating Sensitivity to Within-Person Change: Can Mood Measures in Diary Studies Detect Change Reliably? Personality and Social Psychology Bulletin, 32(7), 917–929. https://doi.org/10.1177/0146167206287721

  • Geldhof, G. J., Preacher, K. J., & Zyphur, M. J. (2014). Reliability estimation in a multilevel confirmatory factor analysis framework. Psychological Methods, 19(1), 72–91. https://doi.org/10.1037/a0032138

  • Green, J. A. (2021). Too many zeros and/or highly skewed? A tutorial on modelling health behaviour as count data with Poisson and negative binomial regression. Health Psychology and Behavioral Medicine, 9(1), 436-455. https://doi.org/10.1080/21642850.2021.1920416

  • Hox, J. J. (2010). Multilevel Analysis: Techniques and Applications (2nd ed.). Routledge.

  • Jak, S., & Jorgensen, T. D. (2017). Relating Measurement Invariance, Cross-Level Invariance, and Multilevel Reliability. Frontiers in Psychology, 8(OCT), 1–9. https://doi.org/10.3389/fpsyg.2017.01640

  • Menghini, L., Yuksel, D., Goldstone, A., Baker, F. C., & de Zambotti, M. (2021). Performance of Fitbit Charge 3 against polysomnography in measuring sleep in adolescent boys and girls. Chronobiology International, 38(7):1010-1022. https://doi.org/10.1080/07420528.2021.1903481

  • Ng, V. K., & Cribbie, R. A. (2017). Using the gamma generalized linear model for modeling continuous, skewed and heteroscedastic outcomes in psychology. Current Psychology, 36(2), 225-235. https://doi.org/10.1007/s12144-015-9404-0


R packages

Auguie, Baptiste. 2017. gridExtra: Miscellaneous Functions for "Grid" Graphics. https://CRAN.R-project.org/package=gridExtra.
Barton, Kamil. 2020. MuMIn: Multi-Model Inference. https://CRAN.R-project.org/package=MuMIn.
Bates, Douglas, Martin Maechler, Ben Bolker, and Steven Walker. 2020. Lme4: Linear Mixed-Effects Models Using Eigen and S4. https://github.com/lme4/lme4/.
Bates, Douglas, Martin Mächler, Ben Bolker, and Steve Walker. 2015. “Fitting Linear Mixed-Effects Models Using lme4.” Journal of Statistical Software 67 (1): 1–48. https://doi.org/10.18637/jss.v067.i01.
Grolemund, Garrett, and Hadley Wickham. 2011. “Dates and Times Made Easy with lubridate.” Journal of Statistical Software 40 (3): 1–25. https://www.jstatsoft.org/v40/i03/.
Harrell, Frank E, Jr. 2020. Hmisc: Harrell Miscellaneous. https://CRAN.R-project.org/package=Hmisc.
Hope, Ryan M. 2013. Rmisc: Ryan Miscellaneous. https://CRAN.R-project.org/package=Rmisc.
R Core Team. 2020. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.
Rosseel, Yves. 2012. lavaan: An R Package for Structural Equation Modeling.” Journal of Statistical Software 48 (2): 1–36. http://www.jstatsoft.org/v48/i02/.
Rosseel, Yves, Terrence D. Jorgensen, and Nicholas Rockwood. 2020. Lavaan: Latent Variable Analysis. http://lavaan.org.
Spinu, Vitalie, Garrett Grolemund, and Hadley Wickham. 2021. Lubridate: Make Dealing with Dates a Little Easier. https://CRAN.R-project.org/package=lubridate.
Wickham, Hadley. 2007. “Reshaping Data with the reshape Package.” Journal of Statistical Software 21 (12): 1–20. http://www.jstatsoft.org/v21/i12/.
———. 2016. Ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. https://ggplot2.tidyverse.org.
———. 2020. Reshape2: Flexibly Reshape Data: A Reboot of the Reshape Package. https://github.com/hadley/reshape.
Wickham, Hadley, Winston Chang, Lionel Henry, Thomas Lin Pedersen, Kohske Takahashi, Claus Wilke, Kara Woo, Hiroaki Yutani, and Dewey Dunnington. 2021. Ggplot2: Create Elegant Data Visualisations Using the Grammar of Graphics. https://CRAN.R-project.org/package=ggplot2.
Wickham, Hadley, Romain François, Lionel Henry, and Kirill Müller. 2021. Dplyr: A Grammar of Data Manipulation. https://CRAN.R-project.org/package=dplyr.