The present document includes the analytical steps implemented to characterize and explore the antecedents of sleep patterns in a sample of older adolescents with (N = 47; 31 girls) and without (N = 46; 28 girls) DSM-5 insomnia.
The data pre-processing pipeline is depicted at this page.
Here, we prepare the data for the main analyses, we evaluate the psychometric proprieties of the aggregate self-report scale, and we summarize the main variables collected for the present study. Then, the processed datasets with aggregate measures are exported to be used for fitting a series of linear regression models, depicted at this page (Characterization of sleep and diary ratings) and at this page (Relationships between sleep and diary ratings).
The following R packages are used in this document (see References section):
# required packages
packages <- c("ggplot2","gridExtra","lubridate","reshape2","dplyr","lavaan","MuMIn","lme4","tcltk","Rmisc","Hmisc")
# generate packages references
knitr::write_bib(c(.packages(), packages),"packagesAn.bib")
# # run to install missing packages
# xfun::pkg_attach2(packages, message = FALSE); rm(list=ls())
Here, we load the long-form ema
and the wide-form demos
dataset resulting from the processing pipeline.
# removing all objets from the workspace
rm(list=ls())
# setting system time zone to GMT (for consistent temporal synchronization)
Sys.setenv(tz="GMT")
# loading pre-processed data
load("DATA/datasets/ema_finalClean.RData")
load("DATA/datasets/demos_clean.RData")
# used R packages
library(ggplot2); library(gridExtra); library(lubridate); library(reshape2)
Here, we recode some of the variables to be used in the analyses below:
we operationalize time by computing dayNr
and partdayNr
indexing, respectively, the No. of days since the first day of assessment, and the number of the current day of participation (i.e., without considering missing days)
we compute light.p
, deep.p
, and rem.p
quantifying the percentage of time in each sleep stage over TST
we compute the SO.num
and WakeUp.num
representing the numeric equivalence of SO
and WakeUp
(currently in “mm-dd-yyyy hh:mm:ss” format) to indicate the distance (hours) from midnight
we compute the TotalSteps1000
variable, quantifying the daily No. of TotalSteps
in thousands (i.e., 1 TotalSteps1000
= 1,000 TotalSteps
) to make values more comparable across variables
we create the weekday
variable, indicating whether the observation was collected during weekdays or weekends
we create the covid19
variable to mark cases observed after the COVID-19 emergency outbreak
Here, we operationalize time by computing the variable dayNr
, indexing the No. of days since the first day of assessment, and partdayNr
, indexing the number of the current day of participation (i.e., without considering missing days).
# sorting by ID and ActivityDate
ema <- ema[order(ema$ID,ema$ActivityDate,ema$StartTime),]
# computing dayNr and partdayNr
library(dplyr)
ema <- ema %>% group_by(ID) %>%
mutate(dayNr=round(as.numeric(difftime(ActivityDate,min(ActivityDate),units="days")))+1)
ema <- as.data.frame(ema)
detach("package:dplyr", unload=TRUE)
ema <- plyr::ddply(ema,c("ID"),transform,partdayNr=seq_along(ID))
# sorting dataset
AD <- which(colnames(ema)=="ActivityDate")
ema <- ema[,c(1:AD,ncol(ema)-1,ncol(ema),(AD+1):(ncol(ema)-2))]
Here, we recode the variables light
, deep
and REM
as percentage of of time in each sleep stage over TST
.
# sanity check (no values > TST)
cat("Sanity check:",nrow(ema[!is.na(ema$light) & (ema$light>ema$TST | ema$deep>ema$TST | ema$rem>ema$TST),])==0)
## Sanity check: TRUE
# sanity check (sum: no values > TST)
ema$totalStages <- apply(ema[,c("light","deep","rem")],1,sum)
cat("Sanity check:",nrow(ema[!is.na(ema$light) & ema$TST!=ema$totalStages,])==0)
## Sanity check: TRUE
# computing percentages
ema$light.p <- 100*ema$light/ema$TST
ema$deep.p <- 100*ema$deep/ema$TST
ema$rem.p <- 100*ema$rem/ema$TST
# sorting dataset columns
AD <- which(colnames(ema)=="rem")
ema <- ema[,c(1:AD,(ncol(ema)-2):ncol(ema),(AD+1):(ncol(ema)-3))]
ema$totalStages <- NULL
# plotting
grid.arrange(ggplot(ema,aes(light))+geom_histogram(),ggplot(ema,aes(light.p))+geom_histogram(),
ggplot(ema,aes(deep))+geom_histogram(),ggplot(ema,aes(deep.p))+geom_histogram(),
ggplot(ema,aes(rem))+geom_histogram(),ggplot(ema,aes(rem.p))+geom_histogram(),nrow=3)
Here, we recode the variables SO
and WakeUp
as numeric, indicating their relative distance in hours from midnight. Positive values will reflect hours after midnight, whereas negative values will reflect hours before midnight.
# isolating hour from date
ema$SO.num <- as.POSIXct(paste(hour(ema$SO),minute(ema$SO)),format="%H %M")
ema$WakeUp.num <- as.POSIXct(paste(hour(ema$WakeUp),minute(ema$WakeUp)),format="%H %M")
# sorting dataset columns
AD <- which(colnames(ema)=="midSleep")
ema <- ema[,c(1:AD,(ncol(ema)-1):ncol(ema),(AD+1):(ncol(ema)-2))]
# computing distance from midnight
ema$SO.num <- as.numeric(difftime(ema$SO.num,
as.POSIXct(paste(substr(Sys.time(),1,10),"00:00:00"),tz="GMT"),units="hours"))
ema$WakeUp.num <- as.numeric(difftime(ema$WakeUp.num,
as.POSIXct(paste(substr(Sys.time(),1,10),"00:00:00"),tz="GMT"),units="hours"))
# subtracting 24h to cases with values before midnight (i.e., so that positive = after, and negative = before)
ema[!is.na(ema$SO.num) & ema$SO.num>15,"SO.num"] <-
ema[!is.na(ema$SO.num) & ema$SO.num>15,"SO.num"] - 24
ema[!is.na(ema$WakeUp.num) & ema$WakeUp.num>15,"WakeUp.num"] <-
ema[!is.na(ema$WakeUp.num) & ema$WakeUp.num>15,"WakeUp.num"] - 24
# plotting
grid.arrange(ggplot(ema,aes(SO.num)) + geom_histogram() + ggtitle("Sleep Onset (hours from 00:00)") + xlim(c(-6,15)),
ggplot(ema,aes(WakeUp.num)) + geom_histogram() + ggtitle("Wake-up time (hours from 00:00)") + xlim(c(-6,15)))
# sanity check
cat("Sanity check:",
nrow(ema[!is.na(ema$TIB) & ((ema$SO.num > max(ema$TIB,na.rm=TRUE) & ema$WakeUp.num > max(ema$TIB,na.rm=TRUE)) |
ema$SO.num < (-6) | ema$SO.num > (6)),])==0)
## Sanity check: TRUE
Comments:
SO
and WakeUp
have been effectively converted as their distance from midnight
no cases have SO.num
or WakeUp.num
higher than 15, or SO.num
lower than -6 or higher than 6
Here, we recode the TotalSteps
variable to be quantified in thousands of steps rather than in single steps.
# computing TotalSteps1000
ema$TotalSteps1000 <- ema$TotalSteps/1000
# sorting dataset columns
AD <- which(colnames(ema)=="TotalSteps")
ema <- ema[,c(1:AD,ncol(ema),(AD+1):(ncol(ema)-1))]
# plotting
grid.arrange(ggplot(ema,aes(TotalSteps))+geom_histogram(),ggplot(ema,aes(TotalSteps1000))+geom_histogram(),nrow=1)
Here, we create the variable diary.nextDay
to mark those cases in which daily diaries were filled on the following day, that is we we mark the cases with dailyDiary
StartedTime
after sleepLog
EndTime
, and those with valid dailyDiary
but invalid sleepLog
data with a StartedTime
later than 6 AM.
# marking cases with StartedTime > EndTime
ema[!is.na(ema$dailyStress) & !is.na(ema$TIB),c("diary.nextDay","diary.nextDay2")] <- 0
ema[!is.na(ema$dailyStress) & !is.na(ema$TIB) & ema$StartedTime>ema$EndTime,"diary.nextDay"] <- 1
cat(nrow(ema[!is.na(ema$diary.nextDay) & ema$diary.nextDay==1,]),"diary responses entered after wake up (",
round(100*nrow(ema[!is.na(ema$diary.nextDay) & ema$diary.nextDay==1,])/nrow(ema[!is.na(ema$StartedTime),]),1),"% )")
## 790 diary responses entered after wake up ( 16 % )
# marking cases with StartedTime > 6 AM and < 8 PM
ema$StartedHour <- as.POSIXct(paste(hour(ema$StartedTime),minute(ema$StartedTime)),format="%H %M",tz="GMT") # isolating time
sixPM <- as.POSIXct(paste(substr(Sys.time(),1,10),"06:00:00"),tz="GMT") # 6 AM
ema[!is.na(ema$StartedHour) & ema$StartedHour>sixPM & ema$StartedHour<sixPM+14*60*60,"diary.nextDay2"] <- 1
ema[!is.na(ema$StartedHour) & (ema$StartedHour<=sixPM | ema$StartedHour>=sixPM+14*60*60),"diary.nextDay2"] <- 0
cat(nrow(ema[!is.na(ema$diary.nextDay2) & ema$diary.nextDay2==1,]),"diary responses entered after 6 AM (",
round(100*nrow(ema[!is.na(ema$diary.nextDay2) & ema$diary.nextDay2==1,])/nrow(ema[!is.na(ema$StartedTime),]),1),
"% ), of which",nrow(ema[!is.na(ema$diary.nextDay2) & ema$diary.nextDay2==1 &
!is.na(ema$diary.nextDay) & ema$diary.nextDay==0,]),"marked with diary.nextDay = 0 (showed below)")
## 998 diary responses entered after 6 AM ( 20.2 % ), of which 36 marked with diary.nextDay = 0 (showed below)
# showing cases with StartedTime > 6 AM and diary.nextDay = 0
ema[!is.na(ema$diary.nextDay2) & ema$diary.nextDay2==1 & !is.na(ema$diary.nextDay) & ema$diary.nextDay==0,
c("ID","ActivityDate","StartTime","EndTime","StartedTime","diary.nextDay")]
# joining diary.nextDay and diary.nextDay2
ema[!is.na(ema$diary.nextDay2) & ema$diary.nextDay2==1,"diary.nextDay"] <- 1
ema[is.na(ema$diary.nextDay) & !is.na(ema$diary.nextDay2) & ema$diary.nextDay2==0,"diary.nextDay"] <- 0
ema$StartedHour <- ema$diary.nextDay2 <- NULL
# sorting columns
AD <- which(colnames(ema)=="surveyDuration")
ema <- ema[,c(1:AD,ncol(ema),(AD+1):(ncol(ema)-1))]
# printing final info
cat(nrow(ema[!is.na(ema$diary.nextDay) & ema$diary.nextDay==1,]),"diary responses entered after wake up or after 6 AM (",
round(100*nrow(ema[!is.na(ema$diary.nextDay) & ema$diary.nextDay==1,])/nrow(ema[!is.na(ema$StartedTime),]),1),"% )")
## 1010 diary responses entered after wake up or after 6 AM ( 20.5 % )
Comments:
in 790 cases (16%), participants filled the diary after sleepLog WakeUp
time
in 998 cases (20.2%), participants filled the diary after 6 AM
overall, in 1,010 cases (25.5%), participants filled the diary on the following day
Here, the ActivityDate
variable is used to discriminate the observations collected in weekdays (Tuesday to Friday) vs weekend days (Saturday and Sunday). Since sleep habits are mainly influenced by the following day than the previous one (i.e., people go to bed late on Friday, and early on Sunday), we create two variables: weekday
(marking Saturday and Sunday as “weekend”), and weekday.sleep
(marking Friday and Saturday as “weekend”).
# extracting weekday from Activity Date
ema$wd <- weekdays(as.Date(ema$ActivityDate))
# creating weekday
ema$weekday <- "weekday"
ema[ema$wd=="sabato" | ema$wd=="domenica","weekday"] <- "weekend"
# creating weekday.sleep
ema$weekday.sleep <- "weekday"
ema[ema$wd=="venerdì" | ema$wd=="sabato","weekday.sleep"] <- "weekend"
ema$wd <- NULL
# sorting dataset
AD <- which(colnames(ema)=="ActivityDate")
ema <- ema[,c(1:AD,ncol(ema)-1,ncol(ema),(AD+1):(ncol(ema)-2))]
# summarizing
ema[,c("weekday","weekday.sleep")] <- lapply(ema[,c("weekday","weekday.sleep")],as.factor)
summary(ema[,c("weekday","weekday.sleep")])
## weekday weekday.sleep
## weekday:4463 weekday:4451
## weekend:1756 weekend:1768
Here, the ActivityDate
variable is used to create the variable covid19
highlighting the cases recorded during the COVID-19 emergency, whose beginning in the San Francisco Bay Area was identified around March 20th, 2020.
# creating covid19 variable
ema$covid19 <- "pre-COVID19"
ema[ema$ActivityDate>=as.Date("2020-03-20"),"covid19"] <- "post-COVID19"
ema$covid19 <- factor(ema$covid19,levels=c("pre-COVID19","post-COVID19"))
# sorting dataset columns
ema <- ema[,c(1:AD,ncol(ema),(AD+1):(ncol(ema)-1))]
# plotting
ggplot(ema,aes(ActivityDate)) + geom_histogram() + geom_vline(xintercept=as.Date("2020-03-20"),color="red")
# summarizing cases
summary(ema$covid19) # 2,317 post-COVID19 (37%)
## pre-COVID19 post-COVID19
## 3902 2317
# No. participants with both pre-COVID19 and post-COVID19 observations
pre <- levels(as.factor(as.character(ema[ema$covid19=="pre-COVID19","ID"])))
post <- levels(as.factor(as.character(ema[ema$covid19=="post-COVID19","ID"])))
cat("-",length(pre[!(pre%in%post)]),"participants with only pre-COVID19 observations",
"\n-",length(post[!(post%in%pre)]),"participants with only post-COVID19 observations",
"\n-",length(post[post%in%pre]),"participants with both pre- and post-COVID19 observations")
## - 54 participants with only pre-COVID19 observations
## - 32 participants with only post-COVID19 observations
## - 7 participants with both pre- and post-COVID19 observations
Comments:
the 37% of the data was collected after the COVID-19 outbreak
nevertheless, only seven participants have data collected both before and after the COVID-19 outbreak, with most participants having ended their data collection period before March 20th, 2020, and a smaller group (N = 32) having started four-to-five months later
consequently, any comparison will be mainly a comparison between subjects
Here, we describe each variable included in the ema
dataset.
# removing unuseful variables
toRemove <- c("LogId","SOL","nAwake","fragIndex","midSleep",paste("meanHR",1:3,sep=""),paste("sleepHR",1:3,sep=""),
paste("HRtimeCoeff",1:3,sep=""),paste("stageHR",c("TST","SOL","WASO"),sep="_"),"majMiss","minMiss")
ema[,toRemove] <- NULL
# changing diary items names
colnames(ema)[which(colnames(ema)=="dailyStress")] <- "stress"
colnames(ema)[which(colnames(ema)=="eveningMood")] <- "mood"
colnames(ema)[which(colnames(ema)=="eveningWorry")] <- "worry"
# changing HR variable names
colnames(ema)[which(colnames(ema)=="stageHR_NREM")] <- "HR_NREM"
colnames(ema)[which(colnames(ema)=="stageHR_REM")] <- "HR_REM"
# showing variables
str(ema)
## 'data.frame': 6219 obs. of 54 variables:
## $ ID : Factor w/ 93 levels "s001","s002",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ sex : Factor w/ 2 levels "F","M": 1 1 1 1 1 1 1 1 1 1 ...
## $ age : num 19.3 19.3 19.3 19.3 19.3 ...
## $ BMI : num 20.6 20.6 20.6 20.6 20.6 ...
## $ insomnia : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
## $ insomnia.group: Factor w/ 3 levels "control","DSM.ins",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ ActivityDate : Date, format: "2019-01-07" "2019-01-08" ...
## $ covid19 : Factor w/ 2 levels "pre-COVID19",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ weekday : Factor w/ 2 levels "weekday","weekend": 1 1 1 1 1 2 2 1 1 1 ...
## $ weekday.sleep : Factor w/ 2 levels "weekday","weekend": 1 1 1 1 2 2 1 1 1 1 ...
## $ dayNr : num 1 2 3 4 5 6 7 8 9 10 ...
## $ partdayNr : int 1 2 3 4 5 6 7 8 9 10 ...
## $ StartTime : POSIXct, format: "2019-01-07 22:06:30" "2019-01-08 21:59:30" ...
## $ EndTime : POSIXct, format: "2019-01-08 08:07:30" "2019-01-09 07:45:00" ...
## $ SleepDataType : Factor w/ 2 levels "classic","stages": 2 2 2 NA NA NA NA NA NA NA ...
## $ EBEDataType : chr "stages" "stages" "stages" NA ...
## $ TIB : num 601 586 538 NA NA ...
## $ TST : num 507 502 470 NA NA ...
## $ SE : num 84.4 85.8 87.3 NA NA ...
## $ SO : POSIXct, format: "2019-01-07 22:06:30" "2019-01-08 22:00:00" ...
## $ WakeUp : POSIXct, format: "2019-01-08 08:07:30" "2019-01-09 07:45:00" ...
## $ SO.num : num -1.9 -2 -1.22 NA NA ...
## $ WakeUp.num : num 8.12 7.75 7.72 NA NA ...
## $ WASO : num 94 82.5 66 NA NA NA NA NA NA NA ...
## $ light : num 353 297 283 NA NA NA NA NA NA NA ...
## $ deep : num 66 91 97.5 NA NA NA NA NA NA NA ...
## $ rem : num 88 114 89 NA NA ...
## $ light.p : num 69.6 59.1 60.3 NA NA ...
## $ deep.p : num 13 18.1 20.8 NA NA ...
## $ rem.p : num 17.4 22.8 19 NA NA ...
## $ HR_NREM : num 63.3 63.4 61.9 NA NA ...
## $ HR_REM : num 63.6 63.7 61.5 NA NA ...
## $ TotalSteps : int NA 16343 14904 7898 NA NA NA NA NA 6718 ...
## $ TotalSteps1000: num NA 16.3 14.9 7.9 NA ...
## $ StartedTime : POSIXct, format: "2019-01-07 21:00:00" "2019-01-08 21:07:00" ...
## $ SubmittedTime : POSIXct, format: "2019-01-07 21:01:00" "2019-01-08 21:07:00" ...
## $ surveyDuration: num 1 0 0 0 NA 0 1 0 0 0 ...
## $ diary.nextDay : num 0 0 0 0 NA 0 1 0 0 0 ...
## $ stress : num 3 1 3 2 NA 3 1 2 1 2 ...
## $ worry : num 2 2 3 2 NA 1 1 1 2 1 ...
## $ mood : num 4 4 4 5 NA 5 5 5 4 5 ...
## $ stress_school : Factor w/ 2 levels "0","1": 2 1 2 2 NA 1 1 2 1 2 ...
## $ stress_family : Factor w/ 2 levels "0","1": 2 1 1 1 NA 1 1 1 1 1 ...
## $ stress_health : Factor w/ 2 levels "0","1": NA NA NA NA NA NA NA NA NA NA ...
## $ stress_COVID : Factor w/ 2 levels "0","1": NA NA NA NA NA NA NA NA NA NA ...
## $ stress_peers : Factor w/ 2 levels "0","1": 1 1 1 1 NA 1 1 1 1 1 ...
## $ stress_other : Factor w/ 2 levels "0","1": 1 1 1 1 NA 2 1 1 1 1 ...
## $ worry_school : Factor w/ 2 levels "0","1": 2 1 2 2 NA 1 1 1 2 1 ...
## $ worry_family : Factor w/ 2 levels "0","1": 1 1 1 1 NA 1 1 1 1 1 ...
## $ worry_health : Factor w/ 2 levels "0","1": NA NA NA NA NA NA NA NA NA NA ...
## $ worry_peer : Factor w/ 2 levels "0","1": 1 1 1 1 NA 1 1 1 1 1 ...
## $ worry_COVID : Factor w/ 2 levels "0","1": NA NA NA NA NA NA NA NA NA NA ...
## $ worry_sleep : Factor w/ 2 levels "0","1": NA NA NA NA NA NA NA NA NA NA ...
## $ worry_other : Factor w/ 2 levels "0","1": 1 2 1 2 NA 1 1 1 1 1 ...
INDIVIDUAL-LEVEL VARIABLES
Demographics:
ID
= participants’ identification code
sex
= participants’ sex (“F” = female, “M” = male)
age
= participants’ age (years)
BMI
= participants’ BMI (kg*m^(-2))
insomnia
= participants’ group (1 = insomnia, 0 = control)
insomnia.group
= participants’ insomnia group (“control” = control, “DSM.ins” = DSM insomnia, “sub.ins” = insomnia subthreshold)
DAY-LEVEL VARIABLES
ActivityDate
= day of assessment (date in “mm-dd-yyyy” format)
covid19
= factor indicating whether the observation was collected “pre-COVID19” or “post-COVID19”
weekday
= factor indicating whther the observation was collected during a “weekday” (Mon-Fry) or “weekend” (Sat-Sun)
weekday.sleep
= factor indicating whther the observation was collected during a night preceding a “weekday” (Sun-Thu) or “weekend” (Fry-Sat)
dayNr
= No. of days since the first day of assessment, within participant
partdayNr
= current ‘valid’ day of assessment
sleepLog
StartTime
= sleep period start hour (in “mm-dd-yyyy hh:mm:ss” format)
EndTime
= sleep period end hour (in “mm-dd-yyyy hh:mm:ss” format)
SleepDataType
: sleep data type originally scored by Fitabase
EBEDataType
: type of EBE data used for manually recomputing sleep measures (i.e., updated by excluding the last wake epochs)
TIB
= time in bed (min) computed as the number of minutes between StartTime
(i.e., considering missing epochs at the beginning as wake epochs) and the last epoch included in EBE data (i.e., excluding the last wake epochs)
TST
= total sleep time (min)
SE
= sleep efficiency as the percent of TST
over TIB
(%)
SO
= sleep onset hour (in “mm-dd-yyyy hh:mm:ss” format) corresponding to the time of the first epoch classified as sleep
WakeUp
= wake-up time (in “mm-dd-yyyy hh:mm:ss” format) corresponding to the time of the last epoch classified as sleep
SO.num
= SO
expressed as the No. of hours from midnight
WakeUp.num
= WakeUp
expressed as the No. of hours from midnight
WASO
= wake after sleep onset (min)
light
= No. of minutes classified as ‘light’ sleep
deep
= No. of minutes classified as ‘deep’ sleep
rem
= No. of minutes classified as REM sleep
light.p
= Percentage of light sleep over TST
deep.p
= Percentage of deep sleep over TST
rem.p
= Percentage of REM sleep over TST
sleepHR
HR_NREM
= mean HR values (bpm) computed from all epochs categorized as NREM sleep
HR_REM
= mean HR values (bpm) computed from all epochs categorized as REM sleep
dailyAct
TotalSteps
= sum of the No. of steps recoded in each day (recomputed from hourly steps data)
TotalSteps1000
= TotalSteps
expressed in thousands of steps
dailyDiary
StartedTime
= initiation hour of the diary form (in “mm-dd-yyyy hh:mm:ss” format)
SubmittedTime
= submission hour of the diary form (in “mm-dd-yyyy hh:mm:ss” format)
surveyDuration
= duration of the survey (min)
diary.nextDay
= factor indicating whether diary ratings were entered on the following day (1) or not (0)
stress
= score at the daily stress item (1-5)
worry
= score at the evening worry item (1-5)
mood
= score at the evening Mood item (1-5)
stress_school, ..., stress_other
= stressor categories (0 or 1)
worry_school, ..., worry_other
= sources of worry (0 or 1)
Here, we evaluate the psychometric proprieties of the psychological distress scale resulting from the three Likert-type items submitted each evening: stress
, mood
, and worry
.
First, we load the required packages, and we negatively recode mood
so that higher scores reflect negative mood, coherently with the other items.
# loading packages
library(lavaan); library(MuMIn)
# negatively recoding evening mood (higher values = negative mood)
ema$mood <- 6 - ema$mood
Then, we use the corr.matrices
function to visualize the correlation matrix of item scores at the overall, between- and within-subject level, and the itemsICC
function to compute items intraclass correlation coefficients (ICC) for each item. Note that only cases with complete responses to each item ware kept in the dataset.
corr.matrices <- function(data=data,IDvar="ID",items=c("stress","worry","mood")){
require(ggplot2); require(dplyr); require(reshape2)
data <- na.omit(data[,c(IDvar,items)])
data <- plyr::ddply(data,"ID",transform,day=seq_along(ID))
data <- data %>%
group_by(ID) %>% # grouping by ID
mutate(stress.cm = mean(na.omit(stress)), # computing individual mean for any given item
worry.cm = mean(na.omit(worry)),mood.cm = mean(na.omit(mood)),
stress.dm = stress-stress.cm, # computing within-individual deviations (occasional score - mean score)
worry.dm = worry-worry.cm, mood.dm = mood-mood.cm)
# Matrix 1 (all scores as independent)
c1 <- melt(cor(data[,which(colnames(data)=="stress"):which(colnames(data)=="mood")],
data[,which(colnames(data)=="stress"):which(colnames(data)=="mood")],
use="complete.obs",method="pearson"))
p1 <- ggplot(c1,aes(x=Var1, y=Var2, fill=value)) +
geom_tile() +
geom_text(aes(x = Var1, y = Var2, label = round(value,2)),color="black",size=5)+
ggtitle(paste("Correlation Matrix 1 (all independent); N =",nrow(data)))+labs(x="",y="")+
scale_fill_gradient2(low="darkblue",high="#f03b20",
mid="white",midpoint=0,
limit = c(-1,1), space = "Lab",
name="Pearson\nCorrelation",
guide="legend",
breaks=round(seq(1,-1,length.out = 11),2),
minor_breaks=round(seq(1,-1,length.out = 11),2)) + theme_minimal() +
theme(text=element_text(face="bold",size=12))
# Matrix 2 (individual means)
DATA <- data[data$day==1,]
c2 <- melt(cor(DATA[,which(colnames(DATA)=="stress.cm"):which(colnames(DATA)=="mood.cm")],
DATA[,which(colnames(DATA)=="stress.cm"):which(colnames(DATA)=="mood.cm")],
use="complete.obs",method="pearson"))
p2 <- ggplot(c2,aes(x=Var1, y=Var2, fill=value)) + geom_tile() +
geom_text(aes(x = Var1, y = Var2, label = round(value,2)),color="black",size=5)+
ggtitle(paste("Correlation Matrix 2 (individual means); N =",nrow(DATA)))+labs(x="",y="")+
scale_fill_gradient2(low="darkblue",high="#f03b20",
mid="white",midpoint=0,
limit = c(-1,1), space = "Lab",
name="Pearson\nCorrelation",
guide="legend",
breaks=round(seq(1,-1,length.out = 11),2),
minor_breaks=round(seq(1,-1,length.out = 11),2)) + theme_minimal() +
theme(text=element_text(face="bold",size=12))
# Matrix 3 (deviations from individual means)
c3 <- melt(cor(data[,which(colnames(data)=="stress.dm"):which(colnames(data)=="mood.dm")],
data[,which(colnames(data)=="stress.dm"):which(colnames(data)=="mood.dm")],
use="complete.obs",method="pearson"))
p3 <- ggplot(c3,aes(x=Var1, y=Var2, fill=value)) +
geom_tile() +
geom_text(aes(x = Var1, y = Var2, label = round(value,2)),color="black",size=5)+
ggtitle(paste("Correlation Matrix 3 (mean-centered scores); N =",nrow(data)))+labs(x="",y="")+
scale_fill_gradient2(low="darkblue",high="#f03b20",
mid="white",midpoint=0,
limit = c(-1,1), space = "Lab",
name="Pearson\nCorrelation",
guide="legend",
breaks=round(seq(1,-1,length.out = 11),2),
minor_breaks=round(seq(1,-1,length.out = 11),2)) + theme_minimal() +
theme(text=element_text(face="bold",size=12))
return(list(p1,p2,p3))}
itemsICC <- function(data,items,IDvar,output="text",digits=2){ require(lme4)
data <- na.omit(data[,c(IDvar,items)])
res <- data.frame(item=NA,icc=NA)
for(i in 1:length(items)){
m <- lmer(formula=gsub("ID",IDvar,gsub("d1",items[i],"d1~(1|ID)")),data=data) # VAR_between / (VAR_between + VAR_within)
out <- round(as.data.frame(lme4::VarCorr(m))[1,4]/(as.data.frame(lme4::VarCorr(m))[1,4]+as.data.frame(lme4::VarCorr(m))[2,4]),
digits)
# textual output or data.frame
if(output=="text"){cat(items[i],"ICC =",out,"\n")
}else{ res <- rbind(res,cbind(item=items[i],icc=out)) }}
if(output!="text"){return(res) }}
# plotting item scores distributions
grid.arrange(ggplot(ema,aes(stress)) + geom_histogram(),
ggplot(ema,aes(worry)) + geom_histogram(),
ggplot(ema,aes(mood)) + geom_histogram(),nrow=1)
# correlation matrix considering all observations as independent
corr.matrices(data=ema,IDvar="ID",items=c("stress","worry","mood"))[[1]]
# correlation matrix considering mean scores
corr.matrices(data=ema,IDvar="ID",items=c("stress","worry","mood"))[[2]]
# correlation matrix considering person mean-centered scores
corr.matrices(data=ema,IDvar="ID",items=c("stress","worry","mood"))[[3]]
# computing item ICCs
itemsICC(data=ema,IDvar="ID",items=c("stress","worry","mood"))
## stress ICC = 0.26
## worry ICC = 0.33
## mood ICC = 0.25
Comments
item scores are quite negatively skewed, especially in the case of stress
and worry
, implying some deviation from the assumption of the MCFA model below
item scores show a pattern of positive medium-to-strong correlations, indicating an internal coherence of the scale at both levels, with level-2 (between) correlations being higher than level-1 (within) correlations, as expected
at level 2 (between), we also observe a strong correlation between stress
and worry
, in contrast to the moderate correlations shown between mood
and both stress
and worry
items ICCs show similar values, indicating that most of the variance (from 67 to 75%) is at the within-individual level
Then, we perform a Multilevel Confirmatory Factor Analysis (MCFA) by specifying a measurement model that assumes a configural cluster construct (with the same mono-factor structure at both levels). Following Jak & Jorgensen (2017), we also test weak and strong cross-level isomorphism. The fit.ind
function from https://github.com/Luca-Menghini/LuMenPsy.Rfunctions is used to optimize model comparison.
fit.ind <- function(model=NA,from_summary=FALSE,type="multilevel",models.names=NA,
fits=c("npar","chisq","df","pvalue","rmsea","cfi","srmr_within","srmr_between")){ require(lavaan); require(MuMIn)
# removing level-specific fit indices when model is "monolevel"
if(type=="monolevel"){
fits <- gsub("srmr_within","srmr",fits)
fits <- fits[fits!="srmr_between"] }
if(from_summary==FALSE){
# returning dataframe of models fit indices when more than one model is considered
if(length(model)>1){
fit.indices <- fitmeasures(model[[1]])[fits]
for(i in 2:length(model)){
fit.indices <- rbind(fit.indices,fitmeasures(model[[i]])[fits]) }
if(!is.na(models.names[1])){
row.names(fit.indices) <- models.names }
return(as.data.frame(fit.indices))
} else { return(fitmeasures(model)[fits]) }
} else { # in some cases the fit indices are available only from the model's summary
quiet <- function(fit) { # this was written by Alicia FRANCO MARTÍNEZ on the lavaan Google group
sink(tempfile())
on.exit(sink())
invisible(summary(fit, standardized = TRUE, fit.measures=TRUE))
}
sum <- quiet(model)
fit.indices <- sum$FIT[fits]
return(fit.indices)}}
First, we specify all models of interest.
We start with the preliminary steps suggested by Hox (2010, chapter 14) to evaluate the factor structure at level 2: the following benchmark models imply (1) no specification at level 2 (Null model), (2) only variances but not covariances at level 2 (Independence model), and (3) a full covariance matrix at level 2 (Saturated model).
m.null <- 'level: 1
Distress_w =~ stress + worry + mood
level: 2
stress ~~ 0*stress
worry ~~ 0*worry
mood ~~ 0*mood'
m.ind <- 'level: 1
Distress_w =~ stress + worry + mood
level: 2
stress ~~ stress
worry ~~ worry
mood ~~ mood'
m.sat <- 'level: 1
Distress_w =~ stress + worry + mood
level: 2
stress ~~ stress + worry + mood
worry ~~ worry + mood
mood ~~ mood'
Then, we specify the target models: the configural model (i.e., same structure at level 1 and 2), the weak invariance model (i.e., equivalent factor loading at level 1 and level 2), and the strong invariance model (i.e., equivalent factor loadings, and no residual variance at level 2).
m.conf <- 'level: 1
Distress_w =~ stress + worry + mood
level: 2
Distress_b =~ stress + worry + mood'
m.weak <- 'level: 1
Distress_w =~ a*stress + b*worry + c*mood
level: 2
Distress_b =~ a*stress + b*worry + c*mood'
m.strong <- 'level: 1
Distress_w =~ a*stress + b*worry + c*mood
level: 2
Distress_b =~ a*stress + b*worry + c*mood
stress ~~ 0*stress
worry ~~ 0*worry
mood ~~ 0*mood'
Here, we fit the specified models to the observed data.
We start with the benchmark models.
# selecting only observations with complete dailyDiary responses
dailyDiary <- na.omit(ema[,c("ID","stress","worry","mood")])
# fitting models
fit.null <- cfa(model=m.null,data=dailyDiary,cluster="ID",std.lv=TRUE)
## Warning in lav_data_full(data = data, group = group, cluster = cluster, : lavaan WARNING:
## Level-1 variable "stress" has no variance within some clusters.
## The cluster ids with zero within variance are: s050
## Warning in lav_data_full(data = data, group = group, cluster = cluster, : lavaan WARNING:
## Level-1 variable "worry" has no variance within some clusters. The
## cluster ids with zero within variance are: s050
fit.ind <- cfa(model=m.ind,data=dailyDiary,cluster="ID",std.lv=TRUE)
## Warning in lav_data_full(data = data, group = group, cluster = cluster, : lavaan WARNING:
## Level-1 variable "stress" has no variance within some clusters.
## The cluster ids with zero within variance are: s050
## Warning in lav_data_full(data = data, group = group, cluster = cluster, : lavaan WARNING:
## Level-1 variable "worry" has no variance within some clusters. The
## cluster ids with zero within variance are: s050
fit.sat <- cfa(model=m.sat,data=dailyDiary,cluster="ID",std.lv=TRUE)
## Warning in lav_data_full(data = data, group = group, cluster = cluster, : lavaan WARNING:
## Level-1 variable "stress" has no variance within some clusters.
## The cluster ids with zero within variance are: s050
## Warning in lav_data_full(data = data, group = group, cluster = cluster, : lavaan WARNING:
## Level-1 variable "worry" has no variance within some clusters. The
## cluster ids with zero within variance are: s050
Then, we fit the target models.
# fitting models
fit.conf <- cfa(model=m.conf,data=dailyDiary,cluster="ID",std.lv=TRUE) # Heywood case
## Warning in lav_data_full(data = data, group = group, cluster = cluster, : lavaan WARNING:
## Level-1 variable "stress" has no variance within some clusters.
## The cluster ids with zero within variance are: s050
## Warning in lav_data_full(data = data, group = group, cluster = cluster, : lavaan WARNING:
## Level-1 variable "worry" has no variance within some clusters. The
## cluster ids with zero within variance are: s050
## Warning in lav_object_post_check(object): lavaan WARNING: some estimated ov
## variances are negative
fit.weak <- cfa(model=m.weak,data=dailyDiary,cluster="ID",std.lv=TRUE)
## Warning in lav_data_full(data = data, group = group, cluster = cluster, : lavaan WARNING:
## Level-1 variable "stress" has no variance within some clusters.
## The cluster ids with zero within variance are: s050
## Warning in lav_data_full(data = data, group = group, cluster = cluster, : lavaan WARNING:
## Level-1 variable "worry" has no variance within some clusters. The
## cluster ids with zero within variance are: s050
fit.strong <- cfa(model=m.strong,data=dailyDiary,cluster="ID",std.lv=TRUE)
## Warning in lav_data_full(data = data, group = group, cluster = cluster, : lavaan WARNING:
## Level-1 variable "stress" has no variance within some clusters.
## The cluster ids with zero within variance are: s050
## Warning in lav_data_full(data = data, group = group, cluster = cluster, : lavaan WARNING:
## Level-1 variable "worry" has no variance within some clusters. The
## cluster ids with zero within variance are: s050
Comments:
participant s050 has no intraindividual varianec for items stress and worry (but we keep it for now)
an Heywood case (i.e., estimated negative variance for item worry) is observed for the configural model
Here, we can see that the Heywood case does not seem to be caused by structural misspecification (the upper CI for that parameter is positive), but most probably to sample fluctuations around a parameter close to zero (see Kolenikov & Bollen, 2012).
# structural misspecification? But upper CI is > 0
parameterestimates(fit.conf)[parameterestimates(fit.conf)$op=="~~" & parameterestimates(fit.conf)$ci.lower<0,
c("lhs","level","est","se","ci.lower","ci.upper")]
To solve the problem, we first try to fix the residual variance of item worry
to the 15% of its estimated variance at level 2 (i.e., var = .15 x rho2_between) (see Joreskog & Sobrom 1996).
# estimating lv-2 variance for worry item
library(lme4)
fit <- lmer(worry ~ 1 + (1|ID),data=dailyDiary) # null LMER model
varlv2 <- as.data.frame(lme4::VarCorr(fit))[1,4] # between-subjects variance of item t2
# re-fitting the configural model with the new variance constraint
fit.conf.fix <- cfa(model=gsub("rho2",varlv2*0.15,
"level: 1
Distress_w =~ stress + worry + mood
level: 2
Distress_b =~ stress + worry + mood
worry ~~ rho2*worry"),data=dailyDiary,cluster='ID',std.lv=TRUE)
## Warning in lav_data_full(data = data, group = group, cluster = cluster, : lavaan WARNING:
## Level-1 variable "stress" has no variance within some clusters.
## The cluster ids with zero within variance are: s050
## Warning in lav_data_full(data = data, group = group, cluster = cluster, : lavaan WARNING:
## Level-1 variable "worry" has no variance within some clusters. The
## cluster ids with zero within variance are: s050
# inspecting covariances
parameterestimates(fit.conf.fix)[parameterestimates(fit.conf.fix)$op=="~~" & parameterestimates(fit.conf)$ci.lower<0,
c("lhs","level","est","se","ci.lower","ci.upper")]
Comments:
the re-fitting of the configural model with worry
variance fixed at the 15% of the total variance at level 2 does no show further improper solutions
now, item stress
show a negative lower CI, but the estimate is positive
For better comparison, we also re-fit the weak invariance model with the same constraint.
# re-fitting the weak invariance model with the new variance constraint
fit.weak.fix <- cfa(model=gsub("rho2",varlv2*0.15,
"level: 1
Distress_w =~ a*stress + b*worry + c*mood
level: 2
Distress_b =~ a*stress + b*worry + c*mood
worry ~~ rho2*worry"),data=dailyDiary,cluster='ID',std.lv=TRUE)
## Warning in lav_data_full(data = data, group = group, cluster = cluster, : lavaan WARNING:
## Level-1 variable "stress" has no variance within some clusters.
## The cluster ids with zero within variance are: s050
## Warning in lav_data_full(data = data, group = group, cluster = cluster, : lavaan WARNING:
## Level-1 variable "worry" has no variance within some clusters. The
## cluster ids with zero within variance are: s050
As an alternative solution, we inspect the changes in residual variances following the removal of each participant. This is done with the sample.fluct
and plot.infl
functions from https://github.com/Luca-Menghini/LuMenPsy.Rfunctions.
#' @title Influential analysis
#' @param data = data.frame used by the model.
#' @param cluster = Character. Variable name in the data frame defining the cluster in a two-level dataset.
#' @param parameter = Character. "var" for estimated variances, "load" for loadings.
#' @param st = Character indicating the standardization level of loadings: "st.all" or "st.lv".
#' @param n.items = Integer. Number of observed variables considered by the model.
#' @param item.labels = Character vector indicating the labels of the considered observed variables.
#' @param m = Character string specifying the model using the lavaan synthax.
sample.fluct <- function(data=NA,cluster="ID",parameter="var",st="st.all",
n.items=3,item.labels=c("stress","worry","mood"),
m = 'level: 1
Distress_w =~ stress + worry + mood
level: 2
Distress_b =~ stress + worry + mood'){ require(lavaan); require(tcltk)
# function to estiamte and store estimated parameters
estTable <- function(data=data,m=m,cluster=cluster){
# model fitting
m.res <- cfa(model=m,data=data,cluster=cluster,std.lv=TRUE)
# parameters
if(parameter=="var"){
p <- parameterestimates(m.res)
lv1 <- p[p$op=="~~" & p$est!=1 & p$level==1,"est"]
lv2 <- p[p$op=="~~" & p$est!=1 & p$level==2,"est"]
} else if(parameter=="load"){
if(st=="st.all"){
p <- standardizedsolution(m.res)
lv1 <- p[p$op=="=~","est.std"][1:n.items]
lv2 <- p[p$op=="=~","est.std"][(n.items+1):(n.items+n.items)]
} else if(st=="st.lv"){
p <- standardizedsolution(m.res,type="std.lv")
lv1 <- p[p$op=="=~","est.std"][1:n.items]
lv2 <- p[p$op=="=~","est.std"][(n.items+1):(n.items+n.items)]
} else{
lv1 <- p[p$op=="=~" & p$est!=1 & p$level==1,"est"]
lv2 <- p[p$op=="=~" & p$est!=1 & p$level==2,"est"] }}
# dataframe storing results
parameters <- data.frame(ID=rep("all",n.items),X=item.labels,lv1=lv1,lv2=lv2,neg.var=rep(1,n.items))
# marking neg.var as "0" when no negative values are highlighted
if(!any(diag(lavInspect(m.res,"est")[["within"]][["theta"]]) < 0) &
!any(diag(lavInspect(m.res,"est")[["ID"]][["theta"]]) < 0)){
parameters[1:n.items,"neg.var"] <- 0 }
return(parameters) }
# extracting baseline parameters
parameters <- estTable(data=data,m=m,cluster=cluster)
# replicate parameters estimation by excluding any participant one-by-one
IDs <- levels(as.factor(as.character(data$ID)))
pb <- tkProgressBar("Modeling", "Data modeling",0, 100, 50) # progress bar
for(ID in IDs){
info <- sprintf("%d%% done", round(which(IDs==ID)/length(IDs)*100))
setTkProgressBar(pb, round(which(IDs==ID)/length(IDs)*100), sprintf("Data modeling", info), info)
parameters <- rbind(parameters,estTable(data=data[data$ID!=ID,],m=m,cluster=cluster))
parameters[(nrow(parameters)-n.items+1):nrow(parameters),"ID"] <- as.character(ID)}
close(pb)
parameters <- parameters[,c("ID","X","lv1","lv2","neg.var")]
return(parameters)}
#' @title Plotting results of influential analysis
#' @param data = data.frame generated by the influential.analysis function.
#' @param parameter = Character. "var" for estimated variances, "load" for loadings.
#' @param variable = Character indicating the name of the observed variable to be plotted.
#' @param level = Integer indicating if focusing on level 1 or 2 (default).
#' @param threshold_lower = Numeric. Threeshold below which the participants'IDs are showed.
#' @param threshold_upper = Numeric. Threeshold above which the participants'IDs are showed.
plot.infl <- function(data,parameter="var",variable,level=2,
threshold_lower=NA,threshold_upper=NA){
require(ggplot2)
if(parameter=="var"){ par <- "variance"} else { par = "loading" }
if(level==2){ colnames(data)[colnames(data)=="lv2"] <- "par"
} else { colnames(data)[colnames(data)=="lv1"] <- "par" }
p <- ggplot(data[data$X==variable,],aes(ID,par))+
geom_point()+ggtitle(paste(variable,par,"on level",level))+
geom_point(data=data[data$X==variable & data$ID=="all",],colour="blue",size=5)+
geom_point(data=data[data$X==variable & data$neg.var==1,],colour="red")+
theme(axis.text.x = element_blank()) + xlab("")
if(!is.na(threshold_lower) & is.na(threshold_upper)){
p <- p + geom_text(data=data[data$X==variable &
data$par < threshold_lower,],
aes(label=ID),nudge_x=-5,size=3)
} else if(is.na(threshold_lower) & !is.na(threshold_upper)){
p <- p + geom_text(data=data[data$X==variable &
data$par > threshold_upper,],
aes(label=ID),nudge_x=-5,size=3)
} else if(!is.na(threshold_lower) & !is.na(threshold_upper)){
p <- p + geom_text(data=data[data$X==variable &
(data$par > threshold_upper |
data$par < threshold_lower),],
aes(label=ID),nudge_x=-5,size=3) }
return(p)}
# estimating parameters
infl <- sample.fluct(data=dailyDiary,parameter="var")
# plotting
plot.infl(data=infl,par="var",variable="worry",level=2,threshold_upper=0)
Comments:
s093
, s021
, s123
) seems to solve the Heywood case highlighted at level 2 for item worry
Here, we re-specify all target models by excluding participant s093
(i.e., associated with the lowest variance for worry
).
# fitting models
fit.conf_noInfl1 <- cfa(model=m.conf,data=dailyDiary[dailyDiary$ID!="s093",],cluster="ID",std.lv=TRUE) # Heywood case
## Warning in lav_data_full(data = data, group = group, cluster = cluster, : lavaan WARNING:
## Level-1 variable "stress" has no variance within some clusters.
## The cluster ids with zero within variance are: s050
## Warning in lav_data_full(data = data, group = group, cluster = cluster, : lavaan WARNING:
## Level-1 variable "worry" has no variance within some clusters. The
## cluster ids with zero within variance are: s050
fit.weak_noInfl1 <- cfa(model=m.weak,data=dailyDiary[dailyDiary$ID!="s093",],cluster="ID",std.lv=TRUE)
## Warning in lav_data_full(data = data, group = group, cluster = cluster, : lavaan WARNING:
## Level-1 variable "stress" has no variance within some clusters.
## The cluster ids with zero within variance are: s050
## Warning in lav_data_full(data = data, group = group, cluster = cluster, : lavaan WARNING:
## Level-1 variable "worry" has no variance within some clusters. The
## cluster ids with zero within variance are: s050
fit.strong_noInfl1 <- cfa(model=m.strong,data=dailyDiary[dailyDiary$ID!="s093",],cluster="ID",std.lv=TRUE)
## Warning in lav_data_full(data = data, group = group, cluster = cluster, : lavaan WARNING:
## Level-1 variable "stress" has no variance within some clusters.
## The cluster ids with zero within variance are: s050
## Warning in lav_data_full(data = data, group = group, cluster = cluster, : lavaan WARNING:
## Level-1 variable "worry" has no variance within some clusters. The
## cluster ids with zero within variance are: s050
Comments:
s095
Here, we compare the models specified above by inspecting their fit indices and information criteria. The fitInd
function is used to optimize the process.
fitInd <- function(model=NA,from_summary=FALSE,type="multilevel",models.names=NA,
fits=c("npar","chisq","df","rmsea","cfi","srmr_within","srmr_between")){
require(lavaan); require(MuMIn)
# removing level-specific fit indices when model is "monolevel"
if(type=="monolevel"){
fits <- gsub("srmr_within","srmr",fits)
fits <- fits[fits!="srmr_between"] }
if(from_summary==FALSE){
# returning dataframe of models fit indices when more than one model is considered
if(length(model)>1){
fit.indices <- fitmeasures(model[[1]])[fits]
for(i in 2:length(model)){
m <- model[[i]]
fit.indices <- rbind(fit.indices,fitmeasures(m)[fits]) }
if(!is.na(models.names[1])){
row.names(fit.indices) <- models.names }
return(as.data.frame(fit.indices))
} else { return(fitmeasures(model)[fits]) }
} else { # in some cases the fit indices are available only from the model's summary
quiet <- function(fit) { # this was written by Alicia FRANCO MARTÍNEZ on the lavaan Google group
sink(tempfile())
on.exit(sink())
invisible(summary(fit, standardized = TRUE, fit.measures=TRUE))
}
sum <- quiet(model)
fit.indices <- sum$FIT[fits]
return(fit.indices)}}
First, we evaluate the fit of the benchmark models specified for examining the factor structure at level 2.
# fit indices
round(fitInd(model=list(fit.null,fit.ind,fit.sat),
models.names=c("Lv2 Null","Lv2 Independence","Lv2 Saturated")),3)
# AIC also considering the configural model
Weights(AIC(fit.null,fit.ind,fit.sat,fit.conf))
## model weights
## [1] 0.0 0.0 0.5 0.5
# BIC also considering the configural model
Weights(BIC(fit.null,fit.ind,fit.sat,fit.conf))
## model weights
## [1] 0.0 0.0 0.5 0.5
Comments:
all benchmark models converged with no problems
lv2 Null and Independence models show unsatisfactory goodness of fit, and are rejected, suggesting that there is some interesting structure also at level 2
lv2 Saturated model is saturated, so we cannot use chi-squared-derived fit indices, and we cannot compare it with the configural model (which is also saturated)
Second, we evaluate the target models by keeping in mind that the configural model is saturated and show an Heywood case for worry
at level 2.
cbind(round(fitInd(model=c(fit.conf,fit.weak,fit.strong),
models.names=c("Configural","Weak Invariance","Strong Invariance")),3),
AICw=round(Weights(AIC(fit.conf,fit.weak,fit.strong)),3),
BICw=round(Weights(BIC(fit.conf,fit.weak,fit.strong)),3))
Comments:
the configural model is saturated, so we cannot evaluate its fit indices, but it shows the highest AICw and BIC
the weak invariance model shows acceptable RMSEA, CFI, and SRMR within, but excessively high SRMR between, suggesting a poor fit at level 2
the strong invariance model is rejected
Then, we replicate the model comparison by considering the configural and weak invariance models specified by constraining the level-2 residual variance for item worry
to the 15% of its total variance at level 2.
cbind(round(fitInd(model=c(fit.conf.fix,fit.weak.fix,fit.strong),
models.names=c("Configural","Weak Invariance","Strong Invariance")),3),
AICw=round(Weights(AIC(fit.conf.fix,fit.weak.fix,fit.strong)),3),
BICw=round(Weights(BIC(fit.conf.fix,fit.weak.fix,fit.strong)),3))
Comments:
the configural model shows the best fit indices and the highest AICw and BIC
the weak invariance model shows acceptable RMSEA, CFI, and SRMR within, but excessively high SRMR between, suggesting a poor fit at level 2
the strong invariance model is rejected
Finally, we replicate the model comparison on the models specified without participant s095
, associated with the lowest (negative) estimate for the level-2 residual variance of item worry
.
cbind(round(fitInd(model=c(fit.conf_noInfl1,fit.weak_noInfl1,fit.strong_noInfl1),
models.names=c("Configural","Weak Invariance","Strong Invariance")),3),
AICw=round(Weights(AIC(fit.conf_noInfl1,fit.weak_noInfl1,fit.strong_noInfl1)),3),
BICw=round(Weights(BIC(fit.conf_noInfl1,fit.weak_noInfl1,fit.strong_noInfl1)),3))
Comments:
the configural model is saturated, so we cannot evaluate its fit indices, but it shows the highest AICw and BIC
the weak invariance model shows acceptable RMSEA, CFI, and SRMR within, but excessively high SRMR between, suggesting a poor fit at level 2
the strong invariance model is rejected
In all model comparisons, the configural invariance model consistently showed the strongest evidence in terms of AIC and BIC, and the best fit indices when level-2 residual variance for item worry is constrained to the 15% of its total variance at level 2 (otherwise the model is saturated, and shows 1 Heywood case).
In contrast, the weak invariance model is consistently associated with satisfactory fit at level 1 but not at level 2, as indicated by the unsatisfactory SRMR coefficient at the between level.
The strong invariance model is consistently rejected due to the unsatisfactory fit, suggesting that different factors might influence item responses at level 2.
Thus, the configural invariance model is selected as the best measurement model describing the factor structure of the Psychological distress scale.
Here, we inspect and compare the standardized factor loadings estimated by the models showing the best fit indices: the configural model (i.e., we consider both its basic fit, and the conf.fix
and conf.noInfl1
versions).
# fit.conf
(p <- cbind(standardizedsolution(fit.conf)[standardizedsolution(fit.conf)$op=="=~",1:5],
standardizedsolution(fit.conf.fix)[standardizedsolution(fit.conf)$op=="=~",4:5],
standardizedsolution(fit.conf_noInfl1)[standardizedsolution(fit.conf)$op=="=~",4:5]))
Comments:
standardized parameters range from 0.54 to 0.99 in all models, suggesting adequate factorial validity
the model with level-2 worry
residual variance constrained to the 15% of its total variance shows fit indices from 0.54 to 0.96, with estimates very close to those estimated by the model without participant s095
Finally, we inspect the factor scores predicted at the within and between level by the selected model (i.e., configural model), and we compare the factor scores predicted by fit.conf.fix
(in blue) and fit.conf.noInfl1
(in red). Factor scores are then included in the ema
dataset, to be used in the following analyses.
# level 1
hist(lavPredict(fit.conf.fix,level=1),col=rgb(0,0,1,alpha=0.5),breaks=100,main="predicted factor scores lv1")
hist(lavPredict(fit.conf_noInfl1,level=1),add=TRUE,col=rgb(1,0,0,alpha=0.5),breaks=100)
# level 2
hist(lavPredict(fit.conf.fix,level=2),col=rgb(0,0,1,alpha=0.5),breaks=10,main="predicted factor scores lv2")
hist(lavPredict(fit.conf_noInfl1,level=2),add=TRUE,col=rgb(1,0,0,alpha=0.5),breaks=10)
# including fit.conf.fix factor scores in the ema dataset (fs)
ema[!is.na(ema$StartedTime),"fs.w"] <- as.numeric(lavPredict(fit.conf.fix,level=1))
demos$fs.b <- as.numeric(lavPredict(fit.conf.fix,level=2))
ema <- plyr::join(ema,demos[,c("ID","fs.b")],type="left",by="ID")
# computing sum of factor scores (cluster mean + within-cluster deviations)
ema$fs <- ema$fs.b + ema$fs.w
Comments:
factor scores are very similar between the two models at level 1, but more differences are highlighted at level 2
only the factor scores predicted by the fit.conf.fix
model (i.e., with level-2 residual variance for worry
being constrained to the 15% of its total variance at level 2) are included in the ema
dataset
Here, we compute reliability coefficients at both the within and the between level, both based on the MCFA results (following Geldhof, 2014) and based on the generalizability theory (following Cranford et al., 2006). The MCFArel
and GTHEORYrel
functions from https://github.com/Luca-Menghini/LuMenPsy.Rfunctions are used to optimize the analyses:
#' @title Computing composite reliability index from a MCFA model
#' @param fit = multilevel CFA model.
#' @param level = level of interest (either 1 or 2)
#' @param items = numeric string indicating the items of interest (e.g., 1:3 for items 1, 2 and 3)
#' @param item.labels = character string indicating the names of the items of interest
MCFArel <- function(fit,level,items,item.labels){ require(lavaan)
if(level==1){
sl <- standardizedsolution(fit)[1:(nrow(standardizedSolution(fit))/2),] # pars within
} else if(level==2){
sl <- standardizedsolution(fit)[(nrow(standardizedSolution(fit))/2):nrow(standardizedsolution(fit)),] # pars between
} else { stop("Error: level can be either 1 or 2") }
sl <- sl$est.std[sl$op == "=~"][items] # standardized loadings of the selected items
names(sl) <- item.labels # item names
re <- 1 - sl^2 # residual variances of items
omega <- sum(sl)^2 / (sum(sl)^2 + sum(re)) # composite reliability index
return(round(omega,2))}
#' @title Computing variance components and reliability indices of a given scale
#' @param data = dataset
#' @param items = numeric string indicating the items of interest (e.g., 1:3 for items 1, 2 and 3)
#' @param item.labels = character string indicating the names of the items of interest
GTHEORYrel <- function(data,items,latent.lab,what=c("varComp","rel")){ require(lme4)
# creating variable TIME
data <- plyr::ddply(data,"ID",transform,TIME=seq_along(ID))
# preparing long dataset with one row per item
psymetr <- stack(data[,items])
psymetr$ID <- data$ID
psymetr$time <- as.factor(data$TIME)
psymetr <- psymetr[,c(3,4,2,1)]
colnames(psymetr) <- c("person","time","item","y")
psymetr <- psymetr[order(psymetr$person,psymetr$time,psymetr$item),]
# random-only GLMM specificiation
mod1 <- lmer(y ~ 1 + (1|person) + (1|time) + (1|item) +
(1|person:time) + (1|person:item) + (1|time:item),data=psymetr)
# variance decomposition
SIGMAp <- lme4::VarCorr(mod1)[["person"]][1,1]
SIGMAt <- lme4::VarCorr(mod1)[["time"]][1,1]
SIGMAi <- lme4::VarCorr(mod1)[["item"]][1,1]
SIGMAtp <- lme4::VarCorr(mod1)[["person:time"]][1,1]
SIGMApi <- lme4::VarCorr(mod1)[["person:item"]][1,1]
SIGMAti <- lme4::VarCorr(mod1)[["time:item"]][1,1]
SIGMAres <- sigma(mod1)^2
# printing variance components
vars <- data.frame(Component=c("SIGMAp","SIGMAt","SIGMAi","SIGMAtp","SIGMApi","SIGMAti","SIGMAres","Total"),
VAR=c(SIGMAp,SIGMAt,SIGMAi,SIGMAtp,SIGMApi,SIGMAti,SIGMAres,
sum(SIGMAp,SIGMAt,SIGMAi,SIGMAtp,SIGMApi,SIGMAti,SIGMAres)))
vars$VAR <- round(vars$VAR,2)
vars$perc <- round(100*vars$VAR/vars[nrow(vars),2],2)
colnames(vars)[2:3] <- c(latent.lab,paste(latent.lab,"%"))
if(what=="varComp"){ return(vars)
}else if(what=="rel"){ # reliability coeff. based on Cranfort et al. (2006)
rel <- data.frame(measure=latent.lab,
# R1F = (varPERSON + varPERSON*ITEM/n.item) / (varPERSON + varPERSON*ITEM/n.item + varERROR/n.item)
R1F = (vars[1,2] + vars[5,2]/length(items)) / (vars[1,2] + vars[5,2]/length(items) + vars[7,2]/length(items)),
# RkF = (varPERSON + varPERSON*ITEM/n.item) / (varPERSON + varPERSON*ITEM/n.item + varERROR/(n.item*n.occasions))
RkF = (vars[1,2] + vars[5,2]/length(items))/ (vars[1,2] + vars[5,2]/length(items) +
vars[7,2]/(length(items)*max(data$TIME))),
# Rc = varPERSON*TIME / (varPERSON*TIME + varERROR/n.items)
Rc = vars[4,2] / (vars[4,2] + vars[7,2]/length(items)))
rel[,2:ncol(rel)] <- round(rel[,2:ncol(rel)],2)
return(rel)
}else { stop("Error: what argument can be either 'varComp' or 'rel'") }}
Level-specific reliability indices are computed from the configural invariance model fitted on the whole dataset, but with the level-2 residual variance of item worry
being constrained to the 15% of its total variance at level 2.
# MCFA-based reliability
(omega <- data.frame(measure="Psychological distress",
omega_w=MCFArel(fit=fit.conf.fix,level=1,items=1:3,
item.labels=c("stress","worry","mood")),
omega_b=MCFArel(fit=fit.conf.fix,level=2,items=1:3,
item.labels=c("stress","worry","mood"))))
# G-Theory-based reliability
GTHEORYrel(data=dailyDiary,items=c(which(colnames(dailyDiary)=="stress"),
which(colnames(dailyDiary)=="worry"),
which(colnames(dailyDiary)=="mood")),
latent.lab="Psychological distress",what="rel")
Comments:
overall, MCFA-based reliability indices are quite satisfactory (> .65) although not optimal
G-Theory-based reliability indices show similar results, with relatively low although acceptable between-participants reliability (R1F) and sensitivity to change (Rc)
Although the Psychological Distress scale presents some problems (i.e., Heywood case in the configural model, high SRMR between in the weak invariance model suggesting different factor loadings across levels, acceptable but not optimal reliability within-subject), it might work sufficiently well as a continuous measure of psychological distress.
Here, we compute the aggregate PsyDist
score by averaging the three item scores.
# computing aggregated score
ema$PsyDist <- apply(ema[,c("stress","worry","mood")],1,mean)
# sorting columns
AD <- which(colnames(ema)=="mood")
ema <- ema[,c(1:AD,ncol(ema),ncol(ema)-2,ncol(ema)-1,(AD+1):(ncol(ema)-3))]
# plotting single items vs. aggregate score
grid.arrange(ggplot(ema,aes(stress))+geom_histogram()+ggtitle("stress"),
ggplot(ema,aes(worry))+geom_histogram()+ggtitle("worry"),
ggplot(ema,aes(mood))+geom_histogram()+ggtitle("mood"),
ggplot(ema,aes(PsyDist))+geom_histogram(fill="red")+ggtitle("PsyDist"),nrow=2)
# plotting aggregate score vs. aggregate factor score
ggplot(ema,aes(PsyDist)) + geom_histogram(binwidth=0.1) +
geom_histogram(aes(fs,binwidth=0.1),fill=rgb(1,0,0,alpha=0.5)) +
ggtitle("PsyDist: Observed aggregated scores (black) vs. Aggregated factor scores (red)")
Here, we summarize and visual inspect each considered variable.
First, we summarize the categorical and continuous demographic variables in the demos
dataset: sex
, insomnia
, and insomnia.group
.
# summarizing demographics
summary(demos[,c("sex","insomnia","insomnia.group")])
## sex insomnia insomnia.group
## F:59 0:46 control:46
## M:34 1:47 DSM.ins:26
## sub.ins:21
# sex by insomnia
table(demos[,c("sex","insomnia")])
## insomnia
## sex 0 1
## F 28 31
## M 18 16
# sex by insomnia.group
table(demos[,c("sex","insomnia.group")])
## insomnia.group
## sex control DSM.ins sub.ins
## F 28 20 11
## M 18 6 10
# age & BMI
cat("age mean =",mean(demos$age)," sd =",sd(demos$age)," min =",min(demos$age)," max =",max(demos$age),
"\nBMI mean =",mean(demos$BMI)," sd =",sd(demos$BMI)," min =",min(demos$BMI)," max =",max(demos$BMI))
## age mean = 17.91859 sd = 0.9496819 min = 15.81404 max = 19.69924
## BMI mean = 21.81638 sd = 3.209905 min = 16.44353 max = 34.66943
Comments:
the sample includes 93 adolescents aged from 15.81 to 19.70 years (mean age = 17.91, SD = 0.95; 59 girls, 63%; mean BMI = 21.82, SD = 3.21)
47 adolescents (31 girls; 67%) are in the insomnia
group, whereas 46 (28 girls; 61%) are in the control
group
26 adolescents (20 girls; 77%) met full DSM-5 criteria for insomnia disorder, whereas 21 adolescents (10 girls; 48%) met all the DSM-5 criteria for insomnia disorder except for weekly frequency
Then, we report the data collection period as indexed by the date of the first and last data point.
# data collection beginning and end
cat("Start:",as.character(min(ema$ActivityDate))," Stop:",as.character(max(ema$ActivityDate)))
## Start: 2019-01-07 Stop: 2021-04-30
Here, we describe the participants’ compliance and response rate for each class of variables.
# computing response rate and compliance
compliance <- demos[,c("ID","insomnia")]
compVars <- c("TIB","rem.p","HR_REM","TotalSteps","stress")
for(i in 1:nrow(compliance)){ IDdata <- ema[ema$ID==as.character(compliance[i,"ID"]),]
for(Var in compVars){ colnames(IDdata)[which(colnames(IDdata)==Var)] <- "Var"
IDdata.TIB <- IDdata[!is.na(IDdata$TIB),]
IDdata.DD <- IDdata[!is.na(IDdata$stress),]
compliance[i,paste(Var,"No",sep=".")] <- nrow(IDdata[!is.na(IDdata$Var),]) # No. of valid data entries per participant
compliance[i,paste(Var,"RR",sep=".")] <- 100*nrow(IDdata[!is.na(IDdata$Var),])/60 # response rate over 60 days
compliance[i,paste(Var,"RR.sleep",sep=".")] <- 100*nrow(IDdata.TIB[!is.na(IDdata.TIB$Var),])/nrow(IDdata.TIB) # RR over sleep data
compliance[i,paste(Var,"RR.diary",sep=".")] <- 100*nrow(IDdata.DD[!is.na(IDdata.DD$Var),])/nrow(IDdata.DD) # RR over diary ratings
colnames(IDdata)[which(colnames(IDdata)=="Var")] <- Var }}
# adjusting compliance (100% when = or > 60)
compliance$TIB.RR.sleep <- compliance$stress.RR.diary <- 100
for(Var in paste(compVars,"No",sep=".")){ colnames(compliance)[which(colnames(compliance)==Var)] <- "Var"
if(any(compliance$Var>60)){
cat("\n\n",nrow(compliance[compliance$Var>60,]),"participants with more than 60 days of",gsub("No","",Var),"(from",
min(compliance[compliance$Var>60,"Var"]),"to",max(compliance[compliance$Var>60,"Var"]),")")
compliance[compliance$Var>60,gsub("No","RR",Var)] <- 100 } # fixing response rate to 100
colnames(compliance)[which(colnames(compliance)=="Var")] <- Var }
##
##
## 44 participants with more than 60 days of TIB. (from 61 to 85 )
##
## 13 participants with more than 60 days of rem.p. (from 61 to 78 )
##
## 12 participants with more than 60 days of HR_REM. (from 61 to 63 )
##
## 28 participants with more than 60 days of TotalSteps. (from 61 to 78 )
##
## 22 participants with more than 60 days of stress. (from 61 to 82 )
# computing compliance info relative to sleep.archit
TIBok <- ema[!is.na(ema$TIB),]
cat(nrow(TIBok),"valid sleep periods:\n",
" - mean No. per participants =",round(mean(compliance$TIB.No),1),"SD =",round(sd(compliance$TIB.No),1),
"mean response rate =",round(mean(compliance$TIB.RR),1),"% SD =",round(sd(compliance$TIB.RR),1),"%")
## 5121 valid sleep periods:
## - mean No. per participants = 55.1 SD = 14 mean response rate = 88.5 % SD = 20.8 %
for(Var in compVars){ colnames(TIBok)[which(colnames(TIBok)==Var)] <- "Var"
colnames(compliance)[which(colnames(compliance)==paste(Var,"RR.sleep",sep="."))] <- "Var"
cat("\n - ",nrow(TIBok[!is.na(TIBok$Var),]),"also including",Var,"(",
round(100*nrow(TIBok[!is.na(TIBok$Var),])/nrow(TIBok),1),"%, mean =",
round(mean(compliance$Var),1),"% SD =",round(sd(compliance$Var),1),"% )")
colnames(TIBok)[which(colnames(TIBok)=="Var")] <- Var
colnames(compliance)[which(colnames(compliance)=="Var")] <- paste(Var,"RR.sleep",sep=".") }
##
## - 5121 also including TIB ( 100 %, mean = 100 % SD = 0 % )
## - 4401 also including rem.p ( 85.9 %, mean = 86 % SD = 15 % )
## - 4352 also including HR_REM ( 85 %, mean = 85.1 % SD = 15 % )
## - 4310 also including TotalSteps ( 84.2 %, mean = 82.8 % SD = 20 % )
## - 4333 also including stress ( 84.6 %, mean = 83.7 % SD = 14.3 % )
# computing compliance info relative to diary ratings
DIARYok <- ema[!is.na(ema$stress),]
cat(nrow(DIARYok),"valid diary entries:\n",
" - mean No. per participants =",round(mean(compliance$stress.No),1),"SD =",round(sd(compliance$stress.No),1),
"mean response rate =",round(mean(compliance$stress.RR),1),"% SD =",round(sd(compliance$stress.RR),1),"%")
## 4932 valid diary entries:
## - mean No. per participants = 53 SD = 10.2 mean response rate = 87.2 % SD = 15.5 %
for(Var in compVars){ colnames(DIARYok)[which(colnames(DIARYok)==Var)] <- "Var"
colnames(compliance)[which(colnames(compliance)==paste(Var,"RR.diary",sep="."))] <- "Var"
cat("\n - ",nrow(DIARYok[!is.na(DIARYok$Var),]),"also including",Var,"(",
round(100*nrow(DIARYok[!is.na(DIARYok$Var),])/nrow(DIARYok),1),"%, mean =",
round(mean(compliance$Var),1),"% SD =",round(sd(compliance$Var),1),"% )")
colnames(DIARYok)[which(colnames(DIARYok)=="Var")] <- Var
colnames(compliance)[which(colnames(compliance)=="Var")] <- paste(Var,"RR.diary",sep=".") }
##
## - 4333 also including TIB ( 87.9 %, mean = 86.4 % SD = 18.8 % )
## - 3741 also including rem.p ( 75.9 %, mean = 74.9 % SD = 21.3 % )
## - 3713 also including HR_REM ( 75.3 %, mean = 74.3 % SD = 21.3 % )
## - 3949 also including TotalSteps ( 80.1 %, mean = 79.3 % SD = 23.6 % )
## - 4932 also including stress ( 100 %, mean = 100 % SD = 0 % )
# No. valid diary but not sleep data
cat(nrow(ema[!is.na(ema$stress) & is.na(ema$TIB),]),"cases with valid diary entry but missing sleep data (",
round(100*nrow(ema[!is.na(ema$stress) & is.na(ema$TIB),])/nrow(ema[!is.na(ema$stress),]),1),"% )")
## 599 cases with valid diary entry but missing sleep data ( 12.1 % )
Comments:
a total of 5,121 valid sleep periods were obtained from the EMA protocol, corresponding to an average compliance of 88.5 ± 20.8% (note that compliance was rounded at 100% for those participant with more than 60 days in any study variable)
among the valid sleep periods, 85.9% were associated with valid sleep staging information, 84.5% with valid NREM/REM HR data, 84.2% with valid daily steps data, and 84.6% with valid daily diary data
compliance with the diary protocol was equal to 87.2 ± 15.5% (note that compliance was rounded at 100% for those participant with more than 60 days in any study variable), with 599 cases of valid diary data without valid sleep data).
Then, we mark seven participants with extremely low No. of responses, that is with less than 14 days of valid observations for any of the data types.
# marking cases with less than 14 observations in any focal variable
compliance$majMiss <- 0
compliance[compliance$TIB.No<14 | compliance$rem.p.No<14 | compliance$HR_REM.No<14 |
compliance$TotalSteps.No<14 | compliance$stress.No < 14,"majMiss"] <- 1
ema <- plyr::join(ema,compliance[,c("ID","majMiss")],by="ID",type="left")
# showing compliance: majMiss
compliance[compliance$majMiss==1,c(1,2,seq(3,20,by=4))]
Here, we use the multidesc
function for computing the descriptive statistics (mean, SD and ICC) of level-1 (daily observations) and level-2 variables (demographics), as well as the mean and SD for each group.
multidesc <- function(long,wide,cluster="S.ID",lv1,lv2,group,group.labels=NA){ require(Rmisc); require(lme4)
out <- data.frame(Measure=lv1[1],
N=summarySE(long,lv1[1],na.rm=TRUE)[,2],
Mean=paste(round(summarySE(long,lv1[1],na.rm=TRUE)[,3],2)," (",
round(summarySE(long,lv1[1],na.rm=TRUE)[,4],2),")",sep=""))
if(!is.na(group)){
colnames(long)[which(colnames(long)==group)] <- colnames(wide)[which(colnames(wide)==group)] <- "group"
long$group <- as.factor(as.character(long$group))
wide$group <- as.factor(as.character(wide$group))
groups <- levels(long$group)
out$Mean1 = paste(round(summarySE(long[long$group==groups[1],],lv1[1],na.rm=TRUE)[,3],2)," (",
round(summarySE(long[long$group==groups[1],],lv1[1],na.rm=TRUE)[,4],2),")",sep="")
out$Mean2 = paste(round(summarySE(long[long$group==groups[2],],lv1[1],na.rm=TRUE)[,3],2)," (",
round(summarySE(long[long$group==groups[2],],lv1[1],na.rm=TRUE)[,4],2),")",sep="") }
for(i in 2:length(lv1)){
if(!is.na(group)){
out <- rbind(out,
data.frame(Measure=lv1[i],
N=summarySE(long,lv1[i],na.rm=TRUE)[,2],
Mean=paste(round(summarySE(long,lv1[i],na.rm=TRUE)[,3],2)," (",
round(summarySE(long,lv1[i],na.rm=TRUE)[,4],2),")",sep=""),
Mean1=paste(round(summarySE(long[long$group==groups[1],],lv1[i],na.rm=TRUE)[,3],2)," (",
round(summarySE(long[long$group==groups[1],],lv1[i],na.rm=TRUE)[,4],2),")",sep=""),
Mean2=paste(round(summarySE(long[long$group==groups[2],],lv1[i],na.rm=TRUE)[,3],2)," (",
round(summarySE(long[long$group==groups[2],],lv1[i],na.rm=TRUE)[,4],2),")",sep="")))
} else {
out <- rbind(out,
data.frame(Measure=lv1[i],
N=summarySE(long,lv1[i],na.rm=TRUE)[,2],
Mean=paste(round(summarySE(long,lv1[i],na.rm=TRUE)[,3],2)," (",
round(summarySE(long,lv1[i],na.rm=TRUE)[,4],2),")",sep=""))) }}
# demos data
for(i in 1:length(lv2)){
if(!is.na(group)){
out <- rbind(out,
data.frame(Measure=lv2[i],
N=summarySE(wide,lv2[i],na.rm=TRUE)[,2],
Mean=paste(round(summarySE(wide,lv2[i],na.rm=TRUE)[,3],2)," (",
round(summarySE(wide,lv2[i],na.rm=TRUE)[,4],2),")",sep=""),
Mean1=paste(round(summarySE(wide[wide$group==groups[1],],lv2[i],na.rm=TRUE)[,3],2)," (",
round(summarySE(wide[wide$group==groups[1],],lv2[i],na.rm=TRUE)[,4],2),")",sep=""),
Mean2=paste(round(summarySE(wide[wide$group==groups[2],],lv2[i],na.rm=TRUE)[,3],2)," (",
round(summarySE(wide[wide$group==groups[2],],lv2[i],na.rm=TRUE)[,4],2),")",sep="")))
} else {
out <- rbind(out,
data.frame(Measure=lv2[i],
N=summarySE(wide,lv2[i],na.rm=TRUE)[,2],
Mean=paste(round(summarySE(wide,lv2[i],na.rm=TRUE)[,3],2)," (",
round(summarySE(wide,lv2[i],na.rm=TRUE)[,4],2),")",sep=""))) }}
# ICC
out$ICC <- NA
for(i in 1:length(lv1)){
m <- lmer(formula=gsub("var",lv1[i],gsub("ID",cluster,"var~(1|ID)")),data=long) # VAR_between / (VAR_between + VAR_within)
out[out$Measure==lv1[i],"ICC"] <- round(as.data.frame(lme4::VarCorr(m))[1,4]/
(as.data.frame(lme4::VarCorr(m))[1,4]+as.data.frame(lme4::VarCorr(m))[2,4]),2)}
rownames(out) <- gsub(".cm","",rownames(out))
if(!is.na(group)){ if(!is.na(group.labels)){
colnames(out)[c(which(colnames(out)=="Mean1"),which(colnames(out)=="Mean2"))] <- group.labels
} else{ colnames(out)[c(which(colnames(out)=="Mean1"),which(colnames(out)=="Mean2"))] <- levels(long$group) }}
return(out)}
# selecting lv1 continuous variables
lv1 <- c("TIB","TST","WASO","SE","light.p","deep.p","rem.p", # sleep architecture
"SO.num","WakeUp.num", # sleep timing
"HR_NREM","HR_REM", # sleep autonomic functioning
"TotalSteps1000", # steps
"stress","worry","mood") # self-report
# selecting lv2 continuous variables
lv2 <- c("age","BMI")
# computing descriptive statistics
desc <- multidesc(long=ema,wide=demos,cluster="ID",lv1=lv1,lv2=lv2,
group="insomnia",group.labels=c(paste("Control (N =",nrow(demos[demos$insomnia=="0",]),")",sep=""),
paste("Insomnia (N = ",nrow(demos[demos$insomnia=="1",]),")",sep="")))
knitr::kable(desc)
Measure | N | Mean | Control (N =46) | Insomnia (N = 47) | ICC |
---|---|---|---|---|---|
TIB | 5121 | 468.52 (92.19) | 466.09 (96.68) | 470.82 (87.7) | 0.18 |
TST | 5121 | 414.27 (80.87) | 413.04 (84.71) | 415.43 (77.07) | 0.17 |
WASO | 5121 | 48.19 (23.97) | 46.98 (22.66) | 49.33 (25.09) | 0.20 |
SE | 5121 | 88.55 (4.05) | 88.74 (3.77) | 88.38 (4.29) | 0.22 |
light.p | 4401 | 58.77 (8.24) | 57.7 (8.01) | 59.9 (8.33) | 0.31 |
deep.p | 4401 | 19.8 (5.25) | 20.08 (5.15) | 19.51 (5.33) | 0.24 |
rem.p | 4401 | 21.43 (6.17) | 22.22 (6.32) | 20.6 (5.9) | 0.41 |
SO.num | 5121 | 0.52 (1.63) | 0.51 (1.68) | 0.54 (1.58) | 0.47 |
WakeUp.num | 5121 | 8.23 (1.62) | 8.17 (1.63) | 8.28 (1.61) | 0.31 |
HR_NREM | 4320 | 60.04 (7.85) | 61.44 (8.58) | 58.58 (6.7) | 0.76 |
HR_REM | 4352 | 62.06 (7.71) | 63.48 (8.5) | 60.58 (6.45) | 0.74 |
TotalSteps1000 | 4664 | 8.14 (5.16) | 8.89 (5.61) | 7.45 (4.6) | 0.42 |
stress | 4932 | 2.21 (1.08) | 2.16 (1.02) | 2.27 (1.13) | 0.26 |
worry | 4932 | 2.25 (1.09) | 2.21 (1.04) | 2.27 (1.13) | 0.33 |
mood | 4932 | 2.39 (1.03) | 2.49 (0.98) | 2.3 (1.07) | 0.25 |
age | 93 | 17.92 (0.95) | 18.14 (0.94) | 17.7 (0.92) | NA |
BMI | 93 | 21.82 (3.21) | 22.34 (3.7) | 21.3 (2.58) | NA |
Comments:
the sample is characterized by an average TIB of about 7.81 +- 1.54 hours, of which about seven hours of TST
(mean SE
around 88%)
similar summary statistics are shown by the two groups
ICCs vary between .17 and .75, with sleep-related HR showing the highest values, whereas none of the remaining variables shows ICC > .50, suggesting that most of the variance is at the within-individual level
Here, we use the multicorr
function from https://github.com/Luca-Menghini/LuMenPsy.Rfunctions to compute the inter-individual (shown below the main diagonal) and the intra-individual correlations (shown above the main diagonal) between the focal continuous variables.
multicorr <- function(long,wide,lv1,lv2,cluster="ID"){ require(Hmisc); require(Rmisc)
colnames(long)[which(colnames(long)==cluster)] <- "ID"
# individual means (lv2) of lv1 variables
for(VAR in lv1){
wide <- cbind(wide,newVar=summarySE(long,VAR,"ID",na.rm=TRUE)[,3])
colnames(wide)[which(colnames(wide)=="newVar")] <- paste(VAR,".cm",sep="") }
# joining individual means (lv2) to the long dataset
long <- plyr::join(long,
wide[,c(which(colnames(wide)==cluster),
which(colnames(wide)==paste(lv1[1],
".cm",sep="")):which(colnames(wide)==paste(lv1[length(lv1)],
".cm",sep="")))],
type="left",by="ID")
# mean-centered (lv1) values
for(VAR in lv1){
long$newVar <- long[,VAR] - long[,paste(VAR,".cm",sep="")]
colnames(long)[which(colnames(long)=="newVar")] <- paste(VAR,".dm",sep="") }
# between-subjects correlations (all variables)
out.b <- rcorr(as.matrix(wide[,c(paste(lv1,".cm",sep=""),lv2)]), type = "pearson")
rb <- round(out.b$r,2)
rb[lower.tri(rb)] <- NA
# within-participant correlations (HRV and ESM deviations from individual mean)
out.w <- rcorr(as.matrix(long[,paste(lv1,".dm",sep="")]), type = "pearson")
rw <- round(out.w$r,2)
rw[upper.tri(rw)] <- NA
# filling rb empty cells
rb[1:length(lv1),1:length(lv1)][lower.tri(rb[1:length(lv1),1:length(lv1)])] <- rw[lower.tri(rw)]
rownames(rb) <- gsub(".cm","",rownames(rb))
return(t(rb))}
desc <- cbind(desc,
multicorr(long=ema,wide=demos,cluster="ID",lv1=lv1,lv2=lv2))
rownames(desc) <- desc$Measure
desc$Measure <- NULL
write.csv(desc,"RESULTS/descriptives.csv") # saving result
desc
Correlations are also plotted below (blue = negative, white = uncorrelated, red = positive):
ggplot(melt(as.matrix(desc[,6:ncol(desc)])),aes(x=Var1, y=Var2, fill=value)) +
geom_tile() +
geom_text(aes(x = Var1, y = Var2, label = round(value,2)),color="black",size=3.5)+
ggtitle("Correlation Matrix (above = within-individual, below = between-individuals)")+labs(x="",y="")+
scale_fill_gradient2(low="darkblue",high="#f03b20",mid="white",
midpoint=0,limit = c(-1,1), space = "Lab",
name="Pearson\nCorrelation",guide="legend",
breaks=round(seq(1,-1,length.out = 11),2),
minor_breaks=round(seq(1,-1,length.out = 11),2))+
theme(text=element_text(face="bold",size=10),legend.position="none",
axis.text.x=element_text(angle=30))
Here, we visualize the univariate distributions of the focal variables, for each class of variables.
Variables describing sleep architecture include TIB
(min), TST
(min), WASO
(min), SE
(%), light
(%), deep
(%), and rem
(%).
# selecting variables
s.archit <- c("TIB","TST","WASO","SE","light.p","deep.p","rem.p")
# plotting
par(mfrow=c(3,3))
for(Var in s.archit){ hist(ema[,Var],main=Var,xlab="",breaks=30) }
# outliers with WASO > 150 min (23 cases)
nrow(ema[!is.na(ema$TIB) & ema$WASO>150,])
## [1] 23
# cases with WASO = 0 (SE = 100%) (8 cases)
nrow(ema[!is.na(ema$TIB) & ema$WASO==0,])
## [1] 8
Comments:
s.archit
variables are quite normally distributed
some univariate outliers can be highlighted in both WASO
(23 cases (0.4%) with WASO > 150 min) and SE
(14 cases (0.3%) with SE < 70%)
in a few cases (8), WASO
is equal to zero, with a SE
of 100%
Sleep timing variables include SO
(hours since 00:00) and WakeUp
(hours since 00:00).
# selecting variables
s.timing <- c("SO.num","WakeUp.num")
# plotting
par(mfrow=c(1,2))
for(Var in s.timing){ hist(ema[,Var],main=Var,xlab="",breaks=30) }
Comments:
s.timing
variables are quite normally distributed, with only a few outliers for WakeUp.num
Night-to-night variability in sleep measures is operationalized as the squared successive differences (SSD) between consecutive nights in the main s.archit
(i.e., TIB
, TST
, and WASO
) and s.timing
variables (i.e., SO.num
and WakeUp.num
).
# selecting s.archit and s.timing variables
sleepVars <- c(s.archit[1:3],s.timing)
# computing squared successive differences (SSD)
for(Var in sleepVars){
for(i in 2:nrow(ema)){
if(!is.na(ema[i,Var]) & !is.na(ema[i-1,Var])){ # non-missing current and previous day
if(ema[i,"ID"] == ema[i-1,"ID"] & ema[i,"dayNr"] == ema[i-1,"dayNr"] + 1){ # same ID and consecutive day
ema[i,paste(Var,"SSD",sep=".")] <- (ema[i,Var] - ema[i-1,Var])^2 }}}}
# plotting
par(mfrow=c(2,3))
s.variab <- paste(sleepVars,"SSD",sep=".")
for(Var in s.variab){ hist(ema[,Var],main=Var,xlab="",breaks=30) }
Comments:
s.variab
variables are highly skewed and negatively bounded to zeroAmong the variables describing sleep autonomic functioning we focus on stageHR_NREM
and stageHR_REM
.
# selecting variables
s.auton <- c("HR_NREM","HR_REM")
# plotting
par(mfrow=c(1,2))
for(Var in s.auton){ hist(ema[,Var],main=Var,xlab="",breaks=30) }
Comments:
s.auton
variables are quite normally distributed, with a few outliers with HR > 90 bmpTotalSteps1000
quantifies the daily thousands of steps.
# plotting
hist(ema[,"TotalSteps1000"],main="TotalSteps1000",xlab="",breaks=30)
Comments:
TotalSteps1000
is bounded at zero, and positively skewed, with outliers showing > 30 thousants of steps per dayPre-sleep diary ratings include the stress
, the worry
and the mood
item ratings (from 1 to 5).
# selecting variables
d.ratings <- c("stress","mood","worry")
# plotting
par(mfrow=c(1,3))
for(Var in d.ratings){ hist(ema[,Var],main=Var,xlab="",breaks=30) }
Comments:
Here, we visualize the distributions of the aggregated scores and that of the factor scores predicted by the configural invariance MCFA model selected above.
# mean centering PsyDist
demos$PsyDist.cm <- summarySE(ema,measurevar="PsyDist",groupvars="ID",na.rm=TRUE)[,3]
ema <- plyr::join(ema,demos[,c("ID","PsyDist.cm")],by="ID",type="left")
ema$PsyDist.mc <- ema$PsyDist - ema$PsyDist.cm
# plotting
grid.arrange(ggplot(ema,aes(PsyDist))+geom_histogram(),ggplot(ema,aes(fs))+geom_histogram(),
ggplot(ema[!duplicated(ema$ID),],aes(PsyDist.cm))+geom_histogram(),ggplot(demos,aes(fs.b))+geom_histogram(),
ggplot(ema,aes(PsyDist.mc))+geom_histogram(),ggplot(ema,aes(fs.w))+geom_histogram(),nrow=3)
Then, we visualize the distribution of each focal variable against categorical predictors.
Here, we explore the distributions of each continuous variable between insomnia and controls, and between the two insomnia sub-groups.
for(Var in c("age","BMI")){
print(ggplot(demos,aes_string(x="insomnia",y=Var,fill="insomnia")) + ggtitle(paste(Var,"in controls vs. insomnia")) +
geom_point(col="gray",position = position_jitter(width = .15)) + geom_violin(alpha=.4)) }
for(Var in s.archit){
print(ggplot(ema,aes_string(x="insomnia",y=Var,fill="insomnia")) + ggtitle(paste(Var,"in controls vs. insomnia")) +
geom_point(col="gray",position = position_jitter(width = .15)) + geom_violin(alpha=.4)) }
Comments:
TIB
and TST
, a slightly lower SE
, a slightly higher light.p
and lower REM.p
than the control groupfor(Var in s.timing){
print(ggplot(ema,aes_string(x="insomnia",y=Var,fill="insomnia")) + ggtitle(paste(Var,"in controls vs. insomnia")) +
geom_point(col="gray",position = position_jitter(width = .15)) + geom_violin(alpha=.4)) }
Comments:
SO
, and a slightly later WakeUp
times compared to the control groupfor(Var in s.auton){
print(ggplot(ema,aes_string(x="insomnia",y=Var,fill="insomnia")) + ggtitle(paste(Var,"in controls vs. insomnia")) +
geom_point(col="gray",position = position_jitter(width = .15)) + geom_violin(alpha=.4)) }
Comments:
insomnia
group is characterized by a slightly lower HR both in HR_NREM
and especially in HR_REM
compared to the control group# day-level
for(Var in s.variab){
print(ggplot(ema,aes_string(x="insomnia",y=Var,fill="insomnia")) + ggtitle(paste(Var,"in controls vs. insomnia")) +
geom_point(col="gray",position = position_jitter(width = .15)) + geom_violin(alpha=.4) +
ylim(0,max(ema[,Var],na.rm=TRUE)/4))} # higher limit fixed to 1/4 of the variable length for better visualization
Comments:
for(Var in "TotalSteps1000"){
print(ggplot(ema,aes_string(x="insomnia",y=Var,fill="insomnia")) + ggtitle(paste(Var,"in controls vs. insomnia")) +
geom_point(col="gray",position = position_jitter(width = .15)) + geom_violin(alpha=.4)) }
Comments:
TotalSteps1000
count compared to the control group# item ratings
for(Var in d.ratings){
print(ggplot(ema,aes_string(x="insomnia",y=Var,fill="insomnia")) + ggtitle(paste(Var,"in controls vs. insomnia")) +
geom_point(col="gray",position = position_jitter(width = .15)) + geom_violin(alpha=.4)) }
# aggregated scores
for(Var in c("PsyDist.cm","fs.b")){
print(ggplot(demos,aes_string(x="insomnia",y=Var,fill="insomnia")) + ggtitle(paste(Var,"in controls vs. insomnia")) +
geom_point(col="gray",position = position_jitter(width = .15)) + geom_violin(alpha=.4)) }
Comments:
for(Var in c("age","BMI")){
print(ggplot(demos,aes_string(x="insomnia.group",y=Var,fill="insomnia.group")) +
ggtitle(paste(Var,"in controls vs. insomnia groups")) +
geom_point(col="gray",position = position_jitter(width = .15)) + geom_violin(alpha=.4)) }
for(Var in s.archit){
print(ggplot(ema,aes_string(x="insomnia.group",y=Var,fill="insomnia.group")) +
ggtitle(paste(Var,"in controls vs. insomnia groups")) +
geom_point(col="gray",position = position_jitter(width = .15)) + geom_violin(alpha=.4)) }
Comments:
no substantial differences are highlighted between insomnia groups, with both showing differences from the control group similar to those highlighted above
rem.p`` looks slighlty **higher in the
DSM.ins`** group compared to controls
for(Var in s.timing){
print(ggplot(ema,aes_string(x="insomnia.group",y=Var,fill="insomnia.group")) +
ggtitle(paste(Var,"in controls vs. insomnia groups")) +
geom_point(col="gray",position = position_jitter(width = .15)) + geom_violin(alpha=.4)) }
Comments:
DSM.ins
group shows higher WakeUp.num
than the control groupfor(Var in s.auton){
print(ggplot(ema,aes_string(x="insomnia.group",y=Var,fill="insomnia.group")) +
ggtitle(paste(Var,"in controls vs. insomnia groups")) +
geom_point(col="gray",position = position_jitter(width = .15)) + geom_violin(alpha=.4)) }
Comments:
both meanHR
and sleepHR
values computed from the first part of the night show lower values in the insomnia groups (especially the DSM.ins) compared to the control group (look substantial)
this is more evident for the DSM-insomnia group, whereas a bimodal distribution is shown by the sub-insomnia group
for(Var in s.variab){
print(ggplot(ema,aes_string(x="insomnia.group",y=Var,fill="insomnia.group")) +
ggtitle(paste(Var,"in controls vs. insomnia groups")) +
geom_point(col="gray",position = position_jitter(width = .15)) + geom_violin(alpha=.4) +
ylim(0,max(ema[,Var],na.rm=TRUE)/4))} # higher limit fixed to 1/4 of the variable length for better visualization
Comments:
for(Var in "TotalSteps1000"){
print(ggplot(ema,aes_string(x="insomnia.group",y=Var,fill="insomnia.group")) +
ggtitle(paste(Var,"in controls vs. insomnia groups")) +
geom_point(col="gray",position = position_jitter(width = .15)) + geom_violin(alpha=.4)) }
Comments:
TotalSteps
count compared to the other two groups# item ratings
for(Var in d.ratings){
print(ggplot(ema,aes_string(x="insomnia.group",y=Var,fill="insomnia.group")) +
ggtitle(paste(Var,"in controls vs. insomnia groups")) +
geom_point(col="gray",position = position_jitter(width = .15)) + geom_violin(alpha=.4)) }
# aggregated scores
for(Var in c("PsyDist.cm","fs.b")){
print(ggplot(demos,aes_string(x="insomnia.group",y=Var,fill="insomnia.group")) +
ggtitle(paste(Var,"in controls vs. insomnia groups")) +
geom_point(col="gray",position = position_jitter(width = .15)) + geom_violin(alpha=.4)) }
Comments:
PsyDist
, although some variables showing slightly lower values in the sub-insomnia, and a slighlty higher values in the DSM-insomnia compared to the other groupsHere, we explore the distributions of each continuous variable between girls and boys.
for(Var in c("age","BMI")){
print(ggplot(demos,aes_string(x="sex",y=Var,fill="sex")) + ggtitle(paste(Var,"in females vs. males")) +
geom_point(col="gray",position = position_jitter(width = .15)) + geom_violin(alpha=.4)) }
Comments:
for(Var in s.archit){
print(ggplot(ema,aes_string(x="sex",y=Var,fill="sex")) + ggtitle(paste(Var,"in females vs. males")) +
geom_point(col="gray",position = position_jitter(width = .15)) + geom_violin(alpha=.4)) }
Comments:
TST
and TIB
, and a slightly higher REM.p in boys compared to girlsfor(Var in s.timing){
print(ggplot(ema,aes_string(x="sex",y=Var,fill="sex")) + ggtitle(paste(Var,"in females vs. males")) +
geom_point(col="gray",position = position_jitter(width = .15)) + geom_violin(alpha=.4)) }
Comments:
SO
is slightly later in boys compared to girls, whereas WakeUp
times are similarfor(Var in s.auton){
print(ggplot(ema,aes_string(x="sex",y=Var,fill="sex")) + ggtitle(paste(Var,"in females vs. males")) +
geom_point(col="gray",position = position_jitter(width = .15)) + geom_violin(alpha=.4)) }
Comments:
for(Var in s.variab){
print(ggplot(ema,aes_string(x="sex",y=Var,fill="sex")) + ggtitle(paste(Var,"in females vs. males")) +
geom_point(col="gray",position = position_jitter(width = .15)) + geom_violin(alpha=.4) +
ylim(0,max(ema[,Var],na.rm=TRUE)/4))} # higher limit fixed to 1/4 of the variable length for better visualization
Comments:
for(Var in "TotalSteps1000"){
print(ggplot(ema,aes_string(x="sex",y=Var,fill="sex")) + ggtitle(paste(Var,"in females vs. males")) +
geom_point(col="gray",position = position_jitter(width = .15)) + geom_violin(alpha=.4)) }
Comments:
# item ratings
for(Var in d.ratings){
print(ggplot(ema,aes_string(x="sex",y=Var,fill="sex")) + ggtitle(paste(Var,"in females vs. males")) +
geom_point(col="gray",position = position_jitter(width = .15)) + geom_violin(alpha=.4)) }
# aggregated scores
for(Var in c("PsyDist.cm","fs.b")){
print(ggplot(demos,aes_string(x="sex",y=Var,fill="sex")) + ggtitle(paste(Var,"in females vs. males")) +
geom_point(col="gray",position = position_jitter(width = .15)) + geom_violin(alpha=.4)) }
Comments:
fs
is slightly higher in girls compared to boysHere, we explore the explore the differences between cases observed during weekend vs weekdays. Both the weekday
(weekend = Saturday and Sunday) and the weekday.sleep
variable (weekend = Friday and Saturday) are explored.
# weekday
for(Var in s.archit){
print(ggplot(ema,aes_string(x="weekday",y=Var,fill="weekday")) + ggtitle(paste(Var,"in weekdays vs. weekend (SAT-SUN)")) +
geom_point(col="gray",position = position_jitter(width = .15)) + geom_violin(alpha=.4)+
theme(legend.position = "none")) }
# weekday.sleep
for(Var in s.archit){
print(ggplot(ema,aes_string(x="weekday.sleep",y=Var,fill="weekday.sleep")) +
ggtitle(paste(Var,"in weekdays vs. weekend (FRY-SAT)")) +
geom_point(col="gray",position = position_jitter(width = .15)) + geom_violin(alpha=.4) +
theme(legend.position = "none")) }
Comments:
TIB
and TST
than weekdays, with slightly larger differences when considering Friday and Saturday as weekend, rather than Saturday and Sunday# weekday
for(Var in s.timing){
print(ggplot(ema,aes_string(x="weekday",y=Var,fill="weekday")) + ggtitle(paste(Var,"in weekdays vs. weekend (SAT-SUN)")) +
geom_point(col="gray",position = position_jitter(width = .15)) + geom_violin(alpha=.4)+
theme(legend.position = "none")) }
# weekday.sleep
for(Var in s.timing){
print(ggplot(ema,aes_string(x="weekday.sleep",y=Var,fill="weekday.sleep")) +
ggtitle(paste(Var,"in weekdays vs. weekend (FRY-SAT)")) +
geom_point(col="gray",position = position_jitter(width = .15)) + geom_violin(alpha=.4)+
theme(legend.position = "none")) }
Comments:
SO
and WakeUp
times compared to weekdays, with substantially larger differences when considering Friday and Saturday as weekend, rather than Saturday and Sunday# weekday
for(Var in s.auton){
print(ggplot(ema,aes_string(x="weekday",y=Var,fill="weekday")) + ggtitle(paste(Var,"in weekdays vs. weekend (SAT-SUN)")) +
geom_point(col="gray",position = position_jitter(width = .15)) + geom_violin(alpha=.4)+
theme(legend.position = "none")) }
# weekday.sleep
for(Var in s.auton){
print(ggplot(ema,aes_string(x="weekday.sleep",y=Var,fill="weekday.sleep")) +
ggtitle(paste(Var,"in weekdays vs. weekend (FRY-SAT)")) +
geom_point(col="gray",position = position_jitter(width = .15)) + geom_violin(alpha=.4)+
theme(legend.position = "none")) }
Comments:
for(Var in s.variab){
print(ggplot(ema,aes_string(x="weekday",y=Var,fill="weekday")) + ggtitle(paste(Var,"in weekdays vs. weekend (SAT-SUN)")) +
geom_point(col="gray",position = position_jitter(width = .15)) + geom_violin(alpha=.4) +
ylim(0,max(ema[,Var],na.rm=TRUE)/4))} # higher limit fixed to 1/4 of the variable length for better visualization
Comments:
# weekday
for(Var in "TotalSteps1000"){
print(ggplot(ema,aes_string(x="weekday",y=Var,fill="weekday")) + ggtitle(paste(Var,"in weekdays vs. weekend (SAT-SUN)")) +
geom_point(col="gray",position = position_jitter(width = .15)) + geom_violin(alpha=.4)+
theme(legend.position = "none")) }
# weekday.sleep
for(Var in "TotalSteps1000"){
print(ggplot(ema,aes_string(x="weekday.sleep",y=Var,fill="weekday.sleep")) +
ggtitle(paste(Var,"in weekdays vs. weekend (FRY-SAT)")) +
geom_point(col="gray",position = position_jitter(width = .15)) + geom_violin(alpha=.4)+
theme(legend.position = "none")) }
Comments:
# weekday - item ratings
for(Var in d.ratings){
print(ggplot(ema,aes_string(x="weekday",y=Var,fill="weekday")) + ggtitle(paste(Var,"in weekdays vs. weekend (SAT-SUN)")) +
geom_point(col="gray",position = position_jitter(width = .15)) + geom_violin(alpha=.4)) }
# weekday.sleep - item ratings
for(Var in d.ratings){
print(ggplot(ema,aes_string(x="weekday.sleep",y=Var,fill="weekday.sleep")) +
ggtitle(paste(Var,"in weekdays vs. weekend (FRY-SAT)")) +
geom_point(col="gray",position = position_jitter(width = .15)) + geom_violin(alpha=.4)) }
# weekday - aggregated scores
for(Var in c("PsyDist","fs.w")){
print(ggplot(ema,aes_string(x="weekday",y=Var,fill="weekday")) + ggtitle(paste(Var,"in weekdays vs. weekend (SAT-SUN)")) +
geom_point(col="gray",position = position_jitter(width = .15)) + geom_violin(alpha=.4)) }
# weekday.sleep - aggregated scores
for(Var in c("PsyDist","fs.w")){
print(ggplot(ema,aes_string(x="weekday.sleep",y=Var,fill="weekday.sleep")) +
ggtitle(paste(Var,"in weekdays vs. weekend (FRY-SAT)")) +
geom_point(col="gray",position = position_jitter(width = .15)) + geom_violin(alpha=.4)) }
Comments:
PsyDist
than weekdays, especially when considering the factor scores at the within level with Friday and Saturday as weekend daysHere, we explore the differences between cases observed before and during the COVID-19 emergency (i.e., after March 20th, 2020).
for(Var in s.archit){
print(ggplot(ema,aes_string(x="covid19",y=Var,fill="covid19")) + ggtitle(paste(Var,"pre- and post-covid19 lockdown")) +
geom_point(col="gray",position = position_jitter(width = .15)) + geom_violin(alpha=.4)) }
Comments:
TIB
and TST
compared to the pre-COVID-19 period, whereas difference in other s.archit
variables do not seem substantialfor(Var in s.timing){
print(ggplot(ema,aes_string(x="covid19",y=Var,fill="covid19")) + ggtitle(paste(Var,"pre- and post-covid19 lockdown")) +
geom_point(col="gray",position = position_jitter(width = .15)) + geom_violin(alpha=.4)) }
Comments:
SO
and especially WakeUp
are substantially delayed in the post-COVID-19 period compared to the pre-COVID-19 period.for(Var in s.auton){
print(ggplot(ema,aes_string(x="covid19",y=Var,fill="covid19")) + ggtitle(paste(Var,"pre- and post-covid19 lockdown")) +
geom_point(col="gray",position = position_jitter(width = .15)) + geom_violin(alpha=.4)) }
Comments:
stageHR_REM
for(Var in s.variab){
print(ggplot(ema,aes_string(x="covid19",y=Var,fill="covid19")) + ggtitle(paste(Var,"pre- and post-covid19 lockdown")) +
geom_point(col="gray",position = position_jitter(width = .15)) + geom_violin(alpha=.4) +
ylim(0,max(ema[,Var],na.rm=TRUE)/4))} # higher limit fixed to 1/4 of the variable length for better visualization
Comments:
for(Var in "TotalSteps1000"){
print(ggplot(ema,aes_string(x="covid19",y=Var,fill="covid19")) + ggtitle(paste(Var,"pre- and post-covid19 lockdown")) +
geom_point(col="gray",position = position_jitter(width = .15)) + geom_violin(alpha=.4)) }
Comments:
# item ratings
for(Var in d.ratings){
print(ggplot(ema,aes_string(x="covid19",y=Var,fill="covid19")) + ggtitle(paste(Var,"pre- and post-covid19 lockdown")) +
geom_point(col="gray",position = position_jitter(width = .15)) + geom_violin(alpha=.4)) }
# aggregated scores
for(Var in c("PsyDist","fs.w")){
print(ggplot(ema,aes_string(x="covid19",y=Var,fill="covid19")) + ggtitle(paste(Var,"pre- and post-covid19 lockdown")) +
geom_point(col="gray",position = position_jitter(width = .15)) + geom_violin(alpha=.4)) }
Comments:
Finally, we explore the differences between diary ratings obtained before bedtime and those obtained on the following day.
# item ratings
ema$diary.nextDay <- as.factor(ema$diary.nextDay)
for(Var in d.ratings){
print(ggplot(ema[!is.na(ema$diary.nextDay),],aes_string(x="diary.nextDay",y=Var,fill="diary.nextDay")) +
ggtitle(paste(Var,"bedtime vs. next day diary entries")) +
geom_point(col="gray",position = position_jitter(width = .15)) + geom_violin(alpha=.4)) }
# aggregated scores
for(Var in c("PsyDist","fs.w")){
print(ggplot(ema[!is.na(ema$diary.nextDay),],aes_string(x="diary.nextDay",y=Var,fill="diary.nextDay")) +
ggtitle(paste(Var,"bedtime vs. next day diary entries")) +
geom_point(col="gray",position = position_jitter(width = .15)) + geom_violin(alpha=.4)) }
Comments:
Here, we descriptively characterize the sample and the subgroups of participants (based on sex
, insomnia
, and insomnia.group
) in terms of pre-sleep stress
, worry
, and mood
as well as in terms of sources of stress
and worry
.
Here, we describe the stress
variable in the sample, and across the subgroups of participants.
# plotting
grid.arrange(ggplot(ema[!is.na(ema$stress),],aes(x=as.factor(stress))) + geom_histogram(stat="count") + ggtitle("stress in the sample") +
geom_text(aes(label = scales::percent((..count..)/sum(..count..))), stat = "count", vjust = -0.25) + ylim(0,1700),
ggplot(ema,aes(stress,fill=sex)) + geom_histogram(position=position_dodge(0.5)),
ggplot(ema,aes(stress,fill=insomnia)) + geom_histogram(position=position_dodge(0.5)),
ggplot(ema,aes(stress,fill=insomnia.group)) + geom_histogram(position=position_dodge(0.5)),nrow=2)
# computing mean and SD in the sample and by group
summarySE(ema,measurevar="stress",na.rm=TRUE)[,1:4] # stress in the sample
summarySE(ema,measurevar="stress",groupvars="sex",na.rm=TRUE)[,1:4] # stress by sex
summarySE(ema,measurevar="stress",groupvars="insomnia",na.rm=TRUE)[,1:4] # stress by insomnia
summarySE(ema,measurevar="stress",groupvars="insomnia.group",na.rm=TRUE)[,1:4] # stress by insomnia.group
# very or extremely stressful
ema[!is.na(ema$stress) & ema$stress<4,"stress.VE"] <- 0
ema[!is.na(ema$stress) & ema$stress>=4,"stress.VE"] <- 1
ema$stress.VE <- as.factor(ema$stress.VE)
df <- ema[!is.na(ema$stress),]
grid.arrange(ggplot(df,aes(x=stress.VE)) + geom_histogram(stat="count") +
ggtitle("very/estremely stressed in the sample") + ylim(0,4500) +
geom_text(aes(label = scales::percent((..count..)/sum(..count..))), stat="count", vjust = -0.25),
ggplot(df,aes(stress.VE,fill=sex)) + geom_histogram(stat="count",position=position_dodge(0.5)),
ggplot(df,aes(stress.VE,fill=insomnia)) + geom_histogram(stat="count",position=position_dodge(0.5)),
ggplot(df,aes(stress.VE,fill=insomnia.group)) + geom_histogram(stat="count",position=position_dodge(0.5)),nrow=2)
# summarizing stress.VE
cat(nrow(df[df$stress.VE==1,]),"cases with very or extreme stress,",
round(100*nrow(df[df$stress.VE==1,])/nrow(df),2),"%\n",
nrow(df[df$insomnia==0 & df$stress.VE==1,]),"cases with very or extreme stress controls ratings,",
round(100*nrow(df[df$insomnia==0 & df$stress.VE==1,])/nrow(df[df$insomnia==1,]),2),"% of controls ratings\n",
nrow(df[df$insomnia==1 & df$stress.VE==1,]),"cases with very or extreme stress insomnia ratings,",
round(100*nrow(df[df$insomnia==1 & df$stress.VE==1,])/nrow(df[df$insomnia==1,]),2),"% of insomnia ratings")
## 607 cases with very or extreme stress, 12.31 %
## 225 cases with very or extreme stress controls ratings, 8.87 % of controls ratings
## 382 cases with very or extreme stress insomnia ratings, 15.06 % of insomnia ratings
Comments:
insomnia
(15%) than in controls (9%)Here, we describe the sources reported by participants for the stress
variable in the sample, and across the subgroups of participants. Note that many responses are missing, with unbalanced numbers of missing responses for each category. Here, we recode all categories (with the exception of stress_COVID
, that was added later than the others, and thus not considered in the present analyses) by giving value zero when the score to one or more of the remaining categories was not missing.
# showing summary (original)
ema$stress_COVID <- NULL # removing stress_COVID
stressSources <- colnames(ema)[which(substr(colnames(ema),1,7)=="stress_")]
summary(ema[!is.na(ema$stress),stressSources])
## stress_school stress_family stress_health stress_peers stress_other
## 0 :2617 0 :3440 0 :2628 0 :2832 0 :3556
## 1 :2141 1 : 404 1 : 309 1 : 537 1 :1277
## NA's: 174 NA's:1088 NA's:1995 NA's:1563 NA's: 99
# assigning zero when 1+ of the other scores are not missing
ema[,stressSources] <- lapply(ema[,stressSources],as.character) # from factor to numeric
ema[,stressSources] <- lapply(ema[,stressSources],as.numeric)
ema$stress_total <- rowSums(ema[,stressSources],na.rm=TRUE) # computing total score
ema[is.na(ema$stress_school) & !is.na(ema$stress_total),"stress_school"] <- 0
ema[is.na(ema$stress_family) & !is.na(ema$stress_total),"stress_family"] <- 0
ema[is.na(ema$stress_health) & !is.na(ema$stress_total),"stress_health"] <- 0
ema[is.na(ema$stress_peers) & !is.na(ema$stress_total),"stress_peers"] <- 0
ema[is.na(ema$stress_other) & !is.na(ema$stress_total),"stress_other"] <- 0
ema[,stressSources] <- lapply(ema[,stressSources],as.factor) # back to factor
# showing summary (updated)
summary(ema[!is.na(ema$stress),stressSources])
## stress_school stress_family stress_health stress_peers stress_other
## 0:2791 0:4528 0:4623 0:4395 0:3655
## 1:2141 1: 404 1: 309 1: 537 1:1277
# plotting variables in the sample
for(Var in stressSources){
print(ggplot(ema[!is.na(ema$stress),],aes_string(x=Var)) + geom_histogram(stat="count",fill="gray") +
ggtitle(paste(Var,"in the sample")) +
geom_text(aes(label = scales::percent((..count..)/sum(..count..))), stat="count", vjust = 1.2))}
# plotting variables only in cases with stress > 1
for(Var in stressSources){
print(ggplot(ema[!is.na(ema$stress) & ema$stress>1,],aes_string(x=Var)) + geom_histogram(stat="count",fill="yellow") +
ggtitle(paste(Var,"in the sample when stress > 1")) +
geom_text(aes(label = scales::percent((..count..)/sum(..count..))), stat="count", vjust = 1.2))}
# plotting variables only in cases with stress = Very or Extremely stressed
for(Var in stressSources){
print(ggplot(ema[!is.na(ema$stress) & ema$stress>3,],aes_string(x=Var)) + geom_histogram(stat="count",fill="red") +
ggtitle(paste(Var,"in the sample when stress > 3")) +
geom_text(aes(label = scales::percent((..count..)/sum(..count..))), stat="count", vjust = 1.2))}
Comments:
stress
> 1, 70% of times with stress
> 3) and other (38% of times with stress
> 1, 44% of times with stress
> 3).Finally, we visualize the frequency of each source of stress by sex
, insomnia
, and insomnia.group
.
# plotting variables by sex
for(Var in stressSources){
print(ggplot(ema[!is.na(ema$stress) & ema$stress > 1,],aes_string(x="sex",fill=Var)) +
geom_histogram(stat="count",position="dodge") + ggtitle(paste(Var,"by sex")) +
geom_text(aes(label = scales::percent((..count..)/sum(..count..))),stat="count", vjust = 1.2,position="identity"))}
# plotting variables by insomnia
for(Var in stressSources){
print(ggplot(ema[!is.na(ema$stress) & ema$stress > 1,],aes_string(x="insomnia",fill=Var)) +
geom_histogram(stat="count",position="dodge") + ggtitle(paste(Var,"by insomnia")) +
geom_text(aes(label = scales::percent((..count..)/sum(..count..))),stat="count", vjust = 1.2,position="identity"))}
# plotting variables by insomnia.group
for(Var in stressSources){
print(ggplot(ema[!is.na(ema$stress) & ema$stress > 1,],aes_string(x="insomnia.group",fill=Var)) +
geom_histogram(stat="count",position="dodge") + ggtitle(paste(Var,"by insomnia.group")) +
geom_text(aes(label = scales::percent((..count..)/sum(..count..))),stat="count", vjust = 1.2,position="identity"))}
Here, we describe the worry
variable in the sample, and across the subgroups of participants.
# plotting
grid.arrange(ggplot(ema[!is.na(ema$worry),],aes(x=as.factor(worry))) + geom_histogram(stat="count") +
ggtitle("worry in the sample") +
geom_text(aes(label = scales::percent((..count..)/sum(..count..))), stat = "count", vjust = -0.25) + ylim(0,1700),
ggplot(ema,aes(worry,fill=sex)) + geom_histogram(position=position_dodge(0.5)),
ggplot(ema,aes(worry,fill=insomnia)) + geom_histogram(position=position_dodge(0.5)),
ggplot(ema,aes(worry,fill=insomnia.group)) + geom_histogram(position=position_dodge(0.5)),nrow=2)
# computing mean and SD in the sample and by group
summarySE(ema,measurevar="worry",na.rm=TRUE)[,1:4] # worry in the sample
summarySE(ema,measurevar="worry",groupvars="sex",na.rm=TRUE)[,1:4] # worry by sex
summarySE(ema,measurevar="worry",groupvars="insomnia",na.rm=TRUE)[,1:4] # worry by insomnia
summarySE(ema,measurevar="worry",groupvars="insomnia.group",na.rm=TRUE)[,1:4] # worry by insomnia.group
# very or extremely stressful
ema[!is.na(ema$worry) & ema$worry<4,"worry.VE"] <- 0
ema[!is.na(ema$worry) & ema$worry>=4,"worry.VE"] <- 1
ema$worry.VE <- as.factor(ema$worry.VE)
df <- ema[!is.na(ema$worry),]
grid.arrange(ggplot(df,aes(x=worry.VE)) + geom_histogram(stat="count") +
ggtitle("worry in the sample") + ylim(0,4500) +
geom_text(aes(label = scales::percent((..count..)/sum(..count..))), stat="count", vjust = -0.25),
ggplot(df,aes(worry.VE,fill=sex)) + geom_histogram(stat="count",position=position_dodge(0.5)),
ggplot(df,aes(worry.VE,fill=insomnia)) + geom_histogram(stat="count",position=position_dodge(0.5)),
ggplot(df,aes(worry.VE,fill=insomnia.group)) + geom_histogram(stat="count",position=position_dodge(0.5)),nrow=2)
# summarizing worry.VE
cat(nrow(df[df$worry.VE==1,]),"cases with very or extreme worry,",
round(100*nrow(df[df$worry.VE==1,])/nrow(df),2),"%\n",
nrow(df[df$insomnia==0 & df$worry.VE==1,]),"cases with very or extreme worry controls ratings,",
round(100*nrow(df[df$insomnia==0 & df$worry.VE==1,])/nrow(df[df$insomnia==1,]),2),"% of controls ratings\n",
nrow(df[df$insomnia==1 & df$worry.VE==1,]),"cases with very or extreme worry insomnia ratings,",
round(100*nrow(df[df$insomnia==1 & df$worry.VE==1,])/nrow(df[df$insomnia==1,]),2),"% of insomnia ratings")
## 621 cases with very or extreme worry, 12.59 %
## 256 cases with very or extreme worry controls ratings, 10.09 % of controls ratings
## 365 cases with very or extreme worry insomnia ratings, 14.39 % of insomnia ratings
Comments:
insomnia
(14%) than in controls (10%)Here, we describe the sources reported by participants for the worry
variable in the sample, and across the subgroups of participants. Note that many responses are missing, with unbalanced numbers of missing responses for each category. Here, we recode all categories (with the exception of stress_COVID
, that was added later than the others) by giving value zero when the score to one or more of the remaining categories was not missing.
# showing summary (original)
ema$worry_COVID <- NULL # removing worry_COVID
worrySources <- colnames(ema)[which(substr(colnames(ema),1,6)=="worry_")]
summary(ema[!is.na(ema$worry),worrySources])
## worry_school worry_family worry_health worry_peer worry_sleep worry_other
## 0 :2500 0 :3281 0 :2670 0 :2890 0 :2928 0 :3313
## 1 :2241 1 : 294 1 : 351 1 : 554 1 : 732 1 :1289
## NA's: 191 NA's:1357 NA's:1911 NA's:1488 NA's:1272 NA's: 330
# assigning zero when other scores are not missing
ema[,worrySources] <- lapply(ema[,worrySources],as.character) # from factor to numeric
ema[,worrySources] <- lapply(ema[,worrySources],as.numeric)
ema$worry_total <- rowSums(ema[,worrySources],na.rm=TRUE) # computing total score
ema[is.na(ema$worry_school) & !is.na(ema$worry_total),"worry_school"] <- 0
ema[is.na(ema$worry_family) & !is.na(ema$worry_total),"worry_family"] <- 0
ema[is.na(ema$worry_health) & !is.na(ema$worry_total),"worry_health"] <- 0
ema[is.na(ema$worry_peer) & !is.na(ema$worry_total),"worry_peer"] <- 0
ema[is.na(ema$worry_sleep) & !is.na(ema$worry_total),"worry_sleep"] <- 0
ema[is.na(ema$worry_other) & !is.na(ema$worry_total),"worry_other"] <- 0
ema[,worrySources] <- lapply(ema[,worrySources],as.factor) # back to factor
# showing summary (updated)
summary(ema[!is.na(ema$worry),worrySources])
## worry_school worry_family worry_health worry_peer worry_sleep worry_other
## 0:2691 0:4638 0:4581 0:4378 0:4200 0:3643
## 1:2241 1: 294 1: 351 1: 554 1: 732 1:1289
# plotting variables in the sample
for(Var in worrySources){
print(ggplot(ema[!is.na(ema$worry),],aes_string(x=Var)) + geom_histogram(stat="count",fill="gray") +
ggtitle(paste(Var,"in the sample")) +
geom_text(aes(label = scales::percent((..count..)/sum(..count..))), stat="count", vjust = 1.2))}
# plotting variables only in cases with worry > 1
for(Var in worrySources){
print(ggplot(ema[!is.na(ema$worry) & ema$worry>1,],aes_string(x=Var)) + geom_histogram(stat="count",fill="yellow") +
ggtitle(paste(Var,"in the sample when worry > 1")) +
geom_text(aes(label = scales::percent((..count..)/sum(..count..))), stat="count", vjust = 1.2))}
# plotting variables only in cases with worry = Very or Extremely stressed
for(Var in worrySources){
print(ggplot(ema[!is.na(ema$worry) & ema$worry>3,],aes_string(x=Var)) + geom_histogram(stat="count",fill="red") +
ggtitle(paste(Var,"in the sample when worry > 3")) +
geom_text(aes(label = scales::percent((..count..)/sum(..count..))), stat="count", vjust = 1.2))}
Comments:
worry
> 1, 76% of times with worry
> 3) and other (38% of times with worry
> 1, 43% of times with worry
> 3).Here, we plot the frequency of each source of stress by sex
, insomnia
, and insomnia.group
.
# plotting variables by sex
for(Var in worrySources){
print(ggplot(ema[!is.na(ema$worry) & ema$worry > 1,],aes_string(x="sex",fill=Var)) +
geom_histogram(stat="count",position="dodge") + ggtitle(paste(Var,"by sex")) +
geom_text(aes(label = scales::percent((..count..)/sum(..count..))),stat="count", vjust = 1.2,position="identity"))}
# plotting variables by insomnia
for(Var in worrySources){
print(ggplot(ema[!is.na(ema$worry) & ema$worry > 1,],aes_string(x="insomnia",fill=Var)) +
geom_histogram(stat="count",position="dodge") + ggtitle(paste(Var,"by insomnia")) +
geom_text(aes(label = scales::percent((..count..)/sum(..count..))),stat="count", vjust = 1.2,position="identity"))}
# plotting variables by insomnia.group
for(Var in worrySources){
print(ggplot(ema[!is.na(ema$worry) & ema$worry > 1,],aes_string(x="insomnia.group",fill=Var)) +
geom_histogram(stat="count",position="dodge") + ggtitle(paste(Var,"by insomnia.group")) +
geom_text(aes(label = scales::percent((..count..)/sum(..count..))),stat="count", vjust = 1.2,position="identity"))}
Here, we generate an aggregate plot integrating stress source, stress level, and group comparisons.
# stress
ss.df <- ema[!is.na(ema$stress) & ema$stress>1,] # selecting data
ss.df$stressSource <- "Multiple" # Multiple sources when stress_total > 1
ss.df[ss.df$stress_school==1 & ss.df$stress_total==1,"stressSource"] <- "School" # single source only when stress_total = 1
ss.df[ss.df$stress_family==1 & ss.df$stress_total==1,"stressSource"] <- "Family"
ss.df[ss.df$stress_peers==1 & ss.df$stress_total==1,"stressSource"] <- "Peers"
ss.df[ss.df$stress_health==1 & ss.df$stress_total==1,"stressSource"] <- "Health"
ss.df[ss.df$stress_other==1 & ss.df$stress_total==1,"stressSource"] <- "Other"
ss.df[ss.df$stress_total==0,"stressSource"] <- "Other" # other when unspecified source of stress (N = 25)
ss.df$stressSource <- factor(ss.df$stressSource,levels=c("School","Family","Peers","Health","Other","Multiple"))
ss.df$Stress <- "Not so/somwhat stressful" # recoding stress in two categories
ss.df[ss.df$stress>3,"Stress"] <- "Very/Extremely stressful"
ss.df$Stress <- factor(ss.df$Stress,levels=c("Very/Extremely stressful","Not so/somwhat stressful"))
ss.df$insomnia <- as.factor(gsub("0","Controls",gsub("1","Insomnia",ss.df$insomnia))) # recoding insomnia for plotting
p1 <- ggplot(ss.df, aes(stressSource, fill = Stress)) + geom_histogram(stat="count") + # plotting
facet_wrap("insomnia") + labs(y="No. of observations") + guides(fill=guide_legend(title="Daily stress")) +
theme(axis.text.x = element_text(angle=45),axis.title.x=element_blank())
# worry
ws.df <- ema[!is.na(ema$worry) & ema$worry>1,] # selecting data
ws.df$worrySource <- "Multiple" # Multiple sources when worry_total > 1
ws.df[ws.df$worry_school==1 & ws.df$worry_total==1,"worrySource"] <- "School"
ws.df[ws.df$worry_family==1 & ws.df$worry_total==1,"worrySource"] <- "Family"
ws.df[ws.df$worry_peer==1 & ws.df$worry_total==1,"worrySource"] <- "Peers"
ws.df[ws.df$worry_health==1 & ws.df$worry_total==1,"worrySource"] <- "Health"
ws.df[ws.df$worry_sleep==1 & ws.df$worry_total==1,"worrySource"] <- "Sleep"
ws.df[ws.df$worry_other==1 & ws.df$worry_total==1,"worrySource"] <- "Other"
ws.df[ws.df$worry_total==0,"worrySource"] <- "Other" # other when unspecified source of worry (N = 25)
ws.df$worrySource <- factor(ws.df$worrySource,levels=c("School","Family","Peers","Health","Sleep","Other","Multiple"))
ws.df$Worry <- "Not so/somwhat worried" # recoding worry in two categories
ws.df[ws.df$worry>3,"Worry"] <- "Very/Extremely worried"
ws.df$Worry <- factor(ws.df$Worry,levels=c("Very/Extremely worried","Not so/somwhat worried"))
ws.df$insomnia <- as.factor(gsub("0","Controls",gsub("1","Insomnia",ws.df$insomnia))) # recoding insomnia for plotting
p2 <- ggplot(ws.df, aes(worrySource, fill = Worry)) + geom_histogram(stat="count") + # plotting
facet_wrap("insomnia") + labs(y="No. of observations") + guides(fill=guide_legend(title="Pre-sleep worry")) +
theme(axis.text.x = element_text(angle=45),axis.title.x=element_blank())
# final plot
p <- grid.arrange(p1,p2,nrow=2)
ggsave(filename="RESULTS/Figure2.tiff",plot=p,dpi=500)
Here, we describe the mood
variable in the sample, and across the subgroups of participants.
# plotting
grid.arrange(ggplot(ema[!is.na(ema$mood),],aes(x=as.factor(mood))) + geom_histogram(stat="count") +
ggtitle("mood in the sample") +
geom_text(aes(label = scales::percent((..count..)/sum(..count..))), stat = "count", vjust = -0.25),
ggplot(ema,aes(mood,fill=sex)) + geom_histogram(position=position_dodge(0.5)),
ggplot(ema,aes(mood,fill=insomnia)) + geom_histogram(position=position_dodge(0.5)),
ggplot(ema,aes(mood,fill=insomnia.group)) + geom_histogram(position=position_dodge(0.5)),nrow=2)
# computing mean and SD in the sample and by group
summarySE(ema,measurevar="mood",na.rm=TRUE)[,1:4] # mood in the sample
summarySE(ema,measurevar="mood",groupvars="sex",na.rm=TRUE)[,1:4] # mood by sex
summarySE(ema,measurevar="mood",groupvars="insomnia",na.rm=TRUE)[,1:4] # mood by insomnia
summarySE(ema,measurevar="mood",groupvars="insomnia.group",na.rm=TRUE)[,1:4] # mood by insomnia.group
# somehow bad/bad mood
ema[!is.na(ema$mood) & ema$mood<4,"mood.VE"] <- 0
ema[!is.na(ema$mood) & ema$mood>=4,"mood.VE"] <- 1
ema$mood.VE <- as.factor(ema$mood.VE)
df <- ema[!is.na(ema$mood),]
grid.arrange(ggplot(df,aes(x=mood.VE)) + geom_histogram(stat="count") +
ggtitle("mood in the sample") + ylim(0,4500) +
geom_text(aes(label = scales::percent((..count..)/sum(..count..))), stat="count", vjust = -0.25),
ggplot(df,aes(mood.VE,fill=sex)) + geom_histogram(stat="count",position=position_dodge(0.5)),
ggplot(df,aes(mood.VE,fill=insomnia)) + geom_histogram(stat="count",position=position_dodge(0.5)),
ggplot(df,aes(mood.VE,fill=insomnia.group)) + geom_histogram(stat="count",position=position_dodge(0.5)),nrow=2)
# summarizing mood.VE
cat(nrow(df[df$mood.VE==1,]),"cases with very or extreme mood,",
round(100*nrow(df[df$mood.VE==1,])/nrow(df),2),"%\n",
nrow(df[df$insomnia==0 & df$mood.VE==1,]),"cases with very or extreme mood controls ratings,",
round(100*nrow(df[df$insomnia==0 & df$mood.VE==1,])/nrow(df[df$insomnia==1,]),2),"% of controls ratings\n",
nrow(df[df$insomnia==1 & df$mood.VE==1,]),"cases with very or extreme mood insomnia ratings,",
round(100*nrow(df[df$insomnia==1 & df$mood.VE==1,])/nrow(df[df$insomnia==1,]),2),"% of insomnia ratings")
## 713 cases with very or extreme mood, 14.46 %
## 342 cases with very or extreme mood controls ratings, 13.49 % of controls ratings
## 371 cases with very or extreme mood insomnia ratings, 14.63 % of insomnia ratings
Comments:
insomnia
(14.63%) than in controls (13.49%)Finally, we export the recoded ema
and demos
datasets to be used in the main analyses. Click here for Part 2 (Characterization of sleep and diary ratings) and here for Part 3 (Relationships between sleep and diary ratings).
save(ema,file="emaFINAL.RData")
save(demos,file="demosFINAL.RData")
Cranford, J. A., Shrout, P. E., Iida, M., Rafaeli, E., Yip, T., & Bolger, N. (2006). A Procedure for Evaluating Sensitivity to Within-Person Change: Can Mood Measures in Diary Studies Detect Change Reliably? Personality and Social Psychology Bulletin, 32(7), 917–929. https://doi.org/10.1177/0146167206287721
Geldhof, G. J., Preacher, K. J., & Zyphur, M. J. (2014). Reliability estimation in a multilevel confirmatory factor analysis framework. Psychological Methods, 19(1), 72–91. https://doi.org/10.1037/a0032138
Green, J. A. (2021). Too many zeros and/or highly skewed? A tutorial on modelling health behaviour as count data with Poisson and negative binomial regression. Health Psychology and Behavioral Medicine, 9(1), 436-455. https://doi.org/10.1080/21642850.2021.1920416
Hox, J. J. (2010). Multilevel Analysis: Techniques and Applications (2nd ed.). Routledge.
Jak, S., & Jorgensen, T. D. (2017). Relating Measurement Invariance, Cross-Level Invariance, and Multilevel Reliability. Frontiers in Psychology, 8(OCT), 1–9. https://doi.org/10.3389/fpsyg.2017.01640
Menghini, L., Yuksel, D., Goldstone, A., Baker, F. C., & de Zambotti, M. (2021). Performance of Fitbit Charge 3 against polysomnography in measuring sleep in adolescent boys and girls. Chronobiology International, 38(7):1010-1022. https://doi.org/10.1080/07420528.2021.1903481
Ng, V. K., & Cribbie, R. A. (2017). Using the gamma generalized linear model for modeling continuous, skewed and heteroscedastic outcomes in psychology. Current Psychology, 36(2), 225-235. https://doi.org/10.1007/s12144-015-9404-0