Analytical pipeline for testing the performance of sleep-tracking technology v1.0.0

Luca Menghini,\(^{1,2}\) \and Nicola Cellini,\(^{2,3,4,5}\) \and Aimee Goldstone,\(^1\) \and Fiona C. Baker,\(^{1,6}\) \and Massimiliano de Zambotti\(^1\)

\(^1\)Center for Health Sciences, SRI International, Menlo Park, CA, USA; \(^2\)Department of General Psychology, University of Padova, Padua, Italy; \(^3\)Padova Neuroscience Center, University of Padova, Padua, Italy; \(^4\)Department of Biomedical Sciences, University of Padova, Padua, Italy; \(^5\)Human Inspired Technology Center, University of Padova, Padua, Italy; \(^6\)Brain Function Research Group, School of Psychology, University of the Witwatersrand, Johannesburg, South Africa

Aims and content

The present document integrates the article “A standardized framework for testing the performance of sleep-tracking technology: Step-by-step guidelines and open-source code”, and it includes the code for running the essential steps to test the performance of a device under assessment (e.g., a consumer wearable sleep-tracking device) in measuring sleep as compared with the gold-standard reference method (polysomnography, PSG) or alternative reference method (e.g., actigraphy, sleep diary).

The functions depicted below are available also from the following public repository: https://github.com/SRI-human-sleep/sleep-trackers-performance )

The document includes the following sections:

Data structure: a sample dataset with the essential epoch-by-epoch data structure is loaded and considered the starting point for analyses.
Discrepancy analysis: individual- and group-level sleep measures are generated for both the reference method and the device under assessment, along with their bias, the limits of agreement (LoAs), and their 95% confidence intervals. Bland-Altman plots are provided.
Epoch-by-epoch analysis: error matrices (also referred to as confusion matrices), performance metrics at both the individual- and group-level, and secondary statistics are generated based on epoch-by-epoch data.

A function is provided for each step of the pipeline, which can be applied to any dataset that meets the assumptions below, to generate the respective output.

Glossary

PSG = Polysomnography
TIB = Time in bed
TST = Total Sleep Time
SOL = Sleep Onset Latency
SE = Sleep Efficiency
WASO = Wake After Sleep Onset
REM = Rapid Eye Movement
EBE = Epoch-by-Epoch
PPV = Positive Predictive Value
NPV = Negative Predictive Value
PABAK = Prevalence-Adjusted Bias-Adjusted Kappa
ROC = Receiver Operating Characteristic

Note also that in the new generation of multi-sensor sleep-tracking devices providing sleep staging information, ‘Light Sleep’ is usually considered equivalent to PSG-derived N1 + N2 sleep, while ‘Deep Sleep’ is usually considered equivalent to PSG-derived N3 sleep. When testing the performance of a device, we recommend checking with the device manufacturer for sleep stage specification.

Assumptions

The starting point of each of the following steps is a data structure based on the following assumptions:

- Measurement systems: sleep has been measured with both a device under assessment (e.g., a consumer sleep tracker) and a reference method (e.g., PSG)

- Epoch length: both device and reference recordings have the same epoch length (e.g. 30-seconds or 1-min level).

- Recording bounds: both device and reference recordings are confined to the period between lights-off and lights-on (i.e., both recordings share the same time in bed).

- Synchronization: the device and reference recordings have been synchronized on an epoch level and encoded using the same coding system (e.g. 0 = wake in both device and reference data).

- Staging: the device provides information on PSG-equivalent sleep staging, either as sleep/wake (typical of standard actigraphy) or as wake/light/deep/REM (typical of more modern consumer sleep trackers).

- Number of nights: only one night per subject has been recorded (the same procedures might apply to multiple nights, by aggregating night-by-night outcomes).

- Missing data: the dataset should not contain any missing data (i.e., both device and reference information must exist for each epoch).

Please refer to de Zambotti et al. (2019) and Depner et al. (2019) for guidelines and details about implementation and use of consumer sleep technology.

1. Data structure

As a first step, we load a sample dataset organized with the data structure described in the main article.

SLEEP STAGING

Here, we show the dataset with sleep stage information, encoded as: 0 = wake, 1 = light sleep (N1 or N2), 2 = deep sleep (N3), and 3 = REM sleep. The dataset is in a long format that includes one column with the subject identifier, one column for the epoch identifier, and two columns reporting the device and the reference data, respectively.

(raw.data <- read.csv("sample_data.csv"))

SLEEP/WAKE

To account for cases where only sleep/wake dichotomous data are allowed by the device, we also recode our sample dataset with 10 = wake and 5 = sleep.

# dichotomic recoding of raw dataset
dic.data <- raw.data
dic.data[dic.data$reference!=0,"reference"] = 5 # 5 when sleep
dic.data[dic.data$reference==0,"reference"] = 10 # 10 when wake
dic.data[dic.data$device!=0,"device"] = 5
dic.data[dic.data$device==0,"device"] = 10
dic.data

2. Discrepancy analysis

As a second step, we perform the essential procedures to analyze the discrepancy between device- and reference-derived sleep measures (e.g., total sleep time, wake after sleep onset, sleep onset latency). For each measure, the discrepancies will be computed at both individual-level and group-level, and Bland-Altman plots will be generated.

2.1. Sleep measures computation

First, the raw data are used to generate a dataset of sleep measures based on the definitions provided in the main article. This is done with the ebe2sleep.R function, which allows users to specify the following arguments:

data: data.frame including the raw data to be used, organized with the data structure shown above.
idCol, RefCol and deviceCol: character strings indicating the names of the columns for subjects, reference and device epochs, respectively.
epochLength: numeric value indicating the epoch duration in seconds (default: 30).
staging: logical value indicates whether the raw data includes sleep stages (wake, light, deep and REM) or not, FALSE assumes data consists of only wake/sleep distinction (default: TRUE).
stages: numeric vector indicating the coding system used for wake, light, deep and REM sleep (in this order). The default coding system is c(wake = 0, light = 1, deep = 2, REM = 3). Only wake and sleep should be indicated when staging = FALSE, e.g. c(wake = 0, sleep = 1).
digits: numeric value specifying the decimal precision for the output (default: 2).

Show function

ebe2sleep <- function(data=NA,idCol="subject",RefCol="reference",deviceCol="device",epochLenght=30,
                      staging=TRUE,stages=c(wake=0,light=1,deep=2,REM=3),digits=2){
  
  # renaming variables
  colnames(data) <- gsub(idCol,"ID",colnames(data))
  data$ID <- as.factor(data$ID)
  colnames(data) <- gsub(RefCol,"ref",colnames(data))
  colnames(data) <- gsub(deviceCol,"device",colnames(data))
  
  # setting stages as 0 = wake, 1 = light, 2 = deep, 3 = REM
  if(staging==TRUE){ 
    data$ref <- as.integer(as.character(factor(data$ref,levels=as.numeric(stages),labels=c(0,1,2,3))))
    data$device <- as.integer(as.character(factor(data$device,levels=as.numeric(stages),labels=c(0,1,2,3))))
    
    } else { # setting stages as 0 = wake, 1 = sleep when staging = FALSE
      if(length(stages)>2){ stop("only two elements should be used in the stages argument when staging = FALSE,
                                 e.g., stages = c(wake = 0, sleep = 1)") }
      data$ref <- as.integer(as.character(factor(data$ref,levels=as.numeric(stages),labels=c(0,1))))
      data$device <- as.integer(as.character(factor(data$device,levels=as.numeric(stages),labels=c(0,1)))) }
  
  # empty dataframe of sleep measures (stages length is not provided when staging = FALSE)
  sleep.metrics <- as.data.frame(matrix(nrow=0,ncol=ifelse(staging==TRUE,22,10)))

  for(ID in levels(data$ID)){

    sleepID <- data[data$ID==ID,]
  
    # TIB = number of minutes between lights on and lights off
    TIB <- nrow(sleepID)*epochLenght/60
  
    # TST = number of minutes scored as sleep
    TST_ref <- nrow(sleepID[sleepID$ref!=0,])*epochLenght/60
    TST_device <- nrow(sleepID[sleepID$device!=0,])*epochLenght/60
  
    # SE = percentage of sleep time over TIB
    SE_ref <- 100*TST_ref/TIB
    SE_device <- 100*TST_device/TIB
  
    # SOL = number of minutes scored as wake before the first epoch scored as sleep
    SOL_ref = SOL_device = 0
    for(i in 1:nrow(sleepID)){ if(sleepID[i,"ref"]==0){ SOL_ref = SOL_ref + 1 } else { break } }
    for(i in 1:nrow(sleepID)){ if(sleepID[i,"device"]==0){ SOL_device = SOL_device + 1 } else { break } }
    SOL_ref <- SOL_ref*epochLenght/60
    SOL_device <- SOL_device*epochLenght/60

    # WASO = number of minutes scored as wake after the first epoch scored as sleep
    WASO_ref <- sleepID[(SOL_ref*60/epochLenght+1):nrow(sleepID),]
    WASO_ref <- nrow(WASO_ref[WASO_ref$ref==0,])*epochLenght/60
    WASO_device <- sleepID[(SOL_device*60/epochLenght+1):nrow(sleepID),]
    WASO_device <- nrow(WASO_device[WASO_device$device==0,])*epochLenght/60
    
    if(staging==TRUE){
      
      # Light = number of minutes scored as Light sleep (N1 + N2)
      Light_ref <- nrow(sleepID[sleepID$ref==1,])*epochLenght/60
      Light_device <- nrow(sleepID[sleepID$device==1,])*epochLenght/60
    
      # Deep = number of minutes scored as Light sleep (N3)
      Deep_ref <- nrow(sleepID[sleepID$ref==2,])*epochLenght/60
      Deep_device <- nrow(sleepID[sleepID$device==2,])*epochLenght/60
    
      # REM = number of minutes scored as REM sleep
      REM_ref <- nrow(sleepID[sleepID$ref==3,])*epochLenght/60
      REM_device <- nrow(sleepID[sleepID$device==3,])*epochLenght/60
  
      # LightPerc = percentage of Light sleep over TST
      LightPerc_ref <- 100*Light_ref/TST_ref
      LightPerc_device <- 100*Light_device/TST_device
  
      # DeepPerc  = percentage of Deep sleep over TST
      DeepPerc_ref <- 100*Deep_ref/TST_ref
      DeepPerc_device <- 100*Deep_device/TST_device

      # REMPerc = percentage of REM sleep over TST
      REMPerc_ref <- 100*REM_ref/TST_ref
      REMPerc_device <- 100*REM_device/TST_device
  
      # filling dataframe of sleep metrics
      sleep.metrics <- rbind(sleep.metrics,
                             data.frame(subject=ID,TIB=TIB,
                                        # sleep/wake metrics
                                        TST_ref,TST_device,SE_ref,SE_device,SOL_ref,SOL_device,WASO_ref,WASO_device,
                                        # sleep stages duration metrics
                                        Light_ref,Light_device,Deep_ref,Deep_device,REM_ref,REM_device,
                                        # sleep stages percentages metrics
                                        LightPerc_ref,LightPerc_device,DeepPerc_ref,DeepPerc_device,REMPerc_ref,REMPerc_device))
      
    } else {
      
      sleep.metrics <- rbind(sleep.metrics,
                             data.frame(subject=ID,TIB=TIB,
                                        # sleep/wake metrics
                                        TST_ref,TST_device,SE_ref,SE_device,SOL_ref,SOL_device,WASO_ref,WASO_device))
      
    }
    
  }
  
  # rounding values and returning dataset
  nums <- vapply(sleep.metrics, is.numeric, FUN.VALUE = logical(1))
  sleep.metrics[,nums] <- round(sleep.metrics[,nums], digits = digits)
  return(sleep.metrics) 
}

SLEEP STAGING

Here, we use the function to generate the dataset of sleep measures by assuming that sleep staging is allowed by the device.

(sleep.data <- ebe2sleep(data = raw.data, idCol = "subject", RefCol = "reference", deviceCol = "device",
                         epochLenght = 30, staging = TRUE, stages = c(wake = 0, light = 1, deep = 2, REM = 3), digits = 2))

SLEEP/WAKE

Here, we use the function to generate the dataset of sleep measures by assuming that only sleep/wake patterns (and not sleep staging) are allowed by the device.

(sleep.data.dic <- ebe2sleep(data = dic.data, idCol = "subject", RefCol = "reference", deviceCol = "device",
                             epochLenght = 30, staging = FALSE, stages = c(wake = 10, sleep = 5), digits = 2))

2.2. Individual-level discrepancies

Here, we compute the differences between device- and reference -derived sleep measures for each subject. This is done with the indDiscr.R function, which allows the user to specify the following arguments:

data: data.frame of sleep measures (such that generated with the ebe2sleep.R function).
staging: logical value indicates whether the raw data includes sleep stages (wake, light, deep and REM) or not, FALSE assumes data consists of only wake/sleep distinction (default: TRUE).
digits: numeric value specifying the decimal precision for the output (default: 2).
doPlot: logical value indicating if the data should be plotted (default: TRUE, requires the ggplot2 and the reshape2 package).

Show function

indDiscr <- function(data=NA,staging=TRUE,digits=2,doPlot=TRUE){
  
  indivDiscr <- data.frame(subject=data$subject,
                           TST_diff = data$TST_device - data$TST_ref, # TST
                           SE_diff = data$SE_device - data$SE_ref, # SE
                           SOL_diff = data$SOL_device - data$SOL_ref, # SOL
                           WASO_diff = data$WASO_device - data$WASO_ref) # WASO
  
  if(staging==TRUE){
    
    indivDiscr <- cbind(indivDiscr,
                        data.frame(
                          
                          # adding sleep stages duration
                          Light_diff = data$Light_device - data$Light_ref,
                          Deep_diff = data$Deep_device - data$Deep_ref,
                          REM_diff = data$REM_device - data$REM_ref,
                          
                          # adding sleep stages percentages
                          LightPerc_diff = data$LightPerc_device - data$LightPerc_ref,
                          DeepPerc_diff = data$DeepPerc_device - data$DeepPerc_ref,
                          REMPerc_diff = data$REMPerc_device - data$REMPerc_ref
                          
                        ))
    
  }
  
  # rounding values
  nums <- vapply(indivDiscr, is.numeric, FUN.VALUE = logical(1))
  indivDiscr[,nums] <- round(indivDiscr[,nums], digits = digits)
  
  if(doPlot==TRUE){
    
    require(ggplot2) 
    require(reshape2)
    
    # normalizing data by column to have color code
    melted.perc <- absDiscr <- indivDiscr
    absDiscr[,2:ncol(absDiscr)] <- abs(absDiscr[,2:ncol(absDiscr)])
    for(i in 1:nrow(melted.perc)){ for(j in 2:ncol(melted.perc)){ 
      melted.perc[i,j] <- 100*(absDiscr[i,j] - min(absDiscr[,j]))/(max(absDiscr[,j])-min(absDiscr[,j]))
      } }
    melted.perc <- melt(melted.perc[,!grepl("diffPerc",colnames(melted.perc))])
    melted.perc$variable <- gsub("Perc","%",gsub("_diff","",melted.perc$variable))
    melted.perc[melted.perc$value>100,"value"] <- 100 # bounding at 100
    
    melted.labs <- melt(indivDiscr[,!grepl("diffPerc",colnames(indivDiscr))])
    melted.labs$variable <- gsub("Perc","%",gsub("_diff","",melted.labs$variable))
    
    # sorting axis labels
    if(staging == TRUE){ sleep.levels <- c("TST","SE","WASO","SOL","Light","Deep","REM","Light%","Deep%","REM%")
    } else { sleep.levels <- c("TST","SE","WASO","SOL") }
    
    # generating plot
    p <- ggplot(data=melted.perc,aes(x=ordered(variable,levels=sleep.levels),
                                     y=ordered(subject,levels=rev(levels(subject))),fill=value)) +
      geom_tile() + xlab("variable") + ylab("subject") +
      scale_fill_gradient2(low="white",high="#f03b20",
                           limit = c(0,100),
                           space = "Lab",
                           name="Min-max normalized \ndiscrepancy (%)",
                           guide="legend",
                           breaks=round(seq(100,0,length.out = 11),2),
                           minor_breaks=round(seq(100,0,length.out = 11),2)) +
      geom_text(data=melted.labs,aes(x=ordered(variable,levels=sleep.levels),
                                     y=ordered(subject,levels=rev(levels(subject))),label=value),color="black",size=3) +
      ggtitle("Discrepancies between device and ref by subject") + theme(panel.background = element_blank())
    
    return(list(indivDiscr,p))
    
  } else { return(indivDiscr) }
  
}

SLEEP STAGING

Here, the function is applied to generate individual discrepancies by assuming that sleep staging is allowed by the device.

indDiscr(data = sleep.data, staging=TRUE, digits=2, doPlot = FALSE)

SLEEP/WAKE

Here, the function is applied to generate individual discrepancies by assuming that only sleep/wake patterns (and not sleep staging) are allowed by the device.

indDiscr(data = sleep.data.dic, staging=FALSE, digits=2, doPlot = FALSE)

2.2.1. Data visualization

Here, we set the doPlot argument as TRUE in the DiscrAnalysis_ind function to plot the results. The generated plot (table format) includes the same values shown in the table above, with colors indicating the minimum-maximum, normalized values. This allows a visual check for outliers for each of the main sleep outcomes.

SLEEP STAGING

Here, the function is applied by assuming that sleep staging is allowed by the device.

indDiscr(data = sleep.data, staging=TRUE, digits=2, doPlot = TRUE)[2]

## [[1]]

SLEEP/WAKE

Here, the function is applied by assuming that only sleep/wake patterns (and not sleep staging) are allowed by the device.

indDiscr(data = sleep.data.dic, staging=FALSE, digits=2, doPlot = TRUE)[2]

## [[1]]

2.3. Group-level discrepancies

Here, we compute the standard metrics to evaluate the discrepancy/agreement between device- and reference-derived sleep measures, based on the definitions provided in the main article, in compliance with Bland and Altman (1999). As recommended by Euser, Dekker and le Cessie (2008), when a significant correlation is observed between differences and PSG-derived measures (proportional bias), the LOAs are indicated as a function of the PSG-derived value. This is done with the groupDiscr.R function, which allows specification of the following arguments:

data: data.frame of sleep measures with a structure matching that generated with the ebe2sleep.R function).
measures: character vector indicating the name of the two columns to be considered (i.e., fist column: device, second column: reference). Default is c(“TST_device”,“TST_ref”).
size: character indicating what should be used as the size of measurement: “reference” (default) for using the reference method as we suggest in our article, “mean” for using the averages of ref- and device-derived measures, as originally suggested by Bland & Altman (1999).
logTransf: logical value indicating if data should be log transformed to deal with heteroscedasticity or non-normal distribution of differences (default: FALSE). If TRUE, data bias and LOAs are computed on the log-transformed data and back-transformed to express LOAs as a function of the size of measurement, following the method proposed by Euser, Dekker and le Cessie (2008).
CI.type: character string indicating the type of confidence intervals to be used: “classic” (default) returns intervals based on the t-distribution, “boot” returns intervals computed by bootstrap with 10000 replicates (requires the boot package).
CI.level: numeric value indicating the confidence intervals to be computed on bias and LOAs (default: .95).
boot.type: character string indicating the type of bootstrapped confidence intervals. The value should be one of “basic” (default), “norm”, “stud”, “perc”, or “bca” (see ?boot.ci for details). This argument is not considered when CI.type = “classic”.
boot.R: numeric value indicating the number of boostrap replicates (default: 10,000). Higher values require more time for computing CI (see also ?boot). This argument is not considered when CI.type = “classic”.
digits: numeric value specifying the decimal precision for the output (default: 2).
warnings: logical value indicating if warnings regarding assumptions and analytical details should be displayed (default: TRUE).

The function is partially based on the BlandAltmanLeh package (Lehnert, 2015).

Show function

groupDiscr <- function(data=NA,measures=c("TST_device","TST_ref"),size="reference",logTransf=FALSE,
                       CI.type="classic",CI.level=.95,boot.type="basic",boot.R=10000,digits=2,warnings=TRUE){
  
  require(BlandAltmanLeh)
  
  # setting measure name and unit of measurement
  measure <- gsub("_ref","",gsub("_device","",measures[1]))
  if(any(grepl("efficiency|EFFICIENCY|se|SE|eff|EFF|perc|Perc|PERC|%",measure))){
    meas.unit <- "%" } else { meas.unit <- "min" }
  if(warnings==TRUE){cat("\n\n----------------\n Measure:",measure,"\n----------------")}
  
  # packages and functions to be used with boostrap CI
  if(CI.type=="boot"){ require(boot)
    # functions to generate bootstrap CI for model parameters
    boot.reg <- function(data,formula,indices){ return(coef(lm(formula,data=data[indices,]))[2]) } # slope
    boot.int <- function(data,formula,indices){ return(coef(lm(formula,data=data[indices,]))[1]) } # intercept
    boot.res <- function(data,formula,indices){return(1.96*sd(resid(lm(formula,data=data[indices,]))))} # 1.96 SD of residuals
    if(warnings==TRUE){cat("\n\nComputing boostrap CI with method '",boot.type,"' ...",sep="")}
    } else if(CI.type!="classic") { stop("Error: CI.type can be either 'classic' or 'boot'") }
  
  # data to be used
  ba.stat <- bland.altman.stats(data[,measures[1]],data[,measures[2]],conf.int=CI.level)
  if(size=="reference"){
    ba <- data.frame(size=ba.stat$groups$group2,diffs=ba.stat$diffs)
    xlab <- paste("Reference-derived",measure)
    } else if(size=="mean"){
      ba <- data.frame(size=ba.stat$means,diffs=ba.stat$diffs)
      xlab <- paste("Mean",measure,"by device and reference")
    } else { stop("Error: size argument can be either 'reference' or 'mean'") }
  
  # basic output table
  out <- data.frame(measure=paste(measure," (",meas.unit,")",sep=""),
                    
                    # mean and standard deviation for ref and device
                    device = paste(round(mean(ba.stat$groups$group1,na.rm=TRUE),digits)," (",
                                   round(sd(ba.stat$groups$group1,na.rm=TRUE),digits),")",sep=""),
                    reference = paste(round(mean(ba.stat$groups$group2,na.rm=TRUE),digits)," (",
                                      round(sd(ba.stat$groups$group2,na.rm=TRUE),digits),")",sep=""),
                    
                    # CI type
                    CI.type=CI.type,
                    CI.level=CI.level)
  
  # ..........................................
  # 1. TESTING PROPORTIONAL BIAS
  # ..........................................
  m <- lm(diffs~size,ba)
  if(CI.type=="classic"){ CI <- confint(m,level=CI.level)[2,] 
  } else { CI <- boot.ci(boot(data=ba,statistic=boot.reg,formula=diffs~size,R=boot.R),
                         type=boot.type,conf=CI.level)[[4]][4:5] }
  prop.bias <- ifelse(CI[1] > 0 | CI[2] < 0, TRUE, FALSE)
  
  # updating output table
  out <- cbind(out,prop.bias=prop.bias)
  
  # ...........................................
  # 1.1. DIFFERENCES INDEPENDENT FROM SIZE
  # ...........................................
  if(prop.bias == FALSE){
    
    if(CI.type=="boot"){ # changing bias CI when CI.type="boot"
      ba.stat$CI.lines[3] <- boot.ci(boot(ba$diffs,function(dat,idx)mean(dat[idx],na.rm=TRUE),R=boot.R),
                                     type=boot.type,conf=CI.level)[[4]][4]
      ba.stat$CI.lines[4] <- boot.ci(boot(ba$diffs,function(dat,idx)mean(dat[idx],na.rm=TRUE),R=boot.R),
                                     type=boot.type,conf=CI.level)[[4]][5] }
    
    out <- cbind(out, # updating output table
                 # bias (SD) [CI]
                 bias=paste(round(ba.stat$mean.diffs,digits)," (",round(sd(ba.stat$diffs),digits),")",sep=""),
                 bias_CI=paste("[",round(ba.stat$CI.lines[3],digits),", ",round(ba.stat$CI.lines[4],digits),"]",sep=""))
    
    # ..........................................
    # 1.2. DIFFERENCES PROPORTIONAL TO SIZE
    # ..........................................
    } else { 
        
      b0 <- coef(m)[1]
      b1 <- coef(m)[2]
      
      # warning message
      if(warnings==TRUE){cat("\n\nWARNING: differences in ",measure," might be proportional to the size of measurement (coeff. = ",
          round(b1,2)," [",round(CI[1],2),", ",round(CI[2],2),"]",").",
          "\nBias and LOAs are represented as a function of the size of measurement.",sep="")}
      
      # intercept CI
      if(CI.type=="classic"){ CInt <- confint(m,level=CI.level)[1,]
      } else { CInt <- boot.ci(boot(data=ba,statistic=boot.int,formula=diffs~size,R=boot.R),
                               type=boot.type,conf=CI.level)[[4]][4:5] }
      
      out <- cbind(out, # updating output table
                   
                   # bias and CI following Bland & Altman (1999): D = b0 + b1 * size
                   bias=paste(round(b0,digits)," + ",round(b1,digits)," x ",ifelse(size=="mean","mean","ref"),sep=""),
                   bias_CI=paste("b0 = [",round(CInt[1],digits),", ",round(CInt[2],digits),
                                 "], b1 = [",round(CI[1],digits),", ",round(CI[2],digits),"]",sep="")) }
  
  # ..............................................
  # 2. LOAs ESTIMATION FROM ORIGINAL DATA
  # ..............................................
  if(logTransf == FALSE){
    
    # testing heteroscedasticity
    mRes <- lm(abs(resid(m))~size,ba)
    if(CI.type=="classic"){ CIRes <- confint(mRes,level=CI.level)[2,]
    } else { CIRes <- boot.ci(boot(data=ba,statistic=boot.reg,formula=abs(resid(m))~size,R=boot.R),
                              type=boot.type,conf=CI.level)[[4]][4:5] }
    heterosced <- ifelse(CIRes[1] > 0 | CIRes[2] < 0,TRUE,FALSE)
    
    # testing normality of differences
    shapiro <- shapiro.test(ba.stat$diffs)
    if(shapiro$p.value <= .05){ normality = FALSE
      if(warnings==TRUE){cat("\n\nWARNING: differences in ",measure,
          " might be not normally distributed (Shapiro-Wilk W = ",round(shapiro$statistic,3),", p = ",round(shapiro$p.value,3),
          ").","\nBootstrap CI (CI.type='boot') and log transformation (logTransf=TRUE) are recommended.",sep="")} 
    } else { normality = TRUE }
    
    # updating output table
    out <- cbind(out[,1:6],logTransf=FALSE,normality=normality,heterosced=heterosced,out[,7:ncol(out)])
    
    # ...............................................
    # 2.1. CONSTANT BIAS AND HOMOSCEDASTICITY
    # ............................................
      if(prop.bias==FALSE & heterosced==FALSE){
        
        if(CI.type=="boot"){ # changing LOAs CI when CI.type="boot"
          ba.stat$CI.lines[1] <- boot.ci(boot(ba$diffs-1.96*sd(ba.stat$diffs),
                                              function(dat,idx)mean(dat[idx],na.rm=TRUE),R=boot.R),
                                         type=boot.type,conf=CI.level)[[4]][4]
          ba.stat$CI.lines[2] <- boot.ci(boot(ba$diffs-1.96*sd(ba.stat$diffs),
                                              function(dat,idx)mean(dat[idx],na.rm=TRUE),R=boot.R),
                                         type=boot.type,conf=CI.level)[[4]][5]
          ba.stat$CI.lines[5] <- boot.ci(boot(ba$diffs+1.96*sd(ba.stat$diffs),
                                              function(dat,idx)mean(dat[idx],na.rm=TRUE),R=boot.R),
                                         type=boot.type,conf=CI.level)[[4]][4]
          ba.stat$CI.lines[6] <- boot.ci(boot(ba$diffs+1.96*sd(ba.stat$diffs),
                                              function(dat,idx)mean(dat[idx],na.rm=TRUE),R=boot.R),
                                         type=boot.type,conf=CI.level)[[4]][5] }
        
        out <- cbind(out, # updating output table
                     
                     # lower LOA and CI
                     LOA.lower = ba.stat$lower.limit,
                     LOA.lower_CI = paste("[",round(ba.stat$CI.lines[1],2),", ",round(ba.stat$CI.lines[2],digits),"]",sep=""),
                     
                     # upper LOA and CI
                     LOA.upper = ba.stat$upper.limit,
                     LOA.upper_CI = paste("[",round(ba.stat$CI.lines[5],2),", ",round(ba.stat$CI.lines[6],digits),"]",sep=""))
      
    # ...............................................
    # 2.2. PROPORTIONAL BIAS AND HOMOOSCEDASTICITY
    # ...............................................
    } else if(prop.bias==TRUE & heterosced==FALSE) { 
        
      # boostrapped CI in any case
      if(warnings==TRUE){cat("\n\nLOAs CI are computed using boostrap with method '",boot.type,"'.",sep="")}
      
      require(boot)
      
      boot.res <- function(data,formula,indices){return(1.96*sd(resid(lm(formula,data=data[indices,]))))} # 1.96 SD of residuals
      CI.sdRes <- boot.ci(boot(data=ba,statistic=boot.res,formula=diffs~size,R=boot.R),
                          type=boot.type,conf=CI.level)[[4]][4:5]
      
      out <- cbind(out, # updating output table
                   
                   # lower LOA and CI following Bland & Altman (1999): LOAs = bias - 1.96sd of the residuals
                   LOA.lower = paste("bias -",round(1.96*sd(resid(m)),digits)),
                   LOA.lower_CI = paste("bias - [",round(CI.sdRes[1],digits),", ",round(CI.sdRes[2],digits),"]",sep=""),
                   
                   # upper LOA and CI following Bland & Altman (1999): LOAs = bias + 1.96sd of the residuals
                   LOA.upper = paste("bias +",round(1.96*sd(resid(m)),digits)),
                   LOA.upper_CI =paste("bias + [",round(CI.sdRes[1],digits),", ",round(CI.sdRes[2],digits),"]",sep="")) 
      
      # ............................................
      # 2.3. HETEROSCEDASTICITY
      # ............................................
      } else if(heterosced==TRUE){
          
          c0 <- coef(mRes)[1]
          c1 <- coef(mRes)[2]
          
          # warning message
          if(warnings==TRUE){cat("\n\nWARNING: standard deviation of differences in ",measure,
              " might be proportional to the size of measurement (coeff. = ",
              round(c1,digits)," [",round(CIRes[1],digits),", ",round(CIRes[2],digits),"]",").",
              "\nLOAs range is represented as a function of the size of measurement.",sep="")}
          
          # intercept CI
          if(CI.type=="classic"){ CInt <- confint(mRes,level=CI.level)[1,]
          } else { CInt <- boot.ci(boot(data=ba,statistic=boot.int,formula=abs(resid(m))~size,R=boot.R),
                                   type=boot.type,conf=CI.level)[[4]][4:5] }
          
          out <- cbind(out, # updating output table
                       
                       # lower LOA and CI following Bland & Altman (1999): LOAs = meanDiff - 2.46(c0 + c1A)
                       LOA.lower = paste("bias - 2.46(",round(c0,digits)," + ",round(c1,digits)," x ",
                                         ifelse(size=="mean","mean","ref"),")",sep=""),
                       LOA.lower_CI = paste("c0 = [",round(CInt[1],digits),", ",round(CInt[2],digits),
                                            "], c1 = [",round(CIRes[1],digits),", ",round(CIRes[2],digits),"]",sep=""),
                       
                       # upper LOA and CI following Bland & Altman (1999): LOAs = meanDiff - 2.46(c0 + c1A)
                       LOA.upper = paste("bias + 2.46(",round(c0,digits)," + ",round(c1,digits)," x ",
                                         ifelse(size=="mean","mean","ref"),")",sep=""),
                       LOA.upper_CI = paste("c0 = [",round(CInt[1],digits),", ",round(CInt[2],digits),
                                            "], c1 = [",round(CIRes[1],digits),", ",round(CIRes[2],digits),"]",sep="")) }
    
  # ..............................................
  # 3. LOAs ESTIMATION FROM LOG-TRANSFORMED DATA
  # ..............................................
  } else {
    
    # log transformation of data (add little constant to avoid Inf values)
    if(warnings==TRUE){cat("\n\nLog transforming the data before computing LOAs...")}
    ba.stat$groups$LOGgroup1 <- log(ba.stat$groups$group1 + .0001)
    ba.stat$groups$LOGgroup2 <- log(ba.stat$groups$group2 + .0001)
    ba.stat$groups$LOGdiff <- ba.stat$groups$LOGgroup1 - ba.stat$groups$LOGgroup2
    if(size=="reference"){ baLog <- data.frame(size=ba.stat$groups$LOGgroup2,diffs=ba.stat$groups$LOGdiff)
    } else { baLog <- data.frame(size=(ba.stat$groups$LOGgroup1 + ba.stat$groups$LOGgroup2)/2,diffs=ba.stat$groups$LOGdiff) }
    
    # testing heteroscedasticity
    mRes <- lm(abs(resid(m))~size,baLog)
    if(CI.type=="classic"){ CIRes <- confint(mRes,level=CI.level)[2,]
    } else { CIRes <- boot.ci(boot(data=baLog,statistic=boot.reg,formula=abs(resid(m))~size,R=boot.R),
                              type=boot.type,conf=CI.level)[[4]][4:5] }
    heterosced <- ifelse(CIRes[1] > 0 | CIRes[2] < 0,TRUE,FALSE)
    
    # testing normality of differences
    shapiro <- shapiro.test(baLog$diffs)
    if(shapiro$p.value <= .05){ normality = FALSE
      if(warnings==TRUE){cat("\n\nWARNING: differences in log transformed ",measure,
          " might be not normally distributed (Shapiro-Wilk W = ",round(shapiro$statistic,3),", p = ",round(shapiro$p.value,3),
          ").","\nBootstrap CI (CI.type='boot') are recommended.",sep="")} } else { normality = TRUE }
    
    # updating output table
    out <- cbind(out[,1:6],logTransf=FALSE,normality=normality,heterosced=heterosced,out[,7:ncol(out)])
    
    # LOAs slope following Euser et al (2008) for antilog transformation: slope = 2 * (e^(1.96 SD) - 1)/(e^(1.96 SD) + 1)
    ANTILOGslope <- function(x){ 2 * (exp(1.96 * sd(x)) - 1) / (exp(1.96*sd(x)) + 1) }
    ba.stat$LOA.slope <- ANTILOGslope(baLog$diffs)
    
    # LOAs CI slopes 
    if(CI.type=="classic"){ # classic CI
      t1 <- qt((1 - CI.level)/2, df = ba.stat$based.on - 1) # t-value right
      t2 <- qt((CI.level + 1)/2, df = ba.stat$based.on - 1) # t-value left
      ba.stat$LOA.slope.CI.upper <- 2 * (exp(1.96 * sd(baLog$diffs) + t2 * sqrt(sd(baLog$diffs)^2 * 3/ba.stat$based.on)) - 1) /
        (exp(1.96*sd(baLog$diffs) + t2 * sqrt(sd(baLog$diffs)^2 * 3/ba.stat$based.on)) + 1)
      ba.stat$LOA.slope.CI.lower <- 2 * (exp(1.96 * sd(baLog$diffs) + t1 * sqrt(sd(baLog$diffs)^2 * 3/ba.stat$based.on)) - 1) /
        (exp(1.96*sd(baLog$diffs) + t1 * sqrt(sd(baLog$diffs)^2 * 3/ba.stat$based.on)) + 1)
      } else { # boostrap CI
        ba.stat$LOA.slope.CI.lower <- boot.ci(boot(baLog$diffs,function(dat,idx) ANTILOGslope(dat[idx]),R=boot.R),
                                              type=boot.type,conf=CI.level)[[4]][4]
        ba.stat$LOA.slope.CI.upper <- boot.ci(boot(baLog$diffs,function(dat,idx) ANTILOGslope(dat[idx]),R=boot.R),
                                              type=boot.type,conf=CI.level)[[4]][5] }
    
    # LOAs depending on mean and size (regardless if bias is proportional or not)
    out <- cbind(out, # updating output table
                 
                 # lower LOA and CI
                 LOA.lower = paste("bias - ",size," x ",round(ba.stat$LOA.slope,digits)),
                 LOA.lower_CI = paste("bias - ",size," x [",round(ba.stat$LOA.slope.CI.lower,digits),
                                      ", ",round(ba.stat$LOA.slope.CI.upper,digits),"]",sep=""),
                 
                 # upper LOA and CI
                 LOA.upper = paste("bias + ",size," x ",round(ba.stat$LOA.slope,digits)),
                 LOA.upper_CI = paste("bias + ",size," x [",round(ba.stat$LOA.slope.CI.lower,digits),
                                      ", ",round(ba.stat$LOA.slope.CI.upper,digits),"]",sep="")) }
  
  
  # rounding values
  nums <- vapply(out, is.numeric, FUN.VALUE = logical(1))
  out[,nums] <- round(out[,nums], digits = digits)
  out[,nums] <- as.character(out[,nums]) # to prevent NA values when combined with other cases
  row.names(out) <- NULL
  
  return(out)
  
}

Here, the function is applied to the dataset generated with the EBE2sleep function. The rbind function is used to concatenate the information on each stage in a single output. The CI.type and the logTransf arguments are manipulated among sleep measures to show differences between the function settings. Note that a warning is returned in cases where a proportional bias and/or heteroscedasticity is detected (in these cases, the discrepancy metrics are computed according to the procedures described in the main article) or when the distribution of differences deviates from normality (i.e., based on the Shapiro-Wilk test).

SLEEP STAGING

Here, the function is applied by assuming that sleep staging is allowed by the device.

      # boostrapped CI on original data
rbind(groupDiscr(data = sleep.data, measures=c("TST_device","TST_ref"), size="reference", 
                 CI.type="boot", CI.level=.95, digits=2),
      groupDiscr(data = sleep.data, measures=c("SE_device","SE_ref"), size="reference", 
                 CI.type="boot", CI.level=.95, digits=2),
      groupDiscr(data = sleep.data, measures=c("SOL_device","SOL_ref"), size="reference", 
                 CI.type="boot", CI.level=.95, digits=2),
      groupDiscr(data = sleep.data, measures=c("WASO_device","WASO_ref"), size="reference", 
                 CI.type="boot", CI.level=.95, digits=2),
      groupDiscr(data = sleep.data, measures=c("Light_device","Light_ref"), size="reference", 
                 CI.type="boot", CI.level=.95, digits=2),
      groupDiscr(data = sleep.data, measures=c("Deep_device","Deep_ref"), size="reference", 
                 CI.type="boot", CI.level=.95, digits=2),
      
      # bootstrapped CI on log-transformed data
      groupDiscr(data = sleep.data, measures=c("REM_device","REM_ref"), size = "reference", logTransf = TRUE, 
                 CI.type="boot", CI.level=.95, digits=2),
      groupDiscr(data = sleep.data, measures=c("LightPerc_device","LightPerc_ref"), size = "reference", logTransf = TRUE, 
                 CI.type="boot",boot.type="basic",CI.level=.95, digits=2),
      
      # classic CI on log-transformed data
      groupDiscr(data = sleep.data, measures=c("DeepPerc_device","DeepPerc_ref"), size  ="reference", logTransf = TRUE, 
                 CI.type="classic", boot.type="basic", CI.level=.95, digits=2),
      groupDiscr(data = sleep.data, measures=c("REMPerc_device","REMPerc_ref"), size="reference", logTransf = TRUE, 
                 CI.type="classic",boot.type="basic",CI.level=.95, digits=2))

## 
## 
## ----------------
##  Measure: TST 
## ----------------
## 
## Computing boostrap CI with method 'basic' ...
## 
## WARNING: differences in TST might be proportional to the size of measurement (coeff. = -0.15 [-0.28, -0.03]).
## Bias and LOAs are represented as a function of the size of measurement.
## 
## LOAs CI are computed using boostrap with method 'basic'.
## 
## ----------------
##  Measure: SE 
## ----------------
## 
## Computing boostrap CI with method 'basic' ...
## 
## WARNING: differences in SE might be proportional to the size of measurement (coeff. = -0.65 [-1.11, -0.35]).
## Bias and LOAs are represented as a function of the size of measurement.
## 
## WARNING: differences in SE might be not normally distributed (Shapiro-Wilk W = 0.855, p = 0.026).
## Bootstrap CI (CI.type='boot') and log transformation (logTransf=TRUE) are recommended.
## 
## WARNING: standard deviation of differences in SE might be proportional to the size of measurement (coeff. = -0.26 [-0.72, -0.3]).
## LOAs range is represented as a function of the size of measurement.
## 
## ----------------
##  Measure: SOL 
## ----------------
## 
## Computing boostrap CI with method 'basic' ...
## 
## WARNING: differences in SOL might be proportional to the size of measurement (coeff. = -0.69 [-2.41, -0.3]).
## Bias and LOAs are represented as a function of the size of measurement.
## 
## WARNING: differences in SOL might be not normally distributed (Shapiro-Wilk W = 0.862, p = 0.032).
## Bootstrap CI (CI.type='boot') and log transformation (logTransf=TRUE) are recommended.
## 
## WARNING: standard deviation of differences in SOL might be proportional to the size of measurement (coeff. = 0.53 [0.31, 1.53]).
## LOAs range is represented as a function of the size of measurement.
## 
## ----------------
##  Measure: WASO 
## ----------------
## 
## Computing boostrap CI with method 'basic' ...
## 
## WARNING: differences in WASO might be proportional to the size of measurement (coeff. = -0.8 [-1.56, -0.23]).
## Bias and LOAs are represented as a function of the size of measurement.
## 
## WARNING: differences in WASO might be not normally distributed (Shapiro-Wilk W = 0.854, p = 0.025).
## Bootstrap CI (CI.type='boot') and log transformation (logTransf=TRUE) are recommended.
## 
## LOAs CI are computed using boostrap with method 'basic'.
## 
## ----------------
##  Measure: Light 
## ----------------
## 
## Computing boostrap CI with method 'basic' ...
## 
## ----------------
##  Measure: Deep 
## ----------------
## 
## Computing boostrap CI with method 'basic' ...
## 
## WARNING: differences in Deep might be proportional to the size of measurement (coeff. = -0.65 [-1.14, -0.23]).
## Bias and LOAs are represented as a function of the size of measurement.
## 
## WARNING: standard deviation of differences in Deep might be proportional to the size of measurement (coeff. = 0.18 [0.08, 0.69]).
## LOAs range is represented as a function of the size of measurement.
## 
## ----------------
##  Measure: REM 
## ----------------
## 
## Computing boostrap CI with method 'basic' ...
## 
## Log transforming the data before computing LOAs...
## 
## WARNING: differences in log transformed REM might be not normally distributed (Shapiro-Wilk W = 0.872, p = 0.045).
## Bootstrap CI (CI.type='boot') are recommended.
## 
## ----------------
##  Measure: LightPerc 
## ----------------
## 
## Computing boostrap CI with method 'basic' ...
## 
## WARNING: differences in LightPerc might be proportional to the size of measurement (coeff. = -1.56 [-2.44, -0.67]).
## Bias and LOAs are represented as a function of the size of measurement.
## 
## Log transforming the data before computing LOAs...
## 
## ----------------
##  Measure: DeepPerc 
## ----------------
## 
## WARNING: differences in DeepPerc might be proportional to the size of measurement (coeff. = -0.92 [-1.32, -0.53]).
## Bias and LOAs are represented as a function of the size of measurement.
## 
## Log transforming the data before computing LOAs...
## 
## ----------------
##  Measure: REMPerc 
## ----------------
## 
## WARNING: differences in REMPerc might be proportional to the size of measurement (coeff. = -1.14 [-1.93, -0.36]).
## Bias and LOAs are represented as a function of the size of measurement.
## 
## Log transforming the data before computing LOAs...

SLEEP/WAKE

Here, the function is applied by assuming that only sleep/wake patterns (and not sleep staging) are allowed by the device.

rbind(groupDiscr(data = sleep.data.dic, measures=c("TST_device","TST_ref"), 
                 CI.type="boot", CI.level=.95, digits=2),
      groupDiscr(data = sleep.data.dic, measures=c("SE_device","SE_ref"), logTransf = TRUE, 
                 CI.type="boot", boot.type="basic", CI.level=.95, digits=2),
      groupDiscr(data = sleep.data.dic, measures=c("SOL_device","SOL_ref"), 
                 CI.type="classic", CI.level=.95, digits=2),
      groupDiscr(data = sleep.data.dic, measures=c("WASO_device","WASO_ref"), logTransf = TRUE,
                 CI.type="classic",boot.type="basic", CI.level=.95, digits=2))

## 
## 
## ----------------
##  Measure: TST 
## ----------------
## 
## Computing boostrap CI with method 'basic' ...
## 
## WARNING: differences in TST might be proportional to the size of measurement (coeff. = -0.15 [-0.28, -0.03]).
## Bias and LOAs are represented as a function of the size of measurement.
## 
## LOAs CI are computed using boostrap with method 'basic'.
## 
## ----------------
##  Measure: SE 
## ----------------
## 
## Computing boostrap CI with method 'basic' ...
## 
## WARNING: differences in SE might be proportional to the size of measurement (coeff. = -0.65 [-1.1, -0.35]).
## Bias and LOAs are represented as a function of the size of measurement.
## 
## Log transforming the data before computing LOAs...
## 
## WARNING: differences in log transformed SE might be not normally distributed (Shapiro-Wilk W = 0.846, p = 0.019).
## Bootstrap CI (CI.type='boot') are recommended.
## 
## ----------------
##  Measure: SOL 
## ----------------
## 
## WARNING: differences in SOL might be not normally distributed (Shapiro-Wilk W = 0.862, p = 0.032).
## Bootstrap CI (CI.type='boot') and log transformation (logTransf=TRUE) are recommended.
## 
## ----------------
##  Measure: WASO 
## ----------------
## 
## WARNING: differences in WASO might be proportional to the size of measurement (coeff. = -0.8 [-1.18, -0.41]).
## Bias and LOAs are represented as a function of the size of measurement.
## 
## Log transforming the data before computing LOAs...

2.4. Bland-Altman plots

The final step for discrepancy analysis uses the information reported above to generate the Bland-Altman plots. This is done with the BAplot.R function, which requires the BlandAltmanLeh, the ggplot2 and ggExtra packages, and allows the user to specify the following arguments:

data: data.frame of sleep measures (such as that generated with the ebe2sleep.R function).
measures: character vector indicating the name of the two columns to be considered (i.e., fist column: device indicated by “measure_device”, second column: reference indicated by “measure_ref”). Default is c(“TST_device”,“TST_ref”).
xaxis: character indicating what should be used as the x axis: “reference” (default) for using the reference method as we suggest in our article, “mean” for using the averages of ref- and device-derived measures, as originally suggested by Bland & Altman (1999).
logTransf: logical value indicating if data should be log transformed to deal with heteroscedasticity or non-normal distribution of differences (default: FALSE). If TRUE, data bias and LOAs are computed on the log-transformed data and back-transformed to express LOAs as a function of the size of measurement, following the method proposed by Euser, Dekker and le Cessie (2008).
CI.type: character string indicating the type of confidence intervals to be used: “classic” (default) returns intervals based on the t-distribution, “boot” returns intervals for means by bootstrap with 10000 replicates (requires boot package).
CI.level: numeric value indicating the confidence intervals to be computed on bias and LOAs (default: .95).
boot.type: character string indicating the type of bootstrapped confidence intervals. The value should be one of “basic” (default), “norm”, “stud”, “perc”, or “bca” (see ?boot.ci for details). Note that when a proportional bias is detected, only “basic” bootstrapped CI are shown. This argument is not considered when CI.type = “classic”.
boot.R: numeric value indicating the number of boostrap replicates (default: 10,000). Higher values require more time for computing CI (see also ?boot). This argument is not considered when CI.type = “classic”.
xlim and ylim: numeric vector of two elements indicating the limits of the x and the y axis, respectively. If not specified (default), limits are selected based on the range of differences.
warnings: logical value indicating if warnings regarding assumptions and analytical details should be displayed (default: TRUE).

Show function

BAplot <- function(data=NA,measures=c("TST_device","TST_ref"),logTransf=FALSE,
                   xaxis="reference",CI.type="classic",CI.level=.95,boot.type="basic",boot.R=10000,
                   xlim=NA,ylim=NA,warnings=TRUE){
  
  require(BlandAltmanLeh); require(ggplot2); require(ggExtra)
  
  # setting labels
  Measure <- gsub("_ref","",gsub("_device","",measures[1]))
  measure <- gsub("TST","total sleep time (min)",Measure)
  measure <- gsub("SE","sleep efficiency (%)",measure)
  measure <- gsub("SOL","sleep onset latency (min)",measure)
  measure <- gsub("WASO","wake after sleep onset (min)",measure)
  measure <- gsub("LightPerc","light sleep percentage (%)",measure)
  measure <- gsub("DeepPerc","deep sleep percentage (%)",measure)
  measure <- gsub("REMPerc","REM sleep percentage (%)",measure)
  measure <- gsub("Light","light sleep duration (min)",measure)
  measure <- gsub("Deep","deep sleep duration (min)",measure)
  if(grepl("REM",measure) & !grepl("%",measure)) { measure <- gsub("REM","REM sleep duration (min)",measure) }
  if(warnings==TRUE){cat("\n\n----------------\n Measure:",Measure,"\n----------------")}
  
  # packages and functions to be used with boostrap CI
  if(CI.type=="boot"){ require(boot)
    # function to generate bootstrap CI for model parameters
    boot.reg <- function(data,formula,indices){ return(coef(lm(formula,data=data[indices,]))[2]) }
    # function for sampling and predicting Y values based on model
    boot.pred <- function(data,formula,tofit) { indices <- sample(1:nrow(data),replace = TRUE)
      return(predict(lm(formula,data=data[indices,]), newdata=data.frame(tofit))) }
    if(warnings==TRUE){cat("\n\nComputing boostrap CI with method '",boot.type,"' ...",sep="")}
    } else if(CI.type!="classic") { stop("Error: CI.type can be either 'classic' or 'boot'") }
  
  # data to be used
  ba.stat <- bland.altman.stats(data[,measures[1]],data[,measures[2]],conf.int=CI.level)
  if(xaxis=="reference"){
    ba <- data.frame(size=ba.stat$groups$group2,diffs=ba.stat$diffs)
    xlab <- paste("Reference",measure)
    } else if(xaxis=="mean"){
      ba <- data.frame(size=ba.stat$means,diffs=ba.stat$diffs)
      xlab <- paste("Mean",measure,"by device and reference")
    } else { stop("Error: xaxis argument can be either 'reference' or 'mean'") }
  
  # range of values to be fitted for drawing the lines (i.e., from min to max of x-axis values, by .1)
  size <- seq(min(ba$size),max(ba$size),(max(ba$size)-min(ba$size))/((max(ba$size)-min(ba$size))*10))
  
  # basic plot
  p <- ggplot(data=ba,aes(size,diffs))
  
  # ..........................................
  # 1. TESTING PROPORTIONAL BIAS
  # ..........................................
  m <- lm(diffs~size,ba)
  if(CI.type=="classic"){ CI <- confint(m,level=CI.level)[2,] 
  } else { CI <- boot.ci(boot(data=ba,statistic=boot.reg,formula=diffs~size,R=boot.R),
                         type=boot.type,conf=CI.level)[[4]][4:5] }
  prop.bias <- ifelse(CI[1] > 0 | CI[2] < 0, TRUE, FALSE)
  
  # ...........................................
  # 1.1. DIFFERENCES INDEPENDENT FROM SIZE
  # ...........................................
  if(prop.bias == FALSE){ 
      
      if(CI.type=="boot"){ # changing bias CI when CI.type="boot"
        ba.stat$CI.lines[3] <- boot.ci(boot(ba$diffs,function(dat,idx)mean(dat[idx],na.rm=TRUE),R=boot.R),
                                       type=boot.type,conf=CI.level)[[4]][4]
        ba.stat$CI.lines[4] <- boot.ci(boot(ba$diffs,function(dat,idx)mean(dat[idx],na.rm=TRUE),R=boot.R),
                                       type=boot.type,conf=CI.level)[[4]][5] }
      
      p <- p + # adding lines to plot
        
        # bias and CI (i.e., mean diff)
        geom_line(aes(y=ba.stat$mean.diffs),colour="red",size=1.5) +
        geom_line(aes(y=ba.stat$CI.lines[3]),colour="red",linetype=2,size=1) +
        geom_line(aes(y=ba.stat$CI.lines[4]),colour="red",linetype=2,size=1)
      
      # ..........................................
      # 1.2. DIFFERENCES PROPORTIONAL TO SIZE
      # ..........................................
      } else {
        
        b0 <- coef(m)[1]
        b1 <- coef(m)[2]
        
        # warning message
        if(warnings==TRUE){cat("\n\nWARNING: differences in ",Measure," might be proportional to the size of measurement (coeff. = ",
            round(b1,2)," [",round(CI[1],2),", ",round(CI[2],2),"]",").",
            "\nBias and LOAs are plotted as a function of the size of measurement.",sep="")}
        
        # modeling bias following Bland & Altman (1999): D = b0 + b1 * size
        y.fit <- data.frame(size,y.bias=b0+b1*size) 
        
        # bias CI
        if(CI.type=="classic"){ # classic ci
          y.fit$y.biasCI.upr <- predict(m,newdata=data.frame(y.fit$size),interval="confidence",level=CI.level)[,3]
          y.fit$y.biasCI.lwr <- predict(m,newdata=data.frame(y.fit$size),interval="confidence",level=CI.level)[,2]
          } else { # boostrap CI 
            fitted <- t(replicate(boot.R,boot.pred(ba,"diffs~size",y.fit))) # sampling CIs
            y.fit$y.biasCI.upr <- apply(fitted,2,quantile,probs=c((1-CI.level)/2))
            y.fit$y.biasCI.lwr <- apply(fitted,2,quantile,probs=c(CI.level+(1-CI.level)/2)) }
        
        p <- p + # adding lines to plot
          
          # bias and CI (i.e., D = b0 + b1 * size)
          geom_line(data=y.fit,aes(y=y.bias),colour="red",size=1.5) +
          geom_line(data=y.fit,aes(y=y.biasCI.upr),colour="red",linetype=2,size=1) +
          geom_line(data=y.fit,aes(y=y.biasCI.lwr),colour="red",linetype=2,size=1) }
  
  # ..............................................
  # 2. LOAs ESTIMATION FROM ORIGINAL DATA
  # ..............................................
  if(logTransf == FALSE){
    
    # testing heteroscedasticity
    mRes <- lm(abs(resid(m))~size,ba)
    if(CI.type=="classic"){ CIRes <- confint(mRes,level=CI.level)[2,]
    } else { CIRes <- boot.ci(boot(data=ba,statistic=boot.reg,formula=abs(resid(m))~size,R=boot.R),
                              type=boot.type,conf=CI.level)[[4]][4:5] }
    heterosced <- ifelse(CIRes[1] > 0 | CIRes[2] < 0,TRUE,FALSE)
    
    # testing normality of differences
    shapiro <- shapiro.test(ba$diffs)
    if(shapiro$p.value <= .05){
       if(warnings==TRUE){cat("\n\nWARNING: differences in ",Measure,
           " might be not normally distributed (Shapiro-Wilk W = ",round(shapiro$statistic,3),", p = ",round(shapiro$p.value,3),
           ").","\nBootstrap CI (CI.type='boot') and log transformation (logTransf=TRUE) are recommended.",sep="")} }
     
    # ............................................
    # 2.1. CONSTANT BIAS AND HOMOSCEDASTICITY
    # ............................................
    if(prop.bias==FALSE & heterosced==FALSE){
        
        if(CI.type=="boot"){ # changing LOAs CI when CI.type="boot"
          ba.stat$CI.lines[1] <- boot.ci(boot(ba$diffs-1.96*sd(ba.stat$diffs),
                                              function(dat,idx)mean(dat[idx],na.rm=TRUE),R=boot.R),
                                         type=boot.type,conf=CI.level)[[4]][4]
          ba.stat$CI.lines[2] <- boot.ci(boot(ba$diffs-1.96*sd(ba.stat$diffs),
                                              function(dat,idx)mean(dat[idx],na.rm=TRUE),R=boot.R),
                                         type=boot.type,conf=CI.level)[[4]][5]
          ba.stat$CI.lines[5] <- boot.ci(boot(ba$diffs+1.96*sd(ba.stat$diffs),
                                              function(dat,idx)mean(dat[idx],na.rm=TRUE),R=boot.R),
                                         type=boot.type,conf=CI.level)[[4]][4]
          ba.stat$CI.lines[6] <- boot.ci(boot(ba$diffs+1.96*sd(ba.stat$diffs),
                                              function(dat,idx)mean(dat[idx],na.rm=TRUE),R=boot.R),
                                         type=boot.type,conf=CI.level)[[4]][5] }
        
        p <- p + # adding lines to plot
      
        # Upper LOA and CI (i.e., mean diff + 1.96 SD)
        geom_line(aes(y=ba.stat$upper.limit),colour="darkgray",size=1.3) +
        geom_line(aes(y=ba.stat$CI.lines[5]),colour="darkgray",linetype=2,size=1) +
        geom_line(aes(y=ba.stat$CI.lines[6]),colour="darkgray",linetype=2,size=1) +
        
        # Lower LOA and CI (i.e., mean diff - 1.96 SD)
        geom_line(aes(y=ba.stat$lower.limit),colour="darkgray",size=1.3) +
        geom_line(aes(y=ba.stat$CI.lines[1]),colour="darkgray",linetype=2,size=1) +
        geom_line(aes(y=ba.stat$CI.lines[2]),colour="darkgray",linetype=2,size=1)
        
        # ............................................
        # 2.2. PROPORTIONAL BIAS AND HOMOSCEDASTICITY
        # ............................................
        } else if(prop.bias==TRUE & heterosced==FALSE) { 
          
          # modeling LOAs following Bland & Altman (1999): LOAs = bias +- 1.96sd of the residuals
          y.fit$y.LOAu = b0+b1*size + 1.96*sd(resid(m))
          y.fit$y.LOAl = b0+b1*size - 1.96*sd(resid(m))
          
          # LOAs CI based on bias CI +- 1.96sd of the residuals
          if(warnings==TRUE){cat(" Note that LOAs CI are represented based on bias CI.")}
          y.fit$y.LOAu.upr = y.fit$y.biasCI.upr + 1.96*sd(resid(m))
          y.fit$y.LOAu.lwr = y.fit$y.biasCI.lwr + 1.96*sd(resid(m))
          y.fit$y.LOAl.upr = y.fit$y.biasCI.upr - 1.96*sd(resid(m))
          y.fit$y.LOAl.lwr = y.fit$y.biasCI.lwr - 1.96*sd(resid(m))
          
          # ............................................
          # 2.3. CONSTANT BIAS AND HOMOSCEDASTICITY
          # ............................................
          } else if(prop.bias==FALSE & heterosced==TRUE) {
            
            c0 <- coef(mRes)[1]
            c1 <- coef(mRes)[2]
  
            # warning message
            if(warnings==TRUE){cat("WARNING: SD of differences in ",Measure,
                " might be proportional to the size of measurement (coeff. = ",
                round(c1,2)," [",round(CIRes[1],2),", ",round(CIRes[2],2),"]",").",
                "\nLOAs range is plotted as a function of the size of measurement.",sep="")}
  
            # modeling LOAs following Bland & Altman (1999): LOAs = meanDiff +- 2.46(c0 + c1A)
            y.fit <- data.frame(size=size,
                                y.LOAu = ba.stat$mean.diffs + 2.46*(c0+c1*size),
                                y.LOAl = ba.stat$mean.diffs - 2.46*(c0+c1*size))
            
            # LOAs CI
            if(CI.type=="classic"){ # classic ci
              fitted <- predict(mRes,newdata=data.frame(y.fit$size),interval="confidence",level=CI.level) # based on mRes
              y.fit$y.LOAu.upr <- ba.stat$mean.diffs + 2.46*fitted[,3]
              y.fit$y.LOAu.lwr <- ba.stat$mean.diffs + 2.46*fitted[,2]
              y.fit$y.LOAl.upr <- ba.stat$mean.diffs - 2.46*fitted[,3]
              y.fit$y.LOAl.lwr <- ba.stat$mean.diffs - 2.46*fitted[,2]
              } else { # boostrap CI
                fitted <- t(replicate(boot.R,boot.pred(ba,"abs(resid(lm(diffs ~ size))) ~ size",y.fit)))
                y.fit$y.LOAu.upr <- ba.stat$mean.diffs + 2.46*apply(fitted,2,quantile,probs=c(CI.level+(1-CI.level)/2))
                y.fit$y.LOAu.lwr <- ba.stat$mean.diffs + 2.46*apply(fitted,2,quantile,probs=c((1-CI.level)/2))
                y.fit$y.LOAl.upr <- ba.stat$mean.diffs - 2.46*apply(fitted,2,quantile,probs=c(CI.level+(1-CI.level)/2))
                y.fit$y.LOAl.lwr <- ba.stat$mean.diffs - 2.46*apply(fitted,2,quantile,probs=c((1-CI.level)/2)) }
            
          # ............................................
          # 2.4. PROPORTIONAL BIAS AND HETEROSCEDASTICITY
          # ............................................ 
          } else if(prop.bias==TRUE & heterosced==TRUE) {
            
          c0 <- coef(mRes)[1]
          c1 <- coef(mRes)[2]
          
          # warning message
          if(warnings==TRUE){cat("\n\nWARNING: SD of differences in ",Measure,
              " might be proportional to the size of measurement (coeff. = ",
              round(c1,2)," [",round(CIRes[1],2),", ",round(CIRes[2],2),"]",").",
              "\nLOAs range is plotted as a function of the size of measurement.",sep="")}
          
          # modeling LOAs following Bland & Altman (1999): LOAs = b0 + b1 * size +- 2.46(c0 + c1A) 
          y.fit$y.LOAu = b0+b1*size + 2.46*(c0+c1*size)
          y.fit$y.LOAl = b0+b1*size - 2.46*(c0+c1*size)
          
          # LOAs CI
          if(CI.type=="classic"){ # classic ci
            fitted <- predict(mRes,newdata=data.frame(y.fit$size),interval="confidence",level=CI.level) # based on mRes
            y.fit$y.LOAu.upr <- b0+b1*size + 2.46*fitted[,3]
            y.fit$y.LOAu.lwr <- b0+b1*size + 2.46*fitted[,2]
            y.fit$y.LOAl.upr <- b0+b1*size - 2.46*fitted[,3]
            y.fit$y.LOAl.lwr <- b0+b1*size - 2.46*fitted[,2] 
            } else { # boostrap CI
              fitted <- t(replicate(boot.R,boot.pred(ba,"abs(resid(lm(diffs ~ size))) ~ size",y.fit)))
              y.fit$y.LOAu.upr <- b0+b1*size + 2.46*apply(fitted,2,quantile,probs=c(CI.level+(1-CI.level)/2))
              y.fit$y.LOAu.lwr <- b0+b1*size + 2.46*apply(fitted,2,quantile,probs=c((1-CI.level)/2))
              y.fit$y.LOAl.upr <- b0+b1*size - 2.46*apply(fitted,2,quantile,probs=c(CI.level+(1-CI.level)/2))
              y.fit$y.LOAl.lwr <- b0+b1*size - 2.46*apply(fitted,2,quantile,probs=c((1-CI.level)/2)) }}
    
    if(prop.bias==TRUE | heterosced==TRUE){
      
      p <- p + # adding lines to plot
          
          # Upper LOA and CI
          geom_line(data=y.fit,aes(y=y.LOAu),colour="darkgray",size=1.3) +
          geom_line(data=y.fit,aes(y=y.LOAu.upr),colour="darkgray",linetype=2,size=1) +
          geom_line(data=y.fit,aes(y=y.LOAu.lwr),colour="darkgray",linetype=2,size=1) +
          
          # Lower LOA and CI
          geom_line(data=y.fit,aes(y=y.LOAl),colour="darkgray",size=1.3) +
          geom_line(data=y.fit,aes(y=y.LOAl.upr),colour="darkgray",linetype=2,size=1) +
          geom_line(data=y.fit,aes(y=y.LOAl.lwr),colour="darkgray",linetype=2,size=1)
      
    }
    
  # ..............................................
  # 3. LOAs ESTIMATION FROM LOG-TRANSFORMED DATA
  # ..............................................
  } else {
      
    # log transformation of data (add little constant to avoid Inf values)
    if(warnings==TRUE){cat("\n\nLog transforming data ...")}
    ba.stat$groups$LOGgroup1 <- log(ba.stat$groups$group1 + .0001)
    ba.stat$groups$LOGgroup2 <- log(ba.stat$groups$group2 + .0001)
    ba.stat$groups$LOGdiff <- ba.stat$groups$LOGgroup1 - ba.stat$groups$LOGgroup2
    if(xaxis=="reference"){ baLog <- data.frame(size=ba.stat$groups$LOGgroup2,diffs=ba.stat$groups$LOGdiff)
    } else { baLog <- data.frame(size=(ba.stat$groups$LOGgroup1 + ba.stat$groups$LOGgroup2)/2,diffs=ba.stat$groups$LOGdiff) }
    
    # testing heteroscedasticity
    mRes <- lm(abs(resid(m))~size,baLog)
    if(CI.type=="classic"){ CIRes <- confint(mRes,level=CI.level)[2,]
    } else { CIRes <- boot.ci(boot(data=baLog,statistic=boot.reg,formula=abs(resid(m))~size,R=boot.R),
                              type=boot.type,conf=CI.level)[[4]][4:5] }
    heterosced <- ifelse(CIRes[1] > 0 | CIRes[2] < 0,TRUE,FALSE)
      
    # testing normality of differences
    shapiro <- shapiro.test(baLog$diffs)
    if(shapiro$p.value <= .05){
      if(warnings==TRUE){cat("\n\nWARNING: differences in log transformed ",Measure,
          " might be not normally distributed (Shapiro-Wilk W = ",round(shapiro$statistic,3),", p = ",round(shapiro$p.value,3),
          ").","\nBootstrap CI (CI.type='boot') are recommended.",sep="")} }
    
    # LOAs slope following Euser et al (2008) for antilog transformation: slope = 2 * (e^(1.96 SD) - 1)/(e^(1.96 SD) + 1)
    ANTILOGslope <- function(x){ 2 * (exp(1.96 * sd(x)) - 1) / (exp(1.96*sd(x)) + 1) }
    ba.stat$LOA.slope <- ANTILOGslope(baLog$diffs)
    
    # LOAs CI slopes 
    if(CI.type=="classic"){ # classic CI
      t1 <- qt((1 - CI.level)/2, df = ba.stat$based.on - 1) # t-value right
      t2 <- qt((CI.level + 1)/2, df = ba.stat$based.on - 1) # t-value left
      ba.stat$LOA.slope.CI.upper <- 2 * (exp(1.96 * sd(baLog$diffs) + t2 * sqrt(sd(baLog$diffs)^2 * 3/ba.stat$based.on)) - 1) /
        (exp(1.96*sd(baLog$diffs) + t2 * sqrt(sd(baLog$diffs)^2 * 3/ba.stat$based.on)) + 1)
      ba.stat$LOA.slope.CI.lower <- 2 * (exp(1.96 * sd(baLog$diffs) + t1 * sqrt(sd(baLog$diffs)^2 * 3/ba.stat$based.on)) - 1) /
        (exp(1.96*sd(baLog$diffs) + t1 * sqrt(sd(baLog$diffs)^2 * 3/ba.stat$based.on)) + 1)
      } else { # boostrap CI
        ba.stat$LOA.slope.CI.upper <- boot.ci(boot(baLog$diffs,
                                                   function(dat,idx) ANTILOGslope(dat[idx]),R=boot.R),
                                              type=boot.type,conf=CI.level)[[4]][4]
        ba.stat$LOA.slope.CI.lower <- boot.ci(boot(baLog$diffs,
                                                   function(dat,idx) ANTILOGslope(dat[idx]),R=boot.R),
                                              type=boot.type,conf=CI.level)[[4]][5] }
    
    # Recomputing LOAs and their CIs as a function of size multiplied by the computed slopes
    y.fit <- data.frame(size,
                        ANTLOGdiffs.upper = size * ba.stat$LOA.slope, # upper LOA
                        ANTLOGdiffs.upper.lower = size * ba.stat$LOA.slope.CI.lower,
                        ANTLOGdiffs.upper.upper = size * ba.stat$LOA.slope.CI.upper,
                        ANTLOGdiffs.lower = size * ((-1)*ba.stat$LOA.slope), # lower LOA
                        ANTLOGdiffs.lower.lower = size * ((-1)*ba.stat$LOA.slope.CI.lower),
                        ANTLOGdiffs.lower.upper = size * ((-1)*ba.stat$LOA.slope.CI.upper))
    
    # adding bias values based on prop.bias
    if(prop.bias==FALSE){ y.fit$y.bias <- rep(ba.stat$mean.diffs,nrow(y.fit)) } else { y.fit$y.bias <- b0+b1*y.fit$size }
    
    p <- p + # adding lines to plot
      
      # UPPER LIMIT (i.e., bias + 2 * (e^(1.96 SD) - 1)/(e^(1.96 SD) + 1))
      geom_line(data=y.fit,aes(y = y.bias + ANTLOGdiffs.upper),colour="darkgray",size=1.3) +
      geom_line(data=y.fit,aes(y = y.bias + ANTLOGdiffs.upper.upper),colour="darkgray",linetype=2,size=1) +
      geom_line(data=y.fit,aes(y = y.bias + ANTLOGdiffs.upper.lower),colour="darkgray",linetype=2,size=1) +
      
      # LOWER LIMIT (i.e., bias - 2 * (e^(1.96 SD) - 1)/(e^(1.96 SD) + 1))
      geom_line(data=y.fit,aes(y = y.bias + ANTLOGdiffs.lower),colour="darkgray",size=1.3) +
      geom_line(data=y.fit,aes(y = y.bias + ANTLOGdiffs.lower.upper),colour="darkgray",linetype=2,size=1) +
      geom_line(data=y.fit,aes(y = y.bias + ANTLOGdiffs.lower.lower),colour="darkgray",linetype=2,size=1)
    
    # ..........................................
    # 3.3. HETEROSCHEDASTICITY (only a warning)
    # ..........................................
    if(heterosced==TRUE){
      
      # warning message
      if(warnings==TRUE){cat("\n\nWARNING: standard deviation of differences in in log transformed ",Measure,
          " might be proportional to the size of measurement (coeff. = ",
          round(coef(mRes)[2],2)," [",round(CIRes[1],2),", ",round(CIRes[2],2),"]",").",sep="")} }}
  
  
  p <- p + # adding last graphical elements and plotting with marginal density distribution
    
    geom_point(size=4.5,shape=20) +
    xlab(xlab) + ylab(paste("Device - reference differences in\n",measure,sep="")) +
    ggtitle(paste("Bland-Altman plot of",measure))+
    theme(axis.text = element_text(size=12,face="bold"),
          axis.title = element_text(size=15,face="bold",colour="black"),
          plot.title = element_text(hjust = 0.5,size=12,face="bold"))
  if(!is.na(ylim[1])){ p <- p + ylim(ylim) }
  if(!is.na(xlim[1])){ p <- p + xlim(xlim) }
  return(ggMarginal(p,fill="lightgray",colour="lightgray",size=4,margins="y"))
  
}

Here, the function is applied to generate Bland-Altman plots for each sleep measure. Outputs are generated both by using the original data (logTransf = FALSE) and by log transforming the data before computing bias and LOAs (logTransf = TRUE). In each case, plots are generated with both t-derived (CI.type = “classic”) and bootstrapped confidence intervals (CI.type = “boot”). Any plot is generated by considering the reference method as the size of measurement (xaxis = “reference”), as suggested by de Zambotti et al. (2019). Note that a warning message is returned when a proportional bias and/or heteroscedasticity is detected, and bias and LOAs are represented accordingly.

TST

ORIGINAL DATA

CLASSIC CI

BAplot(data=sleep.data,measures=c("TST_device","TST_ref"), logTransf = FALSE,
       xaxis="reference", CI.type="classic", CI.level=.95)

## 
## 
## ----------------
##  Measure: TST 
## ----------------

BOOTSTRAP CI

BAplot(data=sleep.data,measures=c("TST_device","TST_ref"), logTransf = FALSE,
       xaxis="reference", CI.type="boot", boot.type = "basic", CI.level=.95)

## 
## 
## ----------------
##  Measure: TST 
## ----------------
## 
## Computing boostrap CI with method 'basic' ...
## 
## WARNING: differences in TST might be proportional to the size of measurement (coeff. = -0.15 [-0.28, -0.03]).
## Bias and LOAs are plotted as a function of the size of measurement. Note that LOAs CI are represented based on bias CI.

LOG TRANSFORMED

CLASSIC CI

BAplot(data=sleep.data,measures=c("TST_device","TST_ref"), logTransf = TRUE,
       xaxis="reference", CI.type="classic", CI.level=.95)

## 
## 
## ----------------
##  Measure: TST 
## ----------------
## 
## Log transforming data ...
## 
## WARNING: differences in log transformed TST might be not normally distributed (Shapiro-Wilk W = 0.845, p = 0.019).
## Bootstrap CI (CI.type='boot') are recommended.

BOOTSTRAP CI

BAplot(data=sleep.data,measures=c("TST_device","TST_ref"), logTransf = TRUE,
       xaxis="reference", CI.type="boot", boot.type = "basic", CI.level=.95)

## 
## 
## ----------------
##  Measure: TST 
## ----------------
## 
## Computing boostrap CI with method 'basic' ...
## 
## WARNING: differences in TST might be proportional to the size of measurement (coeff. = -0.15 [-0.28, -0.03]).
## Bias and LOAs are plotted as a function of the size of measurement.
## 
## Log transforming data ...
## 
## WARNING: differences in log transformed TST might be not normally distributed (Shapiro-Wilk W = 0.845, p = 0.019).
## Bootstrap CI (CI.type='boot') are recommended.

SE

ORIGINAL DATA

CLASSIC CI

BAplot(data=sleep.data,measures=c("SE_device","SE_ref"), logTransf = FALSE,
       xaxis="reference", CI.type="classic", CI.level=.95)

## 
## 
## ----------------
##  Measure: SE 
## ----------------
## 
## WARNING: differences in SE might be proportional to the size of measurement (coeff. = -0.65 [-1, -0.3]).
## Bias and LOAs are plotted as a function of the size of measurement.
## 
## WARNING: differences in SE might be not normally distributed (Shapiro-Wilk W = 0.855, p = 0.026).
## Bootstrap CI (CI.type='boot') and log transformation (logTransf=TRUE) are recommended.
## 
## WARNING: SD of differences in SE might be proportional to the size of measurement (coeff. = -0.26 [-0.41, -0.1]).
## LOAs range is plotted as a function of the size of measurement.

BOOTSRAP CI

BAplot(data=sleep.data,measures=c("SE_device","SE_ref"), logTransf = FALSE,
       xaxis="reference", CI.type="boot", boot.type = "basic", CI.level=.95)

## 
## 
## ----------------
##  Measure: SE 
## ----------------
## 
## Computing boostrap CI with method 'basic' ...
## 
## WARNING: differences in SE might be proportional to the size of measurement (coeff. = -0.65 [-1.09, -0.35]).
## Bias and LOAs are plotted as a function of the size of measurement.
## 
## WARNING: differences in SE might be not normally distributed (Shapiro-Wilk W = 0.855, p = 0.026).
## Bootstrap CI (CI.type='boot') and log transformation (logTransf=TRUE) are recommended.
## 
## WARNING: SD of differences in SE might be proportional to the size of measurement (coeff. = -0.26 [-0.71, -0.3]).
## LOAs range is plotted as a function of the size of measurement.

LOG TRANSFORMED

CLASSIC CI

BAplot(data=sleep.data,measures=c("SE_device","SE_ref"), logTransf = TRUE,
       xaxis="reference", CI.type="classic", CI.level=.95)

## 
## 
## ----------------
##  Measure: SE 
## ----------------
## 
## WARNING: differences in SE might be proportional to the size of measurement (coeff. = -0.65 [-1, -0.3]).
## Bias and LOAs are plotted as a function of the size of measurement.
## 
## Log transforming data ...
## 
## WARNING: differences in log transformed SE might be not normally distributed (Shapiro-Wilk W = 0.846, p = 0.019).
## Bootstrap CI (CI.type='boot') are recommended.
## 
## WARNING: standard deviation of differences in in log transformed SE might be proportional to the size of measurement (coeff. = -21.7 [-34.9, -8.49]).

BOOTSRAP CI

BAplot(data=sleep.data,measures=c("SE_device","SE_ref"), logTransf = TRUE,
       xaxis="reference", CI.type="boot", boot.type = "basic", CI.level=.95)

## 
## 
## ----------------
##  Measure: SE 
## ----------------
## 
## Computing boostrap CI with method 'basic' ...
## 
## WARNING: differences in SE might be proportional to the size of measurement (coeff. = -0.65 [-1.1, -0.35]).
## Bias and LOAs are plotted as a function of the size of measurement.
## 
## Log transforming data ...
## 
## WARNING: differences in log transformed SE might be not normally distributed (Shapiro-Wilk W = 0.846, p = 0.019).
## Bootstrap CI (CI.type='boot') are recommended.
## 
## WARNING: standard deviation of differences in in log transformed SE might be proportional to the size of measurement (coeff. = -21.7 [-60.21, -25.32]).

SOL

ORIGINAL DATA

CLASSIC CI

BAplot(data=sleep.data,measures=c("SOL_device","SOL_ref"), logTransf = FALSE,
       xaxis="reference", CI.type="classic", CI.level=.95)

## 
## 
## ----------------
##  Measure: SOL 
## ----------------
## 
## WARNING: differences in SOL might be not normally distributed (Shapiro-Wilk W = 0.862, p = 0.032).
## Bootstrap CI (CI.type='boot') and log transformation (logTransf=TRUE) are recommended.

BOOTSTRAP CI

BAplot(data=sleep.data,measures=c("SOL_device","SOL_ref"), logTransf = FALSE,
       xaxis="reference", CI.type="boot", boot.type = "basic", CI.level=.95)

## 
## 
## ----------------
##  Measure: SOL 
## ----------------
## 
## Computing boostrap CI with method 'basic' ...
## 
## WARNING: differences in SOL might be proportional to the size of measurement (coeff. = -0.69 [-2.45, -0.3]).
## Bias and LOAs are plotted as a function of the size of measurement.
## 
## WARNING: differences in SOL might be not normally distributed (Shapiro-Wilk W = 0.862, p = 0.032).
## Bootstrap CI (CI.type='boot') and log transformation (logTransf=TRUE) are recommended.
## 
## WARNING: SD of differences in SOL might be proportional to the size of measurement (coeff. = 0.53 [0.29, 1.53]).
## LOAs range is plotted as a function of the size of measurement.

LOG TRANSFORMED

CLASSIC CI

BAplot(data=sleep.data,measures=c("SOL_device","SOL_ref"), logTransf = TRUE,
       xaxis="reference", CI.type="classic", CI.level=.95)

## 
## 
## ----------------
##  Measure: SOL 
## ----------------
## 
## Log transforming data ...
## 
## WARNING: differences in log transformed SOL might be not normally distributed (Shapiro-Wilk W = 0.538, p = 0).
## Bootstrap CI (CI.type='boot') are recommended.
## 
## WARNING: standard deviation of differences in in log transformed SOL might be proportional to the size of measurement (coeff. = 7.32 [0.32, 14.32]).

BOOTSTRAP CI

BAplot(data=sleep.data,measures=c("SOL_device","SOL_ref"), logTransf = TRUE,
       xaxis="reference", CI.type="boot", boot.type = "basic", CI.level=.95)

## 
## 
## ----------------
##  Measure: SOL 
## ----------------
## 
## Computing boostrap CI with method 'basic' ...
## 
## WARNING: differences in SOL might be proportional to the size of measurement (coeff. = -0.69 [-2.38, -0.29]).
## Bias and LOAs are plotted as a function of the size of measurement.
## 
## Log transforming data ...
## 
## WARNING: differences in log transformed SOL might be not normally distributed (Shapiro-Wilk W = 0.538, p = 0).
## Bootstrap CI (CI.type='boot') are recommended.
## 
## WARNING: standard deviation of differences in in log transformed SOL might be proportional to the size of measurement (coeff. = 7.32 [7.18, 21.88]).

WASO

ORIGINAL DATA

CLASSIC CI

BAplot(data=sleep.data,measures=c("WASO_device","WASO_ref"), logTransf = FALSE,
       xaxis="reference", CI.type="classic", CI.level=.95)

## 
## 
## ----------------
##  Measure: WASO 
## ----------------
## 
## WARNING: differences in WASO might be proportional to the size of measurement (coeff. = -0.8 [-1.18, -0.41]).
## Bias and LOAs are plotted as a function of the size of measurement.
## 
## WARNING: differences in WASO might be not normally distributed (Shapiro-Wilk W = 0.854, p = 0.025).
## Bootstrap CI (CI.type='boot') and log transformation (logTransf=TRUE) are recommended. Note that LOAs CI are represented based on bias CI.

BOOTSTRAP CI

BAplot(data=sleep.data,measures=c("WASO_device","WASO_ref"), logTransf = FALSE,
       xaxis="reference", CI.type="boot", boot.type = "basic", CI.level=.95)

## 
## 
## ----------------
##  Measure: WASO 
## ----------------
## 
## Computing boostrap CI with method 'basic' ...
## 
## WARNING: differences in WASO might be proportional to the size of measurement (coeff. = -0.8 [-1.56, -0.25]).
## Bias and LOAs are plotted as a function of the size of measurement.
## 
## WARNING: differences in WASO might be not normally distributed (Shapiro-Wilk W = 0.854, p = 0.025).
## Bootstrap CI (CI.type='boot') and log transformation (logTransf=TRUE) are recommended. Note that LOAs CI are represented based on bias CI.

LOG TRANSFORMED

CLASSIC CI

BAplot(data=sleep.data,measures=c("WASO_device","WASO_ref"), logTransf = TRUE,
       xaxis="reference", CI.type="classic", CI.level=.95)

## 
## 
## ----------------
##  Measure: WASO 
## ----------------
## 
## WARNING: differences in WASO might be proportional to the size of measurement (coeff. = -0.8 [-1.18, -0.41]).
## Bias and LOAs are plotted as a function of the size of measurement.
## 
## Log transforming data ...

BOOTSTRAP CI

BAplot(data=sleep.data,measures=c("WASO_device","WASO_ref"), logTransf = TRUE,
       xaxis="reference", CI.type="boot", boot.type = "basic", CI.level=.95)

## 
## 
## ----------------
##  Measure: WASO 
## ----------------
## 
## Computing boostrap CI with method 'basic' ...
## 
## WARNING: differences in WASO might be proportional to the size of measurement (coeff. = -0.8 [-1.56, -0.24]).
## Bias and LOAs are plotted as a function of the size of measurement.
## 
## Log transforming data ...

LIGHT

ORIGINAL DATA

CLASSIC CI

BAplot(data=sleep.data,measures=c("Light_device","Light_ref"), logTransf = FALSE,
       xaxis="reference", CI.type="classic", CI.level=.95)

## 
## 
## ----------------
##  Measure: Light 
## ----------------

BOOTSTRAP CI

BAplot(data=sleep.data,measures=c("Light_device","Light_ref"), logTransf = FALSE,
       xaxis="reference", CI.type="boot", boot.type = "basic", CI.level=.95)

## 
## 
## ----------------
##  Measure: Light 
## ----------------
## 
## Computing boostrap CI with method 'basic' ...

LOG TRANSFORMED

CLASSIC CI

BAplot(data=sleep.data,measures=c("Light_device","Light_ref"), logTransf = TRUE,
       xaxis="reference", CI.type="classic", CI.level=.95)

## 
## 
## ----------------
##  Measure: Light 
## ----------------
## 
## Log transforming data ...

BOOTSTRAP CI

BAplot(data=sleep.data,measures=c("Light_device","Light_ref"), logTransf = TRUE,
       xaxis="reference", CI.type="boot", boot.type = "basic", CI.level=.95)

## 
## 
## ----------------
##  Measure: Light 
## ----------------
## 
## Computing boostrap CI with method 'basic' ...
## 
## Log transforming data ...

DEEP

ORIGINAL DATA

CLASSIC CI

BAplot(data=sleep.data,measures=c("Deep_device","Deep_ref"), logTransf = FALSE,
       xaxis="reference", CI.type="classic", CI.level=.95)

## 
## 
## ----------------
##  Measure: Deep 
## ----------------
## 
## WARNING: differences in Deep might be proportional to the size of measurement (coeff. = -0.65 [-1.21, -0.09]).
## Bias and LOAs are plotted as a function of the size of measurement. Note that LOAs CI are represented based on bias CI.

BOOTSTRAP CI

BAplot(data=sleep.data,measures=c("Deep_device","Deep_ref"), logTransf = FALSE,
       xaxis="reference", CI.type="boot", boot.type = "basic", CI.level=.95)

## 
## 
## ----------------
##  Measure: Deep 
## ----------------
## 
## Computing boostrap CI with method 'basic' ...
## 
## WARNING: differences in Deep might be proportional to the size of measurement (coeff. = -0.65 [-1.15, -0.22]).
## Bias and LOAs are plotted as a function of the size of measurement.
## 
## WARNING: SD of differences in Deep might be proportional to the size of measurement (coeff. = 0.18 [0.07, 0.7]).
## LOAs range is plotted as a function of the size of measurement.

LOG TRANSFORMED

CLASSIC CI

BAplot(data=sleep.data,measures=c("Deep_device","Deep_ref"), logTransf = TRUE,
       xaxis="reference", CI.type="classic", CI.level=.95)

## 
## 
## ----------------
##  Measure: Deep 
## ----------------
## 
## WARNING: differences in Deep might be proportional to the size of measurement (coeff. = -0.65 [-1.21, -0.09]).
## Bias and LOAs are plotted as a function of the size of measurement.
## 
## Log transforming data ...

BOOTSTRAP CI

BAplot(data=sleep.data,measures=c("Deep_device","Deep_ref"), logTransf = TRUE,
       xaxis="reference", CI.type="boot", boot.type = "basic", CI.level=.95)

## 
## 
## ----------------
##  Measure: Deep 
## ----------------
## 
## Computing boostrap CI with method 'basic' ...
## 
## WARNING: differences in Deep might be proportional to the size of measurement (coeff. = -0.65 [-1.15, -0.25]).
## Bias and LOAs are plotted as a function of the size of measurement.
## 
## Log transforming data ...
## 
## WARNING: standard deviation of differences in in log transformed Deep might be proportional to the size of measurement (coeff. = 10.7 [4.03, 43.23]).

REM

ORIGINAL DATA

CLASSIC CI

BAplot(data=sleep.data,measures=c("REM_device","REM_ref"), logTransf = FALSE,
       xaxis="reference", CI.type="classic", CI.level=.95)

## 
## 
## ----------------
##  Measure: REM 
## ----------------
## 
## WARNING: differences in REM might be proportional to the size of measurement (coeff. = -1.03 [-1.77, -0.28]).
## Bias and LOAs are plotted as a function of the size of measurement. Note that LOAs CI are represented based on bias CI.

BOOTSTRAP CI

BAplot(data=sleep.data,measures=c("REM_device","REM_ref"), logTransf = FALSE,
       xaxis="reference", CI.type="boot", boot.type = "basic", CI.level=.95)

## 
## 
## ----------------
##  Measure: REM 
## ----------------
## 
## Computing boostrap CI with method 'basic' ...

LOG TRANSFORMED

CLASSIC CI

BAplot(data=sleep.data,measures=c("REM_device","REM_ref"), logTransf = TRUE,
       xaxis="reference", CI.type="classic", CI.level=.95)

## 
## 
## ----------------
##  Measure: REM 
## ----------------
## 
## WARNING: differences in REM might be proportional to the size of measurement (coeff. = -1.03 [-1.77, -0.28]).
## Bias and LOAs are plotted as a function of the size of measurement.
## 
## Log transforming data ...
## 
## WARNING: differences in log transformed REM might be not normally distributed (Shapiro-Wilk W = 0.872, p = 0.045).
## Bootstrap CI (CI.type='boot') are recommended.

BOOTSTRAP CI

BAplot(data=sleep.data,measures=c("REM_device","REM_ref"), logTransf = TRUE,
       xaxis="reference", CI.type="boot", boot.type = "basic", CI.level=.95)

## 
## 
## ----------------
##  Measure: REM 
## ----------------
## 
## Computing boostrap CI with method 'basic' ...
## 
## Log transforming data ...
## 
## WARNING: differences in log transformed REM might be not normally distributed (Shapiro-Wilk W = 0.872, p = 0.045).
## Bootstrap CI (CI.type='boot') are recommended.

LIGHT %

ORIGINAL DATA

CLASSIC CI

BAplot(data=sleep.data,measures=c("LightPerc_device","LightPerc_ref"), logTransf = FALSE,
       xaxis="reference", CI.type="classic", CI.level=.95)

## 
## 
## ----------------
##  Measure: LightPerc 
## ----------------
## 
## WARNING: differences in LightPerc might be proportional to the size of measurement (coeff. = -1.56 [-2.54, -0.58]).
## Bias and LOAs are plotted as a function of the size of measurement. Note that LOAs CI are represented based on bias CI.

BOOTSTRAP CI

BAplot(data=sleep.data,measures=c("LightPerc_device","LightPerc_ref"), logTransf = FALSE,
       xaxis="reference", CI.type="boot", boot.type = "basic", CI.level=.95)

## 
## 
## ----------------
##  Measure: LightPerc 
## ----------------
## 
## Computing boostrap CI with method 'basic' ...
## 
## WARNING: differences in LightPerc might be proportional to the size of measurement (coeff. = -1.56 [-2.41, -0.65]).
## Bias and LOAs are plotted as a function of the size of measurement. Note that LOAs CI are represented based on bias CI.

LOG TRANSFORMED

CLASSIC CI

BAplot(data=sleep.data,measures=c("LightPerc_device","LightPerc_ref"), logTransf = TRUE,
       xaxis="reference", CI.type="classic", CI.level=.95)

## 
## 
## ----------------
##  Measure: LightPerc 
## ----------------
## 
## WARNING: differences in LightPerc might be proportional to the size of measurement (coeff. = -1.56 [-2.54, -0.58]).
## Bias and LOAs are plotted as a function of the size of measurement.
## 
## Log transforming data ...

BOOTSTRAP CI

BAplot(data=sleep.data,measures=c("LightPerc_device","LightPerc_ref"), logTransf = TRUE,
       xaxis="reference", CI.type="boot", boot.type = "basic", CI.level=.95)

## 
## 
## ----------------
##  Measure: LightPerc 
## ----------------
## 
## Computing boostrap CI with method 'basic' ...
## 
## WARNING: differences in LightPerc might be proportional to the size of measurement (coeff. = -1.56 [-2.44, -0.64]).
## Bias and LOAs are plotted as a function of the size of measurement.
## 
## Log transforming data ...

DEEP %

ORIGINAL DATA

CLASSIC CI

BAplot(data=sleep.data,measures=c("DeepPerc_device","DeepPerc_ref"), logTransf = FALSE,
       xaxis="reference", CI.type="classic", CI.level=.95)

## 
## 
## ----------------
##  Measure: DeepPerc 
## ----------------
## 
## WARNING: differences in DeepPerc might be proportional to the size of measurement (coeff. = -0.92 [-1.32, -0.53]).
## Bias and LOAs are plotted as a function of the size of measurement. Note that LOAs CI are represented based on bias CI.

BOOTSTRAP CI

BAplot(data=sleep.data,measures=c("DeepPerc_device","DeepPerc_ref"), logTransf = FALSE,
       xaxis="reference", CI.type="boot", boot.type = "basic", CI.level=.95)

## 
## 
## ----------------
##  Measure: DeepPerc 
## ----------------
## 
## Computing boostrap CI with method 'basic' ...
## 
## WARNING: differences in DeepPerc might be proportional to the size of measurement (coeff. = -0.92 [-1.42, -0.63]).
## Bias and LOAs are plotted as a function of the size of measurement. Note that LOAs CI are represented based on bias CI.

LOG TRANSFORMED

CLASSIC CI

BAplot(data=sleep.data,measures=c("DeepPerc_device","DeepPerc_ref"), logTransf = TRUE,
       xaxis="reference", CI.type="classic", CI.level=.95)

## 
## 
## ----------------
##  Measure: DeepPerc 
## ----------------
## 
## WARNING: differences in DeepPerc might be proportional to the size of measurement (coeff. = -0.92 [-1.32, -0.53]).
## Bias and LOAs are plotted as a function of the size of measurement.
## 
## Log transforming data ...

BOOTSTRAP CI

BAplot(data=sleep.data,measures=c("DeepPerc_device","DeepPerc_ref"), logTransf = TRUE,
       xaxis="reference", CI.type="boot", boot.type = "basic", CI.level=.95)

## 
## 
## ----------------
##  Measure: DeepPerc 
## ----------------
## 
## Computing boostrap CI with method 'basic' ...
## 
## WARNING: differences in DeepPerc might be proportional to the size of measurement (coeff. = -0.92 [-1.42, -0.63]).
## Bias and LOAs are plotted as a function of the size of measurement.
## 
## Log transforming data ...

REM %

ORIGINAL DATA

CLASSIC CI

BAplot(data=sleep.data,measures=c("REMPerc_device","REMPerc_ref"), logTransf = FALSE,
       xaxis="reference", CI.type="classic", CI.level=.95)

## 
## 
## ----------------
##  Measure: REMPerc 
## ----------------
## 
## WARNING: differences in REMPerc might be proportional to the size of measurement (coeff. = -1.14 [-1.93, -0.36]).
## Bias and LOAs are plotted as a function of the size of measurement. Note that LOAs CI are represented based on bias CI.

BOOTSTRAP CI

BAplot(data=sleep.data,measures=c("REMPerc_device","REMPerc_ref"), logTransf = FALSE,
       xaxis="reference", CI.type="boot", boot.type = "basic", CI.level=.95)

## 
## 
## ----------------
##  Measure: REMPerc 
## ----------------
## 
## Computing boostrap CI with method 'basic' ...
## 
## WARNING: differences in REMPerc might be proportional to the size of measurement (coeff. = -1.14 [-1.87, -0.04]).
## Bias and LOAs are plotted as a function of the size of measurement. Note that LOAs CI are represented based on bias CI.

LOG TRANSFORMED

CLASSIC CI

BAplot(data=sleep.data,measures=c("REMPerc_device","REMPerc_ref"), logTransf = TRUE,
       xaxis="reference", CI.type="classic", CI.level=.95)

## 
## 
## ----------------
##  Measure: REMPerc 
## ----------------
## 
## WARNING: differences in REMPerc might be proportional to the size of measurement (coeff. = -1.14 [-1.93, -0.36]).
## Bias and LOAs are plotted as a function of the size of measurement.
## 
## Log transforming data ...

BOOTSTRAP CI

BAplot(data=sleep.data,measures=c("REMPerc_device","REMPerc_ref"), logTransf = TRUE,
       xaxis="reference", CI.type="boot", boot.type = "basic", CI.level=.95)

## 
## 
## ----------------
##  Measure: REMPerc 
## ----------------
## 
## Computing boostrap CI with method 'basic' ...
## 
## WARNING: differences in REMPerc might be proportional to the size of measurement (coeff. = -1.14 [-1.83, -0.03]).
## Bias and LOAs are plotted as a function of the size of measurement.
## 
## Log transforming data ...

3. Epoch-by-epoch analysis

The third step implies the use of the epoch-by-epoch raw data to evaluate the device performance on an epoch-by-epoch basis as compared to the reference method.

3.1. Error matrices

Here, we compute the error matrices (or confusion matrices) to describe the device performance in classifying reference-derived epoch-by-epoch stages (or sleep/wake), as described in the main article. The procedure generalizes to sleep stages by dichotomizing the scores for each stage (e.g., 1 = stage of interest, 0 = all other stages). This is done with the errorMatrix.R function, which allows to specify the following arguments:

data: data.frame including the raw data to be used, organized with the data structure shown in Section 1.
idCol, RefCol and deviceCol: character strings indicating the names of the columns for subjects, reference and device epochs, respectively.
staging: logical value indicating if the raw data includes sleep stages (wake, light, deep and REM) or not, FALSE assumes data consists of only wake/sleep distinction (default: TRUE).
stages: : numeric vector indicating the coding system used for wake, light, deep and REM sleep (in this order).. The default coding system is c(wake = 0, light = 1, deep = 2, REM = 3). Only wake and sleep should be indicated when staging = FALSE, e.g. c(wake = 0, sleep = 1).
matrixType: character string indicating the type of matrix to be generated: “prop” (default) for generating a group-level ‘proportional error matrix’, which reports the proportion of device-derived epochs in each condition over the number of reference-derived epochs in that condition, averaged between subjects, and presented as ’mean (sd) [CI]’ (requires the DescTools package for computing bootstrapped CI); “sum” for reporting the ‘absolute error matrix’, which reports the total number of epochs in each condition, without distinguishing between subjects.
CI.type: character string indicating the type of confidence intervals to be used: “classic” (default) returns intervals based on the t-distribution, “boot” returns intervals for means by bootstrap with 10000 replicates (see ?boot.ci). This argument is not considered when matrixType = “sum”.
CI.level: confidence intervals to be computed on bias and LOAs (default: .95). This argument is not considered when matrixType = “sum”.
boot.type: character string indicating the type of bootstrapped confidence intervals. The value should be one of “basic” (default), “norm”, “stud”, “perc”, or “bca” (see ?boot.ci for details). This argument is not considered when matrixType = “sum” or when CI.type = “classic”.
boot.R: numeric value indicating the number of boostrap replicates (default: 10,000). Higher values require more time for computing CI (see also ?boot). This argument is not considered when metricsType = “sum” or when CI.type = “classic”.
digits: numeric value specifying the decimal precision for the output (default: 2).

Show function

errorMatrix <- function(data=NA,idCol="subject",RefCol="reference",deviceCol="device",
                        staging=TRUE,stages=c(wake=0,light=1,deep=2,REM=3),matrixType="prop",
                        CI.type="classic",CI.level=.95,boot.type="basic",boot.R=10000,digits=2){
  
  # renaming variables
  colnames(data) <- gsub(idCol,"ID",colnames(data))
  colnames(data) <- gsub(RefCol,"ref",colnames(data))
  colnames(data) <- gsub(deviceCol,"device",colnames(data))
  
  # function to print error info as "mean (sd) [CI]"
  printInfo <- function(condition){
    m <- round(mean(errMatrix[,condition],na.rm=TRUE),digits)
    s <- round(sd(errMatrix[,condition],na.rm=TRUE),digits)
    if(CI.type=="classic"){ require(DescTools)
      CI.up <- round(MeanCI(errMatrix[,condition],method="classic")[3],digits)
      CI.lo <- round(MeanCI(errMatrix[,condition],method="classic")[2],digits)
    } else if(CI.type=="boot"){ require(boot)
      CI.lo <- round(boot.ci(boot(errMatrix[,condition],
                                function(dat,idx)mean(dat[idx],na.rm=TRUE),R=boot.R),
                             type=boot.type,conf=CI.level)[[4]][4],digits)
      CI.up <- round(boot.ci(boot(errMatrix[,condition],
                                function(dat,idx)mean(dat[idx],na.rm=TRUE),R=boot.R),
                             type=boot.type,conf=CI.level)[[4]][5],digits)
    } else { stop("Error: CI.type can be either 'classic' or 'boot'") }
    if(CI.lo < 0) CI.lo = 0 # bounding at zero
    paste(m," (",s,") [",CI.lo,", ",CI.up,"]",sep="") }
  
  # .................................
  # sleep staging is available
  # .................................
  
  if(staging==TRUE){
    
    # setting stages as 0 = wake, 1 = light, 2 = deep, 3 = REM
    data$ref <- as.integer(as.character(factor(data$ref,levels=as.numeric(stages),labels=c(0,1,2,3))))
    data$device <- as.integer(as.character(factor(data$device,levels=as.numeric(stages),labels=c(0,1,2,3))))
    
    # function to generate error matrix
    generateMatrix <- function(data,matrixType){
      # empty dataframe for error matrix
      errMatrix <- as.data.frame(matrix(nrow=0,ncol=5))
      for(i in 0:3){
        # dividend based on matrix type
        h <- ifelse(matrixType=="prop",nrow(data[data$ref==i,]),1)
        # filling matrix with number (sum) or proportion (prop) of epochs in each condition
        errMatrix <- rbind(errMatrix,
                           data.frame(reference=names(stages)[i+1],
                                      device_wake=nrow(data[data$device==0 & data$ref==i,])/h,
                                      device_light=nrow(data[data$device==1 & data$ref==i,])/h,
                                      device_deep=nrow(data[data$device==2 & data$ref==i,])/h,
                                      device_REM=nrow(data[data$device==3 & data$ref==i,])/h,
                                      reference_tot=nrow(data[data$ref==i,]))) }
      return(rbind(errMatrix,
                   # adding marginal sums
                   data.frame(reference="device_tot",device_wake=nrow(data[data$device==0,]),
                              device_light=nrow(data[data$device==1,]),device_deep=nrow(data[data$device==2,]),
                              device_REM=nrow(data[data$device==3,]),reference_tot=nrow(data))))
    }
      
    # generating absolute error matrix
    if(matrixType=="sum"){ 
      errMatrix <- generateMatrix(data,matrixType)
      
    # generating averaged percent error matrix 
      } else if(matrixType=="prop"){
        # empty dataframe for error matrix
        errMatrix <- as.data.frame(matrix(nrow=0,ncol=17))
        # on matrix per subject (in wide form)
        for(ID in levels(as.factor(data$ID))){
          errMatrix <- rbind(errMatrix,
                             reshape(data=cbind(ID=rep(ID,length(stages)),
                                                generateMatrix(data[data$ID==ID,],matrixType)[1:4,1:5]),
                                     idvar="ID",v.names=paste("device",c("wake","light","deep","REM"),sep="_"),
                                     timevar="reference",direction="wide")) }
        
        # using printInfo for reporting mean, sd, and CI
        errMatrix <- data.frame(reference=c("wake","light","deep","REM"),
                                device_wake=unlist(lapply(colnames(errMatrix)[substr(colnames(errMatrix),
                                                                                     8,10)=="wak"],printInfo)),
                                device_light=unlist(lapply(colnames(errMatrix)[substr(colnames(errMatrix),
                                                                                      8,10)=="lig"],printInfo)),
                                device_deep=unlist(lapply(colnames(errMatrix)[substr(colnames(errMatrix),
                                                                                     8,10)=="dee"],printInfo)),
                                device_REM=unlist(lapply(colnames(errMatrix)[substr(colnames(errMatrix),
                                                                                    8,10)=="REM"],printInfo)))
        
        } else { stop("Error: matrixType argument can be either 'prop' or 'sum'") }
    
  # .................................
  # sleep staging is *not* available
  # .................................
    
  } else {
    
    # setting stages
    data$ref <- as.integer(as.character(factor(data$ref,levels=as.numeric(stages),labels=c(0,1))))
    data$device <- as.integer(as.character(factor(data$device,levels=as.numeric(stages),labels=c(0,1))))
    
    # function to generate error matrix
    generateMatrix <- function(data,matrixType){
      # empty dataframe for error matrix
      errMatrix <- as.data.frame(matrix(nrow=0,ncol=3))
      for(i in 0:1){
        # dividend based on matrix type
        h <- ifelse(matrixType=="prop",nrow(data[data$ref==i,]),1)
        # filling matrix with number (sum) or percentage (perc) of epochs in each condition
        errMatrix <- rbind(errMatrix,
                           data.frame(reference=names(stages)[i+1],
                                      device_wake=nrow(data[data$device==0 & data$ref==i,])/h,
                                      device_sleep=nrow(data[data$device==1 & data$ref==i,])/h,
                                      reference_tot=nrow(data[data$ref==i,]))) }
      return(rbind(errMatrix,
                   # adding marginal sums
                   data.frame(reference="device_tot",device_wake=nrow(data[data$device==0,]),
                              device_sleep=nrow(data[data$device==1,]),reference_tot=nrow(data))))
    }
    
    # generating absolute error matrix
    if(matrixType=="sum"){
      errMatrix <- generateMatrix(data,matrixType)
      
    # generating averaged error matrix 
      } else if(matrixType=="prop"){
        errMatrix <- as.data.frame(matrix(nrow=0,ncol=5))
        for(ID in levels(as.factor(data$ID))){
          errMatrix <- rbind(errMatrix,
                             reshape(data=cbind(ID=rep(ID,length(stages)),
                                                generateMatrix(data[data$ID==ID,],matrixType)[1:2,1:3]),
                                     idvar="ID",v.names=paste("device",c("wake","sleep"),sep="_"),
                                     timevar="reference",direction="wide"))
        }
        
        errMatrix <- data.frame(reference=c("wake","sleep"),
                                device_wake=unlist(lapply(colnames(errMatrix)[substr(colnames(errMatrix),
                                                                                      8,10)=="wak"],printInfo)),
                                device_sleep=unlist(lapply(colnames(errMatrix)[substr(colnames(errMatrix),
                                                                                     8,10)=="sle"],printInfo)))
        
        } else { cat('Error: matrixType argument can be either "sum" or "prop"')}
    
  }
  
  return(errMatrix)
  
}

SLEEP STAGING

Here, we use the function to generate the error matrix by assuming that sleep staging is allowed by the device. The matrixType argument is set as “prop” to generate the ‘proportional error matrix’, reporting the group average proportion of epochs in each condition.

errorMatrix(data = raw.data, idCol = "subject", RefCol = "reference", deviceCol = "device",
            staging = TRUE, stages = c(wake = 0, light = 1, deep = 2, REM = 3), matrixType="prop",
            CI.type = "boot", boot.type = "basic", boot.R = 10000, CI.level = .95, digits = 2)

Alternatively, the matrixType argument is set as “sum” to generate the ‘absolute error matrix’, where the number of epochs in each condition is reported.

errorMatrix(data = raw.data, idCol = "subject", RefCol = "reference", deviceCol = "device",
            staging = TRUE, stages = c(wake = 0, light = 1, deep = 2, REM = 3), matrixType="sum",
            CI.type = "classic", CI.level = .95, digits = 2)

SLEEP/WAKE

Here, we use the function to generate the error matrix by assuming that only sleep/wake patterns (and not sleep staging) are allowed by the device. This is done by using the dichotomic dataset we created in Section 1. The matrixType argument is set as “prop” to report the group average proportion of epochs in each condition.

errorMatrix(data = dic.data, idCol = "subject", RefCol = "reference", deviceCol = "device",
            staging = FALSE, stages = c(wake = 10, sleep = 5), matrixType = "prop",
            CI.type = "classic", boot.type = "basic", CI.level = .95, digits = 2)

Alternatively, the matrixType argument is set as “sum” to generate the absolute error matrix, where the sum of epochs in each condition is reported.

errorMatrix(data = dic.data, idCol = "subject", RefCol = "reference", deviceCol = "device",
            staging = FALSE, stages = c(wake = 10, sleep = 5), matrixType = "sum",
            CI.type = "classic", boot.type = "basic", CI.level = .95, digits = 2)

3.2. Individual-level EBE metrics

Here, we compute the basic EBE metrics (i.e., accuracy, sensitivity, specificity) for each subject and for each stage of interest, as described in the main article. Outcomes can be also plotted to visually check for distribution and extreme values. This is done with the indEBE.R function:

data: data.frame including the raw data to be used, organized with the data structure shown in Section 1.
idCol, RefCol and deviceCol: character strings indicating the names of the columns for subjects, reference and device epochs, respectively.
stage: numeric value corresponding to the stage of interest (default: 0).
stageLabel: character string indicating the label corresponding to the stage of interest (default: “wake”).
doPlot: logical value indicating if the data should be plotted (default: TRUE, requires the ggplot2 and the reshape2 package).
digits: numeric value specifying the decimal precision for the output (default: 2).

Show function

indEBE <- function(data=NA,idCol="subject",RefCol="reference",deviceCol="device",doPlot=TRUE,
                            stage=0,stageLabel="wake",digits=2){
  
  # renaming variables
  colnames(data) <- gsub(idCol,"ID",colnames(data))
  colnames(data) <- gsub(RefCol,"ref",colnames(data))
  colnames(data) <- gsub(deviceCol,"device",colnames(data))
  
  # empty dataframe to be filled
  results <- as.data.frame(matrix(nrow=0,ncol=4))
  
  for(ID in levels(as.factor(data$ID))){
    
    d <- data[data$ID==ID,]
    
    # .....................
    # DYCHOTOMIC ENCODING
    # .....................
    
    # reference dychotomous recoding
    d$reference_EBE <- NA
    d[d$ref!=stage,"reference_EBE"] = 0 # other stages = 0
    d[d$ref==stage,"reference_EBE"] = 1 # target stage = 1
    
    # device dychotomous recoding
    d$device_EBE = NA
    d[d$device==stage,"device_EBE"] = 1 # target stage = 1
    d[d$device!=stage,"device_EBE"] = 0 # other stages = 0
    
    # ...................
    # CONFUSION MATRIX
    # ...................
    
    errMatrix <- data.frame(device=c("positive","negative"),
                            ref_positive=c(nrow(d[d$device_EBE==1 & d$reference_EBE==1,]), # true wake
                                           nrow(d[d$device_EBE==0 & d$reference_EBE==1,])), # false sleep
                            ref_negative=c(nrow(d[d$device_EBE==1 & d$reference_EBE==0,]), # false wake
                                           nrow(d[d$device_EBE==0 & d$reference_EBE==0,]))) # true sleep
    
    # ..............
    # EBE MEASURES
    # ..............
    
    results <- rbind(results,
                     data.frame(subject=ID,
                                
                                # accuracy
                                acc=round(100*(errMatrix[1,2]+errMatrix[2,3])/
                                            (errMatrix[1,2]+errMatrix[2,2]+errMatrix[1,3]+errMatrix[2,3]),digits),
                                
                                # sensitivity
                                sens=round(100*errMatrix[1,2]/(errMatrix[1,2]+errMatrix[2,2]),digits),
                                
                                # specificity
                                spec=round(100*errMatrix[2,3]/(errMatrix[2,3]+errMatrix[1,3]),digits))) }
  
  # ..............
  # PLOTTING RESULTS
  # ..............
  
  if(doPlot==TRUE){
    
    require(ggplot2)
    require(reshape2)
    
    melted.results <- melt(results)
    p <- ggplot(data=melted.results,aes(x=variable,y=subject,fill=value)) +
      geom_tile() +
      scale_fill_gradient2(low="#f03b20",high="lightgreen",mid="yellow",
                           limit = c(0,100),midpoint = 50,
                           space = "Lab",
                           name="% discrepancy",
                           guide="legend",
                           breaks=round(seq(100,0,length.out = 11),2),
                           minor_breaks=round(seq(100,0,length.out = 11),2)) +
      geom_text(data=melted.results,aes(x=variable,y=subject,label=value),color="black",size=3) +
      ggtitle(paste("Epoch-by-epoch statistics by subject for",stageLabel)) +
      theme(panel.background = element_blank())
    
    colnames(results)[2:4] <- paste(stageLabel,c("acc","sens","spec")) 
    
    return(list(results,p))
    
  } else { 
    
    colnames(results)[2:4] <- paste(stageLabel,c("acc","sens","spec")) 
    return(results) 
    
    }
  
}

SLEEP STAGING

Here, we use the function to generate individual EBE metrics for each considered stage, by assuming that sleep staging is allowed by the device. The cbind function is used to concatenate the information on each stage in a single output.

(ebe <- cbind(indEBE(data = raw.data, stage = 0, stageLabel = "wake", doPlot = FALSE, digits = 2),
              indEBE(data = raw.data, stage = 1, stageLabel = "light", doPlot = FALSE, digits = 2)[,2:4],
              indEBE(data = raw.data, stage = 2, stageLabel = "deep", doPlot = FALSE, digits = 2)[,2:4],
              indEBE(data = raw.data, stage = 3, stageLabel = "REM", doPlot = FALSE, digits = 2)[,2:4]))

SLEEP/WAKE

Here, we use the function to generate individual accuracy metrics for sleep and wake detection, by assuming that only sleep/wake patterns (and not sleep staging) are allowed by the device. Note that only sleep-related metrics (and not wake-related metrics) are reported, since with binary data the metrics for a given category mirror those of the other category (i.e., wake sensitivity is sleep specificity and vice versa, whereas accuracy is the same for both categories).

(ebe.dic <- indEBE(data = dic.data, stage = 5, stageLabel = "sleep", doPlot = FALSE, digits = 2))

3.2.1. Data visualization

Here, we show how the output can be used to generate boxplots of sensitivity levels in each stage (the same applies to specificity or accuracy by substituting “sens” with “spec” or “acc”, respectively). Then, we set doPlot = TRUE to plot the results for each accuracy metric (using a table format).

SLEEP STAGING

Here, boxplots are generated by assuming that sleep staging is allowed by the device.

library(reshape2); library(ggplot2)

indEBE_melt <- melt(ebe[, c("subject", colnames(ebe)[grepl("sens", colnames(ebe)) == TRUE])])
indEBE_melt$variable <- as.factor(gsub(" sens","",indEBE_melt$variable))
ggplot(indEBE_melt,aes(x = variable, y = value, fill = variable)) + xlab("stage") + ylab("sensitivity (%)") + 
  geom_boxplot(size=1.5, outlier.shape = NA, alpha = .5, width = .5, colour = "black") +
  geom_point(aes(col=variable),position = position_jitter(width = .15), size = 2) + 
  theme(legend.position = "none")

WAKE

indEBE(data = raw.data, stage = 0, stageLabel = "wake", doPlot = TRUE, digits = 2)[2]

## [[1]]

LIGHT SLEEP

indEBE(data = raw.data, stage = 1, stageLabel = "light", doPlot = TRUE, digits = 2)[2]

## [[1]]

DEEP SLEEP

indEBE(data = raw.data, stage = 2, stageLabel = "deep", doPlot = TRUE, digits = 2)[2]

## [[1]]

REM SLEEP

indEBE(data = raw.data, stage = 3, stageLabel = "REM", doPlot = TRUE, digits = 2)[2]

## [[1]]

SLEEP/WAKE

Here, boxplots are generated by assuming that only sleep/wake patterns (and not sleep staging) are allowed by the device. Note that with binary data the sensitivity in detecting one category (e.g., wake) corresponds to the specificity in detecting the other category (e.g., sleep).

library(reshape2); library(ggplot2)

ebe.dic <- cbind(indEBE(data = dic.data, stage = 5, stageLabel = "sleep", doPlot = FALSE, digits = 2),
                 indEBE(data = dic.data, stage = 10, stageLabel = "wake", doPlot = FALSE, digits = 2))

indEBE_melt <- melt(ebe.dic[, c("subject",colnames(ebe.dic)[grepl("sens", colnames(ebe.dic)) == TRUE])])
indEBE_melt$variable <- as.factor(gsub(" sens","",indEBE_melt$variable))
ggplot(indEBE_melt,aes(x = variable, y = value, fill = variable)) + ylab("sensitivity (%)") + 
  geom_boxplot(size=1.5, outlier.shape = NA, alpha = .5, width = .5, colour = "black") +
  geom_point(aes(col=variable),position = position_jitter(width = .15), size = 2) + 
  theme(legend.position = "none")

Here, we use the function by setting doPlot = TRUE to plot the results for each EBE metric (using a table format). Green colors indicate higher individual accuracy, whereas red colors indicate lower individual accuracy. Note that only sleep-related metrics (and not wake-related metrics) are reported, since with binary data the metrics for a given category mirror those of the other category (i.e., wake sensitivity is sleep specificity and vice versa, whereas accuracy is the same for both categories).

indEBE(data = dic.data, stage = 5, stageLabel = "sleep", doPlot = TRUE, digits = 2)[2]

## [[1]]

3.3. Group-level EBE metrics

Here, we compute the group-level basic (i.e., accuracy, sensitivity, specificity) and advanced EBE metrics (e.g., positive predictive value, Cohen’s kappa) for each stage of interest, as described in the main article . Outcomes can be also plotted to visually check for distribution and extreme values. This is done with the groupEBE.R function, which allows to specify the following arguments:

data: data.frame including the raw data to be used, organized with the data structure shown in Section 1.
idCol, RefCol and deviceCol: characters indicating the names of the columns for subjects, reference and device epochs, respectively.
stage: numeric value corresponding to the stage of interest (default: 0).
stageLabel: label indicating the stage of interest (default: “wake”).
metricsType: character string indicating the type of metrics to be generated: “avg” (defaut) for reporting individual EBE metrics averaged between subjects, and presented as “mean (sd) [CI]” (requires the DescTools package for computing CI), “sum” for reporting EBE metrics based on the absolute error matrix (i.e., using the total sum of epochs in each condition, without distinguishing between subjects).
CI.type: character string indicating the type of confidence intervals to be used: “classic” (default) returns intervals based on the t-distribution, “boot” returns intervals for means by bootstrap with 10000 replicates (see ?boot.ci). This argument is not considered when metricsType = “sum”.
CI.level: confidence intervals to be computed on bias and LOAs (default: .95). This argument is not considered when metricsType = “sum”.
boot.type: character string indicating the type of bootstrapped confidence intervals. The value should be one of “basic” (default), “norm”, “stud”, “perc”, or “bca” (see ?boot.ci for details). This argument is not considered when metricsType = “sum” or when CI.type = “classic”.
boot.R: numeric value indicating the number of boostrap replicates (default: 10,000). Higher values require more time for computing CI (see also ?boot). This argument is not considered when metricsType = “sum” or when CI.type = “classic”.
advancedMetrics: logical value indicating wether additional epoch-by-epoch metrics (i.e., PPV, NPV, kappa, PABAK, bias, and prevalence index) should be reported (default=FALSE, requires to install the epiR package).
doROC: logical value indicating wether the receiver operating characteristics (ROC) curves should be plotted (default: FALSE, if TRUE requires to install the ROCR package).
digits: numeric value specifying the decimal precision for the output (default: 2).

Show function

groupEBE <- function(data=NA,idCol="subject",RefCol="reference",deviceCol="device",
                     stage=5,stageLabel="sleep",metricsType="avg",CI.type="classic",CI.level=.95,boot.type="basic",boot.R=10000,
                     advancedMetrics=FALSE,doROC=FALSE,digits=2){
  
  # renaming variables
  colnames(data) <- gsub(idCol,"ID",colnames(data))
  colnames(data) <- gsub(RefCol,"ref",colnames(data))
  colnames(data) <- gsub(deviceCol,"device",colnames(data))
  
  # ............................
  # DYCHOTOMIC RECODING
  # ............................
    
  # reference dychotomous recoding
  data$ref_EBE <- NA
  data[data$ref!=stage,"ref_EBE"] = 0 # other stages = 0
  data[data$ref==stage,"ref_EBE"] = 1 # target stage = 1
    
  # device dychotomous recoding
  data$device_EBE = NA
  data[data$device==stage,"device_EBE"] = 1 # target stage = 1
  data[data$device!=stage,"device_EBE"] = 0 # other stages = 0
  
  generateAccMetrics <- function(data,advancedMetrics){
    
    # ............................
    # CONFUSION MATRIX
    # ............................
    
    errMatrix <- data.frame(device=c("positive","negative"),
                            ref_positive=c(nrow(data[data$device_EBE==1 & data$ref_EBE==1,]), # true wake
                                           nrow(data[data$device_EBE==0 & data$ref_EBE==1,])), # false sleep
                            ref_negative=c(nrow(data[data$device_EBE==1 & data$ref_EBE==0,]), # false wake
                                           nrow(data[data$device_EBE==0 & data$ref_EBE==0,]))) # true sleep
    
    # ............................
    # BASIC EBE METRICS
    # ............................
    
    # accuracy
    accuracy <- round(100*(errMatrix[1,2]+errMatrix[2,3])/
                        (errMatrix[1,2]+errMatrix[2,2]+errMatrix[1,3]+errMatrix[2,3]),digits)
    # sensitivity
    sensitivity <- round(100*errMatrix[1,2]/(errMatrix[1,2]+errMatrix[2,2]),digits)
    # specificity
    specificity <- round(100*errMatrix[2,3]/(errMatrix[2,3]+errMatrix[1,3]),digits)
    
    # ............................
    # ADDITIONAL EBE METRICS
    # ............................
    
    if(advancedMetrics==TRUE){
      
      require(epiR)
    
      # prevalence
      prevalence <- round(100*(errMatrix[1,2]+errMatrix[2,2])/
                            (errMatrix[1,2]+errMatrix[2,2]+errMatrix[1,3]+errMatrix[2,3]),2)
      # positive predictive value
      ppv <- round(100*(sensitivity*prevalence)/
                     (sensitivity*prevalence + (100-specificity)*(100-prevalence)),digits)
      # negative predictive value
      npv <- round(100*(specificity*(100-prevalence))/
                     ((100-prevalence)* specificity + prevalence*(100-sensitivity)),digits)
      # Cohen's kappa
      kappa <- round(epi.kappa(as.table(as.matrix(errMatrix[1:2,2:3])))$kappa$est,digits)
      # PABAK
      pabak <- round(epi.kappa(as.table(as.matrix(errMatrix[1:2,2:3])))$pabak$est,digits)
      # bias index
      bindex <- round(epi.kappa(as.table(as.matrix(errMatrix[1:2,2:3])))$bindex$est,digits)
      # prevalence index
      pindex <- round(epi.kappa(as.table(as.matrix(errMatrix[1:2,2:3])))$pindex$est,digits)
      
      # creating dataframe with basic and advanced metrics
      results <- data.frame(stage=stageLabel,
                            accuracy=accuracy,sensitivity=sensitivity,specificity=specificity,
                            ppv=ppv,npv=npv,kappa=kappa,pabak=pabak,pindex=pindex)
      
      } else { # only basic metrics
        results <- data.frame(stage=stageLabel,
                            accuracy=accuracy,sensitivity=sensitivity,specificity=specificity) } 
    
    return(results)
  }
  
  # generating metrics from absolute error matrix
  if(metricsType=="sum"){ 
    results <- generateAccMetrics(data,advancedMetrics)
      
  # generating metrics from averaged error matrix    
  } else if(metricsType=="avg"){
    # empty dataframe to be filled with individual EBE metrics
    results <- as.data.frame(matrix(nrow=0,ncol=ncol(generateAccMetrics(data,advancedMetrics))))
    for(ID in levels(as.factor(data$ID))){
      results <- rbind(results,
                       data.frame(ID,generateAccMetrics(data[data$ID==ID,],advancedMetrics))) 
      }
    
    # function to print EBE metrics info as "mean (sd) [CI]"
    printInfo <- function(results,metric){
      require(DescTools)
      m <- round(mean(results[,metric],na.rm=TRUE),digits)
      s <- round(sd(results[,metric],na.rm=TRUE),digits)
      if(CI.type=="classic"){ require(DescTools)
        CI.lo <- round(MeanCI(results[,metric],method=CI.type,btype=boot.type,R=boot.R)[2],digits)
        CI.up <- round(MeanCI(results[,metric],method=CI.type,btype=boot.type,R=boot.R)[3],digits)
      } else if (CI.type=="boot"){ library(boot)
      CI.lo <- round(boot.ci(boot(results[,metric],
                                  function(dat,idx)mean(dat[idx],na.rm=TRUE),R=boot.R),
                             type=boot.type,conf=CI.level)[[4]][4],digits)
      CI.up <- round(boot.ci(boot(results[,metric],
                                function(dat,idx)mean(dat[idx],na.rm=TRUE),R=boot.R),
                             type=boot.type,conf=CI.level)[[4]][5],digits)
      } else { stop("Error: CI.type can be either 'classic' or 'boot'") }
    paste(m," (",s,") [",CI.lo,", ",CI.up,"]",sep="") }
    
    # creating and filling data.frame of EBE metrics information
    out <- as.data.frame(matrix(nrow=0,ncol=ncol(results)-2))
    for(i in 1:ncol(out)){ out[1,i] <- printInfo(results,colnames(results)[i+2]) } 
    colnames(out) <- colnames(results)[3:ncol(results)]
    results <- cbind(stage=stageLabel,out)
    
  } else { stop("Error: metricsType argument can be either 'avg' or 'sum'") }
  
  # ROC curves
  if(doROC==TRUE){
    
    require(ROCR)
    p <- plot(performance(prediction(data$device_EBE,data$ref_EBE),"tpr","fpr"),lwd=2,
              main=paste(stageLabel,"- ROC curves"),xlab="1-specificity",ylab="sensitivity")
    abline(0,1,col="darkgray")
    res <- list(results,p)
  }
  
  return(results)
  
}

SLEEP STAGING

BASIC

Here, we use the function to generate the basic EBE metrics by assuming that sleep staging is allowed by the device. The metricsType argument is set as “avg” to report individual EBE metrics averaged between subjects. The rbind function is used to concatenate the information on each stage in a single output. Some arguments are omitted since default settings are used for each stage.

rbind(groupEBE(data=raw.data,stage=0,stageLabel="wake",metricsType="avg",CI.type="boot",advancedMetrics=FALSE),
      groupEBE(data=raw.data,stage=1,stageLabel="light",metricsType="avg",CI.type="boot",advancedMetrics=FALSE),
      groupEBE(data=raw.data,stage=2,stageLabel="deep",metricsType="avg",CI.type="boot",advancedMetrics=FALSE),
      groupEBE(data=raw.data,stage=3,stageLabel="REM",metricsType="avg",CI.type="boot",advancedMetrics=FALSE))

Alternatively, the metricsType argument can be set as “sum” to report EBE metrics based on the absolute error matrix (i.e., using the total sum of epochs in each condition, without distinguishing between subjects).

rbind(groupEBE(data=raw.data,stage=0,stageLabel="wake",metricsType="sum",advancedMetrics=FALSE),
      groupEBE(data=raw.data,stage=1,stageLabel="light",metricsType="sum",advancedMetrics=FALSE),
      groupEBE(data=raw.data,stage=2,stageLabel="deep",metricsType="sum",advancedMetrics=FALSE),
      groupEBE(data=raw.data,stage=3,stageLabel="REM",metricsType="sum",advancedMetrics=FALSE))

ADVANCED

Here, we use the function to generate both the basic and the additional accuracy metrics (see the main article) by assuming that sleep staging is allowed by the device. The metricsType argument is set as “avg” to report individual EBE metrics averaged between subjects. The rbind function is used to concatenate the information on each stage in a single output. Some arguments are omitted since default settings are used for each stage.

rbind(groupEBE(data=raw.data,stage=0,stageLabel="wake",metricsType="avg",CI.type="boot",advancedMetrics=TRUE),
      groupEBE(data=raw.data,stage=1,stageLabel="light",metricsType="avg",CI.type="boot",advancedMetrics=TRUE),
      groupEBE(data=raw.data,stage=2,stageLabel="deep",metricsType="avg",CI.type="boot",advancedMetrics=TRUE),
      groupEBE(data=raw.data,stage=3,stageLabel="REM",metricsType="avg",CI.type="boot",advancedMetrics=TRUE))

rbind(groupEBE(data=raw.data,stage=0,stageLabel="wake",metricsType="sum",advancedMetrics=TRUE),
      groupEBE(data=raw.data,stage=1,stageLabel="light",metricsType="sum",advancedMetrics=TRUE),
      groupEBE(data=raw.data,stage=2,stageLabel="deep",metricsType="sum",advancedMetrics=TRUE),
      groupEBE(data=raw.data,stage=3,stageLabel="REM",metricsType="sum",advancedMetrics=TRUE))

SLEEP/WAKE

BASIC

Here, we use the function to generate the basic EBE metrics by assuming that only sleep/wake patterns (and not sleep staging) are allowed by the device (note that in this case wake and sleep information are symmetrical). The metricsType argument is set as “avg” to report individual EBE metrics averaged between subjects. The rbind function is used to concatenate the information on each stage in a single output. Some arguments are omitted since default settings are used for each stage. Note that only sleep-related metrics (and not wake-related metrics) are reported, since with binary data the metrics for a given category mirror those of the other category (i.e., wake sensitivity is sleep specificity and vice versa, whereas accuracy is the same for both categories).

groupEBE(data=dic.data,stage=5,stageLabel="sleep",metricsType="avg",CI.type="boot",advancedMetrics=FALSE)

groupEBE(data=dic.data,stage=5,stageLabel="sleep",metricsType="sum",advancedMetrics=FALSE)

ADVANCED

Here, we use the function to generate both the basic and the “advanced” EBE metrics by assuming that only sleep/wake patterns (and not sleep staging) are allowed by the device (note that in this case wake and sleep information are symmetrical). The metricsType argument is set as “avg” to report individual EBE metrics averaged between subjects. The rbind function is used to concatenate the information on each stage in a single output. Some arguments are omitted since default settings are used for each stage. Note that only sleep-related metrics (and not wake-related metrics) are reported, since with binary data the metrics for a given category mirror those of the other category (i.e., wake sensitivity is sleep specificity and vice versa, whereas accuracy is the same for both categories).

groupEBE(data=dic.data,stage=5,stageLabel="sleep",metricsType="avg",CI.type="boot",advancedMetrics=TRUE)

groupEBE(data=dic.data,stage=5,stageLabel="sleep",metricsType="sum",advancedMetrics=TRUE)

3.3.1. Data visualization

Here, we use the function by setting doROC = TRUE to plot the ROC curves for each stage. Some arguments are omitted since default settings are used for each stage.

SLEEP STAGING

Here, the function is applied by assuming that sleep staging is allowed by the device.

WAKE

groupEBE(data = raw.data, stage = 0, stageLabel = "wake", metricsType = "sum", doROC = TRUE)[[2]]

## [1] 90.93

LIGHT SLEEP

groupEBE(data = raw.data, stage = 1, stageLabel = "light", metricsType = "sum", doROC = TRUE)[[2]]

## [1] 68.32

DEEP SLEEP

groupEBE(data = raw.data, stage = 2, stageLabel = "deep", metricsType = "sum", doROC = TRUE)[[2]]

## [1] 84.51

REM SLEEP

groupEBE(data = raw.data, stage = 3, stageLabel = "REM", metricsType = "sum", doROC = TRUE)[[2]]

## [1] 88.13

SLEEP/WAKE

Here, the function is applied by assuming that only sleep/wake patterns (and not sleep staging) are allowed by the device (note that in this case wake and sleep information are symmetrical).

WAKE

groupEBE(data = dic.data, stage = 10, stageLabel = "wake", metricsType = "sum", doROC = TRUE)[[2]]

## [1] 90.93

SLEEP

groupEBE(data = dic.data, stage = 5, stageLabel = "sleep", metricsType = "sum", doROC = TRUE)[[2]]

## [1] 90.93

References

Bland, J. M., & Altman, D. G. (1999). Measuring agreement in method comparison studies. Statistical Methods in Medical Research, 8(2), 135–160. https://doi.org/10.1177/096228029900800204

Depner, C. M., Cheng, P. C., Devine, J. K., Khosla, S., de Zambotti, M., Robillard, R., Vakulin, A., & Drummond, S. P. A. (2019). Wearable Technologies for Developing Sleep and Circadian Biomarkers: A Summary of Workshop Discussions. Sleep. https://doi.org/10.1093/sleep/zsz254

de Zambotti, M., Cellini, N., Goldstone, A., Colrain, I. M., & Baker, F. C. (2019). Wearable Sleep Technology in Clinical and Research Settings. Medicine & Science in Sports & Exercise, 51(7), 1538–1557. https://doi.org/10.1249/MSS.0000000000001947

Euser, A. M., Dekker, F. W., & le Cessie, S. (2008). A practical approach to Bland-Altman plots and variation coefficients for log transformed variables. Journal of Clinical Epidemiology, 61(10), 978–982. https://doi.org/10.1016/j.jclinepi.2007.11.003

Required packages

Attali, D., & Baker, C. (2019). ggExtra: Add Marginal Histograms to ‘ggplot2’, and More ‘ggplot2’ Enhancements. R package version 0.9.
https://cran.r-project.org/web/packages/ggExtra

Canty, A., & Ripley, B. (2019). boot: Bootstrap R (S-Plus) Functions. R package version 1.3-23. https://cran.r-project.org/web/packages/boot

Wickham, H. (2007). Reshaping Data with the reshape Package. Journal of Statistical Software, 21(12), 1-20.
https://cran.r-project.org/web/packages/reshape2

Lehnert, B. (2015). BlandAltmanLeh: Plots (Slightly Extended) Bland-Altman Plots. R package version 0.3.1.
https://cran.r-project.org/web/packages/BlandAltmanLeh

Signorell, A. et al. (2020). DescTools: Tools for descriptive statistics. R package version 0.99.32.
https://cran.r-project.org/web/packages/DescTools

Sing T., Sander O, Beerenwinkel, N., & Lengauer, T. (2005). ROCR: visualizing classifier performance in R. Bioinformatics, 21(20), 3940-3941.
https://cran.r-project.org/web/packages/ROCR

Stevenson, M., Nunes, T., Heuer, C., …, & Reynard, C. (2019). epiR: Tools for the Analysis of Epidemiological Data. R package version 1.0-10.
https://cran.r-project.org/web/packages/epiR

Wickham, H. (2016). ggplot2: Elegant Graphics for Data Analysis. New York: Springer-Verlag. R package version 3.2.1.
https://cran.r-project.org/web/packages/ggplot2/index.html