Aims and content

The present document integrates the article “A standardized framework for testing the performance of sleep-tracking technology: Step-by-step guidelines and open-source code”, and it includes the code for running the essential steps to test the performance of a device under assessment (e.g., a consumer wearable sleep-tracking device) in measuring sleep as compared with the gold-standard reference method (polysomnography, PSG) or alternative reference method (e.g., actigraphy, sleep diary).

The functions depicted below are available also from the following public repository: )

The document includes the following sections:

  1. Data structure: a sample dataset with the essential epoch-by-epoch data structure is loaded and considered the starting point for analyses.

  2. Discrepancy analysis: individual- and group-level sleep measures are generated for both the reference method and the device under assessment, along with their bias, the limits of agreement (LoAs), and their 95% confidence intervals. Bland-Altman plots are provided.

  3. Epoch-by-epoch analysis: error matrices (also referred to as confusion matrices), performance metrics at both the individual- and group-level, and secondary statistics are generated based on epoch-by-epoch data.

A function is provided for each step of the pipeline, which can be applied to any dataset that meets the assumptions below, to generate the respective output.


  • PSG = Polysomnography

  • TIB = Time in bed

  • TST = Total Sleep Time

  • SOL = Sleep Onset Latency

  • SE = Sleep Efficiency

  • WASO = Wake After Sleep Onset

  • REM = Rapid Eye Movement

  • EBE = Epoch-by-Epoch

  • PPV = Positive Predictive Value

  • NPV = Negative Predictive Value

  • PABAK = Prevalence-Adjusted Bias-Adjusted Kappa

  • ROC = Receiver Operating Characteristic

Note also that in the new generation of multi-sensor sleep-tracking devices providing sleep staging information, ‘Light Sleep’ is usually considered equivalent to PSG-derived N1 + N2 sleep, while ‘Deep Sleep’ is usually considered equivalent to PSG-derived N3 sleep. When testing the performance of a device, we recommend checking with the device manufacturer for sleep stage specification.


The starting point of each of the following steps is a data structure based on the following assumptions:

- Measurement systems: sleep has been measured with both a device under assessment (e.g., a consumer sleep tracker) and a reference method (e.g., PSG)

- Epoch length: both device and reference recordings have the same epoch length (e.g. 30-seconds or 1-min level).

- Recording bounds: both device and reference recordings are confined to the period between lights-off and lights-on (i.e., both recordings share the same time in bed).

- Synchronization: the device and reference recordings have been synchronized on an epoch level and encoded using the same coding system (e.g. 0 = wake in both device and reference data).

- Staging: the device provides information on PSG-equivalent sleep staging, either as sleep/wake (typical of standard actigraphy) or as wake/light/deep/REM (typical of more modern consumer sleep trackers).

- Number of nights: only one night per subject has been recorded (the same procedures might apply to multiple nights, by aggregating night-by-night outcomes).

- Missing data: the dataset should not contain any missing data (i.e., both device and reference information must exist for each epoch).

Please refer to de Zambotti et al. (2019) and Depner et al. (2019) for guidelines and details about implementation and use of consumer sleep technology.

1. Data structure

As a first step, we load a sample dataset organized with the data structure described in the main article.


Here, we show the dataset with sleep stage information, encoded as: 0 = wake, 1 = light sleep (N1 or N2), 2 = deep sleep (N3), and 3 = REM sleep. The dataset is in a long format that includes one column with the subject identifier, one column for the epoch identifier, and two columns reporting the device and the reference data, respectively.

( <- read.csv("sample_data.csv"))