“A performance validation of six commercial wrist-worn wearable sleep-tracking devices for sleep stage scoring compared to polysomnography”

Author: An‑Marie Schyvens et al.
Journal: Sleep Advances (2025 – 6)

This study evaluated the accuracy of six popular consumer wrist-worn sleep-tracking devices: Fitbit Charge 5, Fitbit Sense, Withings Scanwatch, Garmin Vivosmart 4, Whoop 4.0, and Apple Watch Series 8 — in estimating sleep parameters and staging, compared to polysomnography (PSG), the clinical gold standard for sleep assessment.

Methods
  • Participants: 62 adults (52 males, 10 females; mean age 46 ± 12.6 years).
  • Each participant underwent one night of simultaneous PSG and 2–4 wearable recordings.
  • Devices were analyzed for total sleep time (TST), sleep efficiency (SE), wake after sleep onset (WASO), sleep onset latency (SOL), and sleep stage differentiation (wake, light, deep, and REM).
  • Statistical comparisons included Bland–Altman plots, sensitivity/specificity, and Cohen’s kappa to measure agreement with PSG.
Key Findings
  • Overall performance:
    • All wearables displayed significant differences from PSG in at least some parameters.
    • Most devices overestimated total sleep time and sleep efficiency, and underestimated wake and WASO.
  • Accuracy across devices:
    • Fitbit Sense (κ = 0.42), Fitbit Charge 5 (κ = 0.41), and Apple Watch Series 8 (κ = 0.53) showed moderate agreement with PSG.
    • Whoop 4.0 (κ = 0.37), Withings Scanwatch (κ = 0.22), and Garmin Vivosmart 4 (κ = 0.21) showed only fair agreement.
  • Sensitivity & Specificity:
    • All devices detected >90% of sleep epochs (high sensitivity).
    • Specificity (wake detection) was markedly lower, ranging 29–52%.
  • Sleep stage detection:
    • Deep sleep (N3) and REM were more accurately identified than wake or light sleep.
    • The Apple Watch Series 8 achieved the highest REM accuracy (≈69%).
    • The Garmin Vivosmart 4 had the poorest stage differentiation.
  • Data reliability:
    • Some devices (Garmin and Apple Watch) had partial data loss, though no malfunctions were observed.
Interpretation

Although none of the devices matched the precision of PSG, several — especially Fitbit Sense, Fitbit Charge 5, and Apple Watch Series 8 — show clinically acceptable accuracy for monitoring total sleep time and sleep efficiency. These devices could be useful for tracking general sleep trends and long-term changes in sleep architecture, though they remain unsuitable as replacements for PSG in clinical diagnosis.

Conclusions
  • Current consumer wearables cannot replace PSG but can complement it for non-clinical sleep tracking and large-scale sleep research.
  • Devices perform better in identifying sleep than in detecting wakefulness.
  • Future improvements in sensor algorithms and data transparency are needed to enhance reliability.
  • These findings provide one of the most comprehensive, up-to-date validations of modern consumer sleep trackers.

Note: This summary was generated with the assistance of Claude Opus 4.1 based on the original paper, with the aim of translating the research into practical insights for coaches and practitioners.

Niels de Vries
Niels de Vries
Articles: 151