“A performance validation of six commercial wrist-worn wearable sleep-tracking devices for sleep stage scoring compared to polysomnography”
Author: An‑Marie Schyvens et al.
Journal: Sleep Advances (2025 – 6)
This study evaluated the accuracy of six popular consumer wrist-worn sleep-tracking devices: Fitbit Charge 5, Fitbit Sense, Withings Scanwatch, Garmin Vivosmart 4, Whoop 4.0, and Apple Watch Series 8 — in estimating sleep parameters and staging, compared to polysomnography (PSG), the clinical gold standard for sleep assessment.
Methods
- Participants: 62 adults (52 males, 10 females; mean age 46 ± 12.6 years).
- Each participant underwent one night of simultaneous PSG and 2–4 wearable recordings.
- Devices were analyzed for total sleep time (TST), sleep efficiency (SE), wake after sleep onset (WASO), sleep onset latency (SOL), and sleep stage differentiation (wake, light, deep, and REM).
- Statistical comparisons included Bland–Altman plots, sensitivity/specificity, and Cohen’s kappa to measure agreement with PSG.
Key Findings
- Overall performance:
- All wearables displayed significant differences from PSG in at least some parameters.
- Most devices overestimated total sleep time and sleep efficiency, and underestimated wake and WASO.
- Accuracy across devices:
- Fitbit Sense (κ = 0.42), Fitbit Charge 5 (κ = 0.41), and Apple Watch Series 8 (κ = 0.53) showed moderate agreement with PSG.
- Whoop 4.0 (κ = 0.37), Withings Scanwatch (κ = 0.22), and Garmin Vivosmart 4 (κ = 0.21) showed only fair agreement.
- Sensitivity & Specificity:
- All devices detected >90% of sleep epochs (high sensitivity).
- Specificity (wake detection) was markedly lower, ranging 29–52%.
- Sleep stage detection:
- Deep sleep (N3) and REM were more accurately identified than wake or light sleep.
- The Apple Watch Series 8 achieved the highest REM accuracy (≈69%).
- The Garmin Vivosmart 4 had the poorest stage differentiation.
- Data reliability:
- Some devices (Garmin and Apple Watch) had partial data loss, though no malfunctions were observed.
Interpretation
Although none of the devices matched the precision of PSG, several — especially Fitbit Sense, Fitbit Charge 5, and Apple Watch Series 8 — show clinically acceptable accuracy for monitoring total sleep time and sleep efficiency. These devices could be useful for tracking general sleep trends and long-term changes in sleep architecture, though they remain unsuitable as replacements for PSG in clinical diagnosis.
Conclusions
- Current consumer wearables cannot replace PSG but can complement it for non-clinical sleep tracking and large-scale sleep research.
- Devices perform better in identifying sleep than in detecting wakefulness.
- Future improvements in sensor algorithms and data transparency are needed to enhance reliability.
- These findings provide one of the most comprehensive, up-to-date validations of modern consumer sleep trackers.
Note: This summary was generated with the assistance of Claude Opus 4.1 based on the original paper, with the aim of translating the research into practical insights for coaches and practitioners.