Google Research has unveiled SensorLM, a family of foundation models designed to take raw data streaming from wearable sensors and turn it into natural language descriptions.
The paper, titled “SensorLM: Learning the Language of Wearable Sensors,” appeared on arXiv on July 28, 2025. It describes a system trained on 59.7 million hours of multimodal sensor data collected from 103,643 consenting users across 127 countries, using Fitbit and Pixel Watch devices.
What SensorLM actually does
SensorLM is a sensor-language model that maps raw physiological signals directly to natural language descriptions. The system uses a hierarchical captioning pipeline to auto-generate detailed descriptions from various sensor statistics, producing what Google describes as one of the largest known sensor-language datasets to date.
Among its headline capabilities is zero-shot activity recognition, meaning the model can identify what a user is doing without ever being explicitly trained on labeled examples of those specific activities. It also handles cross-modal retrieval, matching sensor readings to text descriptions and vice versa, and generates captions summarizing physiological states.
Google says SensorLM surpasses both previous specialized methodologies and general-purpose large language models on sensor understanding tasks. The model also demonstrated strong scaling behavior: feeding it more data and more compute consistently improved performance.
The data behind it
The training data spans 59.7 million hours of de-identified wearable sensor readings, roughly 6,814 years of continuous monitoring. The data was collected between March 1 and May 1, 2024, from users in 127 countries who provided explicit consent for research use.
This isn’t Google’s first pass at wearable foundation models. Previous research published on November 20, 2024, detailed efforts using over 40 million hours of data from 165,000 users. SensorLM represents a significant jump in dataset size while narrowing the user pool from 165,000 to 103,643 users.
The original tweet announcing the paper referenced “over one trillion minutes” of wearable data from “five million people.” The peer-reviewed paper on arXiv cites 59.7 million hours (roughly 3.58 billion minutes) from 103,643 users. The published paper’s figures are the ones worth anchoring to.
Why this matters beyond the lab
Traditional activity recognition models need labeled training data for every activity they want to detect. Zero-shot approaches sidestep that bottleneck by learning general patterns that transfer across activities, which expands the range of behaviors and health states a wearable can interpret without manual curation.
Any clinical application of SensorLM, such as flagging atrial fibrillation patterns or predicting metabolic changes, would require regulatory review before deployment. The model currently lives in research territory, not clinical use.
Disclosure: This article was edited by Editorial Team. For more information on how we create and review content, see our Editorial Policy.

13 hours ago
2
















English (US) ·