RAG-HAR: Retrieval Augmented Generation-based Human Activity Recognition

1University of Sydney, 2University of Sri Jayewardenepura, 3Curtin University, 4Colorado State University
*Equal Contribution
RAG-HAR Framework Overview

RAG-HAR framework: A training-free approach that leverages retrieval-augmented generation with large language models for human activity recognition from sensor data.

Abstract

Human Activity Recognition (HAR) underpins applications in healthcare, rehabilitation, fitness tracking, and smart environments, yet existing deep learning approaches demand dataset-specific training, large labeled corpora, and significant computational resources. We introduce RAG-HAR, a training-free retrieval-augmented framework that leverages large language models (LLMs) for HAR. RAG-HAR computes lightweight statistical descriptors, retrieves semantically similar samples from a vector database, and uses this contextual evidence to make LLM-based activity identification. We further enhance RAG-HAR by first applying prompt optimization and incorporating domain knowledge to effectively guide LLMs, and second, by introducing an LLM-based activity descriptor that generates context-enriched vector databases for delivering accurate and highly relevant contextual information. Along with these mechanisms, RAG-HAR achieves state-of-the-art performance across six diverse HAR benchmarks. Most importantly, RAG-HAR attains these improvements without requiring model training or fine-tuning, emphasizing its robustness and practical applicability.

RAG-HAR Framework

RAG-HAR introduces a two-stage framework for training-free human activity recognition: the Baseline Stage establishes retrieval-augmented classification, while the Optimization Stage enhances performance through prompt optimization and LLM-based activity descriptors.

Stage 1: Baseline RAG-HAR

  1. Temporal Partitioning: Each sensor window is divided into four temporal segments—Full, Start, Mid, and End—to capture both global patterns and localized temporal dynamics such as activity transitions.
  2. Statistical Descriptor Extraction: Raw sensor signals are transformed into eight statistical descriptors per channel across each partition: mean, max, min, first quartile (Q1), third quartile (Q3), standard deviation, median, and number of peaks. These lightweight features capture essential motion characteristics while remaining interpretable.
  3. Text Encoding & Embedding: Statistical descriptors are formatted into structured natural language templates and encoded using a text embedding model, generating dense vector representations for semantic similarity search.
  4. Multi-Vector Hybrid Search: RAG-HAR retrieves similar samples from a vector database using weighted re-ranking across all temporal partitions, combining global context with fine-grained temporal cues.
  5. LLM-based Classification: Retrieved exemplars are provided as in-context examples to an LLM, which leverages semantic reasoning to classify the query activity based on pattern matching with retrieved evidence.

Stage 2: Optimization

The Optimization Stage introduces two key enhancements:

  • Prompt Optimization: An evolutionary algorithm with roulette-wheel selection iteratively refines prompts using three strategies—Exploration (generate diverse variants), Combination (merge successful prompts), and Refinement (targeted improvements)—to maximize classification accuracy.
  • LLM-based Activity Descriptors: Rather than using simple programmatic templates that capture only shallow statistical features, we leverage an LLM to generate detailed natural language descriptors. The LLM receives raw time-series data alongside sensor configuration details (sampling rate, sensor placement, orientation, and axis mappings) and produces structured analyses covering dominant axis, motion smoothness, intensity, and transitions. These rich descriptors are then vectorized and indexed, providing semantically meaningful representations that significantly enhance retrieval relevance.

Demo Paper

Large Language Models Meet Wearable Sensing: A Demo of Training-Free HAR with RAG-HAR

A mobile application demonstrating RAG-HAR with real-time activity recognition from smartphone sensor data.

Results

We evaluate RAG-HAR on six diverse HAR benchmark datasets spanning locomotion activities, daily living tasks, exercises, and assembly-line operations. RAG-HAR achieves state-of-the-art performance across multiple benchmarks without any model training, outperforming supervised deep learning methods that require extensive labeled data and compute resources.

Performance Comparison on HAR Benchmarks (F1-score%)

These results are achieved with baseline RAG-HAR. The optimization stage further improved the results.

Dataset RAG-HAR F1 Best Baseline F1
PAMAP2 91.12 90.40
MHEALTH 96.74 96.47
GOTOV 79.92 79.40
SKODA 95.21 94.80
USC-HAD 58.63 62.80
HHAR 59.86 59.25

Bold indicates best performance.

Key Findings: RAG-HAR achieves state-of-the-art performance on most datasets without any training. It excels on datasets with diverse activity types and complex motion patterns, where semantic reasoning provides significant advantages. Notably, RAG-HAR adapts instantly to new datasets by simply updating the vector database.

Prompt Optimization

LLM performance is highly sensitive to prompt design. We introduce an iterative prompt optimizer that automatically discovers effective instructions through roulette-wheel selection and three strategies: Exploration (diverse generation), Combination (crossover of best prompts), and Refinement (fine-tuning). The LLM classifier remains fixed with no weight updates.

Prompt Optimization Framework

Overview of the prompt optimization framework showing descriptor generation, candidate instruction creation, and iterative refinement.

Impact of Prompt Optimization (MHEALTH Dataset)

LLM Model Before Optimization After Optimization Improvement
gpt-5-mini 96.91% 98.7% +1.79%
gemma-27B-IT 90.0% 94.0% +4.0%
gpt-oss-20b 81.0% 91.0% +10.0%

Prompt optimization yields significant gains across model sizes. Lower-performing models show larger relative improvements, narrowing the gap with proprietary models.

Open-set Classification

Real-world HAR systems frequently encounter activities not present in the knowledge base. Traditional deep learning classifiers extend the label space in a binary way, sending all out-of-distribution inputs to a single "unknown" class—collapsing diverse forms of novelty into one undifferentiated bucket. RAG-HAR addresses this through open-set classification, leveraging LLM reasoning to detect, reject, and even generate meaningful labels for previously unseen activities.

Open-Set Detection

We evaluate open-set classification on the PAMAP2 dataset under three openness conditions (30%, 50%, and 70% of activities held out as unknown). RAG-HAR consistently outperforms baseline methods, demonstrating strong performance in both lower and higher openness settings.

Impact of Label Availability

Using a leave-one-class-out protocol, we investigate how label availability affects unseen activity recognition. When the true label is available in the prompt (as a semantic anchor), accuracy reaches 78.0%. When replaced with a generic "unseen activity" placeholder, accuracy drops to 63.2%—a ~15% decline. This reveals that LLMs can leverage semantic knowledge embedded in labels to improve unknown activity detection.

Labeling Unseen Activities

Beyond detection, RAG-HAR can generate meaningful labels for novel activities—a capability unique to LLM-based approaches. The LLM generates a label, which is then mapped to the closest ground-truth class via cosine similarity.

Semantic Proximity of LLM-predicted Labels
Semantic proximity of LLM-predicted labels to ground-truth activity classes.
Unknown Activity Labeling Accuracy
Labeling accuracy for unknown activities across different settings.

Unlike conventional DL models that collapse all unknowns into a single undifferentiated class, RAG-HAR leverages pretrained semantic knowledge to assign fine-grained, human-interpretable labels—effectively partitioning the unknown space into meaningful subcategories. This capability is critical for deploying HAR in healthcare monitoring, elderly care, and safety-critical applications where encountering new activities is inevitable.

BibTeX

@inproceedings{sivaroopan2026raghar,
  title={RAG-HAR: Retrieval Augmented Generation-based Human Activity Recognition},
  author={Sivaroopan, Nirhoshan and Karunarathna, Hansi and Madarasingha, Chamara and Jayasumana, Anura and Thilakarathna, Kanchana},
  booktitle={IEEE International Conference on Pervasive Computing and Communications (PerCom)},
  year={2026}
}