Simulation-Based Own Voice Detection for Hearing Aids Achieves High Accuracy with Single Microphone
A novel machine learning approach enables accurate own voice detection (OVD) in hearing aids using only a single microphone, a significant advancement that could reduce device complexity and cost. The method, detailed in a new paper (arXiv:2603.02724v1), employs a sophisticated data augmentation strategy based on simulated acoustic transfer functions (ATFs) to train a transformer-based classifier, bypassing the need for expensive real-world acoustic measurements. This simulation-to-reality pipeline achieved up to 95.52% accuracy in tests, demonstrating strong potential for practical application in next-generation hearing devices.
Own voice detection is a critical feature that improves user comfort and speech intelligibility by allowing the hearing aid to process the wearer's voice differently from external sounds. However, many existing solutions depend on multiple microphones or additional sensors, which increase power consumption, design complexity, and manufacturing costs. This research addresses a core challenge in hearing aid design: enabling robust, ML-based OVD with minimal hardware requirements.
Hierarchical Training with Simulated Acoustic Environments
The core innovation is a hierarchical training framework that educates the model using progressively more realistic simulated data. Researchers first trained a transformer-based classifier on ATFs generated from an analytical, rigid-sphere model. The model was then progressively fine-tuned using ATFs from detailed, numerically simulated head-and-torso representations.
This method of hierarchical adaptation allowed the model to refine its understanding of how sound propagates spatially around a human head without ever needing real-user data during initial training. By exposing the model to a vast range of simulated spatial conditions, the approach ensures strong generalization capabilities before encountering real-world signals.
Strong Performance in Simulated and Real-World Tests
The experimental results validate the effectiveness of the simulation-based strategy. On a test set of simulated head-and-torso data, the model achieved a high accuracy of 95.52%. Performance remained robust under challenging, short-duration conditions, with the model maintaining 90.02% accuracy when processing one-second speech utterances.
Most notably, when evaluated on real hearing aid recordings—a scenario for which it was not directly fine-tuned—the model achieved an 80% accuracy. This generalization was aided by a lightweight test-time feature compensation technique, which helps align the simulated training data characteristics with real-world input. This jump from simulation to real device data underscores the model's practical viability.
Why This Hearing Aid Research Matters
- Reduces Hardware Dependency: Achieving high-accuracy own voice detection with a single microphone simplifies hearing aid design, potentially lowering costs and improving device form factors.
- Enables Scalable ML Development: The simulation-based data augmentation strategy eliminates the need for costly and logistically difficult transfer-function measurements from human subjects, accelerating machine learning research for acoustics.
- Demonstrates Simulation-to-Reality Success: The strong performance on real recordings proves that models trained on sophisticated simulations can effectively generalize to complex, real-world auditory environments.
- Points to Future Innovation: This work establishes a promising direction for integrating advanced, data-efficient AI models into wearable audio technology, improving user experience through software intelligence.
This research highlights a significant step toward more intelligent and accessible hearing assistance technology. By leveraging simulated acoustic environments and a hierarchical training regimen, it provides a viable path to deploying robust own voice detection in future hearing aid designs without compromising on performance or affordability.