The Vienna 4G/5G Drive-Test Dataset

The Vienna 4G/5G Drive-Test Dataset is a city-scale open dataset providing georeferenced LTE and 5G NR measurements collected across Vienna, Austria. It combines passive network-side scans with active user-side handset logs to offer a holistic foundation for environment-aware AI, propagation modeling, and network optimization. The dataset is enriched with inferred cell data and high-resolution 3D city models to support geometry-conditioned learning and ray-tracing calibration.

The Vienna 4G/5G Drive-Test Dataset

Vienna 4G/5G Drive-Test Dataset: A City-Scale Open Dataset for Mobile Network AI

Researchers have unveiled a major new resource to accelerate AI development in telecommunications: the Vienna 4G/5G Drive-Test Dataset. This comprehensive, city-scale open dataset provides georeferenced Long Term Evolution (LTE) and 5G New Radio (NR) measurements collected across Vienna, Austria, directly addressing a critical bottleneck in mobile network machine learning—the scarcity of large, real-world data. By combining passive network-side scans with active user-side handset logs, the dataset offers a holistic and reproducible foundation for environment-aware AI, propagation modeling, and network optimization.

Bridging the Data Gap with Dual-Perspective Measurements

The dataset's core innovation lies in its fusion of complementary data collection methodologies. It integrates passive wideband scanner observations, which capture the radio environment from a network perspective, with active handset logs that reflect the actual user experience. This dual-perspective approach provides a more complete picture of deployed radio access networks (RAN) than either method alone. The measurements were systematically collected across diverse urban and suburban settings in Vienna, with all data meticulously aligned by time and precise GPS location to ensure consistency for evaluation and model training.

Enriched with Inferred Cell Data and High-Resolution City Models

To maximize its utility for advanced research, the dataset is enriched with critical contextual information. For a representative subset of base stations (BSs), the release includes inferred deployment descriptors such as estimated BS locations, sector azimuths, and antenna heights. Furthermore, it incorporates high-resolution 3D building and terrain models of Vienna. This geometric data is essential for enabling geometry-conditioned learning and for the calibration of deterministic approaches like ray tracing, which are crucial for accurate radio wave propagation prediction and network planning.

Structured for Practical Reuse and Reproducible Benchmarking

The dataset is pragmatically organized into four main components—scanner data, handset data, estimated cell information, and city models—with accompanying documentation detailing the available fields and how to join them. This structure is designed to facilitate immediate practical reuse. By providing a standardized, real-world benchmark, the dataset empowers researchers to conduct reproducible evaluations across a wide range of workflows, including coverage analysis, propagation model development, and the calibration of simulation tools against ground-truth measurements.

Why This Matters: Key Takeaways for AI and Network Research

  • Addresses a Critical Bottleneck: The dataset directly tackles the lack of large, comprehensive real-world datasets that has long hindered machine learning progress in mobile network analysis and optimization.
  • Enables Holistic Analysis: The combination of network-side (scanner) and user-side (handset) data provides a unique, dual-perspective view essential for developing robust AI models that understand both infrastructure performance and end-user experience.
  • Unlocks Advanced Modeling: The inclusion of inferred base station parameters and high-resolution 3D city models is a game-changer, enabling sophisticated research in data-driven propagation prediction and the AI-assisted calibration of physics-based models like ray tracing.
  • Promotes Reproducibility and Standardization: As an open, well-documented resource, the dataset sets a new standard for reproducible benchmarking, allowing the global research community to compare algorithms and methodologies on a common, realistic playing field.

常见问题