The most comprehensive and richly annotated Qur'anic recitation corpus to date
Despite growing interest in Quranic data research, existing Quran datasets remain limited in both scale and diversity. To address this gap, we present Tadabur, a large-scale Quran audio dataset comprising more than 1400+ hours of recitation audio from over 600 distinct reciters, providing substantial variation in recitation styles, vocal characteristics, and recording conditions.
Tadabur includes complete coverage of all 113 surahs (without Al-Fatiha) of the Qur'an, spanning styles such as murattal and mujawwad. Each file is accompanied by automatically derived word-level temporal alignments and structured metadata in a consistent JSON schema.
This diversity makes Tadabur a comprehensive and representative resource for Quranic speech research—enabling advances in ASR, tajwīd-aware modeling, reciter identification, and prosodic analysis. By significantly expanding both the total duration and variability of available Quran data, Tadabur aims to support future research and facilitate the development of standardized Quranic speech benchmarks.
A fully automated, multi-stage process transforms raw long-form recitations into clean, verse-level annotated audio files.
Tadabur surpasses all prior publicly available Quranic datasets by a wide margin in scale, reciter diversity, and annotation richness.
| Dataset | Samples | Reciters | Transcription | Word-Level Alignment |
|---|---|---|---|---|
| Quran Recitations (Kaggle) | 6,689 | 12 | ✗ | ✗ |
| Quran Speech-to-Text (SLR132) | 226,129 | 30 | ✓ | ✗ |
| Buraaq Quran Audio–Text | 187,080 | 30 | ✓ | ✗ |
| Tadabur Ours | 365,000+ | 600+ | ✓ | ✓ |
Number of Reciters Across Quranic Datasets
Each verse-level file is paired with a structured JSON annotation containing word-level timestamps, metadata, and speaker information. Two representative samples are shown below.
Alongside the dataset, we release Whisper models fine-tuned on Tadabur for Quranic ASR. These models are domain-adapted to handle prolonged phoneme durations, tajwīd rules, melodic articulation, and the wide acoustic diversity unique to Qur'anic recitation.
If you use Tadabur in your research, please cite:
@misc{alherran2026tadabur, author = {Alherran, Faisal}, title = {Tadabur: A Large-Scale Quran Audio Dataset}, year = {2026}, url = {https://github.com/fherran/tadabur}, note = {HuggingFace: huggingface.co/datasets/FaisaI/tadabur} }