A deep learning project for classifying Indian classical music swaras and predicting pitch using synthetic audio data and convolutional neural networks.
Go to projectSynthetic Swara is an innovative project focused on Indian classical music, specifically for classifying the seven shuddha swaras (Sa, Re, Ga, Ma, Pa, Dha, Ni) and predicting pitch frequencies using synthetic audio data. The project leverages convolutional neural networks (CNNs) trained on mel-spectrogram features derived from synthetically generated audio samples. It includes two main components: a swara classifier for identifying musical notes and a pitch predictor for estimating frequencies, both designed to handle realistic audio variations like harmonics, vibrato, and noise.
Synthetic Dataset Generation
Creates realistic audio samples for seven swaras with harmonics, vibrato, ADSR envelope, and noise for robust training.
Swara Classification
A CNN model classifies swaras from mel-spectrograms, achieving 71.43% test accuracy on a synthetic dataset of 1,400 samples.
Pitch Prediction
A simplified CNN predicts pitch frequencies (100-600 Hz), with a test MAE of 51.28 Hz, using normalized mel-spectrogram inputs.
Sliding Window Prediction
Both models use a sliding window approach for real-time analysis of audio segments, enabling precise swara and pitch detection.
Efficient Design
Lightweight CNN architectures with dropout and batch normalization ensure robust performance with minimal computational resources.
Built by Rishabh Kothari Project details are available in the Jupyter notebook. For inquiries, contact rishabhkothari103@gmail.com.