Description
An end-to-end example and architecture for Audio Deep Learning’s foundational application scenario, in Plain English.
Summary
- Audio Deep Learning Made Simple: The augmented audio is now converted into a Mel Spectrogram, resulting in a shape of (num_channels, Mel freq_bands, time_steps) = (2, 64, 344) The SpecAugment data augmentation now randomly applies Time and Frequency Masks to the Mel Spectrograms.
- Each batch has a shape of (batch_sz, num_channels, Mel freq_bands, time_steps) A batch of (X, y) data We can visualize one item from the batch.
- Conclusion We have now seen an end-to-end example of sound classification which is one of the most foundational problems in audio deep learning.