Description
Posted by Neil Zeghidour, Research Scientist, Google Research Developing machine learning (ML) models for audio understanding has seen tr...
Summary
- Developing machine learning (ML) models for audio understanding has seen tremendous progress over the past several years.
- As a consequence, standard mel filterbanks are used for most audio classification tasks in practice, even though they are suboptimal.
- Mimicking Human Perception of Sound The first step in the traditional approach to creating a mel filterbank is to capture the sound’s time-variability by windowing, i.e., cutting the signal into short segments with fixed duration.
- This way, even when paired with a small classifier, such as EfficientNetB0, the LEAF model only accounts for 0.01% of the total parameters.