Audio-Based Event Detection at Different SNR Settings Using Two-Dimensional Spectrogram Magnitude Representations

Latest revision as of 10:32, 15 February 2021

Abstract

Audio-based event detection poses a number of different challenges that are not encountered in other fields, such as image detection. Challenges such as ambient noise, low Signal-to-Noise Ratio (SNR) and microphone distance are not yet fully understood. If the multimodal approaches are to become better in a range of fields of interest, audio analysis will have to play an integral part. Event recognition in autonomous vehicles (AVs) is such a field at a nascent stage that can especially leverage solely on audio or can be part of the multimodal approach. In this manuscript, an extensive analysis focused on the comparison of different magnitude representations of the raw audio is presented. The data on which the analysis is carried out is part of the publicly available MIVIA Audio Events dataset. Single channel Short-Time Fourier Transform (STFT), mel-scale and Mel-Frequency Cepstral Coefficients (MFCCs) spectrogram representations are used. Furthermore, aggregation methods of the aforementioned spectrogram representations are examined; the feature concatenation compared to the stacking of features as separate channels. The effect of the SNR on recognition accuracy and the generalization of the proposed methods on datasets that were both seen and not seen during training are studied and reported.

Document type: Article

Full document

Original document

The different versions of the original document can be found in:

https://www.mdpi.com/2079-9292/9/10/1593/pdf under the license https://creativecommons.org/licenses/by

https://www.mdpi.com/2079-9292/9/10/1593,

https://doaj.org/toc/2079-9292 under the license cc-by

https://www.mdpi.com/2079-9292/9/10/1593,

https://www.mdpi.com/2079-9292/9/10/1593/pdf,

https://academic.microsoft.com/#/detail/3090169026

https://www.mdpi.com/2079-9292/9/10/1593/pdf,

http://dx.doi.org/10.3390/electronics9101593

under the license https://creativecommons.org/licenses/by/4.0/

Latest revision as of 10:32, 15 February 2021

Abstract

Full document

Original document

Document information

Document Score

Share this document

Keywords

claim authorship

Revision as of 10:32, 15 February 2021 (view source) Scipediacontent (talk \| contribs) (Created page with " == Abstract == Audio-based event detection poses a number of different challenges that are not encountered in other fields, such as image detection. Challenges such as ambie...")	Latest revision as of 10:32, 15 February 2021 (view source) Scipediacontent (talk \| contribs) m (Scipediacontent moved page Draft Content 361432315 to Vafeiadis et al 2020a)
(No difference)