freesound
DroneAudioset: An Audio Dataset for Drone-based Search and Rescue
Gupta, Chitralekha, Ramesh, Soundarya, Sasikumar, Praveen, Yeo, Kian Peen, Nanayakkara, Suranga
Unmanned Aerial Vehicles (UAVs) or drones, are increasingly used in search and rescue missions to detect human presence. Existing systems primarily leverage vision-based methods which are prone to fail under low-visibility or occlusion. Drone-based audio perception offers promise but suffers from extreme ego-noise that masks sounds indicating human presence. Existing datasets are either limited in diversity or synthetic, lacking real acoustic interactions, and there are no standardized setups for drone audition. To this end, we present DroneAudioset (The dataset is publicly available at https://huggingface.co/datasets/ahlab-drone-project/DroneAudioSet/ under the MIT license), a comprehensive drone audition dataset featuring 23.5 hours of annotated recordings, covering a wide range of signal-to-noise ratios (SNRs) from -57.2 dB to -2.5 dB, across various drone types, throttles, microphone configurations as well as environments. The dataset enables development and systematic evaluation of noise suppression and classification methods for human-presence detection under challenging conditions, while also informing practical design considerations for drone audition systems, such as microphone placement trade-offs, and development of drone noise-aware audio processing. This dataset is an important step towards enabling design and deployment of drone-audition systems.
- Law (1.00)
- Information Technology > Security & Privacy (1.00)
- Government (1.00)
- Health & Medicine (0.87)
The language of sound search: Examining User Queries in Audio Search Engines
This study examines textual, user-written search queries within the context of sound search engines, encompassing various applications such as foley, sound effects, and general audio retrieval. Current research inadequately addresses real-world user needs and behaviours in designing text-based audio retrieval systems. To bridge this gap, we analysed search queries from two sources: a custom survey and Freesound website query logs. The survey was designed to collect queries for an unrestricted, hypothetical sound search engine, resulting in a dataset that captures user intentions without the constraints of existing systems. This dataset is also made available for sharing with the research community. In contrast, the Freesound query logs encompass approximately 9 million search requests, providing a comprehensive view of real-world usage patterns. Our findings indicate that survey queries are generally longer than Freesound queries, suggesting users prefer detailed queries when not limited by system constraints. Both datasets predominantly feature keyword-based queries, with few survey participants using full sentences. Key factors influencing survey queries include the primary sound source, intended usage, perceived location, and the number of sound sources. These insights are crucial for developing user-centred, effective text-based audio retrieval systems, enhancing our understanding of user behaviour in sound search contexts.
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.16)
- Europe > United Kingdom > England > East Sussex > Brighton (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- (7 more...)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.88)
- Media > Music (0.69)
- Leisure & Entertainment (0.69)
Merz
In order to create algorithmic art using the wealth of documents available on the internet, artists must discover strategies for organizing those documents. In this paper I demonstrate techniques for organizing and collaging sounds from the user-contributed database at freesound.org. Sounds can be organized in a graph structure by exploiting aural similarity relationships provided by freesound.org,
General-purpose Tagging of Freesound Audio with AudioSet Labels: Task Description, Dataset, and Baseline
Fonseca, Eduardo, Plakal, Manoj, Font, Frederic, Ellis, Daniel P. W., Favory, Xavier, Pons, Jordi, Serra, Xavier
Present but not The type of sound described is present, but the audio clip also predominant (PNP) contains other salient types of sound and/or strong background noise. Not Present (NP) The type of sound described is not present in the audio clip. Unsure (U) I am not sure whether the type of sound described is present or not. Table 2: Categories composing FSDKaggle2018, along with the number of samples and time (in minutes, rounded) in the train set. Percategory AP@3 achieved by the baseline system is reported using all the test files for every category (i.e., not following the public/private splits of the Kaggle leaderboard).
- Europe > United Kingdom > England > Surrey (0.05)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- Asia > China (0.04)
- Leisure & Entertainment (0.95)
- Media > Music (0.69)