Auditory Intelligence: Understanding the World Through Sound

Aug-12-2025–arXiv.org Artificial Intelligence

--Recent progress in auditory intelligence has yielded high-performing systems for sound event detection (SED), acoustic scene classification (ASC), automated audio captioning (AAC), and audio question answering (AQA). Y et these tasks remain largely constrained to surface-level recognition--capturing what happened but not why, what it implies, or how it unfolds in context. I propose a conceptual reframing of auditory intelligence as a layered, situated process that encompasses perception, reasoning, and interaction. T o instantiate this view, I introduce four cognitively inspired task paradigms--ASPIRE, SODA, AUX, and AUGMENT--those structure auditory understanding across time-frequency pattern captioning, hierarchical event/scene description, causal explanation, and goal-driven interpretation, respectively. T ogether, these paradigms provide a roadmap toward more generalizable, explainable, and human-aligned auditory intelligence, and are intended to catalyze a broader discussion of what it means for machines to understand sound. Large language models (LLMs) have significantly augmented human capabilities by automating repetitive and tedious tasks [1]-[3]. With simple text or voice prompts, they can generate ideas, assist with research, create visual content, and engage in human-like conversation.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

Aug-12-2025

arXiv.org PDF

Add feedback

Genre:
- Research Report (0.50)

Industry:
- Health & Medicine (0.47)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found