Aligning Audio-Visual Joint Representations with an Agentic Workflow
–Neural Information Processing Systems
Visual content and accompanied audio signals naturally formulate a joint representation to improve audio-visual (A V) related applications.
Neural Information Processing Systems
Feb-15-2026, 15:39:05 GMT
- Genre:
- Research Report > Experimental Study (1.00)
- Workflow (1.00)
- Industry:
- Information Technology > Security & Privacy (0.46)
- Technology:
- Information Technology
- Artificial Intelligence
- Machine Learning > Neural Networks
- Deep Learning (1.00)
- Natural Language
- Chatbot (0.93)
- Large Language Model (1.00)
- Vision (1.00)
- Machine Learning > Neural Networks
- Data Science (1.00)
- Artificial Intelligence
- Information Technology