AITopics | pengi

Collaborating Authors

pengi

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

3a2e5889b4bbef997ddb13b55d5acf77-Paper-Conference.pdf

Neural Information Processing SystemsFeb-10-2026, 11:23:04 GMT

encoder, language model, pengi, (15 more...)

Neural Information Processing Systems

Country:

Asia > China > Beijing > Beijing (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > Canada (0.04)
Asia > India > Maharashtra > Mumbai (0.04)

Genre: Research Report > New Finding (0.93)

Industry:

Media > Music (0.46)
Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

Pengi: An Audio Language Model for Audio Tasks

Neural Information Processing SystemsDec-24-2025, 17:11:16 GMT

In the domain of audio processing, Transfer Learning has facilitated the rise of Self-Supervised Learning and Zero-Shot Learning techniques. These approaches have led to the development of versatile models capable of tackling a wide array of tasks, while delivering state-of-the-art performance. However, current models inherently lack the capacity to produce the requisite language for open-ended tasks, such as Audio Captioning or Audio Question Answering. We introduce Pengi, a novel Audio Language Model that leverages Transfer Learning by framing all audio tasks as text-generation tasks. It takes as input, an audio recording, and text, and generates free-form text as output.

audio language model, name change, pengi, (7 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

3a2e5889b4bbef997ddb13b55d5acf77-Paper-Conference.pdf

Neural Information Processing SystemsOct-10-2025, 23:11:58 GMT

encoder, language model, pengi, (15 more...)

Neural Information Processing Systems

Country:

Asia > China > Beijing > Beijing (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > Canada (0.04)
Asia > India > Maharashtra > Mumbai (0.04)

Genre: Research Report > New Finding (0.93)

Industry:

Media > Music (0.46)
Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

OpenAVS: Training-Free Open-Vocabulary Audio Visual Segmentation with Foundational Models

Chen, Shengkai, Yin, Yifang, Cao, Jinming, Xiang, Shili, Liu, Zhenguang, Zimmermann, Roger

arXiv.org Artificial IntelligenceMay-6-2025

Audio-visual segmentation aims to separate sounding objects from videos by predicting pixel-level masks based on audio signals. Existing methods primarily concentrate on closed-set scenarios and direct audio-visual alignment and fusion, which limits their capability to generalize to new, unseen situations. In this paper, we propose OpenAVS, a novel training-free language-based approach that, for the first time, effectively aligns audio and visual modalities using text as a proxy for open-vocabulary Audio-Visual Segmentation (AVS). Equipped with multimedia foundation models, OpenAVS directly infers masks through 1) audio-to-text prompt generation, 2) LLM-guided prompt translation, and 3) text-to-visual sounding object segmentation. The objective of OpenAVS is to establish a simple yet flexible architecture that relies on the most appropriate foundation models by fully leveraging their capabilities to enable more effective knowledge transfer to the downstream AVS task. Moreover, we present a model-agnostic framework OpenAVS-ST that enables the integration of OpenAVS with any advanced supervised AVS model via pseudo-label based self-training. This approach enhances performance by effectively utilizing large-scale unlabeled data when available. Comprehensive experiments on three benchmark datasets demonstrate the superior performance of OpenAVS. It surpasses existing unsupervised, zero-shot, and few-shot AVS methods by a significant margin, achieving absolute performance gains of approximately 9.4% and 10.9% in mIoU and F-score, respectively, in challenging scenarios.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2505.01448

Country: Asia > China (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Pengi: An Audio Language Model for Audio Tasks

Neural Information Processing SystemsOct-11-2024, 07:57:26 GMT

audio language model, audio task, state-of-the-art performance, (4 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Penguins Can Make Cake

AI MagazineJan-4-2018, 14:50:50 GMT

Until quite recently, it was taken for granted in AIand cognitive science more broadlythat activity resulted from the creation and execution of plans. In 1985, several researchers, including myself, independently realized that plans and planning are not necessary-or necessarily useful-in activity. Since this time, a number of alternatives have been proposed. This analysis is equally applicable to any other computational problem. Thus, you could conclude that vision is impossible because it requires exponential computation in the number of pixels or that, on the average, business data processing takes exponential work in the number of records.

blockhead, information technology software, it software, (14 more...)

AI Magazine

Industry:

Government > Military (0.47)
Information Technology > Software (0.37)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)

Add feedback

Penguins Can Make Cake

Chapman, David

AI MagazineDec-15-1989

Since this article is a counting argument, the conclusion time, a number of alternatives have been proposed. Presumably, in realistic cases, the Universally Bad Idea," analyzes one such number of sensors is large enough that a universal alternative, Marcel Schoppers's universal plan could not fit in your head. He also extends this analysis to a There are two reasons not to be concerned number of other systems, including Pengi about this apparent problem. They involve (A gre and Chapman 1987), which was structure and state, designed by Phil Agre and myself. Ginsberg's criticisms of universal plans rest Using universal plans, he says, is infeasible because their size is exponential in the number of possible domain states. Representing such a plan is infeasible in even quite small realistic domains. I'm sympathetic to such arguments, having made similar ones to the effect that classical planning is infeasible (Agre and Chapman 1988; Chapman 1987b). I don't understand the details of Schoppers's ideas, so I'm not sure whether this critique of universal plans per se is correct. However, I show that these arguments do not extend to Pengi. Ginsberg calls Pengi an approximate universal plan, by which he means it is like a universal plan except that it does not correctly specify what to do in every situation. However, Pengi's operation involves no plans, universal or approximate, and Pengi and universal plans, although they share some motivations, have little to do with each other as technical proposals. Ginsberg suggests number of its inputs. Pengi-like system, computation in the number of pixels or that, Blockhead, which efficiently solves the fruitcake on the average, business data processing takes problem; the way it solves it elucidates exponential work in the number of records. They have a lot The fruitcake problem is to stack a set of of structure to them, and this structure can be labeled blocks so that they spell the word exploited to exponentially reduce the computation's fruitcake. What is apparently difficult about size. I show impossible under the rules of the domain, Blockhead solving a problem involving 45 and the remainder can be categorized relatively blocks in which there are 45! 1056 configurations, cheaply to permit abstraction and There is every in every configuration, so it is not by reason to think that this same structure is approximation that it succeeds. Indeed, Ginsberg makes this and a central system. The [planning couldn't work if] there were no visual system is a small subset of Pengi's rhyme or reason to things."

blockhead, pengi, universal plan, (15 more...)

AI Magazine

Country:

North America > United States > California > San Mateo County > San Mateo (0.05)
North America > United States > California > San Mateo County > Menlo Park (0.05)
North America > United States > New York > Monroe County > Rochester (0.04)
(2 more...)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)

Add feedback