AITopics | meld

Collaborating Authors

meld

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

A Generalization of Input-Output Linearization via Dynamic Switching Between Melds of Output Functions

Mizzoni, Mirko, van Goor, Pieter, Bazzana, Barbara, Franchi, Antonio

arXiv.org Artificial IntelligenceOct-21-2025

This letter presents a systematic framework for switching between different sets of outputs for the control of nonlinear systems via feedback linearization. We introduce the concept of a meld to formally define a valid, feedback-linearizable subset of outputs that can be selected from a larger deck of possible outputs. The main contribution is a formal proof establishing that under suitable dwell-time and compatibility conditions, it is possible to switch between different melds while guaranteeing the uniform boundedness of the system state. We further show that the error dynamics of the active outputs remain exponentially stable within each switching interval and that outputs common to consecutive melds are tracked seamlessly through transitions. The proposed theory is valid for any feedback linearizable nonlinear system, such as, e.g., robots, aerial and terrestrial vehicles, etc.. We demonstrate it on a simple numerical simulation of a robotic manipulator.

artificial intelligence, linearization, meld, (18 more...)

arXiv.org Artificial Intelligence

2510.17448

Country: Europe (0.68)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Robots (1.00)

Add feedback

Learning Flexible Forward Trajectories for Masked Molecular Diffusion

Seo, Hyunjin, Kim, Taewon, Yu, Sihyun, Ahn, SungSoo

arXiv.org Artificial IntelligenceSep-29-2025

Masked diffusion models (MDMs) have achieved notable progress in modeling discrete data, while their potential in molecular generation remains underexplored. In this work, we explore their potential and introduce the surprising result that naively applying standards MDMs severely degrades the performance. We identify the critical cause of this issue as a state-clashing problem-where the forward diffusion of distinct molecules collapse into a common state, resulting in a mixture of reconstruction targets that cannot be learned using typical reverse diffusion process with unimodal predictions. To mitigate this, we propose Masked Element-wise Learnable Diffusion (MELD) that orchestrates per-element corruption trajectories to avoid collision between distinct molecular graphs. This is achieved through a parameterized noise scheduling network that assigns distinct corruption rates to individual graph elements, i.e., atoms and bonds. Extensive experiments on diverse molecular benchmarks reveal that MELD markedly enhances overall generation quality compared to element-agnostic noise scheduling, increasing the chemical validity of vanilla MDMs on ZINC250K from 15% to 93%, Furthermore, it achieves state-of-the-art property alignment in conditional generation tasks.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2505.1679

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Multi-modal Anchor Gated Transformer with Knowledge Distillation for Emotion Recognition in Conversation

Li, Jie, Ding, Shifei, Guo, Lili, Li, Xuan

arXiv.org Artificial IntelligenceJun-24-2025

Emotion Recognition in Conversation (ERC) aims to detect the emotions of individual utterances within a conversation. Generating efficient and modality-specific representations for each utterance remains a significant challenge. Previous studies have proposed various models to integrate features extracted using different modality-specific encoders. However, they neglect the varying contributions of modalities to this task and introduce high complexity by aligning modalities at the frame level. To address these challenges, we propose the Multi-modal Anchor Gated Transformer with Knowledge Distillation (MAGTKD) for the ERC task. Specifically, prompt learning is employed to enhance textual modality representations, while knowledge distillation is utilized to strengthen representations of weaker modalities. Furthermore, we introduce a multi-modal anchor gated transformer to effectively integrate utterance-level representations across modalities. Extensive experiments on the IEMOCAP and MELD datasets demonstrate the effectiveness of knowledge distillation in enhancing modality representations and achieve state-of-the-art performance in emotion recognition. Our code is available at: https://github.com/JieLi-dd/

artificial intelligence, machine learning, natural language, (14 more...)

arXiv.org Artificial Intelligence

2506.18716

Country: Asia > China (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
Information Technology > Artificial Intelligence > Cognitive Science > Emotion (0.86)

Add feedback

MELT: Towards Automated Multimodal Emotion Data Annotation by Leveraging LLM Embedded Knowledge

Jing, Xin, Wang, Jiadong, Tsangko, Iosif, Triantafyllopoulos, Andreas, Schuller, Björn W.

arXiv.org Artificial IntelligenceJun-2-2025

Although speech emotion recognition (SER) has advanced significantly with deep learning, annotation remains a major hurdle. Human annotation is not only costly but also subject to inconsistencies annotators often have different preferences and may lack the necessary contextual knowledge, which can lead to varied and inaccurate labels. Meanwhile, Large Language Models (LLMs) have emerged as a scalable alternative for annotating text data. However, the potential of LLMs to perform emotional speech data annotation without human supervision has yet to be thoroughly investigated. To address these problems, we apply GPT-4o to annotate a multimodal dataset collected from the sitcom Friends, using only textual cues as inputs. By crafting structured text prompts, our methodology capitalizes on the knowledge GPT-4o has accumulated during its training, showcasing that it can generate accurate and contextually relevant annotations without direct access to multimodal inputs. Therefore, we propose MELT, a multimodal emotion dataset fully annotated by GPT-4o. We demonstrate the effectiveness of MELT by fine-tuning four self-supervised learning (SSL) backbones and assessing speech emotion recognition performance across emotion datasets. Additionally, our subjective experiments\' results demonstrate a consistence performance improvement on SER.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2505.24493

Country:

Europe > Germany (0.15)
North America > United States (0.14)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

OmniVox: Zero-Shot Emotion Recognition with Omni-LLMs

Murzaku, John, Rambow, Owen

arXiv.org Artificial IntelligenceMar-28-2025

The use of omni-LLMs (large language models that accept any modality as input), particularly for multimodal cognitive state tasks involving speech, is understudied. We present OmniVox, the first systematic evaluation of four omni-LLMs on the zero-shot emotion recognition task. We evaluate on two widely used multimodal emotion benchmarks: IEMOCAP and MELD, and find zero-shot omni-LLMs outperform or are competitive with fine-tuned audio models. Alongside our audio-only evaluation, we also evaluate omni-LLMs on text only and text and audio. We present acoustic prompting, an audio-specific prompting strategy for omni-LLMs which focuses on acoustic feature analysis, conversation context analysis, and step-by-step reasoning. We compare our acoustic prompting to minimal prompting and full chain-of-thought prompting techniques. We perform a context window analysis on IEMOCAP and MELD, and find that using context helps, especially on IEMOCAP. We conclude with an error analysis on the generated acoustic reasoning outputs from the omni-LLMs.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2503.2148

Country:

Europe > Portugal > Lisbon > Lisbon (0.04)
North America > United States > New York > Suffolk County > Stony Brook (0.04)
North America > Mexico > Mexico City > Mexico City (0.04)
(3 more...)

Genre: Research Report > New Finding (0.94)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Beyond Silent Letters: Amplifying LLMs in Emotion Recognition with Vocal Nuances

Wu, Zehui, Gong, Ziwei, Ai, Lin, Shi, Pengyuan, Donbekci, Kaan, Hirschberg, Julia

arXiv.org Artificial IntelligenceJul-31-2024

This paper introduces a novel approach to emotion detection in speech using Large Language Models (LLMs). We address the limitation of LLMs in processing audio inputs by translating speech characteristics into natural language descriptions. Our method integrates these descriptions into text prompts, enabling LLMs to perform multimodal emotion analysis without architectural modifications. We evaluate our approach on two datasets: IEMOCAP and MELD, demonstrating significant improvements in emotion recognition accuracy, particularly for high-quality audio data. Our experiments show that incorporating speech descriptions yields a 2 percentage point increase in weighted F1 score on IEMOCAP (from 70.111\% to 72.596\%). We also compare various LLM architectures and explore the effectiveness of different feature representations. Our findings highlight the potential of this approach in enhancing emotion detection capabilities of LLMs and underscore the importance of audio quality in speech-based emotion recognition tasks. We'll release the source code on Github.

emotion recognition, information, llm, (13 more...)

arXiv.org Artificial Intelligence

2407.21315

Country:

North America > United States (0.28)
Asia > Singapore (0.04)
North America > Mexico > Mexico City > Mexico City (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Government > Regional Government (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

SenteCon: Leveraging Lexicons to Learn Human-Interpretable Language Representations

Lin, Victoria, Morency, Louis-Philippe

arXiv.org Artificial IntelligenceJun-1-2023

Although deep language representations have become the dominant form of language featurization in recent years, in many settings it is important to understand a model's decision-making process. This necessitates not only an interpretable model but also interpretable features. In particular, language must be featurized in a way that is interpretable while still characterizing the original text well. We present SenteCon, a method for introducing human interpretability in deep language representations. Given a passage of text, SenteCon encodes the text as a layer of interpretable categories in which each dimension corresponds to the relevance of a specific category. Our empirical evaluations indicate that encoding language with SenteCon provides high-level interpretability at little to no cost to predictive performance on downstream tasks. Moreover, we find that SenteCon outperforms existing interpretable language representations with respect to both its downstream performance and its agreement with human characterizations of the text.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2305.14728

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
(9 more...)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Add feedback

Meld with AI

#artificialintelligenceMay-18-2022, 10:51:14 GMT

A community that enhances communication and helps in discussing many technological breakthoughs as well as ethics.

meld

#artificialintelligence

Technology: Information Technology > Artificial Intelligence (0.85)

Add feedback

Meld with AI

#artificialintelligenceNov-20-2021, 08:31:26 GMT

Let the OSS Enterprise newsletter guide your open source journey! Google today released TensorFlow Graph Neural Networks (TF-GNN) in alpha, a library designed to make it easier to work with...

meld

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.53)

Add feedback

A Fast Algorithm for Computing the Deficiency Number of a Mahjong Hand

Yan, Xueqing, Li, Yongming, Li, Sanjiang

arXiv.org Artificial IntelligenceAug-15-2021

The tile-based multiplayer game Mahjong is widely played in Asia and has also become increasingly popular worldwide. Face-to-face or online, each player begins with a hand of 13 tiles and players draw and discard tiles in turn until they complete a winning hand. An important notion in Mahjong is the deficiency number (a.k.a. shanten number in Japanese Mahjong) of a hand, which estimates how many tile changes are necessary to complete the hand into a winning hand. The deficiency number plays an essential role in major decision-making tasks such as selecting a tile to discard. This paper proposes a fast algorithm for computing the deficiency number of a Mahjong hand. Compared with the baseline algorithm, the new algorithm is usually 100 times faster and, more importantly, respects the agent's knowledge about available tiles. The algorithm can be used as a basic procedure in all Mahjong variants by both rule-based and machine learning-based Mahjong AI.

algorithm, deficiency, knowledge base, (17 more...)

arXiv.org Artificial Intelligence

2108.06832

Country:

Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > Texas (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
(3 more...)

Genre: Research Report > New Finding (0.46)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback