AITopics | absolute improvement

Collaborating Authors

absolute improvement

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Unified Language Model Pre-training for Natural Language Understanding and Generation

Neural Information Processing SystemsDec-25-2025, 22:49:34 GMT

This paper presents a new Unified pre-trained Language Model (UniLM) that can be fine-tuned for both natural language understanding and generation tasks. The model is pre-trained using three types of language modeling tasks: unidirectional, bidirectional, and sequence-to-sequence prediction. The unified modeling is achieved by employing a shared Transformer network and utilizing specific self-attention masks to control what context the prediction conditions on. UniLM compares favorably with BERT on the GLUE benchmark, and the SQuAD 2.0 and CoQA question answering tasks.

absolute improvement, name change, unified language model pre-training, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.43)

Add feedback

Coding Agents with Multimodal Browsing are Generalist Problem Solvers

Soni, Aditya Bharat, Li, Boxuan, Wang, Xingyao, Chen, Valerie, Neubig, Graham

arXiv.org Artificial IntelligenceJun-4-2025

Modern human labor is characterized by specialization; we train for years and develop particular tools that allow us to perform well across a variety of tasks. In addition, AI agents have been specialized for domains such as software engineering, web navigation, and workflow automation. However, this results in agents that are good for one thing but fail to generalize beyond their intended scope. One reason for this is that agent developers provide a highly specialized set of tools or make architectural decisions optimized for a specific use case or benchmark. In this work, we ask the question: what is the minimal set of general tools that can be used to achieve high performance across a diverse set of tasks? Our answer is OpenHands-Versa, a generalist agent built with a modest number of general tools: code editing and execution, web search, as well as multimodal web browsing and file access. Importantly, OpenHands-Versa demonstrates superior or competitive performance over leading specialized agents across three diverse and challenging benchmarks: SWE-Bench Multimodal, GAIA, and The Agent Company, outperforming the best-performing previously published results with absolute improvements in success rate of 9.1, 1.3, and 9.1 points respectively. Further, we show how existing state-of-the-art multi-agent systems fail to generalize beyond their target domains. These results demonstrate the feasibility of developing a generalist agent to solve diverse tasks and establish OpenHands-Versa as a strong baseline for future research.

agent, artificial intelligence, openhand-v ersa, (15 more...)

arXiv.org Artificial Intelligence

2506.03011

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Europe > Poland (0.04)
Europe > Denmark > Capital Region > Copenhagen (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.34)

Industry: Information Technology > Security & Privacy (0.46)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)

Add feedback

Unified Language Model Pre-training for Natural Language Understanding and Generation

Neural Information Processing SystemsJan-26-2025, 21:51:13 GMT

The code and pre-trained models are available at https://github.com/microsoft/unilm.

absolute improvement, abstractive summarization rouge-l, unified language model pre-training, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Understanding (0.45)

Add feedback

A Multi-modal Approach to Dysarthria Detection and Severity Assessment Using Speech and Text Information

M, Anuprabha, Gurugubelli, Krishna, V, Kesavaraj, Vuppala, Anil Kumar

arXiv.org Artificial IntelligenceDec-22-2024

Automatic detection and severity assessment of dysarthria are crucial for delivering targeted therapeutic interventions to patients. While most existing research focuses primarily on speech modality, this study introduces a novel approach that leverages both speech and text modalities. By employing cross-attention mechanism, our method learns the acoustic and linguistic similarities between speech and text representations. This approach assesses specifically the pronunciation deviations across different severity levels, thereby enhancing the accuracy of dysarthric detection and severity assessment. All the experiments have been performed using UA-Speech dysarthric database. Improved accuracies of 99.53% and 93.20% in detection, and 98.12% and 51.97% for severity assessment have been achieved when speaker-dependent and speaker-independent, unseen and seen words settings are used. These findings suggest that by integrating text information, which provides a reference linguistic knowledge, a more robust framework has been developed for dysarthric detection and assessment, thereby potentially leading to more effective diagnoses.

artificial intelligence, assessment, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2412.16874

Country:

Asia > Philippines > Luzon > National Capital Region > City of Manila (0.04)
Asia > India > Telangana > Hyderabad (0.04)
Asia > India > Karnataka > Bengaluru (0.04)

Genre:

Research Report > New Finding (0.48)
Research Report > Promising Solution (0.34)
Research Report > Experimental Study (0.34)
Overview > Innovation (0.34)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.93)

Technology:

Information Technology > Artificial Intelligence > Speech (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Sales Whisperer: A Human-Inconspicuous Attack on LLM Brand Recommendations

Lin, Weiran, Gerchanovsky, Anna, Akgul, Omer, Bauer, Lujo, Fredrikson, Matt, Wang, Zifan

arXiv.org Artificial IntelligenceJun-7-2024

Large language model (LLM) users might rely on others (e.g., prompting services), to write prompts. However, the risks of trusting prompts written by others remain unstudied. In this paper, we assess the risk of using such prompts on brand recommendation tasks when shopping. First, we found that paraphrasing prompts can result in LLMs mentioning given brands with drastically different probabilities, including a pair of prompts where the probability changes by 100%. Next, we developed an approach that can be used to perturb an original base prompt to increase the likelihood that an LLM mentions a given brand. We designed a human-inconspicuous algorithm that perturbs prompts, which empirically forces LLMs to mention strings related to a brand more often, by absolute improvements up to 78.3%. Our results suggest that our perturbed prompts, 1) are inconspicuous to humans, 2) force LLMs to recommend a target brand more often, and 3) increase the perceived chances of picking targeted brands.

category, llm, target brand, (17 more...)

arXiv.org Artificial Intelligence

2406.04755

Country:

North America > United States (0.28)
North America > Canada (0.14)
Asia > Middle East > Jordan (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Industry:

Information Technology > Security & Privacy (1.00)
Government (0.93)
Media (0.93)
Leisure & Entertainment > Games > Computer Games (0.92)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

AdaMergeX: Cross-Lingual Transfer with Large Language Models via Adaptive Adapter Merging

Zhao, Yiran, Zhang, Wenxuan, Wang, Huiming, Kawaguchi, Kenji, Bing, Lidong

arXiv.org Artificial IntelligenceFeb-29-2024

As an effective alternative to the direct fine-tuning on target tasks in specific languages, cross-lingual transfer addresses the challenges of limited training data by decoupling ''task ability'' and ''language ability'' by fine-tuning on the target task in the source language and another selected task in the target language, respectively. However, they fail to fully separate the task ability from the source language or the language ability from the chosen task. In this paper, we acknowledge the mutual reliance between task ability and language ability and direct our attention toward the gap between the target language and the source language on tasks. As the gap removes the impact of tasks, we assume that it remains consistent across tasks. Based on this assumption, we propose a new cross-lingual transfer method called $\texttt{AdaMergeX}$ that utilizes adaptive adapter merging. By introducing a reference task, we can determine that the divergence of adapters fine-tuned on the reference task in both languages follows the same distribution as the divergence of adapters fine-tuned on the target task in both languages. Hence, we can obtain target adapters by combining the other three adapters. Furthermore, we propose a structure-adaptive adapter merging method. Our empirical results demonstrate that our approach yields new and effective cross-lingual transfer, outperforming existing methods across all settings.

adamergex, adapter, source language, (17 more...)

arXiv.org Artificial Intelligence

2402.18913

Country:

Asia > Singapore (0.04)
North America > Dominican Republic (0.04)
Europe > Romania > Sud - Muntenia Development Region > Giurgiu County > Giurgiu (0.04)
Asia > China > Zhejiang Province > Hangzhou (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Act3D: 3D Feature Field Transformers for Multi-Task Robotic Manipulation

Gervet, Theophile, Xian, Zhou, Gkanatsios, Nikolaos, Fragkiadaki, Katerina

arXiv.org Artificial IntelligenceOct-19-2023

3D perceptual representations are well suited for robot manipulation as they easily encode occlusions and simplify spatial reasoning. Many manipulation tasks require high spatial precision in end-effector pose prediction, which typically demands high-resolution 3D feature grids that are computationally expensive to process. As a result, most manipulation policies operate directly in 2D, foregoing 3D inductive biases. In this paper, we introduce Act3D, a manipulation policy transformer that represents the robot's workspace using a 3D feature field with adaptive resolutions dependent on the task at hand. The model lifts 2D pre-trained features to 3D using sensed depth, and attends to them to compute features for sampled 3D points. It samples 3D point grids in a coarse to fine manner, featurizes them using relative-position attention, and selects where to focus the next round of point sampling. In this way, it efficiently computes 3D action maps of high spatial resolution. Act3D sets a new state-of-the-art in RL-Bench, an established manipulation benchmark, where it achieves 10% absolute improvement over the previous SOTA 2D multi-view policy on 74 RLBench tasks and 22% absolute improvement with 3x less compute over the previous SOTA 3D policy. We quantify the importance of relative spatial attention, large-scale vision-language pre-trained 2D backbones, and weight tying across coarse-to-fine attentions in ablative experiments. Code and videos are available on our project website: https://act3d.github.io/.

act3d, ghost point, manipulation, (12 more...)

arXiv.org Artificial Intelligence

2306.17817

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Wav2vec-based Detection and Severity Level Classification of Dysarthria from Speech

Javanmardi, Farhad, Tirronen, Saska, Kodali, Manila, Kadiri, Sudarsana Reddy, Alku, Paavo

arXiv.org Artificial IntelligenceOct-17-2023

Automatic detection and severity level classification of dysarthria directly from acoustic speech signals can be used as a tool in medical diagnosis. In this work, the pre-trained wav2vec 2.0 model is studied as a feature extractor to build detection and severity level classification systems for dysarthric speech. The experiments were carried out with the popularly used UA-speech database. In the detection experiments, the results revealed that the best performance was obtained using the embeddings from the first layer of the wav2vec model that yielded an absolute improvement of 1.23% in accuracy compared to the best performing baseline feature (spectrogram). In the studied severity level classification task, the results revealed that the embeddings from the final layer gave an absolute improvement of 10.62% in accuracy compared to the best baseline features (mel-frequency cepstral coefficients).

classification, dysarthric speech, speech, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/ICASSP49357.2023.10094857

2309.14107

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
Europe > Finland (0.04)
(2 more...)

Genre: Research Report > New Finding (0.47)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.94)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.95)
Information Technology > Artificial Intelligence > Natural Language (0.94)

Add feedback

Collaborating Authors

absolute improvement

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

584b98aac2dddf59ee2cf19ca4ccb75e-Supplemental.pdf

Unified Language Model Pre-training for Natural Language Understanding and Generation

584b98aac2dddf59ee2cf19ca4ccb75e-Supplemental.pdf

Coding Agents with Multimodal Browsing are Generalist Problem Solvers

Unified Language Model Pre-training for Natural Language Understanding and Generation

A Multi-modal Approach to Dysarthria Detection and Severity Assessment Using Speech and Text Information

Sales Whisperer: A Human-Inconspicuous Attack on LLM Brand Recommendations

AdaMergeX: Cross-Lingual Transfer with Large Language Models via Adaptive Adapter Merging

Act3D: 3D Feature Field Transformers for Multi-Task Robotic Manipulation

Wav2vec-based Detection and Severity Level Classification of Dysarthria from Speech