AITopics

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.60)

Neural Information Processing SystemsDec-23-2025, 21:01:44 GMT

Association Graph Learning for Multi-Task Classification with Category Shifts

In this paper, we focus on multi-task classification, where related classification tasks share the same label space and are learned simultaneously. In particular, we tackle a new setting, which is more realistic than currently addressed in the literature, where categories shift from training to test data. Hence, individual tasks do not contain complete training data for the categories in the test set. To generalize to such test data, it is crucial for individual tasks to leverage knowledge from related tasks. To this end, we propose learning an association graph to transfer knowledge among tasks for missing classes.

association graph learning, multi-task classification, name change, (9 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Artificial IntelligenceDec-9-2025

Sample from What You See: Visuomotor Policy Learning via Diffusion Bridge with Observation-Embedded Stochastic Differential Equation

Liu, Zhaoyang, Pan, Mokai, Wang, Zhongyi, Zhu, Kaizhen, Lu, Haotao, Wang, Jingya, Shi, Ye

Imitation learning with diffusion models has advanced robotic control by capturing multi-modal action distributions. However, existing approaches typically treat observations as high-level conditioning inputs to the denoising network, rather than integrating them into the stochastic dynamics of the diffusion process itself. As a result, sampling must begin from random Gaussian noise, weakening the coupling between perception and control and often yielding suboptimal performance. W e introduce Bridge-Policy, a generative visuomotor policy that explicitly embeds observations within the stochastic differential equation via a diffusion-bridge formulation. By constructing an observation-informed trajectory, BridgePolicy enables sampling to start from a rich, informative prior rather than random noise, substantially improving precision and reliability in control. A key challenge is that classical diffusion bridges connect distributions with matched dimensionality, whereas robotic observations are heterogeneous and multi-modal and do not naturally align with the action space. T o address this, we design a multi-modal fusion module and a semantic aligner that unify visual and state inputs and align observation and action representations, making the bridge applicable to heterogeneous robot data. Extensive experiments across 52 simulation tasks on three benchmarks and five real-world tasks demonstrate that BridgePolicy consistently outperforms state-of-the-art generative policies.

artificial intelligence, arxiv preprint arxiv, machine learning, (13 more...)

2512.07212

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Neural Information Processing SystemsJan-18-2025, 17:38:09 GMT

Variational Multi-Task Learning with Gumbel-Softmax Priors

individual task, task relatedness, variational multi-task learning, (2 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.64)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.42)

arXiv.org Artificial IntelligenceOct-22-2024

Captions Speak Louder than Images (CASLIE): Generalizing Foundation Models for E-commerce from High-quality Multimodal Instruction Data

Ling, Xinyi, Peng, Bo, Du, Hanwen, Zhu, Zhihui, Ning, Xia

Leveraging multimodal data to drive breakthroughs in e-commerce applications through Multimodal Foundation Models (MFMs) is gaining increasing attention from the research community. However, there are significant challenges that hinder the optimal use of multimodal e-commerce data by foundation models: (1) the scarcity of large-scale, high-quality multimodal benchmark datasets; and (2) the lack of effective multimodal information integration methods. To address these challenges, in this paper, we introduce MMECInstruct, the first-ever, large-scale, and high-quality multimodal instruction dataset for e-commerce. We also develop CASLIE, a simple, lightweight, yet effective framework for integrating multimodal information for e-commerce. Leveraging MMECInstruct, we fine-tune a series of e-commerce MFMs within CASLIE, denoted as CASLIE models. Our comprehensive evaluation demonstrates that CASLIE models substantially outperform 5 categories of advanced baseline models in the in-domain evaluation. Moreover, CASLIE models show strong generalizability to out-of-domain settings. MMECInstruct and CASLIE models are publicly accessible through https://ninglab.github.io/CASLIE/.

large language model, machine learning, natural language, (23 more...)

2410.17337

Country:

North America > United States > Ohio (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > Canada > Newfoundland and Labrador > Labrador (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.50)

Industry: Information Technology > Services > e-Commerce Services (1.00)

Technology:

Information Technology > e-Commerce (1.00)
Information Technology > Data Science (1.00)
Information Technology > Information Management (0.93)
(3 more...)

Neural Information Processing SystemsOct-10-2024, 04:16:11 GMT

Association Graph Learning for Multi-Task Classification with Category Shifts

association graph learning, category shift, multi-task classification, (6 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Artificial IntelligenceJul-13-2024

LiPost: Improved Content Understanding With Effective Use of Multi-task Contrastive Learning

Bindal, Akanksha, Ramanujam, Sudarshan, Golland, Dave, Hazen, TJ, Jiang, Tina, Zhang, Fengyu, Yan, Peng

In enhancing LinkedIn core content recommendation models, a significant challenge lies in improving their semantic understanding capabilities. This paper addresses the problem by leveraging multi-task learning, a method that has shown promise in various domains. We fine-tune a pre-trained, transformer-based LLM using multi-task contrastive learning with data from a diverse set of semantic labeling tasks. We observe positive transfer, leading to superior performance across all tasks when compared to training independently on each. Our model outperforms the baseline on zero shot learning and offers improved multilingual support, highlighting its potential for broader application. The specialized content embeddings produced by our model outperform generalized embeddings offered by OpenAI on Linkedin dataset and tasks. This work provides a robust foundation for vertical teams across LinkedIn to customize and fine-tune the LLM to their specific applications. Our work offers insights and best practices for the field to build on.

dataset, linkedin, lipost, (13 more...)

2405.11344

Genre: Research Report > New Finding (0.46)

Industry: Information Technology > Services (0.90)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

arXiv.org Artificial IntelligenceMay-31-2024

WebSuite: Systematically Evaluating Why Web Agents Fail

Li, Eric, Waldo, Jim

We describe WebSuite, the first diagnostic benchmark for generalist web agents, designed to systematically evaluate why agents fail. Advances in AI have led to the rise of numerous web agents that autonomously operate a browser to complete tasks. However, most existing benchmarks focus on strictly measuring whether an agent can or cannot complete a task, without giving insight on why. In this paper, we 1) develop a taxonomy of web actions to facilitate identifying common failure patterns, and 2) create an extensible benchmark suite to assess agents' performance on our taxonomized actions. This benchmark suite consists of both individual tasks, such as clicking a button, and end-to-end tasks, such as adding an item to a cart, and is designed such that any failure of a task can be attributed directly to a failure of a specific web action. We evaluate two popular generalist web agents, one text-based and one multimodal, and identify unique weaknesses for each agent. Because WebSuite can disaggregate task failures into specific action failures, this enables granular identification of which UX flows an individual agent has trouble with and immediately highlights promising avenues for improvement. These findings highlight the need for more focused benchmarking on where web agents go wrong to effectively improve agents beyond their weaker performance today.

agent, benchmark, interaction, (17 more...)

2406.01623

Country:

North America > United States > New York (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > Canada > Ontario > Toronto (0.04)
Africa > Eswatini > Manzini > Manzini (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Communications > Web (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)

arXiv.org Artificial IntelligenceDec-8-2023

Procedural generation of meta-reinforcement learning tasks

Miconi, Thomas

Open-endedness stands to benefit from the ability to generate an infinite variety of diverse, challenging environments. One particularly interesting type of challenge is meta-learning ("learning-to-learn"), a hallmark of intelligent behavior. However, the number of meta-learning environments in the literature is limited. Here we describe a parametrized space for simple meta-reinforcement learning (meta-RL) tasks with arbitrary stimuli. The parametrization allows us to randomly generate an arbitrary number of novel simple meta-learning tasks. The parametrization is expressive enough to include many well-known meta-RL tasks, such as bandit problems, the Harlow task, T-mazes, the Daw two-step task and others. Simple extensions allow it to capture tasks based on two-dimensional topological spaces, such as full mazes or find-the-spot domains. We describe a number of randomly generated meta-RL domains of varying complexity and discuss potential issues arising from random generation.

agent, probability, stimulus, (17 more...)

2302.05583

Country: North America > United States > Massachusetts > Plymouth County > Norwell (0.04)

Genre: Research Report (0.50)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.49)

Dahlquist, Niklas, Saradagi, Akshit, Nikolakopoulos, George

Reactive Task Allocation for Balanced Servicing of Multiple Task Queues

arXiv.org Artificial IntelligenceApr-5-2023

In this article, we propose a reactive task allocation architecture for a multi-agent system for scenarios where the tasks arrive at random times and are grouped into multiple queues. Two stage tasks are considered where every task has a beginning, an intermediate and a final part, typical in pick-and-drop and inspect-and-report scenarios. A centralized auction-based task allocation system is proposed, where an auction system takes into consideration bids submitted by the agents for individual tasks, current length of the queues and the waiting times of the tasks in the queues to decide on a task allocation strategy. The costs associated with these considerations, along with the constraints of having unique mappings between tasks and agents and constraints on the maximum number of agents that can be assigned to a queue, results in a Linear Integer Program (LIP) that is solved using the SCIP solver. For the scenario where the queue lengths are penalized but not the waiting times, we demonstrate that the auction system allocates tasks in a manner that all the queue lengths become constant, which is termed balancing. For the scenarios where both the costs are considered, we qualitatively analyse the effect of the choice of the relative weights on the resulting task allocation and provide guidelines for the choice of the weights. We present simulation results that illustrate the balanced allocation of tasks and validate the analysis for the trade-off between the costs related to queue lengths and task waiting times.

agent, artificial intelligence, scenario, (16 more...)

2304.02333

Country: Europe > Sweden > Norrbotten County > Luleå (0.04)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)