AITopics | dima

Collaborating Authors

dima

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Revisiting Multi-Agent World Modeling from a Diffusion-Inspired Perspective

Zhang, Yang, Li, Xinran, Ye, Jianing, Qiu, Shuang, Qu, Delin, Li, Xiu, Zhang, Chongjie, Bai, Chenjia

arXiv.org Artificial IntelligenceOct-27-2025

World models have recently attracted growing interest in Multi-Agent Reinforcement Learning (MARL) due to their ability to improve sample efficiency for policy learning. However, accurately modeling environments in MARL is challenging due to the exponentially large joint action space and highly uncertain dynamics inherent in multi-agent systems. To address this, we reduce modeling complexity by shifting from jointly modeling the entire state-action transition dynamics to focusing on the state space alone at each timestep through sequential agent modeling. Specifically, our approach enables the model to progressively resolve uncertainty while capturing the structured dependencies among agents, providing a more accurate representation of how agents influence the state. Interestingly, this sequential revelation of agents' actions in a multi-agent system aligns with the reverse process in diffusion models--a class of powerful generative models known for their expressiveness and training stability compared to autoregressive or latent variable models. Leveraging this insight, we develop a flexible and robust world model for MARL using diffusion models. Our method, Diffusion-Inspired Multi-Agent world model (DIMA), achieves state-of-the-art performance across multiple multi-agent control benchmarks, significantly outperforming prior world models in terms of final return and sample efficiency, including MAMuJoCo and Bi-DexHands. DIMA establishes a new paradigm for constructing multi-agent world models, advancing the frontier of MARL research. Codes are open-sourced at https://github.com/breez3young/DIMA.

artificial intelligence, machine learning, world model, (18 more...)

arXiv.org Artificial Intelligence

2505.20922

Country: Asia > China (0.46)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Information Technology (0.92)
Leisure & Entertainment > Games (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

DiMA: An LLM-Powered Ride-Hailing Assistant at DiDi

Ning, Yansong, Cai, Shuowei, Li, Wei, Fang, Jun, Tan, Naiqiang, Chai, Hua, Liu, Hao

arXiv.org Artificial IntelligenceFeb-12-2025

On-demand ride-hailing services like DiDi, Uber, and Lyft have transformed urban transportation, offering unmatched convenience and flexibility. In this paper, we introduce DiMA, an LLM-powered ride-hailing assistant deployed in DiDi Chuxing. Its goal is to provide seamless ride-hailing services and beyond through a natural and efficient conversational interface under dynamic and complex spatiotemporal urban contexts. To achieve this, we propose a spatiotemporal-aware order planning module that leverages external tools for precise spatiotemporal reasoning and progressive order planning. Additionally, we develop a cost-effective dialogue system that integrates multi-type dialog repliers with cost-aware LLM configurations to handle diverse conversation goals and trade-off response quality and latency. Furthermore, we introduce a continual fine-tuning scheme that utilizes real-world interactions and simulated dialogues to align the assistant's behavior with human preferred decision-making processes. Since its deployment in the DiDi application, DiMA has demonstrated exceptional performance, achieving 93% accuracy in order planning and 92% in response generation during real-world interactions. Offline experiments further validate DiMA capabilities, showing improvements of up to 70.23% in order planning and 321.27% in response generation compared to three state-of-the-art agent frameworks, while reducing latency by $0.72\times$ to $5.47\times$. These results establish DiMA as an effective, efficient, and intelligent mobile assistant for ride-hailing services.

dima, information, order planning, (11 more...)

arXiv.org Artificial Intelligence

2503.04768

Country:

Asia > China > Shanghai > Shanghai (0.05)
Asia > China > Beijing > Beijing (0.05)
Asia > China > Guangdong Province > Guangzhou (0.05)
(2 more...)

Genre: Research Report (1.00)

Industry:

Transportation > Passenger (1.00)
Transportation > Ground > Road (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Distilling Multi-modal Large Language Models for Autonomous Driving

Hegde, Deepti, Yasarla, Rajeev, Cai, Hong, Han, Shizhong, Bhattacharyya, Apratim, Mahajan, Shweta, Liu, Litian, Garrepalli, Risheek, Patel, Vishal M., Porikli, Fatih

arXiv.org Artificial IntelligenceJan-16-2025

Autonomous driving demands safe motion planning, especially in critical "long-tail" scenarios. Recent end-to-end autonomous driving systems leverage large language models (LLMs) as planners to improve generalizability to rare events. However, using LLMs at test time introduces high computational costs. To address this, we propose DiMA, an end-to-end autonomous driving system that maintains the efficiency of an LLM-free (or vision-based) planner while leveraging the world knowledge of an LLM. DiMA distills the information from a multi-modal LLM to a vision-based end-to-end planner through a set of specially designed surrogate tasks. Under a joint training strategy, a scene encoder common to both networks produces structured representations that are semantically grounded as well as aligned to the final planning objective. Notably, the LLM is optional at inference, enabling robust planning without compromising on efficiency. Training with DiMA results in a 37% reduction in the L2 trajectory error and an 80% reduction in the collision rate of the vision-based planner, as well as a 44% trajectory error reduction in longtail scenarios. DiMA also achieves state-of-the-art performance on the nuScenes planning benchmark.

latexit sha1, prediction, vehicle, (14 more...)

arXiv.org Artificial Intelligence

2501.09757

Genre: Research Report (0.82)

Industry:

Transportation > Ground > Road (1.00)
Information Technology > Robotics & Automation (1.00)
Automobiles & Trucks (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

When Generative AI Meets Workplace Learning: Creating A Realistic & Motivating Learning Experience With A Generative PCA

Bucher, Andreas, Schenk, Birgit, Dolata, Mateusz, Schwabe, Gerhard

arXiv.org Artificial IntelligenceMay-24-2024

Workplace learning is used to train employees systematically, e.g., via e-learning or in 1:1 training. However, this is often deemed ineffective and costly. Whereas pure e-learning lacks the possibility of conversational exercise and personal contact, 1:1 training with human instructors involves a high level of personnel and organizational costs. Hence, pedagogical conversational agents (PCAs), based on generative AI, seem to compensate for the disadvantages of both forms. Following Action Design Research, this paper describes an organizational communication training with a Generative PCA (GenPCA). The evaluation shows promising results: the agent was perceived positively among employees and contributed to an improvement in self-determined learning. However, the integration of such agent comes not without limitations. We conclude with suggestions concerning the didactical methods, which are supported by a GenPCA, and possible improvements of such an agent for workplace learning.

dima, learner, learning, (12 more...)

arXiv.org Artificial Intelligence

2405.15561

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Europe > Middle East > Cyprus > Pafos > Paphos (0.05)
Europe > Germany (0.04)
(4 more...)

Genre:

Research Report > New Finding (1.00)
Instructional Material (1.00)
Research Report > Experimental Study (0.67)

Industry:

Education > Educational Technology > Educational Software > Computer Based Training (1.00)
Education > Educational Setting > Online (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.72)

Add feedback

Diffusion on language model embeddings for protein sequence generation

Meshchaninov, Viacheslav, Strashnov, Pavel, Shevtsov, Andrey, Nikolaev, Fedor, Ivanisenko, Nikita, Kardymon, Olga, Vetrov, Dmitry

arXiv.org Artificial IntelligenceMar-6-2024

Protein design requires a deep understanding of the inherent complexities of the protein universe. While many efforts lean towards conditional generation or focus on specific families of proteins, the foundational task of unconditional generation remains underexplored and undervalued. Here, we explore this pivotal domain, introducing DiMA, a model that leverages continuous diffusion on embeddings derived from the protein language model, ESM-2, to generate amino acid sequences. DiMA surpasses leading solutions, including autoregressive transformer-based and discrete diffusion models, and we quantitatively illustrate the impact of the design choices that lead to its superior performance. We extensively evaluate the quality, diversity, distribution similarity, and biological relevance of the generated sequences using multiple metrics across various modalities. Our approach consistently produces novel, diverse protein sequences that accurately reflect the inherent structural and functional diversity of the protein space. This work advances the field of protein design and sets the stage for conditional models by providing a robust framework for scalable and high-quality protein sequence generation.

dataset, language model, sequence, (11 more...)

arXiv.org Artificial Intelligence

2403.03726

Country:

Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)
Asia > Russia (0.04)
North America > United States (0.04)
Europe > Germany (0.04)

Genre: Research Report (0.82)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Education > Health & Safety > School Nutrition (0.36)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

How AI can make the metaverse a more interactive space

#artificialintelligenceJan-31-2023, 23:51:06 GMT

The potential behind the metaverse is becoming greater as virtual and physical worlds converge. Market intelligence firm Contrive Datum Insights recently found that the global metaverse market is estimated to surpass $1.3 trillion by 2030. According to the study, this growth will be driven by newly adopted virtual economy trends, combined with the rise of both crypto and online games. Additionally, a recent survey conducted by CoinWire highlighted that the metaverse would likely reshape social lifestyles. CoinWire found that 69% of respondents believe that the metaverse will eventually modify social lifestyles due to new approaches taken for entertainment and activities. Hackl elaborated that technologies such as volumetric video -- a technique that offers a more immersive experience by capturing three-dimensional spaces -- will likely change how individuals communicate.

artificial intelligence, metaverse, social media, (16 more...)

#artificialintelligence

Industry:

Leisure & Entertainment > Games > Computer Games (0.52)
Banking & Finance (0.37)

Technology:

Information Technology > Artificial Intelligence (1.00)
Information Technology > Communications > Social Media (0.31)

Add feedback

He Said, She Said: Style Transfer for Shifting the Perspective of Dialogues

Bertsch, Amanda, Neubig, Graham, Gormley, Matthew R.

arXiv.org Artificial IntelligenceOct-27-2022

In this work, we define a new style transfer task: perspective shift, which reframes a dialogue from informal first person to a formal third person rephrasing of the text. This task requires challenging coreference resolution, emotion attribution, and interpretation of informal text. We explore several baseline approaches and discuss further directions on this task when applied to short dialogues. As a sample application, we demonstrate that applying perspective shifting to a dialogue summarization dataset (SAMSum) substantially improves the zero-shot performance of extractive news summarization models on this data. Additionally, supervised extractive models perform better when trained on perspective shifted data than on the original dialogues. We release our code publicly.

computational linguistic, large language model, machine learning, (22 more...)

arXiv.org Artificial Intelligence

2210.15462

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > Dominican Republic (0.04)
(9 more...)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.46)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.34)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.34)

Add feedback

Can Smart Earbuds Instantly Translate Foreign Speech?

WSJ.com: WSJD - TechnologyJul-1-2018, 03:05:51 GMT

STEPPING OFF THE PLANE in Russia for the first time in 2013, I collided with a wall of blunt language and was intrigued beyond repair. Five years, countless classes and ten visits to Moscow later, I still claim a distinctly below-average capacity for the Russian tongue and its dense, foreboding components. To fill these gaps ahead of my next adventure abroad, I turned to technology. Late last year, Brooklyn's Waverly Labs released the Pilot ($299, waverlylabs.com), These eavesdropping devices use a cloud-based machine learning technology to pipe dozens of different languages into your brain in your mother tongue.

artificial intelligence, machine learning, translate foreign speech, (7 more...)

WSJ.com: WSJD - Technology

Country:

Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.28)
Asia > Russia (0.26)
North America > United States > Texas > Travis County > Austin (0.06)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.62)

Add feedback