AITopics | moma

Collaborating Authors

moma

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

MOMA: Multi-Object Multi-Actor Activity Parsing

Neural Information Processing SystemsDec-24-2025, 12:46:17 GMT

Complex activities often involve multiple humans utilizing different objects to complete actions (e.g., in healthcare settings, physicians, nurses, and patients interact with each other and various medical devices). Recognizing activities poses a challenge that requires a detailed understanding of actors' roles, objects' affordances, and their associated relationships. Furthermore, these purposeful activities are composed of multiple achievable steps, including sub-activities and atomic actions, which jointly define a hierarchy of action parts. This paper introduces Activity Parsing as the overarching task of temporal segmentation and classification of activities, sub-activities, atomic actions, along with an instance-level understanding of actors, objects, and their relationships in videos. Involving multiple entities (actors and objects), we argue that traditional pair-wise relationships, often used in scene or action graphs, do not appropriately represent the dynamics between them. Hence, we introduce Action Hypergraph, a spatial-temporal graph containing hyperedges (i.e., edges with higher-order relationships), as a new representation. In addition, we introduce Multi-Object Multi-Actor (MOMA), the first benchmark and dataset dedicated to activity parsing. Lastly, to parse a video, we propose the HyperGraph Activity Parsing (HGAP) network, which outperforms several baselines, including those based on regular graphs and raw video data.

activity parsing, multi-object multi-actor activity parsing, name change, (6 more...)

Neural Information Processing Systems

Industry: Health & Medicine (1.00)

Technology: Information Technology > Artificial Intelligence (1.00)

Add feedback

MoMA: A Mixture-of-Multimodal-Agents Architecture for Enhancing Clinical Prediction Modelling

Gao, Jifan, Rahman, Mahmudur, Caskey, John, Oguss, Madeline, O'Rourke, Ann, Brown, Randy, Stey, Anne, Mayampurath, Anoop, Churpek, Matthew M., Chen, Guanhua, Afshar, Majid

arXiv.org Artificial IntelligenceAug-8-2025

Multimodal electronic health record (EHR) data provide richer, complementary insights into patient health compared to single-modality data. However, effectively integrating diverse data modalities for clinical prediction modeling remains challenging due to the substantial data requirements. We introduce a novel architecture, Mixture-of-Multimodal-Agents (MoMA), designed to leverage multiple large language model (LLM) agents for clinical prediction tasks using multimodal EHR data. MoMA employs specialized LLM agents ("specialist agents") to convert non-textual modalities, such as medical images and laboratory results, into structured textual summaries. These summaries, together with clinical notes, are combined by another LLM ("aggregator agent") to generate a unified multimodal summary, which is then used by a third LLM ("predictor agent") to produce clinical predictions. Evaluating MoMA on three prediction tasks using real-world datasets with different modality combinations and prediction settings, MoMA outperforms current state-of-the-art methods, highlighting its enhanced accuracy and flexibility across various tasks.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2508.05492

Country: North America > United States > Wisconsin (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)
Health & Medicine > Consumer Health (1.00)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Monocular One-Shot Metric-Depth Alignment for RGB-Based Robot Grasping

Guo, Teng, Huang, Baichuan, Yu, Jingjin

arXiv.org Artificial IntelligenceJun-23-2025

Accurate 6D object pose estimation is a prerequisite for successfully completing robotic prehensile and non-prehensile manipulation tasks. At present, 6D pose estimation for robotic manipulation generally relies on depth sensors based on, e.g., structured light, time-of-flight, and stereo-vision, which can be expensive, produce noisy output (as compared with RGB cameras), and fail to handle transparent objects. On the other hand, state-of-the-art monocular depth estimation models (MDEMs) provide only affine-invariant depths up to an unknown scale and shift. Metric MDEMs achieve some successful zero-shot results on public datasets, but fail to generalize. We propose a novel framework, Monocular One-shot Metric-depth Alignment (MOMA), to recover metric depth from a single RGB image, through a one-shot adaptation building on MDEM techniques. MOMA performs scale-rotation-shift alignments during camera calibration, guided by sparse ground-truth depth points, enabling accurate depth estimation without additional data collection or model retraining on the testing setup. MOMA supports fine-tuning the MDEM on transparent objects, demonstrating strong generalization capabilities. Real-world experiments on tabletop 2-finger grasping and suction-based bin-picking applications show MOMA achieves high success rates in diverse tasks, confirming its effectiveness.

artificial intelligence, estimation, image understanding, (16 more...)

arXiv.org Artificial Intelligence

2506.1711

Country:

North America > United States > New Jersey > Middlesex County > Piscataway (0.04)
Europe > Italy > Tuscany > Florence (0.04)
Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision > Image Understanding (0.77)
Information Technology > Artificial Intelligence > Robots > Manipulation (0.50)

Add feedback

MoMa: A Modular Deep Learning Framework for Material Property Prediction

Wang, Botian, Ouyang, Yawen, Li, Yaohui, Wang, Yiqun, Cui, Haorui, Zhang, Jianbing, Wang, Xiaonan, Ma, Wei-Ying, Zhou, Hao

arXiv.org Artificial IntelligenceMar-17-2025

Deep learning methods for material property prediction have been widely explored to advance materials discovery. However, the prevailing pre-train then fine-tune paradigm often fails to address the inherent diversity and disparity of material tasks. To overcome these challenges, we introduce MoMa, a Modular framework for Materials that first trains specialized modules across a wide range of tasks and then adaptively composes synergistic modules tailored to each downstream scenario. Evaluation across 17 datasets demonstrates the superiority of MoMa, with a substantial 14% average improvement over the strongest baseline. Few-shot and continual learning experiments further highlight MoMa's potential for real-world applications. Pioneering a new paradigm of modular material learning, MoMa will be open-sourced to foster broader community collaboration.

artificial intelligence, machine learning, module, (14 more...)

arXiv.org Artificial Intelligence

2502.15483

Genre: Research Report (0.64)

Industry: Energy > Oil & Gas > Upstream (0.70)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

MOMA: Multi-Object Multi-Actor Activity Parsing

Neural Information Processing SystemsJan-17-2025, 17:14:07 GMT

activity parsing, moma, multi-object multi-actor activity parsing, (3 more...)

Neural Information Processing Systems

Industry: Health & Medicine (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.73)

Add feedback

Mitigating Social Bias in Large Language Models: A Multi-Objective Approach within a Multi-Agent Framework

Xu, Zhenjie, Chen, Wenqing, Tang, Yi, Li, Xuanying, Hu, Cheng, Chu, Zhixuan, Ren, Kui, Zheng, Zibin, Lu, Zhichao

arXiv.org Artificial IntelligenceDec-19-2024

Natural language processing (NLP) has seen remarkable advancements with the development of large language models (LLMs). Despite these advancements, LLMs often produce socially biased outputs. Recent studies have mainly addressed this problem by prompting LLMs to behave ethically, but this approach results in unacceptable performance degradation. In this paper, we propose a multi-objective approach within a multi-agent framework (MOMA) to mitigate social bias in LLMs without significantly compromising their performance. The key idea of MOMA involves deploying multiple agents to perform causal interventions on bias-related contents of the input questions, breaking the shortcut connection between these contents and the corresponding answers. Unlike traditional debiasing techniques leading to performance degradation, MOMA substantially reduces bias while maintaining accuracy in downstream tasks. Our experiments conducted on two datasets and two models demonstrate that MOMA reduces bias scores by up to 87.7%, with only a marginal performance degradation of up to 6.8% in the BBQ dataset. Additionally, it significantly enhances the multi-objective metric icat in the StereoSet dataset by up to 58.1%. Code will be made available at https://github.com/Cortantse/MOMA.

large language model, natural language, social group, (15 more...)

arXiv.org Artificial Intelligence

2412.15504

Country:

South America > Colombia > Meta Department > Villavicencio (0.04)
Europe > Middle East > Malta > Eastern Region > Northern Harbour District > St. Julian's (0.04)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry:

Education (0.69)
Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

TechScape: Elon Musk is stumping hard for Donald Trump

The GuardianOct-15-2024, 13:19:14 GMT

Thank you for joining me. Elon Musk is stumping hard for Donald Trump. The Tesla and SpaceX CEO has funded a pro-Trump political action committee with tens of millions of dollars and planned a packed campaign schedule to boost the former president in Pennsylvania. He speaks to Trump multiple times per week and has urged other billionaires to endorse the Republican candidate en masse in private gatherings, according to the New York Times. Taken together, Musk's actions amount to something unprecedented in modern times – a man who is both the richest in the world and owner of an influential means of mass communication throwing all his weight behind a political candidate.

artificial intelligence, musk, social media, (18 more...)

The Guardian

Country:

North America > United States > Pennsylvania (0.27)
Europe > United Kingdom (0.15)
North America > United States > New York (0.05)
(2 more...)

Genre: Personal > Honors (0.71)

Industry:

Government > Voting & Elections (1.00)
Government > Regional Government > North America Government > United States Government (1.00)

Technology:

Information Technology > Communications > Social Media (0.98)
Information Technology > Artificial Intelligence (0.72)

Add feedback

MoMa: Efficient Early-Fusion Pre-training with Mixture of Modality-Aware Experts

Lin, Xi Victoria, Shrivastava, Akshat, Luo, Liang, Iyer, Srinivasan, Lewis, Mike, Ghosh, Gargi, Zettlemoyer, Luke, Aghajanyan, Armen

arXiv.org Artificial IntelligenceAug-12-2024

We introduce MoMa, a novel modality-aware mixture-of-experts (MoE) architecture designed for pre-training mixed-modal, early-fusion language models. MoMa processes images and text in arbitrary sequences by dividing expert modules into modality-specific groups. These groups exclusively process designated tokens while employing learned routing within each group to maintain semantically informed adaptivity. Our empirical results reveal substantial pre-training efficiency gains through this modality-specific parameter allocation. Under a 1-trillion-token training budget, the MoMa 1.4B model, featuring 4 text experts and 4 image experts, achieves impressive FLOPs savings: 3.7x overall, with 2.6x for text and 5.2x for image processing compared to a compute-equivalent dense baseline, measured by pre-training loss. This outperforms the standard expert-choice MoE with 8 mixed-modal experts, which achieves 3x overall FLOPs savings (3x for text, 2.8x for image). Combining MoMa with mixture-of-depths (MoD) further improves pre-training FLOPs savings to 4.2x overall (text: 3.4x, image: 5.3x), although this combination hurts performance in causal inference due to increased sensitivity to router accuracy. These results demonstrate MoMa's potential to significantly advance the efficiency of mixed-modal, early-fusion language model pre-training, paving the way for more resource-efficient and capable multimodal AI systems.

architecture, arxiv, modality, (15 more...)

arXiv.org Artificial Intelligence

2407.2177

Country: Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

MoMA: Model-based Mirror Ascent for Offline Reinforcement Learning

Hong, Mao, Zhang, Zhiyue, Wu, Yue, Xu, Yanxun

arXiv.org Artificial IntelligenceJan-20-2024

Model-based offline reinforcement learning methods (RL) have achieved state-of-the-art performance in many decision-making problems thanks to their sample efficiency and generalizability. Despite these advancements, existing model-based offline RL approaches either focus on theoretical studies without developing practical algorithms or rely on a restricted parametric policy space, thus not fully leveraging the advantages of an unrestricted policy space inherent to model-based methods. To address this limitation, we develop MoMA, a model-based mirror ascent algorithm with general function approximations under partial coverage of offline data. MoMA distinguishes itself from existing literature by employing an unrestricted policy class. In each iteration, MoMA conservatively estimates the value function by a minimization procedure within a confidence set of transition models in the policy evaluation step, then updates the policy with general function approximations instead of commonly-used parametric policy classes in the policy improvement step. Under some mild assumptions, we establish theoretical guarantees of MoMA by proving an upper bound on the suboptimality of the returned policy. We also provide a practically implementable, approximate version of the algorithm. The effectiveness of MoMA is demonstrated via numerical studies.

algorithm, approximation, moma, (11 more...)

arXiv.org Artificial Intelligence

2401.1138

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Portugal > Porto > Porto (0.04)

Genre:

Research Report (0.63)
Workflow (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

The Puzzle of Putting Video Games in a Museum

The New YorkerJun-30-2023, 19:19:09 GMT

At some point in my childhood, I persuaded my parents to buy me a computer game at the Metropolitan Museum of Art. Obsessed, like many kids, with ancient Egypt, I'd spent the day marvelling at scarabs, sarcophagi, and ivory game pieces with canine heads. My favorite spot was the Temple of Dendur, where you could actually go inside the narrow chamber etched with hieroglyphs. In the gift shop, I spotted "Nile: An Ancient Egyptian Quest"--a three-disk "edutainment," co-produced by the museum and scored by Brian Eno, which invited me to bring the enchantment home. Soon, in defiance of the twelve-and-up rating, I was wandering the tombs of Giza with a talking jackal, searching for grave goods to nourish the souls of kings.

exhibition, gallery, museum, (15 more...)

The New Yorker

Country:

North America > United States > New York > New York County > New York City (0.24)
Africa > Middle East > Egypt > Giza Governorate > Giza (0.24)
North America > United States > Kentucky (0.04)
(6 more...)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology: Information Technology > Artificial Intelligence > Games (1.00)

Add feedback