AITopics | mida

Collaborating Authors

mida

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

On the Inductive Bias of Stacking Towards Improving Reasoning

Neural Information Processing SystemsMar-21-2026, 09:43:45 GMT

Given the increasing scale of model sizes, efficient training strategies like gradual stacking have garnered interest. Stacking enables efficient training by gradually growing the depth of a model in stages and using layers from a smaller model in an earlier stage to initialize the next stage. Although efficient for training, the model biases induced by such growing approaches are largely unexplored. In this work, we examine this fundamental aspect of gradual stacking, going beyond its efficiency benefits. We propose a variant of gradual stacking called MIDAS that can speed up language model training by up to 40\%. Furthermore we discover an intriguing phenomenon: MIDAS is not only training-efficient but surprisingly also has an inductive bias towards improving downstream tasks, especially tasks that require reasoning abilities like reading comprehension and math problems, despite having similar or slightly worse perplexity compared to baseline training.

artificial intelligence, inductive bias, proceedings, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.42)

Add feedback

LearningtoMitigateExternalities:theCoaseTheorem withHindsightRationality

Neural Information Processing SystemsFeb-8-2026, 20:53:04 GMT

This is a major hindrance to the practical implementation of many proposed solutions.

artificial intelligence, machine learning, vup, (19 more...)

Neural Information Processing Systems

Country:

Europe > France (0.04)
Europe > Denmark (0.04)
Asia > Middle East > Jordan (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.67)

Add feedback

Do Depth-Grown Models Overcome the Curse of Depth? An In-Depth Analysis

Kapl, Ferdinand, Angelis, Emmanouil, Höppe, Tobias, Maile, Kaitlin, von Oswald, Johannes, Scherrer, Nino, Bauer, Stefan

arXiv.org Artificial IntelligenceDec-10-2025

Gradually growing the depth of Transformers during training can not only reduce training cost but also lead to improved reasoning performance, as shown by MIDAS (Saunshi et al., 2024). Thus far, however, a mechanistic understanding of these gains has been missing. In this work, we establish a connection to recent work showing that layers in the second half of non-grown, pre-layernorm Transformers contribute much less to the final output distribution than those in the first half - also known as the Curse of Depth (Sun et al., 2025, Csordás et al., 2025). Using depth-wise analyses, we demonstrate that growth via gradual middle stacking yields more effective utilization of model depth, alters the residual stream structure, and facilitates the formation of permutable computational blocks. In addition, we propose a lightweight modification of MIDAS that yields further improvements in downstream reasoning benchmarks. Overall, this work highlights how the gradual growth of model depth can lead to the formation of distinct computational circuits and overcome the limited depth utilization seen in standard non-grown models.

large language model, lida, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2512.08819

Country:

North America > Mexico (0.28)
Europe > Germany (0.28)
Europe > Austria (0.28)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

MIDAS: Misalignment-based Data Augmentation Strategy for Imbalanced Multimodal Learning

Hwang, Seong-Hyeon, Choi, Soyoung, Whang, Steven Euijong

arXiv.org Artificial IntelligenceOct-1-2025

Multimodal models often over-rely on dominant modalities, failing to achieve optimal performance. While prior work focuses on modifying training objectives or optimization procedures, data-centric solutions remain underexplored. We propose MIDAS, a novel data augmentation strategy that generates misaligned samples with semantically inconsistent cross-modal information, labeled using unimodal confidence scores to compel learning from contradictory signals. However, this confidence-based labeling can still favor the more confident modality. To address this within our misaligned samples, we introduce weak-modality weighting, which dynamically increases the loss weight of the least confident modality, thereby helping the model fully utilize weaker modality. Furthermore, when misaligned features exhibit greater similarity to the aligned features, these misaligned samples pose a greater challenge, thereby enabling the model to better distinguish between classes. To leverage this, we propose hard-sample weighting, which prioritizes such semantically ambiguous misaligned samples. Experiments on multiple multimodal classification benchmarks demonstrate that MIDAS significantly outperforms related baselines in addressing modality imbalance.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2509.25831

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.66)

Add feedback

Memorization or Reasoning? Exploring the Idiom Understanding of LLMs

Kim, Jisu, Shin, Youngwoo, Hwang, Uiji, Choi, Jihun, Xuan, Richeng, Kim, Taeuk

arXiv.org Artificial IntelligenceSep-24-2025

Idioms have long posed a challenge due to their unique linguistic properties, which set them apart from other common expressions. While recent studies have leveraged large language models (LLMs) to handle idioms across various tasks, e.g., idiom-containing sentence generation and idiomatic machine translation, little is known about the underlying mechanisms of idiom processing in LLMs, particularly in multilingual settings. To this end, we introduce MIDAS, a new large-scale dataset of idioms in six languages, each paired with its corresponding meaning. Leveraging this resource, we conduct a comprehensive evaluation of LLMs' idiom processing ability, identifying key factors that influence their performance. Our findings suggest that LLMs rely not only on memorization, but also adopt a hybrid approach that integrates contextual cues and reasoning, especially when processing compositional idioms. This implies that idiom understanding in LLMs emerges from an interplay between internal knowledge retrieval and reasoning-based inference.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2505.16216

Country:

Europe (1.00)
Asia > Middle East > UAE (0.46)
North America > Mexico (0.28)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (0.64)

Add feedback

On the Inductive Bias of Stacking Towards Improving Reasoning

Neural Information Processing SystemsMay-27-2025, 07:09:35 GMT

artificial intelligence, inductive bias, reasoning, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.44)

Add feedback

Examining Spanish Counseling with MIDAS: a Motivational Interviewing Dataset in Spanish

Gunal, Aylin, Yi, Bowen, Piette, John, Mihalcea, Rada, Pérez-Rosas, Verónica

arXiv.org Artificial IntelligenceFeb-12-2025

Cultural and language factors significantly influence counseling, but Natural Language Processing research has not yet examined whether the findings of conversational analysis for counseling conducted in English apply to other languages. This paper presents a first step towards this direction. We introduce MIDAS (Motivational Interviewing Dataset in Spanish), a counseling dataset created from public video sources that contains expert annotations for counseling reflections and questions. Using this dataset, we explore language-based differences in counselor behavior in English and Spanish and develop classifiers in monolingual and multilingual settings, demonstrating its applications in counselor behavioral coding tasks.

artificial intelligence, dataset, natural language, (15 more...)

arXiv.org Artificial Intelligence

2502.08458

Country:

Europe > Ireland (0.04)
South America (0.04)
North America > United States > Texas (0.04)
(12 more...)

Genre: Research Report (0.82)

Industry: Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

MIDAS: Multi-level Intent, Domain, And Slot Knowledge Distillation for Multi-turn NLU

Li, Yan, Kim, So-Eon, Park, Seong-Bae, Han, Soyeon Caren

arXiv.org Artificial IntelligenceAug-15-2024

Although Large Language Models(LLMs) can generate coherent and contextually relevant text, they often struggle to recognise the intent behind the human user's query. Natural Language Understanding (NLU) models, however, interpret the purpose and key information of user's input to enable responsive interactions. Existing NLU models generally map individual utterances to a dual-level semantic frame, involving sentence-level intent and word-level slot labels. However, real-life conversations primarily consist of multi-turn conversations, involving the interpretation of complex and extended dialogues. Researchers encounter challenges addressing all facets of multi-turn dialogue conversations using a unified single NLU model. This paper introduces a novel approach, MIDAS, leveraging a multi-level intent, domain, and slot knowledge distillation for multi-turn NLU. To achieve this, we construct distinct teachers for varying levels of conversation knowledge, namely, sentence-level intent detection, word-level slot filling, and conversation-level domain classification. These teachers are then fine-tuned to acquire specific knowledge of their designated levels. A multi-teacher loss is proposed to facilitate the combination of these multi-level teachers, guiding a student model in multi-turn dialogue tasks. The experimental results demonstrate the efficacy of our model in improving the overall multi-turn conversation understanding, showcasing the potential for advancements in NLU models through the incorporation of multi-level dialogue knowledge distillation techniques.

knowledge, mida, teacher model, (12 more...)

arXiv.org Artificial Intelligence

2408.08144

Country:

Europe > Czechia > South Moravian Region > Brno (0.04)
Asia > Middle East > Iran > Tehran Province > Tehran (0.04)

Genre:

Research Report > New Finding (0.66)
Research Report > Promising Solution (0.48)

Industry: Education (0.91)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

Auto-Platoon : Freight by example

Puthanveettil, Tharun V., Singh, Abhijay, Jain, Yashveer, Bukka, Vinay, S, Sameer Arjun

arXiv.org Artificial IntelligenceMay-19-2024

The work introduces a bio-inspired leader-follower system based on an innovative mechanism proposed as software latching that aims to improve collaboration and coordination between a leader agent and the associated autonomous followers. The system utilizes software latching to establish real-time communication and synchronization between the leader and followers. A layered architecture is proposed, encompassing perception, decision-making, and control modules. Challenges such as uncertainty, dynamic environments, and communication latency are addressed using Deep learning and real-time data processing pipelines. The follower robot is equipped with sensors and communication modules that enable it to track and trace the agent of interest or avoid obstacles. The followers track the leader and dynamically avoid obstacles while maintaining a safe distance from it. The experimental results demonstrate the proposed system's effectiveness, making it a promising solution for achieving success in tasks that demand multi-robot systems capable of navigating complex dynamic environments.

follower, follower robot, robot, (17 more...)

arXiv.org Artificial Intelligence

2405.11659

Country: North America > United States > Maryland (0.05)

Genre: Research Report > New Finding (0.48)

Industry: Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Architecture > Real Time Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Causal Intersectionality and Dual Form of Gradient Descent for Multimodal Analysis: a Case Study on Hateful Memes

Miyanishi, Yosuke, Nguyen, Minh Le

arXiv.org Artificial IntelligenceAug-19-2023

In the wake of the explosive growth of machine learning (ML) usage, particularly within the context of emerging Large Language Models (LLMs), comprehending the semantic significance rooted in their internal workings is crucial. While causal analyses focus on defining semantics and its quantification, the gradient-based approach is central to explainable AI (XAI), tackling the interpretation of the black box. By synergizing these approaches, the exploration of how a model's internal mechanisms illuminate its causal effect has become integral for evidence-based decision-making. A parallel line of research has revealed that intersectionality - the combinatory impact of multiple demographics of an individual - can be structured in the form of an Averaged Treatment Effect (ATE). Initially, this study illustrates that the hateful memes detection problem can be formulated as an ATE, assisted by the principles of intersectionality, and that a modality-wise summarization of gradient-based attention attribution scores can delineate the distinct behaviors of three Transformerbased models concerning ATE. Subsequently, we show that the latest LLM LLaMA2 has the ability to disentangle the intersectional nature of memes detection in an in-context learning setting, with their mechanistic properties elucidated via meta-gradient, a secondary form of gradient. In conclusion, this research contributes to the ongoing dialogue surrounding XAI and the multifaceted nature of ML models.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2308.11585

Country:

Asia > Japan (0.04)
North America > United States > Washington > King County > Seattle (0.04)
North America > United States > Maryland > Baltimore (0.04)
(8 more...)

Genre: Research Report (0.64)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback