AITopics | Inductive Learning

Collaborating Authors

Inductive Learning

Inductive learning, or induction, is the process of creating generalizations from individual instances.

News Overviews Instructional Materials AI-Alerts Classics

Narrate2Nav: Real-Time Visual Navigation with Implicit Language Reasoning in Human-Centric Environments

Payandeh, Amirreza, Pokhrel, Anuj, Song, Daeun, Zampieri, Marcos, Xiao, Xuesu

arXiv.org Artificial IntelligenceJun-18-2025

Large Vision-Language Models (VLMs) have demonstrated potential in enhancing mobile robot navigation in human-centric environments by understanding contextual cues, human intentions, and social dynamics while exhibiting reasoning capabilities. However, their computational complexity and limited sensitivity to continuous numerical data impede real-time performance and precise motion control. To this end, we propose Narrate2Nav, a novel real-time vision-action model that leverages a novel self-supervised learning framework based on the Barlow Twins redundancy reduction loss to embed implicit natural language reasoning, social cues, and human intentions within a visual encoder-enabling reasoning in the model's latent space rather than token space. The model combines RGB inputs, motion commands, and textual signals of scene context during training to bridge from robot observations to low-level motion commands for short-horizon point-goal navigation during deployment. Extensive evaluation of Narrate2Nav across various challenging scenarios in both offline unseen dataset and real-world experiments demonstrates an overall improvement of 52.94 percent and 41.67 percent, respectively, over the next best baseline. Additionally, qualitative comparative analysis of Narrate2Nav's visual encoder attention map against four other baselines demonstrates enhanced attention to navigation-critical scene elements, underscoring its effectiveness in human-centric navigation tasks.

large language model, machine learning, real time system, (20 more...)

arXiv.org Artificial Intelligence

2506.14233

Genre: Research Report (0.66)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Architecture > Real Time Systems (1.00)
(2 more...)

Add feedback

SSLAM: Enhancing Self-Supervised Models with Audio Mixtures for Polyphonic Soundscapes

Alex, Tony, Ahmed, Sara, Mustafa, Armin, Awais, Muhammad, Jackson, Philip JB

arXiv.org Artificial IntelligenceJun-17-2025

Self-supervised pre-trained audio networks have seen widespread adoption in real-world systems, particularly in multi-modal large language models. These networks are often employed in a frozen state, under the assumption that the self-supervised pre-training has sufficiently equipped them to handle real-world audio. However, a critical question remains: how well do these models actually perform in real-world conditions, where audio is typically polyphonic and complex, involving multiple overlapping sound sources? Current audio self-supervised learning (SSL) methods are often benchmarked on datasets predominantly featuring monophonic audio, such as environmental sounds, and speech. As a result, the ability of SSL models to generalize to polyphonic audio, a common characteristic in natural scenarios, remains underexplored. This limitation raises concerns about the practical robustness of SSL models in more realistic audio settings. To address this gap, we introduce S elf-S upervised L earning from A udio M ixtures (SSLAM), a novel direction in audio SSL research, designed to improve the model's ability to learn from polyphonic data while maintaining strong performance on monophonic data. We thoroughly evaluate SSLAM on standard audio SSL benchmark datasets which are predominantly monophonic and conduct a comprehensive comparative analysis against state-of-the-art (SOT A) methods using a range of high-quality, publicly available polyphonic datasets. SS-LAM not only improves model performance on polyphonic audio, but also maintains or exceeds performance on standard audio SSL benchmarks. Notably, it achieves up to a 3.9% improvement on the AudioSet-2M(AS-2M), reaching a mean average precision (mAP) of 50.2. These results demonstrate SSLAM's effectiveness in both polyphonic and monophonic soundscapes, significantly enhancing the performance of audio SSL models. Code and pre-trained models are available at https://github.com/ta012/SSLAM .

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2506.12222

Genre: Research Report > New Finding (0.48)

Industry: Education (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.67)
(2 more...)

Add feedback

Private List Learnability vs. Online List Learnability

Hanneke, Steve, Moran, Shay, Schefler, Hilla, Tsubari, Iska

arXiv.org Artificial IntelligenceJun-17-2025

This work explores the connection between differential privacy (DP) and online learning in the context of PAC list learning. In this setting, a $k$-list learner outputs a list of $k$ potential predictions for an instance $x$ and incurs a loss if the true label of $x$ is not included in the list. A basic result in the multiclass PAC framework with a finite number of labels states that private learnability is equivalent to online learnability [Alon, Livni, Malliaris, and Moran (2019); Bun, Livni, and Moran (2020); Jung, Kim, and Tewari (2020)]. Perhaps surprisingly, we show that this equivalence does not hold in the context of list learning. Specifically, we prove that, unlike in the multiclass setting, a finite $k$-Littlestone dimensio--a variant of the classical Littlestone dimension that characterizes online $k$-list learnability--is not a sufficient condition for DP $k$-list learnability. However, similar to the multiclass case, we prove that it remains a necessary condition. To demonstrate where the equivalence breaks down, we provide an example showing that the class of monotone functions with $k+1$ labels over $\mathbb{N}$ is online $k$-list learnable, but not DP $k$-list learnable. This leads us to introduce a new combinatorial dimension, the \emph{$k$-monotone dimension}, which serves as a generalization of the threshold dimension. Unlike the multiclass setting, where the Littlestone and threshold dimensions are finite together, for $k>1$, the $k$-Littlestone and $k$-monotone dimensions do not exhibit this relationship. We prove that a finite $k$-monotone dimension is another necessary condition for DP $k$-list learnability, alongside finite $k$-Littlestone dimension. Whether the finiteness of both dimensions implies private $k$-list learnability remains an open question.

artificial intelligence, dimension, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2506.12856

Genre: Research Report (0.81)

Industry: Education > Educational Setting > Online (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.69)

Add feedback

Self-Anchored Attention Model for Sample-Efficient Classification of Prosocial Text Chat

Li, Zhuofang, Kocielnik, Rafal, Soltani, Fereshteh, Penphob, null, Boonyarungsrit, null, Anandkumar, Animashree, Alvarez, R. Michael

arXiv.org Artificial IntelligenceJun-12-2025

Millions of players engage daily in competitive online games, communicating through in-game chat. Prior research has focused on detecting relatively small volumes of toxic content using various Natural Language Processing (NLP) techniques for the purpose of moderation. However, recent studies emphasize the importance of detecting prosocial communication, which can be as crucial as identifying toxic interactions. Recognizing prosocial behavior allows for its analysis, rewarding, and promotion. Unlike toxicity, there are limited datasets, models, and resources for identifying prosocial behaviors in game-chat text. In this work, we employed unsupervised discovery combined with game domain expert collaboration to identify and categorize prosocial player behaviors from game chat. We further propose a novel Self-Anchored Attention Model (SAAM) which gives 7.9% improvement compared to the best existing technique. The approach utilizes the entire training set as "anchors" to help improve model performance under the scarcity of training data. This approach led to the development of the first automated system for classifying prosocial behaviors in in-game chats, particularly given the low-resource settings where large-scale labeled data is not available. Our methodology was applied to one of the most popular online gaming titles - Call of Duty(R): Modern Warfare(R)II, showcasing its effectiveness. This research is novel in applying NLP techniques to discover and classify prosocial behaviors in player in-game chat communication. It can help shift the focus of moderation from solely penalizing toxicity to actively encouraging positive interactions on online platforms.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2506.09259

Country:

North America > United States (0.68)
Europe (0.67)

Genre:

Research Report (1.00)
Overview (1.00)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)
(2 more...)

Add feedback

The Efficacy of Semantics-Preserving Transformations in Self-Supervised Learning for Medical Ultrasound

VanBerlo, Blake, Wong, Alexander, Hoey, Jesse, Arntfield, Robert

arXiv.org Artificial IntelligenceJun-12-2025

Data augmentation is a central component of joint embedding self-supervised learning (SSL). Approaches that work for natural images may not always be effective in medical imaging tasks. This study systematically investigated the impact of data augmentation and preprocessing strategies in SSL for lung ultrasound. Three data augmentation pipelines were assessed: (1) a baseline pipeline commonly used across imaging domains, (2) a novel semantic-preserving pipeline designed for ultrasound, and (3) a distilled set of the most effective transformations from both pipelines. Pretrained models were evaluated on multiple classification tasks: B-line detection, pleural effusion detection, and COVID-19 classification. Experiments revealed that semantics-preserving data augmentation resulted in the greatest performance for COVID-19 classification - a diagnostic task requiring global image context. Cropping-based methods yielded the greatest performance on the B-line and pleural effusion object classification tasks, which require strong local pattern recognition. Lastly, semantics-preserving ultrasound image preprocessing resulted in increased downstream performance for multiple tasks. Guidance regarding data augmentation and preprocessing strategies was synthesized for practitioners working with SSL in ultrasound.

artificial intelligence, machine learning, pipeline, (16 more...)

arXiv.org Artificial Intelligence

2504.07904

Country:

North America > Canada (0.28)
North America > United States (0.28)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.68)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Employing self-supervised learning models for cross-linguistic child speech maturity classification

Zhang, Theo, Suresh, Madurya, Warlaumont, Anne S., Hitczenko, Kasia, Cristia, Alejandrina, Cychosz, Margaret

arXiv.org Artificial IntelligenceJun-11-2025

Speech technology systems struggle with many downstream tasks for child speech due to small training corpora and the difficulties that child speech pose. We apply a novel dataset, SpeechMaturity, to state-of-the-art transformer models to address a fundamental classification task: identifying child vocalizations. Unlike previous corpora, our dataset captures maximally ecologically-valid child vocalizations across an unprecedented sample, comprising children acquiring 25+ languages in the U.S., Bolivia, Vanuatu, Papua New Guinea, Solomon Islands, and France. The dataset contains 242,004 labeled vocalizations, magnitudes larger than previous work. Models were trained to distinguish between cry, laughter, mature (consonant+vowel), and immature speech (just consonant or vowel). Models trained on the dataset outperform state-of-the-art models trained on previous datasets, achieved classification accuracy comparable to humans, and were robust across rural and urban settings.

artificial intelligence, inductive learning, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2506.08999

Country:

Europe (1.00)
North America > United States (0.89)
Oceania (0.88)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.51)

Add feedback

Gridding Forced Displacement using Semi-Supervised Learning

Wells, Andrew, Henningsen, Geraldine, Kengne, Brice Bolane Tchinde

arXiv.org Artificial IntelligenceJun-11-2025

We present a semi-supervised approach that dis-aggregates refugee statistics from administrative boundaries to 0.5-degree grid cells across 25 Sub-Saharan African countries. By integrating UN-HCR's ProGres registration data with satellite-derived building footprints from Google Open Buildings and location coordinates from Open-StreetMap Populated Places, our label spreading algorithm creates spatially explicit refugee statistics at high granularity. This methodology achieves 92.9% average accuracy in placing over 10 million refugee observations into appropriate grid cells, enabling the identification of localized displacement patterns previously obscured in broader regional and national statistics. The resulting high-resolution dataset provides a foundation for a deeper understanding of displacement drivers.

artificial intelligence, grid cell, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2506.08019

Country: Africa (1.00)

Genre: Research Report (0.82)

Industry:

Government > Regional Government (1.00)
Government > Immigration & Customs (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.42)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.42)

Add feedback

GELD: A Unified Neural Model for Efficiently Solving Traveling Salesman Problems Across Different Scales

Xiao, Yubin, Wang, Di, Cao, Rui, Wu, Xuan, Li, Boyang, Zhou, You

arXiv.org Artificial IntelligenceJun-10-2025

The Traveling Salesman Problem (TSP) is a well-known combinatorial optimization problem with broad real-world applications. Recent advancements in neural network-based TSP solvers have shown promising results. Nonetheless, these models often struggle to efficiently solve both small- and large-scale TSPs using the same set of pre-trained model parameters, limiting their practical utility. To address this issue, we introduce a novel neural TSP solver named GELD, built upon our proposed broad global assessment and refined local selection framework. Specifically, GELD integrates a lightweight Global-view Encoder (GE) with a heavyweight Local-view Decoder (LD) to enrich embedding representation while accelerating the decision-making process. Moreover, GE incorporates a novel low-complexity attention mechanism, allowing GELD to achieve low inference latency and scalability to larger-scale TSPs. Additionally, we propose a two-stage training strategy that utilizes training instances of different sizes to bolster GELD's generalization ability. Extensive experiments conducted on both synthetic and real-world datasets demonstrate that GELD outperforms seven state-of-the-art models considering both solution quality and inference speed. Furthermore, GELD can be employed as a post-processing method to significantly elevate the quality of the solutions derived by existing neural TSP solvers via spending affordable additional computing time. Notably, GELD is shown as capable of solving TSPs with up to 744,710 nodes, first-of-its-kind to solve this large size TSP without relying on divide-and-conquer strategies to the best of our knowledge.

artificial intelligence, inductive learning, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2506.06634

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.86)

Add feedback

GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents

Wu, Qianhui, Cheng, Kanzhi, Yang, Rui, Zhang, Chaoyun, Yang, Jianwei, Jiang, Huiqiang, Mu, Jian, Peng, Baolin, Qiao, Bo, Tan, Reuben, Qin, Si, Liden, Lars, Lin, Qingwei, Zhang, Huan, Zhang, Tong, Zhang, Jianbing, Zhang, Dongmei, Gao, Jianfeng

arXiv.org Artificial IntelligenceJun-4-2025

One of the principal challenges in building VLM-powered GUI agents is visual grounding, i.e., localizing the appropriate screen region for action execution based on both the visual content and the textual plans. Most existing work formulates this as a text-based coordinate generation task. However, these approaches suffer from several limitations: weak spatial-semantic alignment, inability to handle ambiguous supervision targets, and a mismatch between the dense nature of screen coordinates and the coarse, patch-level granularity of visual features extracted by models like Vision Transformers. In this paper, we propose GUI-Actor, a VLM-based method for coordinate-free GUI grounding. At its core, GUI-Actor introduces an attention-based action head that learns to align a dedicated token with all relevant visual patch tokens, enabling the model to propose one or more action regions in a single forward pass. In line with this, we further design a grounding verifier to evaluate and select the most plausible action region from the candidates proposed for action execution. Extensive experiments show that GUI-Actor outperforms prior state-of-the-art methods on multiple GUI action grounding benchmarks, with improved generalization to unseen screen resolutions and layouts. Notably, GUI-Actor-7B even surpasses UI-TARS-72B (38.1) on ScreenSpot-Pro, achieving scores of 40.7 with Qwen2-VL and 44.6 with Qwen2.5-VL as backbones. Furthermore, by incorporating the verifier, we find that fine-tuning only the newly introduced action head (~100M parameters for 7B model) while keeping the VLM backbone frozen is sufficient to achieve performance comparable to previous state-of-the-art models, highlighting that GUI-Actor can endow the underlying VLM with effective grounding capabilities without compromising its general-purpose strengths.

arxiv preprint arxiv, large language model, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2506.03143

Country:

North America > United States > Illinois > Champaign County > Urbana (0.04)
Asia > China > Jiangsu Province > Nanjing (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Graphics (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(4 more...)

Add feedback

Agnostic Learning under Targeted Poisoning: Optimal Rates and the Role of Randomness

Chornomaz, Bogdan, Koren, Yonatan, Moran, Shay, Waknine, Tom

arXiv.org Artificial IntelligenceJun-4-2025

We study the problem of learning in the presence of an adversary that can corrupt an $η$ fraction of the training examples with the goal of causing failure on a specific test point. In the realizable setting, prior work established that the optimal error under such instance-targeted poisoning attacks scales as $Θ(dη)$, where $d$ is the VC dimension of the hypothesis class arXiv:2210.02713. In this work, we resolve the corresponding question in the agnostic setting. We show that the optimal excess error is $\tildeΘ(\sqrt{dη})$, answering one of the main open problems left by Hanneke et al. To achieve this rate, it is necessary to use randomized learners: Hanneke et al. showed that deterministic learners can be forced to suffer error close to 1, even under small amounts of poisoning. Perhaps surprisingly, our upper bound remains valid even when the learner's random bits are fully visible to the adversary . In the other direction, our lower bound is stronger than standard PAC-style bounds: instead of tailoring a hard distribution separately for each sample size, we exhibit a single fixed distribution under which the adversary can enforce an excess error of $Ω(\sqrt{dη})$ infinitely often.

artificial intelligence, learner, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2506.03075

Country: North America > United States (0.46)

Genre: Research Report (0.50)

Industry: Information Technology > Security & Privacy (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.54)

Add feedback