AITopics | clutter

Collaborating Authors

clutter

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

d3aeec875c479e55d1cdeea161842ec6-AuthorFeedback.pdf

Neural Information Processing SystemsFeb-14-2026, 08:44:02 GMT

body part, dataset, part-level feature, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Vision (0.41)

Add feedback

97af07a14cacba681feacf3012730892-AuthorFeedback.pdf

Neural Information Processing SystemsFeb-13-2026, 01:57:46 GMT

detector, imagenet, objectnet, (6 more...)

Neural Information Processing Systems

Genre: Research Report (0.30)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.32)

Add feedback

Out-of-Distribution Radar Detection with Complex VAEs: Theory, Whitening, and ANMF Fusion

Rouzoumka, Yadang Alexis, Pinsolle, Jean, Terreaux, Eugénie, Morisseau, Christèle, Ovarlez, Jean-Philippe, Ren, Chengfang

arXiv.org Machine LearningJan-27-2026

We investigate the detection of weak complex-valued signals immersed in non-Gaussian, range-varying interference, with emphasis on maritime radar scenarios. The proposed methodology exploits a Complex-valued Variational AutoEncoder (CVAE) trained exclusively on clutter-plus-noise to perform Out-Of-Distribution detection. By operating directly on in-phase / quadrature samples, the CVAE preserves phase and Doppler structure and is assessed in two configurations: (i) using unprocessed range profiles and (ii) after local whitening, where per-range covariance estimates are obtained from neighboring profiles. Using extensive simulations together with real sea-clutter data from the CSIR maritime dataset, we benchmark performance against classical and adaptive detectors (MF, NMF, AMF-SCM, ANMF-SCM, ANMF-Tyler). In both configurations, the CVAE yields a higher detection probability Pd at matched false-alarm rate Pfa, with the most notable improvements observed under whitening. We further integrate the CVAE with the ANMF through a weighted log-p fusion rule at the decision level, attaining enhanced robustness in strongly non-Gaussian clutter and enabling empirically calibrated Pfa control under H0. Overall, the results demonstrate that statistical normalization combined with complex-valued generative modeling substantively improves detection in realistic sea-clutter conditions, and that the fused CVAE-ANMF scheme constitutes a competitive alternative to established model-based detectors.

cv ae, machine learning, natural language, (19 more...)

arXiv.org Machine Learning

2601.18677

Country:

Europe > France (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Italy > Lazio > Rome (0.04)

Genre: Research Report > New Finding (0.87)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Data Science (0.93)
Information Technology > Artificial Intelligence > Natural Language (0.93)

Add feedback

Invariance Co-training for Robot Visual Generalization

Yang, Jonathan, Finn, Chelsea, Sadigh, Dorsa

arXiv.org Artificial IntelligenceDec-8-2025

Abstract-- Reasoning from diverse observations is a fundamental capability for generalist robot policies to operate in a wide range of environments. Despite recent advancements, many large-scale robotic policies still remain sensitive to key sources of observational variation--such as changes in camera perspective, lighting, and the presence of distractor objects. We posit that the limited generalizability of these models arises from the substantial diversity required to robustly cover these quasistatic axes, coupled with the current scarcity of large-scale robotic datasets that exhibit rich variation across them. In this work, we propose to systematically examine what robots need to generalize across these challenging axes by introducing two key auxiliary tasks--state similarity and invariance to observational perturbations--applied to both demonstration data and static visual data. We then show that via these auxiliary tasks, leveraging both more-expensive robotic demonstration data and less-expensive, visually rich synthetic images generated from non-physics-based simulation (e.g., Unreal Engine) can lead to substantial increases in generalization to unseen camera viewpoints, lighting configurations, and distractor conditions. Our results demonstrate that co-training on this diverse data improves performance by 18% over existing generative augmentation methods. Robotic foundation models have shown impressive progress in generalizing to everyday scenarios by leveraging large-scale datasets spanning multiple embodiments, environments, and tasks [1], [2]. However, despite their breadth, the resulting models often remain brittle in real-world settings--failing to handle unseen spatial configurations of objects or adapt to drastic visual changes such as lighting and viewpoint shifts. We hypothesize that the brittleness of current robotic policies stems from insufficient coverage of key observational factors during training. For example, many large-scale datasets provide only one or two third-person perspectives per scene, limiting robustness to viewpoint shifts.

artificial intelligence, dataset, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2512.0523

Genre: Research Report > New Finding (0.54)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

CycliST: A Video Language Model Benchmark for Reasoning on Cyclical State Transitions

Kohaut, Simon, Ochs, Daniel, Zhang, Shun, Flade, Benedict, Eggert, Julian, Kersting, Kristian, Dhami, Devendra Singh

arXiv.org Artificial IntelligenceDec-2-2025

We present CycliST, a novel benchmark dataset designed to evaluate Video Language Models (VLM) on their ability for textual reasoning over cyclical state transitions. CycliST captures fundamental aspects of real-world processes by generating synthetic, richly structured video sequences featuring periodic patterns in object motion and visual attributes. CycliST employs a tiered evaluation system that progressively increases difficulty through variations in the number of cyclic objects, scene clutter, and lighting conditions, challenging state-of-the-art models on their spatio-temporal cognition. We conduct extensive experiments with current state-of-the-art VLMs, both open-source and proprietary, and reveal their limitations in generalizing to cyclical dynamics such as linear and orbital motion, as well as time-dependent changes in visual attributes like color and scale. Our results demonstrate that present-day VLMs struggle to reliably detect and exploit cyclic patterns, lack a notion of temporal understanding, and are unable to extract quantitative insights from scenes, such as the number of objects in motion, highlighting a significant technical gap that needs to be addressed. More specifically, we find no single model consistently leads in performance: neither size nor architecture correlates strongly with outcomes, and no model succeeds equally well across all tasks. By providing a targeted challenge and a comprehensive evaluation framework, CycliST paves the way for visual reasoning models that surpass the state-of-the-art in understanding periodic patterns.

artificial intelligence, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2512.01095

Country:

Europe > Germany > Hesse > Darmstadt Region > Darmstadt (0.04)
Europe > Netherlands > North Brabant > Eindhoven (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(5 more...)

Genre: Research Report > New Finding (0.68)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Gentle Object Retraction in Dense Clutter Using Multimodal Force Sensing and Imitation Learning

Brouwer, Dane, Citron, Joshua, Nolte, Heather, Bohg, Jeannette, Cutkosky, Mark

arXiv.org Artificial IntelligenceDec-2-2025

Dense collections of movable objects are common in everyday spaces-from cabinets in a home to shelves in a warehouse. Safely retracting objects from such collections is difficult for robots, yet people do it frequently, leveraging learned experience in tandem with vision and non-prehensile tactile sensing on the sides and backs of their hands and arms. We investigate the role of contact force sensing for training robots to gently reach into constrained clutter and extract objects. The available sensing modalities are (1) "eye-in-hand" vision, (2) proprioception, (3) non-prehensile triaxial tactile sensing, (4) contact wrenches estimated from joint torques, and (5) a measure of object acquisition obtained by monitoring the vacuum line of a suction cup. We use imitation learning to train policies from a set of demonstrations on randomly generated scenes, then conduct an ablation study of wrench and tactile information. We evaluate each policy's performance across 40 unseen environment configurations. Policies employing any force sensing show fewer excessive force failures, an increased overall success rate, and faster completion times. The best performance is achieved using both tactile and wrench information, producing an 80% improvement above the baseline without force information.

artificial intelligence, information, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2508.19476

Country: North America > United States > California > Santa Clara County > Palo Alto (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Robots > Manipulation (0.88)

Add feedback

Improving Robotic Manipulation Robustness via NICE Scene Surgery

Pakdamansavoji, Sajjad, Pourkeshavarz, Mozhgan, Sigal, Adam, Li, Zhiyuan, Yang, Rui Heng, Rasouli, Amir

arXiv.org Artificial IntelligenceDec-1-2025

Learning robust visuomotor policies for robotic manipulation remains a challenge in real-world settings, where visual distractors can significantly degrade performance and safety. In this work, we propose an effective and scalable framework, Naturalistic Inpainting for Context Enhancement (NICE). Our method minimizes out-of-distribution (OOD) gap in imitation learning by increasing visual diversity through construction of new experiences using existing demonstrations. By utilizing image generative frameworks and large language models, NICE performs three editing operations, object replacement, restyling, and removal of distracting (non-target) objects. These changes preserve spatial relationships without obstructing target objects and maintain action-label consistency. Unlike previous approaches, NICE requires no additional robot data collection, simulator access, or custom model training, making it readily applicable to existing robotic datasets. Using real-world scenes, we showcase the capability of our framework in producing photo-realistic scene enhancement. For downstream tasks, we use NICE data to finetune a vision-language model (VLM) for spatial affordance prediction and a vision-language-action (VLA) policy for object manipulation. Our evaluations show that NICE successfully minimizes OOD gaps, resulting in over 20% improvement in accuracy for affordance prediction in highly cluttered scenes. For manipulation tasks, success rate increases on average by 11% when testing in environments populated with distractors in different quantities. Furthermore, we show that our method improves visual robustness, lowering target confusion by 6%, and enhances safety by reducing collision rate by 7%.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2511.22777

Country: North America > Canada (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Distracted Robot: How Visual Clutter Undermine Robotic Manipulation

Rasouli, Amir, Alban, Montgomery, Pakdamansavoji, Sajjad, Li, Zhiyuan, Zhang, Zhanguang, Wu, Aaron, Zhao, Xuan

arXiv.org Artificial IntelligenceDec-1-2025

In this work, we propose an evaluation protocol for examining the performance of robotic manipulation policies in cluttered scenes. Contrary to prior works, we approach evaluation from a psychophysical perspective, therefore we use a unified measure of clutter that accounts for environmental factors as well as the distractors quantity, characteristics, and arrangement. Using this measure, we systematically construct evaluation scenarios in both hyper-realistic simulation and real-world and conduct extensive experimentation on manipulation policies, in particular vision-language-action (VLA) models. Our experiments highlight the significant impact of scene clutter, lowering the performance of the policies, by as much as 34% and show that despite achieving similar average performance across the tasks, different VLA policies have unique vulnerabilities and a relatively low agreement on success scenarios. We further show that our clutter measure is an effective indicator of performance degradation and analyze the impact of distractors in terms of their quantity and occluding influence. At the end, we show that finetuning on enhanced data, although effective, does not equally remedy all negative impacts of clutter on performance.

artificial intelligence, distractor, scenario, (15 more...)

arXiv.org Artificial Intelligence

2511.2278

Country: North America > Canada (0.04)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Robots (1.00)

Add feedback

Latent-space metrics for Complex-Valued VAE out-of-distribution detection under radar clutter

Rouzoumka, Y. A., Terreaux, E., Morisseau, C., Ovarlez, J. -P., Ren, C.

arXiv.org Machine LearningNov-26-2025

We therefore pursue a data-driven alternative based on complex-valued V AEs and latent-space OOD scores. In recent years, data-driven approaches have emerged to alleviate the need for precise clutter modeling. Among them, V AEs [4] have demonstrated promising capabilities for anomaly and OOD detection in diverse applications, including radar detection [5], speech enhancement [6], medical imaging [7], industrial monitoring [8], and acoustic signal analysis [9]. These models learn a latent representation of the training data and use reconstruction or probabilistic criteria to detect deviations. Despite their effectiveness, most V AE-based detectors operate in the real domain and often treat complex-valued radar data by separating real and imaginary components into distinct channels. Recent advances in Complex-V alued Neural Networks (CVNNs) have shown the benefits of directly modeling complex-valued signals [10, 11].

detection, detector, mahalanobis, (15 more...)

arXiv.org Machine Learning

2511.19805

Country: