AITopics

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.15)
North America > United States > Arkansas > Washington County > Fayetteville (0.15)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.15)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)

Neural Information Processing SystemsFeb-17-2026, 04:44:06 GMT

Supplementary Fairness Continual Learning Approach to Semantic Scene Understanding in Open-World Environments Thanh-Dat Truong

Contrastive Clustering loss and update the prototypical vectors.Algorithm 1: Prototypical Constrative Clustering Loss Compute Prototypical Constrative Clustering Loss based on Eqn. Compute Prototypical Constrative Clustering Loss based on Eqn. Two segmentation network architectures have been used in our experiments, i.e., (1) DeepLab-V3 The learning rate is set individually for each step and dataset. Similarly, to illustrate the effectiveness and robustness of our method in the non-incremental setting. We also perform an additional ablation study on the ADE20K (100-50) benchmark to investigate the impact of the delta.

artificial intelligence, machine learning, segmentation, (13 more...)

Country:

North America > United States > Arkansas > Washington County > Fayetteville (0.14)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)

Genre:

Workflow (0.49)
Research Report > New Finding (0.35)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)

Neural Information Processing SystemsFeb-17-2026, 04:44:05 GMT

Fairness Continual Learning Approach to Semantic Scene Understanding in Open-World Environments Thanh-Dat Truong

Continual semantic segmentation aims to learn new classes while maintaining the information from the previous classes.

artificial intelligence, machine learning, segmentation, (12 more...)

Country:

North America > United States > Arkansas > Washington County > Fayetteville (0.14)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Europe > Spain > Andalusia > Granada Province > Granada (0.04)

Genre:

Workflow (0.47)
Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Neural Information Processing SystemsFeb-16-2026, 23:49:49 GMT

9d66f74820f11ce037fb5f711ab9acd4-Paper-Conference.pdf

large language model, machine learning, natural language, (21 more...)

Country:

North America > United States > Arkansas > Washington County > Fayetteville (0.04)
Europe > Switzerland (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

arXiv.org Artificial IntelligenceNov-17-2025

DualVision ArthroNav: Investigating Opportunities to Enhance Localization and Reconstruction in Image-based Arthroscopy Navigation via External Cameras

Shu, Hongchao, Seenivasan, Lalithkumar, Liu, Mingxu, Hwang, Yunseo, Ku, Yu-Chun, Knopf, Jonathan, Martin-Gomez, Alejandro, Armand, Mehran, Unberath, Mathias

Arthroscopic procedures can greatly benefit from navigation systems that enhance spatial awareness, depth perception, and field of view. However, existing optical tracking solutions impose strict workspace constraints and disrupt surgical workflow. Vision-based alternatives, though less invasive, often rely solely on the monocular arthroscope camera, making them prone to drift, scale ambiguity, and sensitivity to rapid motion or occlusion. We propose DualVision ArthroNav, a multi-camera arthroscopy navigation system that integrates an external camera rigidly mounted on the arthroscope. The external camera provides stable visual odometry and absolute localization, while the monocular arthroscope video enables dense scene reconstruction. By combining these complementary views, our system resolves the scale ambiguity and long-term drift inherent in monocular SLAM and ensures robust relocalization. Experiments demonstrate that our system effectively compensates for calibration errors, achieving an average absolute trajectory error of 1.09 mm. The reconstructed scenes reach an average target registration error of 2.16 mm, with high visual fidelity (SSIM = 0.69, PSNR = 22.19). These results indicate that our system provides a practical and cost-efficient solution for arthroscopic navigation, bridging the gap between optical tracking and purely vision-based systems, and paving the way toward clinically deployable, fully vision-based arthroscopic guidance.

artificial intelligence, dualvision arthronav system, reconstruction, (8 more...)

2511.10699

Country: North America > United States > Arkansas > Washington County > Fayetteville (0.15)

Genre: Research Report (0.65)

Industry: Health & Medicine > Surgery (0.65)

Technology: Information Technology > Artificial Intelligence > Vision (1.00)

arXiv.org Artificial IntelligenceNov-11-2025

Fine-Tuning Vision-Language Models for Multimodal Polymer Property Prediction

Vuong, An, Van, Minh-Hao, Verma, Prateek, Zhao, Chen, Wu, Xintao

Vision-Language Models (VLMs) have shown strong performance in tasks like visual question answering and multimodal text generation, but their effectiveness in scientific domains such as materials science remains limited. While some machine learning methods have addressed specific challenges in this field, there is still a lack of foundation models designed for broad tasks like polymer property prediction using multimodal data. In this work, we present a multimodal polymer dataset to fine-tune VLMs through instruction-tuning pairs and assess the impact of multimodality on prediction performance. Our fine-tuned models, using LoRA, outperform unimodal and baseline approaches, demonstrating the benefits of multimodal learning. Additionally, this approach reduces the need to train separate models for different properties, lowering deployment and maintenance costs.

large language model, machine learning, natural language, (17 more...)

2511.05577

Country: North America > United States > Arkansas > Washington County > Fayetteville (0.15)

Genre: Research Report > New Finding (0.46)

Industry: Government > Regional Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Agorku, Geoffery, Hernandez, Sarah, Hames, Hayley, Wagner, Cade

Predicting Barge Tow Size on Inland Waterways Using Vessel Trajectory Derived Features: Proof of Concept

arXiv.org Artificial IntelligenceOct-29-2025

Accurate, real-time estimation of barge quantity on inland waterways remains a critical challenge due to the non-self-propelled nature of barges and the limitations of existing monitoring systems. This study introduces a novel method to use Automatic Identification System (AIS) vessel tracking data to predict the number of barges in tow using Machine Learning (ML). To train and test the model, barge instances were manually annotated from satellite scenes across the Lower Mississippi River. Labeled images were matched to AIS vessel tracks using a spatiotemporal matching procedure. A comprehensive set of 30 AIS-derived features capturing vessel geometry, dynamic movement, and trajectory patterns were created and evaluated using Recursive Feature Elimination (RFE) to identify the most predictive variables. Six regression models, including ensemble, kernel-based, and generalized linear approaches, were trained and evaluated. The Poisson Regressor model yielded the best performance, achieving a Mean Absolute Error (MAE) of 1.92 barges using 12 of the 30 features. The feature importance analysis revealed that metrics capturing vessel maneuverability such as course entropy, speed variability and trip length were most predictive of barge count. The proposed approach provides a scalable, readily implementable method for enhancing Maritime Domain Awareness (MDA), with strong potential applications in lock scheduling, port management, and freight planning. Future work will expand the proof of concept presented here to explore model transferability to other inland rivers with differing operational and environmental conditions.

data mining, hernandez, machine learning, (15 more...)

2510.23994

Country: North America > United States > Arkansas > Washington County > Fayetteville (0.14)

Genre: Research Report (1.00)

Industry:

Transportation (1.00)
Government > Regional Government > North America Government > United States Government (0.93)
Government > Military (0.68)
Education (0.68)

Technology:

Information Technology > Sensing and Signal Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Data Science > Data Mining (0.93)

Neural Information Processing SystemsOct-10-2025, 11:21:24 GMT

HENASY: Learning to Assemble Scene-Entities for Interpretable Egocentric Video-Language Model

In this paper, we take an inspiration from human perception and explore a compositional approach for egocentric video representation.

computer vision, encoder, representation, (16 more...)

Country:

North America > United States > Arkansas > Washington County > Fayetteville (0.04)
Europe > Switzerland (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

arXiv.org Artificial IntelligenceMay-15-2025

ChicGrasp: Imitation-Learning based Customized Dual-Jaw Gripper Control for Delicate, Irregular Bio-products Manipulation

Davar, Amirreza, Xu, Zhengtong, Mahmoudi, Siavash, Sohrabipour, Pouya, Pallerla, Chaitanya, She, Yu, Shou, Wan, Crandall, Philip, Wang, Dongyi

--Automated poultry processing lines still rely on humans to lift slippery, easily bruised carcasses onto a shackle conveyor . Deformability, anatomical variance, and strict hygiene rules make conventional suction and scripted motions unreliable. An independently actuated dual-jaw pneumatic gripper clamps both chicken legs, while a conditional diffusion-policy controller, trained from only 50 multi-view teleoperation demonstrations (RGB + proprioception), plans 5-DoF end-effector motion, which includes jaw commands in one shot. On individually presented raw broiler carcasses, our system achieves a 40.6% grasp-and-lift success rate and completes the pick-to-shackle cycle in 38 s, whereas state-of-the-art implicit behaviour cloning (IBC) and LSTM-GMM baselines fail entirely. All CAD, code, and datasets will be open-source. ChicGrasp shows that imitation learning can bridge the gap between rigid hardware and variable bio-products, offering a reproducible benchmark and a public dataset for researchers in agricultural engineering and robot learning. OBOTS and intelligent agents are increasingly deployed in unstructured, dynamic environments where manual programming struggles to capture the intricacies of real-world tasks [1].

artificial intelligence, machine learning, survey article, (19 more...)

2505.08986

Country:

North America > United States > Arkansas > Washington County > Fayetteville (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States > Indiana > Tippecanoe County > West Lafayette (0.04)
(5 more...)

Genre:

Research Report (0.64)
Overview (0.46)

Industry:

Government > Regional Government > North America Government > United States Government (1.00)
Food & Agriculture > Agriculture (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Robots > Manipulation (0.93)

Lin, Zirui, Takigahira, Masayuki, Terakado, Naoya, Gulzar, Haris, Busto, Monikka Roslianna, Eda, Takeharu, Itoyama, Katsutoshi, Nakadai, Kazuhiro, Amano, Hideharu

An Efficient GPU-based Implementation for Noise Robust Sound Source Localization

arXiv.org Artificial IntelligenceMay-9-2025

Dept. of Information and Computer Science, Keio University, Kanagawa, Japan Email: hunga@am.ics.keio.ac.jp Abstract --Robot audition, encompassing Sound Source Localization (SSL), Sound Source Separation (SSS), and Automatic Speech Recognition (ASR), enables robots and smart devices to acquire auditory capabilities similar to human hearing. Despite their wide applicability, processing multi-channel audio signals from microphone arrays in SSL involves computationally intensive matrix operations, which can hinder efficient deployment on Central Processing Units (CPUs), particularly in embedded systems with limited CPU resources. This paper introduces a GPU-based implementation of SSL for robot audition, utilizing the Generalized Singular V alue Decomposition-based Multiple Signal Classification (GSVD-MUSIC), a noise-robust algorithm, within the HARK platform, an open-source software suite. For a 60-channel microphone array, the proposed implementation achieves significant performance improvements. On the Jet-son AGX Orin, an embedded device powered by an NVIDIA GPU and ARM Cortex -A78AE v8.2 64-bit CPUs, we observe speedups of 5648.7 for GSVD calculations and 10.7 for the SSL module, while speedups of 4245.1 for GSVD calculation and 17.3 for the entire SSL module on a server configured with an NVIDIA A100 GPU and AMD EPYC 7352 CPUs, making real-time processing feasible for large-scale microphone arrays and providing ample capacity for real-time processing of potential subsequent machine learning or deep leraning tasks. I NTRODUCTION Audition is a critical aspect of human inter-individual communication [1].

artificial intelligence, implementation, speech recognition, (16 more...)

2504.03373

Country:

Asia > Japan > Honshū > Kantō > Kanagawa Prefecture (0.24)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
(7 more...)

Genre: Research Report (0.40)

Industry: Information Technology (0.87)

Technology:

Information Technology > Hardware (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.68)