AITopics | ilm

Collaborating Authors

ilm

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Inverse Language Modeling towards Robust and Grounded LLMs

Gabrielli, Davide, Sestito, Simone, Masi, Iacopo

arXiv.org Artificial IntelligenceOct-3-2025

The current landscape of defensive mechanisms for LLMs is fragmented and underdeveloped, unlike prior work on classifiers. To further promote adversarial robustness in LLMs, we propose Inverse Language Modeling (ILM), a unified framework that simultaneously 1) improves the robustness of LLMs to input perturbations, and, at the same time, 2) enables native grounding by inverting model outputs to identify potentially toxic or unsafe input triggers. ILM transforms LLMs from static generators into analyzable and robust systems, potentially helping RED teaming. ILM can lay the foundation for next-generation LLMs that are not only robust and grounded but also fundamentally more controllable and trustworthy. The code is publicly available at github.com/davegabe/pag-llm.

artificial intelligence, large language model, natural language, (15 more...)

arXiv.org Artificial Intelligence

2510.01929

Country: North America > United States > Minnesota (0.28)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

CLAP: Clustering to Localize Across n Possibilities, A Simple, Robust Geometric Approach in the Presence of Symmetries

Fernandez, Gabriel I., Hou, Ruochen, Xu, Alex, Togashi, Colin, Hong, Dennis W.

arXiv.org Artificial IntelligenceSep-11-2025

Abstract-- In this paper, we present our localization method called CLAP, Clustering to Localize Across n Possibilities, which helped us win the RoboCup 2024 adult-sized autonomous humanoid soccer competition. In addition, our robot had to deal with varying lighting conditions, dynamic feature occlusions, noise from high-impact stepping, and mistaken features from bystanders and neighboring fields. Therefore, we needed an accurate, and most importantly robust localization algorithm that would be the foundation for our path-planning and game-strategy algorithms. CLAP achieves these requirements by clustering estimated states of our robot from pairs of field features to localize its global position and orientation. Correct state estimates naturally cluster together, while incorrect estimates spread apart, making CLAP resilient to noise and incorrect inputs. CLAP is paired with a particle filter and an extended Kalman filter to improve consistency and smoothness. T ests of CLAP with other landmark-based localization methods showed similar accuracy. However, tests with increased false positive feature detection showed that CLAP outperformed other methods in terms of robustness with very little divergence and velocity jumps. Our localization performed well in competition, allowing our robot to shoot faraway goals and narrowly defend our goal. Every year, the Robocup Federation hosts a humanoid soccer competition in hopes of one day playing a live match of robots versus humans. To ensure a fair match, rules are put in place such that robots must be able to play autonomously, be of similar physiological proportions to a human, and only be equipped with sensors that have biological equivalents.

artificial intelligence, landmark, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2509.08495

Country: North America > United States > California > Los Angeles County > Los Angeles (0.28)

Genre: Research Report (0.50)

Industry: Leisure & Entertainment > Sports > Soccer (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.94)

Add feedback

Insertion Language Models: Sequence Generation with Arbitrary-Position Insertions

Patel, Dhruvesh, Sahoo, Aishwarya, Amballa, Avinash, Naseem, Tahira, Rudner, Tim G. J., McCallum, Andrew

arXiv.org Artificial IntelligenceSep-4-2025

Autoregressive models (ARMs), which predict subsequent tokens one-by-one ``from left to right,'' have achieved significant success across a wide range of sequence generation tasks. However, they struggle to accurately represent sequences that require satisfying sophisticated constraints or whose sequential dependencies are better addressed by out-of-order generation. Masked Diffusion Models (MDMs) address some of these limitations, but the process of unmasking multiple tokens simultaneously in MDMs can introduce incoherences, and MDMs cannot handle arbitrary infilling constraints when the number of tokens to be filled in is not known in advance. In this work, we introduce Insertion Language Models (ILMs), which learn to insert tokens at arbitrary positions in a sequence -- that is, they select jointly both the position and the vocabulary element to be inserted. By inserting tokens one at a time, ILMs can represent strong dependencies between tokens, and their ability to generate sequences in arbitrary order allows them to accurately model sequences where token dependencies do not follow a left-to-right sequential structure. To train ILMs, we propose a tailored network parameterization and use a simple denoising objective. Our empirical evaluation demonstrates that ILMs outperform both ARMs and MDMs on common planning tasks. Furthermore, we show that ILMs outperform MDMs and perform on par with ARMs in an unconditional text generation task while offering greater flexibility than MDMs in arbitrary-length text infilling. The code is available at: https://dhruveshp.com/projects/ilm .

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2505.05755

Country: North America > United States (0.46)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Label-Context-Dependent Internal Language Model Estimation for CTC

Yang, Zijian, Phan, Minh-Nghia, Schlüter, Ralf, Ney, Hermann

arXiv.org Artificial IntelligenceJun-9-2025

Although connectionist temporal classification (CTC) has the label context independence assumption, it can still implicitly learn a context-dependent internal language model (ILM) due to modern powerful encoders. In this work, we investigate the implicit context dependency modeled in the ILM of CTC. To this end, we propose novel context-dependent ILM estimation methods for CTC based on knowledge distillation (KD) with theoretical justifications. Furthermore, we introduce two regularization methods for KD. We conduct experiments on Librispeech and TED-LIUM Release 2 datasets for in-domain and cross-domain evaluation, respectively. Experimental results show that context-dependent ILMs outperform the context-independent priors in cross-domain evaluation, indicating that CTC learns a context-dependent ILM. The proposed label-level KD with smoothing method surpasses other ILM estimation approaches, with more than 13% relative improvement in word error rate compared to shallow fusion.

ilm, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2506.06096

Country: Europe > Germany (0.14)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Fast and Robust Localization for Humanoid Soccer Robot via Iterative Landmark Matching

Hou, Ruochen, Zhu, Mingzhang, Nam, Hyunwoo, Fernandez, Gabriel I., Hong, Dennis W.

arXiv.org Artificial IntelligenceMar-13-2025

Accurate robot localization is essential for effective operation. Monte Carlo Localization (MCL) is commonly used with known maps but is computationally expensive due to landmark matching for each particle. Humanoid robots face additional challenges, including sensor noise from locomotion vibrations and a limited field of view (FOV) due to camera placement. This paper proposes a fast and robust localization method via iterative landmark matching (ILM) for humanoid robots. The iterative matching process improves the accuracy of the landmark association so that it does not need MCL to match landmarks to particles. Pose estimation with the outlier removal process enhances its robustness to measurement noise and faulty detections. Furthermore, an additional filter can be utilized to fuse inertial data from the inertial measurement unit (IMU) and pose data from localization. We compared ILM with Iterative Closest Point (ICP), which shows that ILM method is more robust towards the error in the initial guess and easier to get a correct matching. We also compared ILM with the Augmented Monte Carlo Localization (aMCL), which shows that ILM method is much faster than aMCL and even more accurate. The proposed method's effectiveness is thoroughly evaluated through experiments and validated on the humanoid robot ARTEMIS during RoboCup 2024 adult-sized soccer competition.

landmark, localization, robot, (16 more...)

arXiv.org Artificial Intelligence

2503.1102

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
North America > United States > Washington > King County > Seattle (0.04)

Genre: Research Report (0.82)

Industry: Leisure & Entertainment > Sports > Soccer (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots > Soccer Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.94)

Add feedback

FilmAgent: A Multi-Agent Framework for End-to-End Film Automation in Virtual 3D Spaces

Xu, Zhenran, Wang, Longyue, Wang, Jifang, Li, Zhouyi, Shi, Senbao, Yang, Xue, Wang, Yiyu, Hu, Baotian, Yu, Jun, Zhang, Min

arXiv.org Artificial IntelligenceJan-22-2025

Virtual film production requires intricate decision-making processes, including scriptwriting, virtual cinematography, and precise actor positioning and actions. Motivated by recent advances in automated decision-making with language agent-based societies, this paper introduces FilmAgent, a novel LLM-based multi-agent collaborative framework for end-to-end film automation in our constructed 3D virtual spaces. FilmAgent simulates various crew roles, including directors, screenwriters, actors, and cinematographers, and covers key stages of a film production workflow: (1) idea development transforms brainstormed ideas into structured story outlines; (2) scriptwriting elaborates on dialogue and character actions for each scene; (3) cinematography determines the camera setups for each shot. A team of agents collaborates through iterative feedback and revisions, thereby verifying intermediate scripts and reducing hallucinations. We evaluate the generated videos on 15 ideas and 4 key aspects. Human evaluation shows that FilmAgent outperforms all baselines across all aspects and scores 3.98 out of 5 on average, showing the feasibility of multi-agent collaboration in filmmaking. Further analysis reveals that FilmAgent, despite using the less advanced GPT-4o model, surpasses the single-agent o1, showing the advantage of a well-coordinated multi-agent system. Lastly, we discuss the complementary strengths and weaknesses of OpenAI's text-to-video model Sora and our FilmAgent in filmmaking.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2501.12909

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Florida > Miami-Dade County > Miami (0.04)
Asia > Singapore (0.04)
(3 more...)

Genre:

Workflow (1.00)
Research Report (0.82)

Industry:

Media > Film (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

An iterated learning model of language change that mixes supervised and unsupervised learning

Bunyan, Jack, Bullock, Seth, Houghton, Conor

arXiv.org Artificial IntelligenceJun-15-2024

The iterated learning model is an agent-based model of language change in which language is transmitted from a tutor to a pupil which itself becomes a tutor to a new pupil, and so on. Languages that are stable, expressive, and compositional arise spontaneously as a consequence of a language transmission bottleneck. Previous models have implemented an agent's mapping from signals to meanings using an artificial neural network decoder, but have relied on an unrealistic and computationally expensive process of obversion to implement the associated encoder, mapping from meanings to signals. Here, a new model is presented in which both decoder and encoder are neural networks, trained separately through supervised learning, and trained together through unsupervised learning in the form of an autoencoder. This avoids the substantial computational burden entailed in obversion and introduces a mixture of supervised and unsupervised learning as observed during human development.

ilm, neural network, semi-supervised ilm, (16 more...)

arXiv.org Artificial Intelligence

2405.20818

Country:

Europe > United Kingdom > England > Bristol (0.04)
Oceania > Samoa (0.04)
North America > United States > Texas > Travis County > Austin (0.04)
(3 more...)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Modeling language contact with the Iterated Learning Model

Bullock, Seth, Houghton, Conor

arXiv.org Artificial IntelligenceJun-10-2024

Contact between languages has the potential to transmit vocabulary and other language features; however, this does not always happen. Here, an iterated learning model is used to examine, in a simple way, the resistance of languages to change during language contact. Iterated learning models are agent-based models of language change, they demonstrate that languages that are expressive and compositional arise spontaneously as a consequence of a language transmission bottleneck. A recently introduced type of iterated learning model, the Semi-Supervised ILM is used to simulate language contact. These simulations do not include many of the complex factors involved in language contact and do not model a population of speakers; nonetheless the model demonstrates that the dynamics which lead languages in the model to spontaneously become expressive and compositional, also cause a language to maintain its core traits even after mixing with another language.

language contact, simulation, vector, (15 more...)

arXiv.org Artificial Intelligence

2406.06878

Country:

Europe > Ireland (0.05)
North America > United States > District of Columbia > Washington (0.04)
North America > United States > California (0.04)
(2 more...)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.96)

Add feedback

Integrating Language Models into Direct Speech Translation: An Inference-Time Solution to Control Gender Inflection

Fucci, Dennis, Gaido, Marco, Papi, Sara, Cettolo, Mauro, Negri, Matteo, Bentivogli, Luisa

arXiv.org Artificial IntelligenceOct-24-2023

When translating words referring to the speaker, speech translation (ST) systems should not resort to default masculine generics nor rely on potentially misleading vocal traits. Rather, they should assign gender according to the speakers' preference. The existing solutions to do so, though effective, are hardly feasible in practice as they involve dedicated model re-training on gender-labeled ST data. To overcome these limitations, we propose the first inference-time solution to control speaker-related gender inflections in ST. Our approach partially replaces the (biased) internal language model (LM) implicitly learned by the ST decoder with gender-specific external LMs. Experiments on en->es/fr/it show that our solution outperforms the base models and the best training-time mitigation strategy by up to 31.0 and 1.6 points in gender accuracy, respectively, for feminine forms. The gains are even larger (up to 32.0 and 3.4) in the challenging condition where speakers' vocal traits conflict with their gender.

computational linguistic, gender, translation, (14 more...)

arXiv.org Artificial Intelligence

2310.15752

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Czechia > South Moravian Region > Brno (0.04)
Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
(17 more...)

Genre: Research Report > Experimental Study (0.68)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

Towards Deep Learning Guided Autonomous Eye Surgery Using Microscope and iOCT Images

Kim, Ji Woong, Wei, Shuwen, Zhang, Peiyao, Gehlbach, Peter, Kang, Jin U., Iordachita, Iulian, Kobilarov, Marin

arXiv.org Artificial IntelligenceJul-27-2023

Recent advancements in retinal surgery have paved the way for a modern operating room equipped with a surgical robot, a microscope, and intraoperative optical coherence tomography (iOCT)- a depth sensor widely used in retinal surgery. Integrating these tools raises the fundamental question of how to effectively combine them to enable surgical autonomy. In this work, we tackle this question by developing a unified framework that facilitates real-time autonomous surgical workflows leveraging these devices. The system features: (1) a novel imaging system that integrates the microscope and iOCT in real-time by dynamically tracking the surgical instrument via a small iOCT scanning region, providing real-time depth feedback; (2) implementation of convolutional neural networks (CNN) that automatically detect and segment task-relevant information for surgical autonomy; (3) intuitive selection of goal waypoints within both the microscope and iOCT views through simple mouse-click interactions; and (4) integration of model predictive control (MPC) for trajectory generation, ensuring patient safety by implementing safety-related kinematic constraints. The system's utility is demonstrated by automating subretinal injection (SI), a challenging procedure with high accuracy and depth perception requirements. We validate our system by conducting 30 successful SI trials on pig eyes, achieving mean needle insertion accuracy of 26 micrometers to various subretinal goals and mean duration of 55 seconds. Preliminary comparisons to a human operator performing SI in robot-assisted mode highlight the enhanced safety of our system. Project website is here: https://sites.google.com/view/eyesurgerymicroscopeoct/home

artificial intelligence, ilm, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2306.10127

Country: North America > United States > Maryland > Baltimore (0.04)

Genre:

Research Report (0.64)
Workflow (0.50)

Industry:

Health & Medicine > Surgery (1.00)
Health & Medicine > Therapeutic Area > Ophthalmology/Optometry (0.82)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.84)

Add feedback