AITopics | Industry

Collaborating Authors

Industry

RL Zero: Direct Policy Inference from Language Without In-Domain Supervision

Harshit Sikchi,Siddhant Agarwal,Pranaya Jajoo, Samyak Parajuli, Caleb Chuck,Max Rudolph,Peter Stone, Amy Zhang, Scott Niekum

Neural Information Processing SystemsJun-18-2026, 17:01:03 GMT

The reward hypothesis states that all goals and purposes can be understood as the maximization of a received scalar reward signal. However, in practice, defining such a reward signal is notoriously difficult, as humans are often unable to predict the optimal behavior corresponding to a reward function. Natural language offers an intuitive alternative for instructing reinforcement learning (RL) agents, yet previous language-conditioned approaches either require costly supervision or test-time training given a language instruction. In this work, we present a new approach that uses a pretrained RL agent trained using only unlabeled, offline interactions--without task-specific supervision or labeled trajectories--to get zero-shot test-time policy inference from arbitrary natural language instructions. We introduce a framework comprising three steps: imagine, project, and imitate.

large language model, machine learning, reinforcement learning, (17 more...)

Neural Information Processing Systems

Country: North America > United States (0.93)

Genre: Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Consumer Health (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

6 in 10 identity crimes now begin with a new account

FOX NewsJun-18-2026, 16:52:26 GMT

Emily Vranic and Heather Marquis pleaded guilty to bank fraud after using stolen mail to open credit cards in victims' names, stealing nearly $229,000 from banks and customers.

artificial intelligence, credit card, social media, (10 more...)

FOX News

Country: North America > United States > Texas (0.14)

Industry:

Media > News (1.00)
Government > Regional Government > North America Government > United States Government (0.96)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence (1.00)

Add feedback

Israel kills at least three Palestinians in Gaza City drone strike

Al JazeeraJun-18-2026, 16:50:58 GMT

'This is an apartheid regime' Does Trump have real leverage over Netanyahu? At least three Palestinians have been killed and several others wounded after an Israeli drone struck a vehicle near Abu Khadra Mosque in the Rimal neighbourhood of western Gaza City, according to medical sources. Al Jazeera's Hind Khoudary, reporting from Gaza City, said the attack on Thursday was the first explosion in the area after a few "calm and quiet" days. What to know about Colombia's run-off election "Only one of the three victims has been identified: Abdul Jawad Abu Lebn [who] was set to get married next week. Wedding invitations were found inside the car."

artificial intelligence, live navigation menu news show, news section africa asia us, (7 more...)

Al Jazeera

Country: Asia > Middle East > Palestine > Gaza Strip > Gaza Governorate > Gaza (1.00)

Industry:

Law > Civil Rights & Constitutional Law (0.71)
Government > Regional Government > North America Government > United States Government (0.49)
Government > Regional Government > Asia Government > Middle East Government > Israel Government (0.30)

Technology: Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (0.41)

Add feedback

DRIFT: Dynamic Rule-Based Defense with Injection Isolation for Securing LLMAgents

Neural Information Processing SystemsJun-18-2026, 16:49:33 GMT

Large Language Models (LLMs) are increasingly central to agentic systems due to their strong reasoning and planning capabilities. By interacting with external environments through predefined tools, these agents can carry out complex user tasks. Nonetheless, this interaction also introduces the risk of prompt injection attacks, where malicious inputs from external sources can mislead the agent's behavior, potentially resulting in economic loss, privacy leakage, or system compromise. System-level defenses have recently shown promise by enforcing static or predefined policies, but they still face two key challenges: the ability to dynamically update security rules and the need for memory stream isolation. To address these challenges, we propose DRIFT, a Dynamic Rule-based Isolation Framework for Trustworthy agentic systems, which enforces both control-and data-level constraints. ASecure Planner first constructs a minimal function trajectory and a JSON-schema-style parameter checklist for each function node based on the user query. ADynamic Validator then monitors deviations from the original plan, assessing whether changes comply with privilege limitations and the user's intent. Finally, an Injection Isolator detects and masks any instructions that may conflict with the user query from the memory stream to mitigate long-term risks.

large language model, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.93)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

PT-MoE: An Efficient Finetuning Framework for Integrating Mixture-of-Experts into Prompt Tuning

Neural Information Processing SystemsJun-18-2026, 16:49:14 GMT

Parameter-efficient fine-tuning (PEFT) methods have shown promise in adapting large language models, yet existing approaches exhibit counter-intuitive phenomena: integrating either matrix decomposition or mixture-of-experts (MoE) individually decreases performance across tasks, though decomposition improves results on specific domains despite reducing parameters, while MoE increases parameter count without corresponding decrease in training efficiency. Motivated by these observations and the modular nature of PT, we propose PT-MoE, a novel framework that integrates matrix decomposition with MoE routing for efficient PT. Evaluation results across 17 datasets demonstrate that PT-MoE achieves state-of-the-art performance in both question answering (QA) and mathematical problem solving tasks, improving F1 score by 1.49 points over PT and 2.13 points over LoRA in QA tasks, while improving mathematical accuracy by 10.75 points over PT and 0.44 points over LoRA, all while using 25% fewer parameters than LoRA. Our analysis reveals that while PT methods generally excel in QA tasks and LoRA-based methods in math datasets, the integration of matrix decomposition and MoE in PT-MoE yields complementary benefits: decomposition enables efficient parameter sharing across experts while MoE provides dynamic adaptation, collectively enabling PT-MoE to demonstrate cross-task consistency and generalization abilities. These findings, along with ablation studies on routing mechanisms and architectural components, provide insights for future PEFT methods. 1

computational linguistic, information retrieval, large language model, (17 more...)

Neural Information Processing Systems

Country:

Europe (1.00)
Asia (0.93)
North America > United States > Minnesota (0.28)

Genre:

Research Report > Experimental Study (1.00)
Overview (0.93)

Industry: Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.67)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.48)

Add feedback

Volvo XC60 crashes into a 793-pound moose dummy

Crash testing with these massive mammals has come a long way from using real cadavers. More information Adding us as a Preferred Source in Google by using this link indicates that you would like to see more of our content in Google News results. The moose crash test dummy helping Volvo engineers in Sweden build cars. Breakthroughs, discoveries, and DIY tips sent six days a week. By signing up, you confirm you are 16+, will receive newsletters and promotional content and agree to our Terms of Use and acknowledge the data practices in our Privacy Policy .

artificial intelligence, moose, physics popular science video space, (12 more...)

Popular Science

Country: North America > United States (0.69)

Industry:

Transportation > Ground > Road (1.00)
Automobiles & Trucks > Manufacturer (0.95)

Technology: Information Technology > Artificial Intelligence > Robots (0.47)

Add feedback

URB - Urban Routing Benchmark for RL-equipped Connected Autonomous Vehicles

Neural Information Processing SystemsJun-18-2026, 16:29:16 GMT

Connected Autonomous Vehicles (CAVs) promise to reduce congestion in future urban networks, potentially by optimizing their routing decisions. Unlike for human drivers, these decisions can be made with collective, data-driven policies, developed using machine learning algorithms. Reinforcement learning (RL) can facilitate the development of such collective routing strategies, yet standardized and realistic benchmarks are missing.

machine learning, reinforcement learning, scenario 1, (18 more...)

Neural Information Processing Systems

Country:

Europe (1.00)
North America > United States (0.67)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Overview (0.67)

Industry:

Transportation > Infrastructure & Services (0.68)
Transportation > Ground > Road (0.68)
Consumer Products & Services > Travel (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

gAttention Sinks: A ' Catch, Tag, Release ' Mechanism for Embeddings

Neural Information Processing SystemsJun-18-2026, 16:28:16 GMT

Large language models (LLMs) often concentrate their attention on a few specific tokens referred to as attention sinks. Common examples include the first token, a prompt-independent sink, and punctuation tokens, which are prompt-dependent. While the tokens causing the sinks often lack direct semantic meaning, the presence of the sinks is critical for model performance, particularly under model compression and KV-caching. Despite their ubiquity, the function, semantic role, and origin of attention sinks--especially those beyond the first token--remain poorly understood. In this work, we conduct a comprehensive investigation demonstrating that attention sinks: catch a sequence of tokens, tag them using a common direction in embedding space, and release them back into the residual stream, where tokens are later retrieved based on the tags they have acquired. Probing experiments reveal these tags carry semantically meaningful information, such as the truth of a statement. These findings extend to reasoning models, where the mechanism spans more heads and explains greater variance in embeddings, or recent models with querykey normalization, where sinks remain just as prevalent. To encourage future theoretical analysis, we introduce a minimal problem which can be solved through the'catch, tag, release' mechanism, and where it emerges through training.

large language model, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country:

Asia (0.93)
North America > United States (0.93)
North America > Canada > Ontario (0.28)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.92)

Industry: Government (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Structured Sparse Transition Matrices to Enable State Tracking in State-Space Models

Neural Information Processing SystemsJun-18-2026, 16:27:27 GMT

Modern state-space models (SSMs) often utilize structured transition matrices which enable efficient computation but pose restrictions on the model's expressivity, as measured in terms of the ability to emulate finite-state automata (FSA). While unstructured transition matrices are optimal in terms of expressivity, they come at a prohibitively high compute and memory cost, even for moderate state sizes. We propose a structured sparse parametrization of transition matrices in SSMs that enables FSA state tracking with provably optimal state size and depth, while keeping the computational cost of the recurrence comparable to that of diagonal SSMs.

artificial intelligence, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country:

North America > United States (0.45)
Europe (0.28)

Genre: Research Report > Experimental Study (1.00)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.92)

Add feedback

From Information to Generative Exponent: Learning Rate Induces Phase Transitions in SGD

Neural Information Processing SystemsJun-18-2026, 16:22:24 GMT

To understand feature learning dynamics in neural networks, recent theoretical works have focused on gradient-based learning of Gaussian single-index models, where the label is a nonlinear function of a latent one-dimensional projection of the input. While the sample complexity of online SGD is determined by the information exponent of the link function, recent works improved this by performing multiple gradient steps on the same sample with different learning rates -- yielding a non-correlational update rule -- and instead are limited by the (potentially much smaller) generative exponent. However, this picture is only valid when these learning rates are sufficiently large. In this paper, we characterize the relationship between learning rate(s) and sample complexity for a broad class of gradient-based algorithms that encapsulates both correlational and non-correlational updates. We demonstrate that, in certain cases, there is a phase transition from an "information exponent regime" with small learning rate to a "generative exponent regime" with large learning rate. Our framework covers prior analyses of one-pass SGD and SGD with batch reuse, while also introducing a new layer-wise training algorithm that leverages a two-timescales approach (via different learning rates for each layer) to go beyond correlational queries without reusing samples or modifying the loss from squared error. Our theoretical study demonstrates that the choice of learning rate is as important as the design of the algorithm in achieving statistical and computational efficiency.

artificial intelligence, machine learning, sample complexity, (14 more...)

Neural Information Processing Systems

Country: North America > Canada > Ontario (0.28)

Genre: Research Report > Experimental Study (1.00)

Industry: Government (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback