AITopics

2507.05507

Country: North America > Canada > Ontario > Toronto (0.26)

Genre: Research Report > New Finding (0.66)

Industry:

Transportation (1.00)
Consumer Products & Services > Travel (0.57)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

What to Do Next? Memorizing skills from Egocentric Instructional Video

Bi, Jing, Xu, Chenliang

Learning to perform activities through demonstration requires extracting meaningful information about the environment from observations. In this research, we investigate the challenge of planning high-level goal-oriented actions in a simulation setting from an egocentric perspective. W e present a novel task, interactive action planning, and propose an approach that combines topological affordance memory with transformer architecture. The process of memorizing the environment's structure through extracting af-fordances facilitates selecting appropriate actions based on the context. Moreover, the memory model allows us to detect action deviations while accomplishing specific objectives. T o assess the method's versatility, we evaluate it in a realistic interactive simulation environment. Our experimental results demonstrate that the proposed approach learns meaningful representations, resulting in improved performance and robust when action deviations occur .

information, machine learning, natural language, (14 more...)

2507.02997

Genre: Research Report > New Finding (0.66)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Vision (0.96)
(5 more...)

López-Montero, Daniel, Álvarez-Aldana, José L., Morales-Martínez, Alicia, Gil-López, Marta, García, Juan M. Auñón

Reinforcement Learning for Automated Cybersecurity Penetration Testing

This paper aims to provide an innovative machine learning-based solution to automate security testing tasks for web applications, ensuring the correct functioning of all components while reducing project maintenance costs. Reinforcement Learning is proposed to select and prioritize tools and optimize the testing path. The presented approach utilizes a simulated webpage along with its network topology to train the agent. Additionally, the model leverages Geometric Deep Learning to create priors that reduce the search space and improve learning convergence. The validation and testing process was conducted on real-world vulnerable web pages commonly used by human hackers for learning. As a result of this study, a reinforcement learning algorithm was developed that maximizes the number of vulnerabilities found while minimizing the number of steps required

machine learning, reinforcement learning, vulnerability, (16 more...)

2507.02969

Country:

Europe > Spain > Galicia > Madrid (0.05)
Asia > Middle East > UAE > Dubai Emirate > Dubai (0.04)
Asia > Japan > Honshū > Tōhoku > Fukushima Prefecture > Fukushima (0.04)

Genre:

Research Report (1.00)
Overview (0.68)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Gelpí, Rebekah A., Xue, Eric, Cunningham, William A.

Towards Machine Theory of Mind with Large Language Model-Augmented Inverse Planning

We propose a hybrid approach to machine Theory of Mind (ToM) that uses large language models (LLMs) as a mechanism for generating hypotheses and likelihood functions with a Bayesian inverse planning model that computes posterior probabilities for an agent's likely mental states given its actions. Bayesian inverse planning models can accurately predict human reasoning on a variety of ToM tasks, but these models are constrained in their ability to scale these predictions to scenarios with a large number of possible hypotheses and actions. Conversely, LLM-based approaches have recently demonstrated promise in solving ToM benchmarks, but can exhibit brittleness and failures on reasoning tasks even when they pass otherwise structurally identical versions. By combining these two methods, this approach leverages the strengths of each component, closely matching optimal results on a task inspired by prior inverse planning models and improving performance relative to models that utilize LLMs alone or with chain-of-thought prompting, even with smaller LLMs that typically perform poorly on ToM tasks. We also exhibit the model's potential to predict mental states on open-ended tasks, offering a promising direction for future development of ToM models and the creation of socially intelligent generative agents.

large language model, machine learning, natural language, (18 more...)

2507.03682

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > Greece (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Consumer Products & Services > Restaurants (1.00)
Health & Medicine > Consumer Health (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Control Synthesis in Partially Observable Environments for Complex Perception-Related Objectives

Xuan, Zetong, Wang, Yu

Perception-related tasks often arise in autonomous systems operating under partial observability. This work studies the problem of synthesizing optimal policies for complex perception-related objectives in environments modeled by partially observable Markov decision processes. To formally specify such objectives, we introduce \emph{co-safe linear inequality temporal logic} (sc-iLTL), which can define complex tasks that are formed by the logical concatenation of atomic propositions as linear inequalities on the belief space of the POMDPs. Our solution to the control synthesis problem is to transform the \mbox{sc-iLTL} objectives into reachability objectives by constructing the product of the belief MDP and a deterministic finite automaton built from the sc-iLTL objective. To overcome the scalability challenge due to the product, we introduce a Monte Carlo Tree Search (MCTS) method that converges in probability to the optimal policy. Finally, a drone-probing case study demonstrates the applicability of our method.

artificial intelligence, machine learning, objective, (17 more...)

2507.02942

Country:

North America > United States > Florida > Alachua County > Gainesville (0.14)
North America > United States > Iowa (0.04)
North America > United States > California (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Beyond cognacy

Jäger, Gerhard

Computational phylogenetics has become an established tool in historical linguistics, with many language families now analyzed using likelihood-based inference. However, standard approaches rely on expert-annotated cognate sets, which are sparse, labor-intensive to produce, and limited to individual language families. This paper explores alternatives by comparing the established method to two fully automated methods that extract phylogenetic signal directly from lexical data. One uses automatic cognate clustering with unigram/concept features; the other applies multiple sequence alignment (MSA) derived from a pair-hidden Markov model. Both are evaluated against expert classifications from Glottolog and typological data from Grambank. Also, the intrinsic strengths of the phylogenetic signal in the characters are compared. Results show that MSA-based inference yields trees more consistent with linguistic classifications, better predicts typological variation, and provides a clearer phylogenetic signal, suggesting it as a promising, scalable alternative to traditional cognate-based methods. This opens new avenues for global-scale language phylogenies beyond expert annotation bottlenecks.

artificial intelligence, language family, machine learning, (17 more...)

2507.03005

Country:

Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.40)
North America > United States (0.14)
North America > Mexico > Mexico City > Mexico City (0.04)
(3 more...)

Genre: Research Report > New Finding (0.66)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.89)

Inverse Reinforcement Learning using Revealed Preferences and Passive Stochastic Optimization

Krishnamurthy, Vikram

This monograph, spanning three chapters, explores Inverse Reinforcement Learning (IRL). The first two chapters view inverse reinforcement learning (IRL) through the lens of revealed preferences from microeconomics while the third chapter studies adaptive IRL via Langevin dynamics stochastic gradient algorithms. Chapter uses classical revealed preference theory (Afriat's theorem and extensions) to identify constrained utility maximizers based on observed agent actions. This allows for the reconstruction of set-valued estimates of an agent's utility. We illustrate this procedure by identifying the presence of a cognitive radar and reconstructing its utility function. The chapter also addresses the construction of a statistical detector for utility maximization behavior when agent actions are corrupted by noise. Chapter 2 studies Bayesian IRL. It investigates how an analyst can determine if an observed agent is a rationally inattentive Bayesian utility maximizer (i.e., simultaneously optimizing its utility and observation likelihood). The chapter discusses inverse stopping-time problems, focusing on reconstructing the continuation and stopping costs of a Bayesian agent operating over a random horizon. We then apply this IRL methodology to identify the presence of a Bayes-optimal sequential detector. Additionally, Chapter 2 provides a concise overview of discrete choice models, inverse Bayesian filtering, and inverse stochastic gradient algorithms for adaptive IRL. Finally, Chapter 3 introduces an adaptive IRL approach utilizing passive Langevin dynamics. This method aims to track time-varying utility functions given noisy and misspecified gradients. In essence, the adaptive IRL algorithms presented in Chapter 3 can be conceptualized as inverse stochastic gradient algorithms, as they learn the utility function in real-time while a stochastic gradient algorithm is in operation.

artificial intelligence, machine learning, reinforcement learning, (20 more...)

2507.04396

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > New York > Tompkins County > Ithaca (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
(2 more...)

Genre: Summary/Review (1.00)

Industry:

Information Technology > Security & Privacy (0.67)
Telecommunications (0.67)
Information Technology > Services (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
(2 more...)

Keinan, Gur, Ben-Porat, Omer

Churn-Aware Recommendation Planning under Aggregated Preference Feedback

We study a sequential decision-making problem motivated by recent regulatory and technological shifts that limit access to individual user data in recommender systems (RSs), leaving only population-level preference information. This privacy-aware setting poses fundamental challenges in planning under uncertainty: Effective personalization requires exploration to infer user preferences, yet unsatisfactory recommendations risk immediate user churn. To address this, we introduce the Rec-APC model, in which an anonymous user is drawn from a known prior over latent user types (e.g., personas or clusters), and the decision-maker sequentially selects items to recommend. Feedback is binary -- positive responses refine the posterior via Bayesian updates, while negative responses result in the termination of the session. We prove that optimal policies converge to pure exploitation in finite time and propose a branch-and-bound algorithm to efficiently compute them. Experiments on synthetic and MovieLens data confirm rapid convergence and demonstrate that our method outperforms the POMDP solver SARSOP, particularly when the number of user types is large or comparable to the number of content categories. Our results highlight the applicability of this approach and inspire new ways to improve decision-making under the constraints imposed by aggregated preference data.

artificial intelligence, machine learning, user type, (19 more...)

2507.04513

Country:

North America > United States > New York > New York County > New York City (0.04)
Asia > Middle East > Israel (0.04)
North America > United States > California (0.04)
Asia > Middle East > Iran > Tehran Province > Tehran (0.04)

Genre: Research Report > New Finding (0.34)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.48)

WebSynthesis: World-Model-Guided MCTS for Efficient WebUI-Trajectory Synthesis

Gao, Yifei, Ye, Junhong, Wang, Jiaqi, Sang, Jitao

Recent advancements in large language models (LLMs) have significantly improved the capabilities of web agents. However, effectively navigating complex and dynamic web environments still requires more advanced trajectory-level planning and execution. Prior studies have addressed self-improving agents by collecting extensive GUI trajectories from real-environment interactions. Despite their effectiveness, these approaches encounter two critical challenges: (1) Uncontrollable environment states, where real or sandboxed web environments often yield unstable and non-deterministic feedback, complicating the reproduction and debugging of agent behaviors; and (2) High API costs, as generating even a single interaction trajectory can involve hundreds of queries, leading to considerable API usage and computational expenses. To address these limitations and enable scalable self-improvement for agents, we propose WebSynthesis, a novel framework for trajectory synthesis and training. WebSynthesis leverages a learned world model to simulate virtual web environments, allowing a policy agent to perform efficient and reversible tree-based planning. This approach supports the large-scale generation of diverse and high-quality trajectories, which are subsequently utilized to refine the agent's policy. Experimental results demonstrate that an agent trained using WebSynthesis on a small-scale synthetic dataset achieves performance comparable to or even surpassing that of models trained on large-scale real-world data.

large language model, machine learning, trajectory, (20 more...)

2507.0437

Country: Asia > China > Beijing > Beijing (0.04)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
(3 more...)

Zhang, Wenbo, Cai, Hengrui

Where to Intervene: Action Selection in Deep Reinforcement Learning

arXiv.org Machine LearningJul-8-2025

Deep reinforcement learning (RL) has gained widespread adoption in recent years but faces significant challenges, particularly in unknown and complex environments. Among these, high-dimensional action selection stands out as a critical problem. Existing works often require a sophisticated prior design to eliminate redundancy in the action space, relying heavily on domain expert experience or involving high computational complexity, which limits their generalizability across different RL tasks. In this paper, we address these challenges by proposing a general data-driven action selection approach with model-free and computationally friendly properties. Our method not only selects minimal sufficient actions but also controls the false discovery rate via knockoff sampling. More importantly, we seamlessly integrate the action selection into deep RL methods during online training. Empirical experiments validate the established theoretical guarantees, demonstrating that our method surpasses various alternative techniques in terms of both performance in variable selection and overall achieved rewards.

machine learning, reinforcement learning, selection, (14 more...)

arXiv.org Machine Learning

2507.04187

Country:

North America > United States > California > Orange County > Irvine (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Education > Educational Setting > Online (0.68)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)