AITopics | valid action

Collaborating Authors

valid action

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

1e747ddbea997a1b933aaf58a7953c3c-Supplemental.pdf

Neural Information Processing SystemsApr-25-2026, 00:47:42 GMT

artificial intelligence, dead end, machine learning, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.48)

Add feedback

1e747ddbea997a1b933aaf58a7953c3c-Paper.pdf

Neural Information Processing SystemsApr-25-2026, 00:47:39 GMT

machine learning, natural language, reinforcement learning, (19 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Genre: Overview (0.46)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.95)
(2 more...)

Add feedback

7d4c0094ae32530494c71468558ab5b1-Paper-Conference.pdf

Neural Information Processing SystemsFeb-15-2026, 11:07:22 GMT

artificial intelligence, constraint, machine learning, (17 more...)

Neural Information Processing Systems

Country:

Asia > Singapore (0.04)
Europe > Austria (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
(4 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

FlowPG: Action-constrained Policy Gradient with Normalizing Flows

Neural Information Processing SystemsFeb-10-2026, 21:37:40 GMT

Second, learning the flow model requires sampling from the feasible action space, which is also challenging.

constraint, machine learning, reinforcement learning, (19 more...)

Neural Information Processing Systems

Country: Asia > Singapore (0.04)

Genre: Research Report (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Robots (0.94)
(2 more...)

Add feedback

1e747ddbea997a1b933aaf58a7953c3c-Paper.pdf

Neural Information Processing SystemsFeb-7-2026, 18:34:13 GMT

machine learning, natural language, reinforcement learning, (18 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Oceania > Australia > Victoria > Melbourne (0.04)

Genre: Overview (0.46)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.95)
(2 more...)

Add feedback

FlowPG: Action-constrained Policy Gradient with Normalizing Flows

Neural Information Processing SystemsDec-24-2025, 20:34:25 GMT

Action-constrained reinforcement learning (ACRL) is a popular approach for solving safety-critical and resource-allocation related decision making problems. A major challenge in ACRL is to ensure agent taking a valid action satisfying constraints in each RL step. Commonly used approach of using a projection layer on top of the policy network requires solving an optimization program which can result in longer training time, slow convergence, and zero gradient problem. To address this, first we use a normalizing flow model to learn an invertible, differentiable mapping between the feasible action space and the support of a simple distribution on a latent variable, such as Gaussian. Second, learning the flow model requires sampling from the feasible action space, which is also challenging. We develop multiple methods, based on Hamiltonian Monte-Carlo and probabilistic sentential decision diagrams for such action sampling for convex and non-convex constraints. Third, we integrate the learned normalizing flow with the DDPG algorithm. By design, a well-trained normalizing flow will transform policy output into a valid action without requiring an optimization solver. Empirically, our approach results in significantly fewer constraint violations (upto an order-of-magnitude for several instances) and is multiple times faster on a variety of continuous control tasks.

action-constrained policy gradient, name change, normalizing flow, (7 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.77)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.60)

Add feedback

SUPPLEMENTARY MATERIAL Deep Reinforcement Learning with Stacked Hierarchical Attention for T based Games

Neural Information Processing SystemsNov-15-2025, 05:53:22 GMT

Figure 1 shows an example of the raw interface of the game "ztuu", where raw textual observations In this section, we show the first 15 interaction steps of two games: "zork1" and "ztuu". C h o s e n a c t i o n a n d r e w a r d A c t i o n: w e s t Reward: 0 | S c o r e: 0 ===== S t e p 2 ===== ===== 1 . C h o s e n a c t i o n a n d r e w a r d A c t i o n: s o u t h Reward: 0 | S c o r e: 0 ===== S t e p 3 ===== 16 ===== 1 . C h o s e n a c t i o n a n d r e w a r d A c t i o n: s o u t h Reward: 0 | S c o r e: 0 ===== S t e p 4 ===== ===== 1 . C h o s e n a c t i o n a n d r e w a r d A c t i o n: w e s t Reward: 0 | S c o r e: 0 ===== S t e p 5 ===== ===== 1 .

baby rune, deep reinforcement learning, stacked hierarchical attention, (10 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.85)

Add feedback

Competition is the key: A Game Theoretic Causal Discovery Approach

Roy, Amartya, Chakraborty, Souvik

arXiv.org Artificial IntelligenceOct-24-2025

Causal discovery remains a central challenge in machine learning, yet existing methods face a fundamental gap: algorithms like GES and GraN-DAG achieve strong empirical performance but lack finite-sample guarantees, while theoretically principled approaches fail to scale. We close this gap by introducing a game-theoretic reinforcement learning framework for causal discovery, where a DDQN agent directly competes against a strong baseline (GES or GraN-DAG), always warm-starting from the opponent's solution. This design yields three provable guarantees: the learned graph is never worse than the opponent, warm-starting strictly accelerates convergence, and most importantly, with high probability the algorithm selects the true best candidate graph. To the best of our knowledge, our result makes a first-of-its-kind progress in explaining such finite-sample guarantees in causal discovery: on synthetic SEMs (30 nodes), the observed error probability decays with n, tightly matching theory. On real-world benchmarks including Sachs, Asia, Alarm, Child, Hepar2, Dream, and Andes, our method consistently improves upon GES and GraN-DAG while remaining theoretically safe. Remarkably, it scales to large graphs such as Hepar2 (70 nodes), Dream (100 nodes), and Andes (220 nodes). Together, these results establish a new class of RL-based causal discovery algorithms that are simultaneously provably consistent, sample-efficient, and practically scalable, marking a decisive step toward unifying empirical performance with rigorous finite-sample theory.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2510.20106

Country: Asia > India (0.28)

Genre: