AITopics

2410.09924

Country:

North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)
North America > United States > Oregon (0.04)
Asia > South Korea > Daegu > Daegu (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots > Robot Planning & Action (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.89)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.88)

arXiv.org Artificial IntelligenceOct-12-2024

SAPIENT: Mastering Multi-turn Conversational Recommendation with Strategic Planning and Monte Carlo Tree Search

Du, Hanwen, Peng, Bo, Ning, Xia

Conversational Recommender Systems (CRS) proactively engage users in interactive dialogues to elicit user preferences and provide personalized recommendations. Existing methods train Reinforcement Learning (RL)-based agent with greedy action selection or sampling strategy, and may suffer from suboptimal conversational planning. To address this, we present a novel Monte Carlo Tree Search (MCTS)-based CRS framework SAPIENT. SAPIENT consists of a conversational agent (S-agent) and a conversational planner (S-planner). S-planner builds a conversational search tree with MCTS based on the initial actions proposed by S-agent to find conversation plans. The best conversation plans from S-planner are used to guide the training of S-agent, creating a self-training loop where S-agent can iteratively improve its capability for conversational planning. Furthermore, we propose an efficient variant SAPIENT-e for trade-off between training efficiency and performance. Extensive experiments on four benchmark datasets validate the effectiveness of our approach, showing that SAPIENT outperforms the state-of-the-art baselines.

artificial intelligence, planning & scheduling, s-agent, (19 more...)

2410.0958

Country:

North America > United States > Ohio (0.04)
North America > United States > New York (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)

Neural Information Processing SystemsOct-11-2024, 13:35:09 GMT

Efficient Planning in Large MDPs with Weak Linear Function Approximation

Large-scale Markov decision processes (MDPs) require planning algorithms with runtime independent of the number of states of the MDP. We consider the planning problem in MDPs using linear value function approximation with only weak requirements: low approximation error for the optimal value function, and a small set of "core" states whose features span those of other states. In particular, we make no assumptions about the representability of policies or value functions of non-optimal policies. Our algorithm produces almost-optimal actions for any state using a generative oracle (simulator) for the MDP, while its computation time scales polynomially with the number of features, core states, and actions and the effective horizon.

efficient planning, mdp, weak linear function approximation, (1 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.68)

Neural Information Processing SystemsOct-11-2024, 01:42:39 GMT

Monte Carlo Tree Descent for Black-Box Optimization

The key to Black-Box Optimization is to efficiently search through input regions with potentially widely-varying numerical properties, to achieve low-regret descent and fast progress toward the optima. Monte Carlo Tree Search (MCTS) methods have recently been introduced to improve Bayesian optimization by computing better partitioning of the search space that balances exploration and exploitation. Extending this promising framework, we study how to further integrate sample-based descent for faster optimization. We design novel ways of expanding Monte Carlo search trees, with new descent methods at vertices that incorporate stochastic search and Gaussian Processes. We propose the corresponding rules for balancing progress and uncertainty, branch selection, tree expansion, and backpropagation.

black-box optimization, gaussian process, monte carlo tree descent

Industry: Transportation > Air (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.65)

Enhanced Robot Planning and Perception through Environment Prediction

Sharma, Vishnu Dutt

Mobile robots rely on maps to navigate through an environment. In the absence of any map, the robots must build the map online from partial observations as they move in the environment. Traditional methods build a map using only direct observations. In contrast, humans identify patterns in the observed environment and make informed guesses about what to expect ahead. Modeling these patterns explicitly is difficult due to the complexity of the environments. However, these complex models can be approximated well using learning-based methods in conjunction with large training data. By extracting patterns, robots can use direct observations and predictions of what lies ahead to better navigate an unknown environment. In this dissertation, we present several learning-based methods to equip mobile robots with prediction capabilities for efficient and safer operation. In the first part of the dissertation, we learn to predict using geometrical and structural patterns in the environment. Partially observed maps provide invaluable cues for accurately predicting the unobserved areas. We first demonstrate the capability of general learning-based approaches to model these patterns for a variety of overhead map modalities. Then we employ task-specific learning for faster navigation in indoor environments by predicting 2D occupancy in the nearby regions. This idea is further extended to 3D point cloud representation for object reconstruction. Predicting the shape of the full object from only partial views, our approach paves the way for efficient next-best-view planning. In the second part of the dissertation, we learn to predict using spatiotemporal patterns in the environment. We focus on dynamic tasks such as target tracking and coverage where we seek decentralized coordination between robots. We first show how graph neural networks can be used for more scalable and faster inference.

large language model, machine learning, reinforcement learning, (22 more...)

2410.0856

Country:

Europe (1.00)
Asia (0.67)
North America > United States > Minnesota (0.27)

Genre:

Workflow (1.00)
Research Report > New Finding (1.00)
Overview (1.00)

Industry:

Leisure & Entertainment (1.00)
Information Technology (1.00)
Health & Medicine (1.00)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(6 more...)

Wang, Duo, Araldo, Andrea, Yacoubi, Mounim El

Online design of dynamic networks

Designing a network (e.g., a telecommunication or transport network) is mainly done offline, in a planning phase, prior to the operation of the network. On the other hand, a massive effort has been devoted to characterizing dynamic networks, i.e., those that evolve over time. The novelty of this paper is that we introduce a method for the online design of dynamic networks. The need to do so emerges when a network needs to operate in a dynamic and stochastic environment. In this case, one may wish to build a network over time, on the fly, in order to react to the changes of the environment and to keep certain performance targets. We tackle this online design problem with a rolling horizon optimization based on Monte Carlo Tree Search. The potential of online network design is showcased for the design of a futuristic dynamic public transport network, where bus lines are constructed on the fly to better adapt to a stochastic user demand. In such a scenario, we compare our results with state-of-the-art dynamic vehicle routing problem (VRP) resolution methods, simulating requests from a New York City taxi dataset. Differently from classic VRP methods, that extend vehicle trajectories in isolation, our method enables us to build a structured network of line buses, where complex user journeys are possible, thus increasing system performance.

data mining, machine learning, simulation, (20 more...)

2410.08875

Country:

North America > United States > New York (0.24)
Europe > Slovenia > Upper Carniola > Municipality of Bled > Bled (0.04)

Genre: Research Report (1.00)

Industry:

Transportation > Passenger (1.00)
Transportation > Infrastructure & Services (1.00)
Transportation > Ground > Road (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.66)

Refinements on the Complementary PDB Construction Mechanism

Zou, Yufeng

Pattern database (PDB) is one of the most popular automated heuristic generation techniques. A PDB maps states in a planning task to abstract states by considering a subset of variables and stores their optimal costs to the abstract goal in a look up table. As the result of the progress made on symbolic search over recent years, symbolic-PDB-based planners achieved impressive results in the International Planning Competition (IPC) 2018. Among them, Complementary 1 (CPC1) tied as the second best planners and the best non-portfolio planners in the cost optimal track, only 2 tasks behind the winner. It uses a combination of different pattern generation algorithms to construct PDBs that are complementary to existing ones. As shown in the post contest experiments, there is room for improvement. In this paper, we would like to present our work on refining the PDB construction mechanism of CPC1. By testing on IPC 2018 benchmarks, the results show that a significant improvement is made on our modified planner over the original version.

algorithm, artificial intelligence, planning & scheduling, (16 more...)

2410.09297

Country: Oceania > New Zealand > North Island > Auckland Region > Auckland (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.69)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.69)

iFANnpp: Nuclear Power Plant Digital Twin for Robots and Autonomous Intelligence

Do, Youndo, Zebrowitz, Marc, Stahl, Jackson, Zhang, Fan

Robotics has gained significant attention due to its autonomy and ability to automate in the nuclear industry. However, the increasing complexity of robots has led to a growing demand for advanced simulation and control methods to predict robot behavior and optimize plant performance. Most existing digital twins only address parts of systems and do not offer an overall design of nuclear power plants. Furthermore, they are often designed for specific algorithms or tasks, making them unsuitable for broader research applications or other potential projects. In response, we propose a comprehensive nuclear power plant designed to enhance real-time monitoring, operational efficiency, and predictive maintenance. We selected to model a full-scope nuclear power plant in Unreal Engine 5 to incorporate the complexities and various phenomena. The high-resolution simulation environment is integrated with a General Pressurized Water Reactor Simulator, a high-fidelity physics-driven software, to create a realistic flow of nuclear power plant and a real-time updating virtual environment. Furthermore, the virtual environment provides various features and a Python bridge for researchers to test custom algorithms and frameworks easily. The digital twin's performance is presented, and several research ideas - such as multi-robot task scheduling and robot navigation in the radiation area - using implemented features are presented.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

2410.09213

Country: North America > United States > Georgia > Fulton County > Atlanta (0.04)

Genre:

Research Report (0.64)
Overview (0.46)

Industry: Energy > Power Industry > Utilities > Nuclear (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots > Locomotion (0.68)
Information Technology > Artificial Intelligence > Robots > Robot Planning & Action (0.48)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.48)
(4 more...)

Neural Information Processing SystemsOct-10-2024, 15:26:51 GMT

Making the most of your day: online learning for optimal allocation of time

We study online learning for optimal allocation when the resource to be allocated is time. An agent receives task proposals sequentially according to a Poisson process and can either accept or reject a proposed task. If she accepts the proposal, she is busy for the duration of the task and obtains a reward that depends on the task duration. If she rejects it, she remains on hold until a new task proposal arrives. We study the regret incurred by the agent first when she knows her reward function but does not know the distribution of the task duration, and then when she does not know her reward function, either.

online, optimal allocation, reward function, (1 more...)

Industry: Education > Educational Setting > Online (0.66)

Technology:

Information Technology > Enterprise Applications > Human Resources > Learning Management (0.66)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.66)

Neural Information Processing SystemsOct-10-2024, 09:15:38 GMT

Maximum Entropy Monte-Carlo Planning

We develop a new algorithm for online planning in large scale sequential decision problems that improves upon the worst case efficiency of UCT. The idea is to augment Monte-Carlo Tree Search (MCTS) with maximum entropy policy optimization, evaluating each search node by softmax values back-propagated from simulation. To establish the effectiveness of this approach, we first investigate the single-step decision problem, stochastic softmax bandits, and show that softmax values can be estimated at an optimal convergence rate in terms of mean squared error. We then extend this approach to general sequential decision making by developing a general MCTS algorithm, Maximum Entropy for Tree Search (MENTS). We prove that the probability of MENTS failing to identify the best decision at the root decays exponentially, which fundamentally improves the polynomial convergence rate of UCT.

convergence rate, decision problem, maximum entropy monte-carlo planning, (5 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Maximum Entropy (0.91)