Planning & Scheduling
Council committees to be bypassed in bid to build more homes
A Conservative spokesman said the government's plans are "nothing more than a list of empty promises which will do nothing to ensure that Britain has the housing it needs where it needs it". Earlier this week, Prime Minister Sir Keir Starmer restated his pledge to build 1.5 million new homes by 2029 despite accepting it could be "a little too ambitious". The fast-track planning process would apply to housing proposals and associated infrastructure such as schools, if they had already been broadly agreed as part of local development plans where councils set out a strategy for land use in their areas. If the proposals "comply" with these plans, the government said, they could "bypass planning committees entirely to tackle chronic uncertainty, unacceptable delays and unnecessary waste of time and resources". Rayner said building more homes and infrastructure meant "unblocking the clogged-up planning system that serves as a chokehold on growth".
Britain's green energy pledge 'credible' if planning fixed, says system operator
A plan to create a clean electricity system by 2030 promised by Labour before the election is "immensely challenging" but still "credible" if ministers take urgent action to fix Britain's sluggish planning system, the energy system operator has said. Britain could become a net exporter of green electricity by the end of the decade at no extra costs to the energy system under the plans and bills may even fall if ministers make the right policy changes, according to the operator. The newly formed National Energy System Operator (Neso) put forward the conclusions as part of its official advice to new ministers on how to reach Labour election pledge to decarbonise the power system by 2030. Fintan Slye, the chief executive of Neso, said: "There's no doubt that the challenges ahead on the journey to delivering clean power are great. However, if the scale of those challenges is matched with the bold, sustained actions that are outlined in this report, the benefits delivered could be even greater."
Congratulations to the #ECAI2024 outstanding paper award winners
The 27th European Conference on Artificial Intelligence (ECAI-2024) took place from 19-24 October in Santiago de Compostela, Spain. The venue also played host to the 13th Conference on Prestigious Applications of Intelligent Systems (PAIS-2024). During the week, both conferences announced their outstanding paper award winners. The winning articles were chosen based on the reviews written during the paper selection process, nominations submitted by individual members of the programme committee, additional input solicited from outside experts, and the judgement of the programme committee chairs. Abstract: Proper losses such as cross-entropy incentivize classifiers to produce class probabilities that are well-calibrated on the training data.
Efficient Planning in Large MDPs with Weak Linear Function Approximation
Large-scale Markov decision processes (MDPs) require planning algorithms with runtime independent of the number of states of the MDP. We consider the planning problem in MDPs using linear value function approximation with only weak requirements: low approximation error for the optimal value function, and a small set of "core" states whose features span those of other states. In particular, we make no assumptions about the representability of policies or value functions of non-optimal policies. Our algorithm produces almost-optimal actions for any state using a generative oracle (simulator) for the MDP, while its computation time scales polynomially with the number of features, core states, and actions and the effective horizon.
Learning NP-Hard Multi-Agent Assignment Planning using GNN: Inference on a Random Graph and Provable Auction-Fitted Q-learning
This paper explores the possibility of near-optimally solving multi-agent, multi-task NP-hard planning problems with time-dependent rewards using a learning-based algorithm. In particular, we consider a class of robot/machine scheduling problems called the multi-robot reward collection problem (MRRC). Such MRRC problems well model ride-sharing, pickup-and-delivery, and a variety of related problems. In representing the MRRC problem as a sequential decision-making problem, we observe that each state can be represented as an extension of probabilistic graphical models (PGMs), which we refer to as random PGMs. We then develop a mean-field inference method for random PGMs.
Monte Carlo Tree Descent for Black-Box Optimization
The key to Black-Box Optimization is to efficiently search through input regions with potentially widely-varying numerical properties, to achieve low-regret descent and fast progress toward the optima. Monte Carlo Tree Search (MCTS) methods have recently been introduced to improve Bayesian optimization by computing better partitioning of the search space that balances exploration and exploitation. Extending this promising framework, we study how to further integrate sample-based descent for faster optimization. We design novel ways of expanding Monte Carlo search trees, with new descent methods at vertices that incorporate stochastic search and Gaussian Processes. We propose the corresponding rules for balancing progress and uncertainty, branch selection, tree expansion, and backpropagation.
POLY-HOOT: Monte-Carlo Planning in Continuous Space MDPs with Non-Asymptotic Analysis
Monte-Carlo planning, as exemplified by Monte-Carlo Tree Search (MCTS), has demonstrated remarkable performance in applications with finite spaces. In this paper, we consider Monte-Carlo planning in an environment with continuous state-action spaces, a much less understood problem with important applications in control and robotics. We introduce POLY-HOOT, an algorithm that augments MCTS with a continuous armed bandit strategy named Hierarchical Optimistic Optimization (HOO) (Bubeck et al., 2011). Specifically, we enhance HOO by using an appropriate polynomial, rather than logarithmic, bonus term in the upper confidence bounds. Such a polynomial bonus is motivated by its empirical successes in AlphaGo Zero (Silver et al., 2017b), as well as its significant role in achieving theoretical guarantees of finite space MCTS (Shah et al., 2019). We investigate, for the first time, the regret of the enhanced HOO algorithm in non-stationary bandit problems.
Learning Space Partitions for Path Planning
Path planning, the problem of efficiently discovering high-reward trajectories, often requires optimizing a high-dimensional and multimodal reward function. Popular approaches like CEM and CMA-ES greedily focus on promising regions of the search space and may get trapped in local maxima. DOO and VOOT balance exploration and exploitation, but use space partitioning strategies independent of the reward function to be optimized. Recently, LaMCTS empirically learns to partition the search space in a reward-sensitive manner for black-box optimization. In this paper, we develop a novel formal regret analysis for when and why such an adaptive region partitioning scheme works.
Blazing the trails before beating the path: Sample-efficient Monte-Carlo planning
We study the sampling-based planning problem in Markov decision processes (MDPs) that we can access only through a generative model, usually referred to as Monte-Carlo planning. Our objective is to return a good estimate of the optimal value function at any state while minimizing the number of calls to the generative model, i.e. the sample complexity. We propose a new algorithm, TrailBlazer, able to handle MDPs with a finite or an infinite number of transitions from state-action to next states. TrailBlazer is an adaptive algorithm that exploits possible structures of the MDP by exploring only a subset of states reachable by following near-optimal policies. We provide bounds on its sample complexity that depend on a measure of the quantity of near-optimal states.
Monte-Carlo Tree Search for Constrained POMDPs
Monte-Carlo Tree Search (MCTS) has been successfully applied to very large POMDPs, a standard model for stochastic sequential decision-making problems. However, many real-world problems inherently have multiple goals, where multi-objective formulations are more natural. The constrained POMDP (CPOMDP) is such a model that maximizes the reward while constraining the cost, extending the standard POMDP model. To date, solution methods for CPOMDPs assume an explicit model of the environment, and thus are hardly applicable to large-scale real-world problems. In this paper, we present CC-POMCP (Cost-Constrained POMCP), an online MCTS algorithm for large CPOMDPs that leverages the optimization of LP-induced parameters and only requires a black-box simulator of the environment. In the experiments, we demonstrate that CC-POMCP converges to the optimal stochastic action selection in CPOMDP and pushes the state-of-the-art by being able to scale to very large problems.