Gerevini, Alfonso E.
Competitions in AI -- Robustly Ranking Solvers Using Statistical Resampling
Fawcett, Chris, Vallati, Mauro, Hoos, Holger H., Gerevini, Alfonso E.
Solver competitions play a prominent role in assessing and advancing the state of the art for solving many problems in AI and beyond. Notably, in many areas of AI, competitions have had substantial impact in guiding research and applications for many years, and for a solver to be ranked highly in a competition carries considerable weight. But to which extent can we expect competition results to generalise to sets of problem instances different from those used in a particular competition? This is the question we investigate here, using statistical resampling techniques. We show that the rankings resulting from the standard interpretation of competition results can be very sensitive to even minor changes in the benchmark instance set used as the basis for assessment and can therefore not be expected to carry over to other samples from the same underlying instance distribution. To address this problem, we introduce a novel approach to statistically meaningful analysis of competition results based on resampling performance data. Our approach produces confidence intervals of competition scores as well as statistically robust solver rankings with bounded error. Applied to recent SAT, AI planning and computer vision competitions, our analysis reveals frequent statistical ties in solver performance as well as some inversions of ranks compared to the official results based on simple scoring.
Novelty Messages Filtering for Multi Agent Privacy-preserving Planning
Gerevini, Alfonso E., Lipovetzky, Nir, Peli, Nico, Percassi, Francesco, Saetti, Alessandro, Serina, Ivan
In multi-agent planning, agents jointly compute a plan that achieves mutual goals, keeping certain information private to the individual agents. Agents' coordination is achieved through the transmission of messages. These messages can be a source of privacy leakage as they can permit a malicious agent to collect information about other agents' actions and search states. In this paper, we investigate the usage of novelty techniques in the context of (decentralised) multi-agent privacy-preserving planning, addressing the challenges related to the agents' privacy and performance. In particular, we show that the use of novelty based techniques can significantly reduce the number of messages transmitted among agents, better preserving their privacy and improving their performance. An experimental study analyses the effectiveness of our techniques and compares them with the state-of-the-art. Finally, we evaluate the robustness of our approach, considering different delays in the transmission of messages as they would occur in overloaded networks, due for example to massive attacks or critical situations.
Best-First Width Search for Multi Agent Privacy-preserving Planning
Gerevini, Alfonso E., Lipovetzky, Nir, Percassi, Francesco, Saetti, Alessandro, Serina, Ivan
In multi-agent planning, preserving the agents' privacy has become an increasingly popular research topic. For preserving the agents' privacy, agents jointly compute a plan that achieves mutual goals by keeping certain information private to the individual agents. Unfortunately, this can severely restrict the accuracy of the heuristic functions used while searching for solutions. It has been recently shown that, for centralized planning, the performance of goal oriented search can be improved by combining goal oriented search and width-based search. The combination of these techniques has been called best-first width search. In this paper, we investigate the usage of best-first width search in the context of (decentralised) multi-agent privacy-preserving planning, addressing the challenges related to the agents' privacy and performance. In particular, we show that best-first width search is a very effective approach over several benchmark domains, even when the search is driven by heuristics that roughly estimate the distance from goal states, computed without using the private information of other agents. An experimental study analyses the effectiveness of our techniques and compares them with the state-of-the-art.