South America
microPhantom: Playing microRTS under uncertainty and chaos
This competition paper presents microPhantom, a bot playing microRTS and participating in the 2020 microRTS AI competition. microPhantom is based on our previous bot POAdaptive which won the partially observable track of the 2018 and 2019 microRTS AI competitions. In this paper, we focus on decision-making under uncertainty, by tackling the Unit Production Problem with a method based on a combination of Constraint Programming and decision theory. We show that using our method to decide which units to train improves significantly the win rate against the second-best microRTS bot from the partially observable track. We also show that our method is resilient in chaotic environments, with a very small loss of efficiency only. To allow replicability and to facilitate further research, the source code of microPhantom is available, as well as the Constraint Programming toolkit it uses.
Federated Survival Analysis with Discrete-Time Cox Models
Andreux, Mathieu, Manoel, Andre, Menuet, Romuald, Saillard, Charlie, Simpson, Chloé
Building machine learning models from decentralized datasets located in different centers with federated learning (FL) is a promising approach to circumvent local data scarcity while preserving privacy. However, the prominent Cox proportional hazards (PH) model, used for survival analysis, does not fit the FL framework, as its loss function is non-separable with respect to the samples. The na\"ive method to bypass this non-separability consists in calculating the losses per center, and minimizing their sum as an approximation of the true loss. We show that the resulting model may suffer from important performance loss in some adverse settings. Instead, we leverage the discrete-time extension of the Cox PH model to formulate survival analysis as a classification problem with a separable loss function. Using this approach, we train survival models using standard FL techniques on synthetic data, as well as real-world datasets from The Cancer Genome Atlas (TCGA), showing similar performance to a Cox PH model trained on aggregated data. Compared to previous works, the proposed method is more communication-efficient, more generic, and more amenable to using privacy-preserving techniques.
A Survey of Constrained Gaussian Process Regression: Approaches and Implementation Challenges
Swiler, Laura, Gulian, Mamikon, Frankel, Ari, Safta, Cosmin, Jakeman, John
Gaussian process regression is a popular Bayesian framework for surrogate modeling of expensive data sources. As part of a broader effort in scientific machine learning, many recent works have incorporated physical constraints or other a priori information within Gaussian process regression to supplement limited data and regularize the behavior of the model. We provide an overview and survey of several classes of Gaussian process constraints, including positivity or bound constraints, monotonicity and convexity constraints, differential equation constraints provided by linear PDEs, and boundary condition constraints. We compare the strategies behind each approach as well as the differences in implementation, concluding with a discussion of the computational challenges introduced by constraints.
Efficient Path Algorithms for Clustered Lasso and OSCAR
Takahashi, Atsumori, Nomura, Shunichi
In high dimensional regression, feature clustering by their effects on outcomes is often as important as feature selection. For that purpose, clustered Lasso and octagonal shrinkage and clustering algorithm for regression (OSCAR) are used to make feature groups automatically by pairwise $L_1$ norm and pairwise $L_\infty$ norm, respectively. This paper proposes efficient path algorithms for clustered Lasso and OSCAR to construct solution paths with respect to their regularization parameters. Despite too many terms in exhaustive pairwise regularization, their computational costs are reduced by using symmetry of those terms. Simple equivalent conditions to check subgradient equations in each feature group are derived by some graph theories. The proposed algorithms are shown to be more efficient than existing algorithms in numerical experiments.
APQ: Joint Search for Network Architecture, Pruning and Quantization Policy
Wang, Tianzhe, Wang, Kuan, Cai, Han, Lin, Ji, Liu, Zhijian, Han, Song
We present APQ for efficient deep learning inference on resource-constrained hardware. Unlike previous methods that separately search the neural architecture, pruning policy, and quantization policy, we optimize them in a joint manner. To deal with the larger design space it brings, a promising approach is to train a quantization-aware accuracy predictor to quickly get the accuracy of the quantized model and feed it to the search engine to select the best fit. However, training this quantization-aware accuracy predictor requires collecting a large number of quantized
The Limit of the Batch Size
You, Yang, Wang, Yuhui, Zhang, Huan, Zhang, Zhao, Demmel, James, Hsieh, Cho-Jui
Large-batch training is an efficient approach for current distributed deep learning systems. It has enabled researchers to reduce the ImageNet/ResNet-50 training from 29 hours to around 1 minute. In this paper, we focus on studying the limit of the batch size. We think it may provide a guidance to AI supercomputer and algorithm designers. We provide detailed numerical optimization instructions for step-by-step comparison. Moreover, it is important to understand the generalization and optimization performance of huge batch training. Hoffer et al. introduced "ultra-slow diffusion" theory to large-batch training. However, our experiments show contradictory results with the conclusion of Hoffer et al. We provide comprehensive experimental results and detailed analysis to study the limitations of batch size scaling and "ultra-slow diffusion" theory. For the first time we scale the batch size on ImageNet to at least a magnitude larger than all previous work, and provide detailed studies on the performance of many state-of-the-art optimization schemes under this setting. We propose an optimization recipe that is able to improve the top-1 test accuracy by 18% compared to the baseline.
A VIKOR and TOPSIS focused reanalysis of the MADM methods based on logarithmic normalization
Zolfani, Sarfaraz, Yazdani, Morteza, Pamucar, Dragan, Zaraté, Pascale
Decision and policy-makers in multi-criteria decision-making analysis take into account some strategies in order to analyze outcomes and to finally make an effective and more precise decision. Among those strategies, the modification of the normalization process in the multiple-criteria decision-making algorithm is still a question due to the confrontation of many normalization tools. Normalization is the basic action in defining and solving a MADM problem and a MADM model. Normalization is the first, also necessary, step in solving, i.e. the application of a MADM method. It is a fact that the selection of normalization methods has a direct effect on the results. One of the latest normalization methods introduced is the Logarithmic Normalization (LN) method. This new method has a distinguished advantage, reflecting in that a sum of the normalized values of criteria always equals 1. This normalization method had never been applied in any MADM methods before. This research study is focused on the analysis of the classical MADM methods based on logarithmic normalization. VIKOR and TOPSIS, as the two famous MADM methods, were selected for this reanalysis research study. Two numerical examples were checked in both methods, based on both the classical and the novel ways based on the LN. The results indicate that there are differences between the two approaches. Eventually, a sensitivity analysis is also designed to illustrate the reliability of the final results.
Explaining reputation assessments
Nunes, Ingrid, Taylor, Phillip, Barakat, Lina, Griffiths, Nathan, Miles, Simon
Reputation is crucial to enabling human or software agents to select among alternative providers. Although several effective reputation assessment methods exist, they typically distil reputation into a numerical representation, with no accompanying explanation of the rationale behind the assessment. Such explanations would allow users or clients to make a richer assessment of providers, and tailor selection according to their preferences and current context. In this paper, we propose an approach to explain the rationale behind assessments from quantitative reputation models, by generating arguments that are combined to form explanations. Our approach adapts, extends and combines existing approaches for explaining decisions made using multi-attribute decision models in the context of reputation. We present example argument templates, and describe how to select their parameters using explanation algorithms. Our proposal was evaluated by means of a user study, which followed an existing protocol. Our results give evidence that although explanations present a subset of the information of trust scores, they are sufficient to equally evaluate providers recommended based on their trust score. Moreover, when explanation arguments reveal implicit model information, they are less persuasive than scores.
A systematic review and taxonomy of explanations in decision support and recommender systems
Nunes, Ingrid, Jannach, Dietmar
With the recent advances in the field of artificial intelligence, an increasing number of decision-making tasks are delegated to software systems. A key requirement for the success and adoption of such systems is that users must trust system choices or even fully automated decisions. To achieve this, explanation facilities have been widely investigated as a means of establishing trust in these systems since the early years of expert systems. With today's increasingly sophisticated machine learning algorithms, new challenges in the context of explanations, accountability, and trust towards such systems constantly arise. In this work, we systematically review the literature on explanations in advice-giving systems. This is a family of systems that includes recommender systems, which is one of the most successful classes of advice-giving software in practice. We investigate the purposes of explanations as well as how they are generated, presented to users, and evaluated. As a result, we derive a novel comprehensive taxonomy of aspects to be considered when designing explanation facilities for current and future decision support systems. The taxonomy includes a variety of different facets, such as explanation objective, responsiveness, content and presentation. Moreover, we identified several challenges that remain unaddressed so far, for example related to fine-grained issues associated with the presentation of explanations and how explanation facilities are evaluated.