Osogami, Takayuki
Mechanism design with multi-armed bandit
Osogami, Takayuki, Kinoshita, Hirota, Wasserkrug, Segev
A popular approach of automated mechanism design is to formulate a linear program (LP) whose solution gives a mechanism with desired properties. We analytically derive a class of optimal solutions for such an LP that gives mechanisms achieving standard properties of efficiency, incentive compatibility, strong budget balance (SBB), and individual rationality (IR), where SBB and IR are satisfied in expectation. Notably, our solutions are represented by an exponentially smaller number of essential variables than the original variables of LP. Our solutions, however, involve a term whose exact evaluation requires solving a certain optimization problem exponentially many times as the number of players, $N$, grows. We thus evaluate this term by modeling it as the problem of estimating the mean reward of the best arm in multi-armed bandit (MAB), propose a Probably and Approximately Correct estimator, and prove its asymptotic optimality by establishing a lower bound on its sample complexity. This MAB approach reduces the number of times the optimization problem is solved from exponential to $O(N\,\log N)$. Numerical experiments show that the proposed approach finds mechanisms that are guaranteed to achieve desired properties with high probability for environments with up to 128 players, which substantially improves upon the prior work.
Regression with Sensor Data Containing Incomplete Observations
Katsuki, Takayuki, Osogami, Takayuki
This paper addresses a regression problem in which output label values are the results of sensing the magnitude of a phenomenon. A low value of such labels can mean either that the actual magnitude of the phenomenon was low or that the sensor made an incomplete observation. This leads to a bias toward lower values in labels and the resultant learning because labels may have lower values due to incomplete observations, even if the actual magnitude of the phenomenon was high. Moreover, because an incomplete observation does not provide any tags indicating incompleteness, we cannot eliminate or impute them. To address this issue, we propose a learning algorithm that explicitly models incomplete observations corrupted with an asymmetric noise that always has a negative value. We show that our algorithm is unbiased as if it were learned from uncorrupted data that does not involve incomplete observations. We demonstrate the advantages of our algorithm through numerical experiments.
Biases in In Silico Evaluation of Molecular Optimization Methods and Bias-Reduced Evaluation Methodology
Kajino, Hiroshi, Miyaguchi, Kohei, Osogami, Takayuki
Molecular optimization aims to discover novel molecules with improved properties, which is often formulated as a reinforcement learning problem by modeling the construction of a molecule using a Markov decision process. The performance of such agents is measured by the quality of generated molecules. In the community of machine learning, most of the molecular optimization methods have been verified in silico, i.e., in computer simulation. Since most of the generated molecules are novel, their properties are unknown and we have to resort to a predictor to estimate the properties. However, little attention has been paid to how reliable such estimates are, which makes the existing performance estimates less reliable. In this paper, we study the statistical performance of such performance estimators to enhance our understanding of the evaluation protocol and we discuss several directions to improve it. Let us first introduce a common practice to estimate the performance in silico.
Supplementary material for Uncorrected least-squares temporal difference with lambda-return
Osogami, Takayuki
November 15, 2019 Abstract Here, we provide a supplementary material for Takayuki Osogami, "Uncorrected least-squares temporal difference with lambda-return," which appears in Proceedings of the 34th AAAI Conference on Artificial Intelligence (AAAI-20) [Osogami, 2019]. A Proofs In this section, we prove Theorem 1, Lemma 1, Theorem 2, Lemma 2, and Proposition 1. Note that equations (1)-(19) refers to those in Osogami [2019]. A.1 Proof of Theorem 1 From (7)-(8), we have the following equality: A Unc T 1 T null t 0ฯ t null ฯ t (1 ฮป) ฮณ T t null m 1( ฮปฮณ) m 1 ฯ t mnull null (20) T 1 null t 0ฯ t null ฯ t (1 ฮป) ฮณ T t null m 1(ฮปฮณ) m 1 ฯ t mnull null ฯ T ฯ null T (21) T 1 null t 0ฯ tnull ฯ t (1 ฮป) ฮณ T t 1 null m 1(ฮปฮณ) m 1 ฯ t m (1 ฮป) ฮณ (ฮปฮณ) T t 1 ฯ Tnull null ฯ T ฯ null T (22) A Unc T T 1 null t 0ฯ t(1 ฮป) ฮณ (ฮปฮณ) T t 1 ฯ null T ฯ T ฯ null T (23) A Unc T null T null t 0(ฮปฮณ) T t ฯ tnull ฯ null T ฮณ null T 1 null t 0( ฮปฮณ) T t 1 ฯ tnull ฯ null T (24) A Unc T ( z T ฮณ z T 1) ฯ null T . The recursive computation of the eligibility trace can be verified in a straightforward manner.
Visual analytics for team-based invasion sports with significant events and Markov reward process
Zhao, Kun, Osogami, Takayuki, Morimura, Tetsuro
In team-based invasion sports such as soccer and basketball, analytics is important for teams to understand their performance and for audiences to understand matches better. The present work focuses on performing visual analytics to evaluate the value of any kind of event occurring in a sports match with a continuous parameter space. Here, the continuous parameter space involves the time, location, score, and other parameters. Because the spatiotemporal data used in such analytics is a low-level representation and has a very large size, however, traditional analytics may need to discretize the continuous parameter space (e.g., subdivide the playing area) or use a local feature to limit the analysis to specific events (e.g., only shots). These approaches make evaluation impossible for any kind of event with a continuous parameter space. To solve this problem, we consider a whole match as a Markov chain of significant events, so that event values can be estimated with a continuous parameter space by solving the Markov chain with a machine learning model. The significant events are first extracted by considering the time-varying distribution of players to represent the whole match. Then, the extracted events are redefined as different states with the continuous parameter space and built as a Markov chain so that a Markov reward process can be applied. Finally, the Markov reward process is solved by a customized fitted-value iteration algorithm so that the event values with the continuous parameter space can be predicted by a regression model. As a result, the event values can be visually inspected over the whole playing field under arbitrary given conditions. Experimental results with real soccer data show the effectiveness of the proposed system.
Real-time tree search with pessimistic scenarios
Osogami, Takayuki, Takahashi, Toshihiro
Autonomous agents, such as self-driving cars and drones, need to make decisions in real time, which is particularly important but difficult in critical situations for example to avoid collisions. Such decisions often need to be made in a sequential manner to achieve the eventual goal (e.g., avoiding collisions and recovering to safe conditions), under partially observable environment, and by taking into account how other agents behave. Towards this far-reaching goal of realizing such autonomous agents, we propose practical techniques of sequential decision making in real time and demonstrate their effectiveness in Pommerman, a multi-agent environment that has been used in one of the competitions held at the Thirty-second Conference on Neural Information Processing Systems (NeurIPS 2018) on Dec. 8, 2018 Resnick et al. [2018a]. The techniques that we propose in this paper have been used in the Pommerman agents (HakozakiJunctions and dypm-final) who have won the first and third places in the competition. In Pommerman, a team of two agents competes against another team of two agents on a board of 11 11 grids (see Figure 1 (a) for an initial configuration of the board). Each agent can observe only a limited area of the board, and the agents cannot communicate with each other. The goal of a team is to knock down all of the opponents. Towards this goal, the agents place bombs to destroy wooden walls and collect power-up items that might appear from those wooden walls, while avoiding flames and attacking opponents. See Figure 1 (b) for an example of the board in the middle of the game.
Time-Discounting Convolution for Event Sequences with Ambiguous Timestamps
Katsuki, Takayuki, Osogami, Takayuki, Koseki, Akira, Ono, Masaki, Kudo, Michiharu, Makino, Masaki, Suzuki, Atsushi
This paper proposes a method for modeling event sequences with ambiguous timestamps, a time-discounting convolution. Unlike in ordinary time series, time intervals are not constant, small time-shifts have no significant effect, and inputting timestamps or time durations into a model is not effective. The criteria that we require for the modeling are providing robustness against time-shifts or timestamps uncertainty as well as maintaining the essential capabilities of time-series models, i.e., forgetting meaningless past information and handling infinite sequences. The proposed method handles them with a convolutional mechanism across time with specific parameterizations, which efficiently represents the event dependencies in a time-shift invariant manner while discounting the effect of past events, and a dynamic pooling mechanism, which provides robustness against the uncertainty in timestamps and enhances the time-discounting capability by dynamically changing the pooling window size. In our learning algorithm, the decaying and dynamic pooling mechanisms play critical roles in handling infinite and variable length sequences. Numerical experiments on real-world event sequences with ambiguous timestamps and ordinary time series demonstrated the advantages of our method.
Dynamic Determinantal Point Processes
Osogami, Takayuki (IBM Research AI) | Raymond, Rudy (IBM Research AI) | Goel, Akshay (Graduate School of Mathematics, Kyushu University) | Shirai, Tomoyuki (Institute of Mathematics for Industry, Kyushu University) | Maehara, Takanori (RIKEN Center for Advanced Intelligence Project)
The determinantal point process (DPP) has been receiving increasing attention in machine learning as a generative model of subsets consisting of relevant and diverse items. Recently, there has been a significant progress in developing efficient algorithms for learning the kernel matrix that characterizes a DPP. Here, we propose a dynamic DPP, which is a DPP whose kernel can change over time, and develop efficient learning algorithms for the dynamic DPP. In the dynamic DPP, the kernel depends on the subsets selected in the past, but we assume a particular structure in the dependency to allow efficient learning. We also assume that the kernel has a low rank and exploit a recently proposed learning algorithm for the DPP with low-rank factorization, but also show that its bottleneck computation can be reduced from O ( M 2 K ) time to O ( M K 2 ) time, where M is the number of items under consideration, and K is the rank of the kernel, which can be set smaller than M by orders of magnitude.
Dynamic Boltzmann Machines for Second Order Moments and Generalized Gaussian Distributions
Raymond, Rudy, Osogami, Takayuki, Dasgupta, Sakyasingha
Dynamic Boltzmann Machine (DyBM) has been shown highly efficient to predict time-series data. Gaussian DyBM is a DyBM that assumes the predicted data is generated by a Gaussian distribution whose first-order moment (mean) dynamically changes over time but its second-order moment (variance) is fixed. However, in many financial applications, the assumption is quite limiting in two aspects. First, even when the data follows a Gaussian distribution, its variance may change over time. Such variance is also related to important temporal economic indicators such as the market volatility. Second, financial time-series data often requires learning datasets generated by the generalized Gaussian distribution with an additional shape parameter that is important to approximate heavy-tailed distributions. Addressing those aspects, we show how to extend DyBM that results in significant performance improvement in predicting financial time-series data.