Goto

Collaborating Authors

 Hatano, Daisuke


New classes of the greedy-applicable arm feature distributions in the sparse linear bandit problem

arXiv.org Machine Learning

We consider the sparse contextual bandit problem where arm feature affects reward through the inner product of sparse parameters. Recent studies have developed sparsity-agnostic algorithms based on the greedy arm selection policy. However, the analysis of these algorithms requires strong assumptions on the arm feature distribution to ensure that the greedily selected samples are sufficiently diverse; One of the most common assumptions, relaxed symmetry, imposes approximate origin-symmetry on the distribution, which cannot allow distributions that has origin-asymmetric support. In this paper, we show that the greedy algorithm is applicable to a wider range of the arm feature distributions from two aspects. Firstly, we show that a mixture distribution that has a greedy-applicable component is also greedy-applicable. Second, we propose new distribution classes, related to Gaussian mixture, discrete, and radial distribution, for which the sample diversity is guaranteed. The proposed classes can describe distributions with origin-asymmetric support and, in conjunction with the first claim, provide theoretical guarantees of the greedy policy for a very wide range of the arm feature distributions.


Near-Optimal Regret Bounds for Contextual Combinatorial Semi-Bandits with Linear Payoff Functions

arXiv.org Machine Learning

The contextual combinatorial semi-bandit problem with linear payoff functions is a decision-making problem in which a learner chooses a set of arms with the feature vectors in each round under given constraints so as to maximize the sum of rewards of arms. Several existing algorithms have regret bounds that are optimal with respect to the number of rounds $T$. However, there is a gap of $\tilde{O}(\max(\sqrt{d}, \sqrt{k}))$ between the current best upper and lower bounds, where $d$ is the dimension of the feature vectors, $k$ is the number of the chosen arms in a round, and $\tilde{O}(\cdot)$ ignores the logarithmic factors. The dependence of $k$ and $d$ is of practical importance because $k$ may be larger than $T$ in real-world applications such as recommender systems. In this paper, we fill the gap by improving the upper and lower bounds. More precisely, we show that the C${}^2$UCB algorithm proposed by Qin, Chen, and Zhu (2014) has the optimal regret bound $\tilde{O}(d\sqrt{kT} + dk)$ for the partition matroid constraints. For general constraints, we propose an algorithm that modifies the reward estimates of arms in the C${}^2$UCB algorithm and demonstrate that it enjoys the optimal regret bound for a more general problem that can take into account other objectives simultaneously. We also show that our technique would be applicable to related problems. Numerical experiments support our theoretical results and considerations.


Regret Bounds for Online Portfolio Selection with a Cardinality Constraint

Neural Information Processing Systems

Online portfolio selection is a sequential decision-making problem in which a learner repetitively selects a portfolio over a set of assets, aiming to maximize long-term return. In this paper, we study the problem with the cardinality constraint that the number of assets in a portfolio is restricted to be at most k, and consider two scenarios: (i) in the full-feedback setting, the learner can observe price relatives (rates of return to cost) for all assets, and (ii) in the bandit-feedback setting, the learner can observe price relatives only for invested assets. We propose efficient algorithms for these scenarios, which achieve sublinear regrets. We also provide regret (statistical) lower bounds for both scenarios which nearly match the upper bounds when k is a constant. In addition, we give a computational lower bound, which implies that no algorithm maintains both computational efficiency, as well as a small regret upper bound.


Regret Bounds for Online Portfolio Selection with a Cardinality Constraint

Neural Information Processing Systems

Online portfolio selection is a sequential decision-making problem in which a learner repetitively selects a portfolio over a set of assets, aiming to maximize long-term return. In this paper, we study the problem with the cardinality constraint that the number of assets in a portfolio is restricted to be at most k, and consider two scenarios: (i) in the full-feedback setting, the learner can observe price relatives (rates of return to cost) for all assets, and (ii) in the bandit-feedback setting, the learner can observe price relatives only for invested assets. We propose efficient algorithms for these scenarios that achieve sublinear regrets. We also provide regret (statistical) lower bounds for both scenarios which nearly match the upper bounds when k is a constant. In addition, we give a computational lower bound which implies that no algorithm maintains both computational efficiency, as well as a small regret upper bound.


Causal Bandits with Propagating Inference

arXiv.org Machine Learning

Bandit is a framework for designing sequential experiments. In each experiment, a learner selects an arm $A \in \mathcal{A}$ and obtains an observation corresponding to $A$. Theoretically, the tight regret lower-bound for the general bandit is polynomial with respect to the number of arms $|\mathcal{A}|$. This makes bandit incapable of handling an exponentially large number of arms, hence the bandit problem with side-information is often considered to overcome this lower bound. Recently, a bandit framework over a causal graph was introduced, where the structure of the causal graph is available as side-information. A causal graph is a fundamental model that is frequently used with a variety of real problems. In this setting, the arms are identified with interventions on a given causal graph, and the effect of an intervention propagates throughout all over the causal graph. The task is to find the best intervention that maximizes the expected value on a target node. Existing algorithms for causal bandit overcame the $\Omega(\sqrt{|\mathcal{A}|/T})$ simple-regret lower-bound; however, their algorithms work only when the interventions $\mathcal{A}$ are localized around a single node (i.e., an intervention propagates only to its neighbors). We propose a novel causal bandit algorithm for an arbitrary set of interventions, which can propagate throughout the causal graph. We also show that it achieves $O(\sqrt{ \gamma^*\log(|\mathcal{A}|T) / T})$ regret bound, where $\gamma^*$ is determined by using a causal graph structure. In particular, if the in-degree of the causal graph is bounded, then $\gamma^* = O(N^2)$, where $N$ is the number $N$ of nodes.


Efficient Sublinear-Regret Algorithms for Online Sparse Linear Regression with Limited Observation

Neural Information Processing Systems

Online sparse linear regression is the task of applying linear regression analysis to examples arriving sequentially subject to a resource constraint that a limited number of features of examples can be observed. Despite its importance in many practical applications, it has been recently shown that there is no polynomial-time sublinear-regret algorithm unless NP$\subseteq$BPP, and only an exponential-time sublinear-regret algorithm has been found. In this paper, we introduce mild assumptions to solve the problem. Under these assumptions, we present polynomial-time sublinear-regret algorithms for the online sparse linear regression. In addition, thorough experiments with publicly available data demonstrate that our algorithms outperform other known algorithms.


Scalable Algorithm for Higher-Order Co-Clustering via Random Sampling

AAAI Conferences

We propose a scalable and efficient algorithm for coclustering a higher-order tensor. Viewing tensors with hypergraphs, we propose formulating the co-clustering of a tensor as a problem of partitioning the corresponding hypergraph. Our algorithm is based on the random sampling technique, which has been successfully applied to graph cut problems. We extend a random sampling algorithm for the graph multiwaycut problem to hypergraphs, and design a co-clustering algorithm based on it. Each iteration of our algorithm runs in polynomial on the size of hypergraphs, and thus it performs well even for higher-order tensors, which are difficult to deal with for state-of-the-art algorithm.


Distributed Multiplicative Weights Methods for DCOP

AAAI Conferences

In this game, each player deal with enormous sizes such as a smart grid is rapidly increasing associated with a variable keeps providing probability distributions in AI communities. The distributed constraint optimization over its domain, and tries to minimize the regret, problem (DCOP for short) is arguably the most which is the average additional cost incurred by the probability studied problem in this setting, where the goal is to find an distributions against the strategy of outputting a best assignment that minimizes the total sum of costs incurred single value all the time. We can make the regret of each by (local) cost functions. Since it takes a prohibitively long agent arbitrarily small by utilizing the multiplicative weights time to exactly solve DCOP, we need to resort to incomplete method. Finally, we round the obtained probability distributions algorithms, and a plethora of incomplete algorithms to integer values. We prove that our method converges have been proposed in the literature, such as local search to a certain kind of equilibrium, called a coarse correlated based algorithms (Maheswaran, Pearce, and Tambe 2004; equilibrium. Zhang et al. 2005), inference based algorithms (Farinelli We empirically compare our methods with previous stateof-the-art et al. 2008), graph based algorithms (Bowring et al. 2008; methods. We demonstrate that our methods are Kiekintveld et al. 2010), divide-and-coordinate based algorithms scalable, and that DMW-Game outperforms other methods (Vinyals, Rodriguez-Aguilar, and Cerquides 2010; in terms of solution quality and efficiency. Hatano and Hirayama 2013), and sampling based algorithms (Ottens, Dimitrakakis, and Faltings 2012; Nguyen, Yeoh, and Lau 2013).


Lagrangian Decomposition Algorithm for Allocating Marketing Channels

AAAI Conferences

In this paper, we formulate a new problem related to the well-known influence maximization in the context of computational advertising. Our new problem considers allocating marketing channels (e.g., TV, newspaper, and websites) to advertisers from the view point of a match maker, which was not taken into account in previous studies on the influence maximization. The objective of the problem is to find an allocation such that each advertiser can influence some given number of customers while the slots of marketing channels are limited. We propose an algorithm based on the Lagrangian decomposition. We empirically show that our algorithm computes better quality solutions than existing algorithms, scales up to graphs of 10M vertices, and performs well particularly in a parallel environment.


Dynamic SAT with Decision Change Costs: Formalization and Solutions

AAAI Conferences

We address a dynamic decision problem in which decision makers must pay some costs when they change their decisions along the way. We formalize this problem as Dynamic SAT (DynSAT) with decision change costs, whose goal is to find a sequence of models that minimize the aggregation of the costs for changing variables. We provide two solutions to solve a specific case of this problem. The first uses a Weighted Partial MaxSAT solver after we encode the entire problem as a WeightedPartial MaxSAT problem. The second solution, which we believe is novel, uses the Lagrangian decomposition technique that divides the entire problem into sub-problems, each of which can be separately solved by an exact Weighted Partial MaxSATsolver, and produces both lower and upper bounds on the optimal in an anytime manner. To compare the performance of these solvers, we experimentedon the random problem and the target trackingproblem. The experimental results show that a solver based on Lagrangian decomposition performs better for the random problem and competitively for the target tracking problem.