AITopics | Sakaue, Shinsaku

Collaborating Authors

Sakaue, Shinsaku

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Bandit and Delayed Feedback in Online Structured Prediction

Shibukawa, Yuki, Tsuchiya, Taira, Sakaue, Shinsaku, Yamanishi, Kenji

arXiv.org Machine LearningFeb-25-2025

Online structured prediction is a task of sequentially predicting outputs with complex structures based on inputs and past observations, encompassing online classification. Recent studies showed that in the full information setup, we can achieve finite bounds on the surrogate regret, i.e., the extra target loss relative to the best possible surrogate loss. In practice, however, full information feedback is often unrealistic as it requires immediate access to the whole structure of complex outputs. Motivated by this, we propose algorithms that work with less demanding feedback, bandit and delayed feedback. For the bandit setting, using a standard inverse-weighted gradient estimator, we achieve a surrogate regret bound of $O(\sqrt{KT})$ for the time horizon $T$ and the size of the output set $K$. However, $K$ can be extremely large when outputs are highly complex, making this result less desirable. To address this, we propose an algorithm that achieves a surrogate regret bound of $O(T^{2/3})$, which is independent of $K$. This is enabled with a carefully designed pseudo-inverse matrix estimator. Furthermore, for the delayed full information feedback setup, we obtain a surrogate regret bound of $O(D^{2/3} T^{1/3})$ for the delay time $D$. We also provide algorithms for the delayed bandit feedback setup. Finally, we numerically evaluate the performance of the proposed algorithms in online classification with bandit feedback.

artificial intelligence, inductive learning, machine learning, (17 more...)

arXiv.org Machine Learning

2502.18709

Country: Asia (0.14)

Genre: Research Report > New Finding (0.87)

Industry: Education > Educational Setting > Online (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.62)

Add feedback

Online Inverse Linear Optimization: Improved Regret Bound, Robustness to Suboptimality, and Toward Tight Regret Analysis

Sakaue, Shinsaku, Tsuchiya, Taira, Bao, Han, Oki, Taihei

arXiv.org Artificial IntelligenceFeb-13-2025

We study an online learning problem where, over $T$ rounds, a learner observes both time-varying sets of feasible actions and an agent's optimal actions, selected by solving linear optimization over the feasible actions. The learner sequentially makes predictions of the agent's underlying linear objective function, and their quality is measured by the regret, the cumulative gap between optimal objective values and those achieved by following the learner's predictions. A seminal work by B\"armann et al. (ICML 2017) showed that online learning methods can be applied to this problem to achieve regret bounds of $O(\sqrt{T})$. Recently, Besbes et al. (COLT 2021, Oper. Res. 2023) significantly improved the result by achieving an $O(n^4\ln T)$ regret bound, where $n$ is the dimension of the ambient space of objective vectors. Their method, based on the ellipsoid method, runs in polynomial time but is inefficient for large $n$ and $T$. In this paper, we obtain an $O(n\ln T)$ regret bound, improving upon the previous bound of $O(n^4\ln T)$ by a factor of $n^3$. Our method is simple and efficient: we apply the online Newton step (ONS) to appropriate exp-concave loss functions. Moreover, for the case where the agent's actions are possibly suboptimal, we establish an $O(n\ln T+\sqrt{\Delta_Tn\ln T})$ regret bound, where $\Delta_T$ is the cumulative suboptimality of the agent's actions. This bound is achieved by using MetaGrad, which runs ONS with $\Theta(\ln T)$ different learning rates in parallel. We also provide a simple instance that implies an $\Omega(n)$ lower bound, showing that our $O(n\ln T)$ bound is tight up to an $O(\ln T)$ factor. This gives rise to a natural question: can the $O(\ln T)$ factor in the upper bound be removed? For the special case of $n=2$, we show that an $O(1)$ regret bound is possible, while we delineate challenges in extending this result to higher dimensions.

artificial intelligence, machine learning, online inverse linear optimization, (8 more...)

arXiv.org Artificial Intelligence

2501.14349

Genre: Research Report (0.54)

Industry: Education (0.73)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Revisiting Online Learning Approach to Inverse Linear Optimization: A Fenchel$-$Young Loss Perspective and Gap-Dependent Regret Analysis

Sakaue, Shinsaku, Bao, Han, Tsuchiya, Taira

arXiv.org Artificial IntelligenceFeb-7-2025

Linear optimization is arguably the most widely used model of decision-making. Inverse linear optimization is its inverse problem, where the goal is to infer linear objective functions from observed outcomes. Since the early development in geographical science (Tarantola, 1988; Burton and Toint, 1992), inverse linear optimization has been an important subject of study (Ahuja and Orlin, 2001; Heuberger, 2004; Chan et al., 2019) and used in various applications, ranging from route recommendation to healthcare (Chan et al., 2023). Inverse linear optimization is particularly interesting when forward linear optimization is a decision-making model of a human agent.1 Then, the linear objective function represents the agent's internal preference. If the agent repeatedly takes an action upon facing a set of feasible actions, inverse linear optimization can be seen as online learning of the agent's internal preference from pairs of the feasible sets and the agent's choices. Bärmann et al. (2017) studied this setting and proposed an elegant approach based on online learning, which is the focus of this paper and is described below. Consider an agent who addresses decision problems of the following linear-optimization form for = 1,...,: maximize

artificial intelligence, machine learning, optimization, (17 more...)

arXiv.org Artificial Intelligence

2501.13648

Country: Asia > Japan > Honshū (0.14)

Genre: Research Report (0.40)

Industry: Education > Educational Setting > Online (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.95)
Information Technology > Enterprise Applications > Human Resources > Learning Management (0.93)

Add feedback

Any-stepsize Gradient Descent for Separable Data under Fenchel--Young Losses

Bao, Han, Sakaue, Shinsaku, Takezawa, Yuki

arXiv.org Machine LearningFeb-7-2025

The gradient descent (GD) has been one of the most common optimizer in machine learning. In particular, the loss landscape of a neural network is typically sharpened during the initial phase of training, making the training dynamics hover on the edge of stability. This is beyond our standard understanding of GD convergence in the stable regime where arbitrarily chosen stepsize is sufficiently smaller than the edge of stability. Recently, Wu et al. (COLT2024) have showed that GD converges with arbitrary stepsize under linearly separable logistic regression. Although their analysis hinges on the self-bounding property of the logistic loss, which seems to be a cornerstone to establish a modified descent lemma, our pilot study shows that other loss functions without the self-bounding property can make GD converge with arbitrary stepsize. To further understand what property of a loss function matters in GD, we aim to show arbitrary-stepsize GD convergence for a general loss function based on the framework of \emph{Fenchel--Young losses}. We essentially leverage the classical perceptron argument to derive the convergence rate for achieving $\epsilon$-optimal loss, which is possible for a majority of Fenchel--Young losses. Among typical loss functions, the Tsallis entropy achieves the GD convergence rate $T=\Omega(\epsilon^{-1/2})$, and the R{\'e}nyi entropy achieves the far better rate $T=\Omega(\epsilon^{-1/3})$. We argue that these better rate is possible because of \emph{separation margin} of loss functions, instead of the self-bounding property.

artificial intelligence, machine learning, separation margin, (15 more...)

arXiv.org Machine Learning

2502.04889

Country: Asia > Japan > Honshū (0.14)

Genre:

Research Report > New Finding (0.34)
Research Report > Experimental Study (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.85)

Add feedback

No-Regret M${}^{\natural}$-Concave Function Maximization: Stochastic Bandit Algorithms and NP-Hardness of Adversarial Full-Information Setting

Oki, Taihei, Sakaue, Shinsaku

arXiv.org Artificial IntelligenceMay-20-2024

M${}^{\natural}$-concave functions, a.k.a. gross substitute valuation functions, play a fundamental role in many fields, including discrete mathematics and economics. In practice, perfect knowledge of M${}^{\natural}$-concave functions is often unavailable a priori, and we can optimize them only interactively based on some feedback. Motivated by such situations, we study online M${}^{\natural}$-concave function maximization problems, which are interactive versions of the problem studied by Murota and Shioura (1999). For the stochastic bandit setting, we present $O(T^{-1/2})$-simple regret and $O(T^{2/3})$-regret algorithms under $T$ times access to unbiased noisy value oracles of M${}^{\natural}$-concave functions. A key to proving these results is the robustness of the greedy algorithm to local errors in M${}^{\natural}$-concave function maximization, which is one of our main technical results. While we obtain those positive results for the stochastic setting, another main result of our work is an impossibility in the adversarial setting. We prove that, even with full-information feedback, no algorithms that run in polynomial time per round can achieve $O(T^{1-c})$ regret for any constant $c > 0$ unless $\mathsf{P} = \mathsf{NP}$. Our proof is based on a reduction from the matroid intersection problem for three matroids, which would be a novel idea in the context of online learning.

algorithm, artificial intelligence, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2405.12439

Country: North America > United States (0.28)

Genre: Research Report (1.00)

Industry: Education > Educational Setting > Online (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
Information Technology > Data Science > Data Mining > Big Data (0.84)

Add feedback

Online Structured Prediction with Fenchel--Young Losses and Improved Surrogate Regret for Online Multiclass Classification with Logistic Loss

Sakaue, Shinsaku, Bao, Han, Tsuchiya, Taira, Oki, Taihei

arXiv.org Artificial IntelligenceFeb-12-2024

This paper studies online structured prediction with full-information feedback. For online multiclass classification, van der Hoeven (2020) has obtained surrogate regret bounds independent of the time horizon, or \emph{finite}, by introducing an elegant \emph{exploit-the-surrogate-gap} framework. However, this framework has been limited to multiclass classification primarily because it relies on a classification-specific procedure for converting estimated scores to outputs. We extend the exploit-the-surrogate-gap framework to online structured prediction with \emph{Fenchel--Young losses}, a large family of surrogate losses including the logistic loss for multiclass classification, obtaining finite surrogate regret bounds in various structured prediction problems. To this end, we propose and analyze \emph{randomized decoding}, which converts estimated scores to general structured outputs. Moreover, by applying our decoding to online multiclass classification with the logistic loss, we obtain a surrogate regret bound of $O(B^2)$, where $B$ is the $\ell_2$-diameter of the domain. This bound is tight up to logarithmic factors and improves the previous bound of $O(dB^2)$ due to van der Hoeven (2020) by a factor of $d$, the number of classes.

artificial intelligence, machine learning, surrogate regret, (18 more...)

arXiv.org Artificial Intelligence

2402.0818

Country: Asia > Japan > Honshū (0.14)

Genre: Research Report (1.00)

Industry: Education > Educational Setting > Online (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)

Add feedback

Data-Driven Projection for Reducing Dimensionality of Linear Programs: Generalization Bound and Learning Methods

Sakaue, Shinsaku, Oki, Taihei

arXiv.org Artificial IntelligenceJan-29-2024

How to solve high-dimensional linear programs (LPs) efficiently is a fundamental question. Recently, there has been a surge of interest in reducing LP sizes using \textit{random projections}, which can accelerate solving LPs independently of improving LP solvers. In this paper, we explore a new direction of \emph{data-driven projections}, which use projection matrices learned from data instead of random projection matrices. Given data of past $n$-dimensional LPs, we learn an $n\times k$ projection matrix such that $n > k$. When addressing a future LP instance, we reduce its dimensionality from $n$ to $k$ via the learned projection matrix, solve the resulting LP to obtain a $k$-dimensional solution, and apply the learned matrix to it to recover an $n$-dimensional solution. On the theoretical side, a natural question is: how much data is sufficient to ensure the quality of recovered solutions? We address this question based on the framework of \textit{data-driven algorithm design}, which connects the amount of data sufficient for establishing generalization bounds to the \textit{pseudo-dimension} of performance metrics. We obtain an $\tilde{\mathrm{O}}(nk^2)$ upper bound on the pseudo-dimension, where $\tilde{\mathrm{O}}$ compresses logarithmic factors. We also provide an $\Omega(nk)$ lower bound, implying our result is tight up to an $\tilde{\mathrm{O}}(k)$ factor. On the practical side, we explore two natural methods for learning projection matrices: PCA- and gradient-based methods. While the former is simple and efficient, the latter can sometimes lead to better solution quality. Our experiments confirm the practical benefit of learning projection matrices from data, achieving significantly higher solution quality than the existing random projection while greatly reducing the time for solving LPs.

artificial intelligence, machine learning, projection matrix, (15 more...)

arXiv.org Artificial Intelligence

2309.00203

Country: Asia > Japan (0.14)

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.48)

Add feedback

Faster Discrete Convex Function Minimization with Predictions: The M-Convex Case

Oki, Taihei, Sakaue, Shinsaku

arXiv.org Artificial IntelligenceJun-9-2023

Recent research on algorithms with predictions [29] has demonstrated that we can improve algorithms' performance beyond the limitations of the worst-case analysis using predictions learned from past data. In particular, a surge of interest has been given to research on using predictions to improve the time complexity of algorithms, which we refer to as warm-starts with predictions for convenience. Since Dinitz et al. [11]'s seminal work on speeding up the Hungarian method for weighted bipartite matching with predictions, researchers have extended this idea to algorithms for various problems [7, 35, 10]. Sakaue and Oki [39] have found similarities between the idea and standard warm-starts in continuous convex optimization and extended it to L-convex function minimization, a broad class of discrete optimization problems studied in discrete convex analysis [31]. They thus have shown that warm-starts with predictions can improve the time complexity of algorithms for various discrete optimization problems, including weighted bipartite matching and weighted matroid intersection. In this paper, we extend the idea of warm-starts with predictions to a new direction called M-convex function minimization, another important problem class studied in discrete convex analysis. The M-convexity is in conjugate relation to the L-convexity. Therefore, exploring the applicability of warm-starts with predictions to M-convex function minimization is crucial to broaden further the range of algorithms that can benefit from predictions, as is also mentioned in [39]. Specifically, we focus on an important subclass of M-convex function minimization called laminar convex minimization (Laminar), which forms a large problem class and is widely studied in the field of operations research.

algorithm, artificial intelligence, optimization problem, (13 more...)

arXiv.org Artificial Intelligence

2306.05865

Country:

Asia > Japan (0.14)
Europe (0.14)
North America > United States > New York (0.14)

Genre: Research Report > New Finding (0.66)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)

Add feedback

Rethinking Warm-Starts with Predictions: Learning Predictions Close to Sets of Optimal Solutions for Faster $\text{L}$-/$\text{L}^\natural$-Convex Function Minimization

Sakaue, Shinsaku, Oki, Taihei

arXiv.org Artificial IntelligenceFeb-2-2023

An emerging line of work has shown that machine-learned predictions are useful to warm-start algorithms for discrete optimization problems, such as bipartite matching. Previous studies have shown time complexity bounds proportional to some distance between a prediction and an optimal solution, which we can approximately minimize by learning predictions from past optimal solutions. However, such guarantees may not be meaningful when multiple optimal solutions exist. Indeed, the dual problem of bipartite matching and, more generally, $\text{L}$-/$\text{L}^\natural$-convex function minimization have arbitrarily many optimal solutions, making such prediction-dependent bounds arbitrarily large. To resolve this theoretically critical issue, we present a new warm-start-with-prediction framework for $\text{L}$-/$\text{L}^\natural$-convex function minimization. Our framework offers time complexity bounds proportional to the distance between a prediction and the set of all optimal solutions. The main technical difficulty lies in learning predictions that are provably close to sets of all optimal solutions, for which we present an online-gradient-descent-based method. We thus give the first polynomial-time learnability of predictions that can provably warm-start algorithms regardless of multiple optimal solutions.

artificial intelligence, optimal solution, optimization problem, (17 more...)

arXiv.org Artificial Intelligence

2302.00928

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)

Add feedback

Beyond Adaptive Submodularity: Approximation Guarantees of Greedy Policy with Adaptive Submodularity Ratio

Fujii, Kaito, Sakaue, Shinsaku

arXiv.org Machine LearningApr-24-2019

We propose a new concept named adaptive submodularity ratio to study the greedy policy for sequential decision making. While the greedy policy is known to perform well for a wide variety of adaptive stochastic optimization problems in practice, its theoretical properties have been analyzed only for a limited class of problems. We narrow the gap between theory and practice by using adaptive submodularity ratio, which enables us to prove approximation guarantees of the greedy policy for a substantially wider class of problems. Examples of newly analyzed problems include important applications such as adaptive influence maximization and adaptive feature selection. Our adaptive submodularity ratio also provides bounds of adaptivity gaps. Experiments confirm that the greedy policy performs well with the applications being considered compared to standard heuristics.

artificial intelligence, optimization problem, submodularity ratio, (16 more...)

arXiv.org Machine Learning

1904.10748

Country: North America > United States > California (0.14)

Genre: Research Report (0.81)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.48)

Add feedback