AITopics | Takeda, Akiko

Collaborating Authors

Takeda, Akiko

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Zeroth-Order Methods for Nonconvex Stochastic Problems with Decision-Dependent Distributions

Hikima, Yuya, Takeda, Akiko

arXiv.org Artificial IntelligenceDec-28-2024

In this study, we consider an optimization problem with uncertainty dependent on decision variables, which has recently attracted attention due to its importance in machine learning and pricing applications. In this problem, the gradient of the objective function cannot be obtained explicitly because the decision-dependent distribution is unknown. Therefore, several zeroth-order methods have been proposed, which obtain noisy objective values by sampling and update the iterates. Although these existing methods have theoretical convergence for optimization problems with decision-dependent uncertainty, they require strong assumptions about the function and distribution or exhibit large variances in their gradient estimators. To overcome these issues, we propose two zeroth-order methods under mild assumptions. First, we develop a zeroth-order method with a new one-point gradient estimator including a variance reduction parameter. The proposed method updates the decision variables while adjusting the variance reduction parameter. Second, we develop a zeroth-order method with a two-point gradient estimator. There are situations where only one-point estimators can be used, but if both one-point and two-point estimators are available, it is more practical to use the two-point estimator. As theoretical results, we show the convergence of our methods to stationary points and provide the worst-case iteration and sample complexity analysis. Our simulation experiments with real data on a retail service application show that our methods output solutions with lower objective values than the conventional zeroth-order methods.

artificial intelligence, gradient estimator, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2412.2033

Country: Asia > Japan (0.14)

Genre: Research Report > New Finding (0.87)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.86)

Add feedback

SLTrain: a sparse plus low-rank approach for parameter and memory efficient pretraining

Han, Andi, Li, Jiaxiang, Huang, Wei, Hong, Mingyi, Takeda, Akiko, Jawanpuria, Pratik, Mishra, Bamdev

arXiv.org Artificial IntelligenceJun-4-2024

Large language models (LLMs) have shown impressive capabilities across various tasks. However, training LLMs from scratch requires significant computational power and extensive memory capacity. Recent studies have explored low-rank structures on weights for efficient fine-tuning in terms of parameters and memory, either through low-rank adaptation or factorization. While effective for fine-tuning, low-rank structures are generally less suitable for pretraining because they restrict parameters to a low-dimensional subspace. In this work, we propose to parameterize the weights as a sum of low-rank and sparse matrices for pretraining, which we call SLTrain. The low-rank component is learned via matrix factorization, while for the sparse component, we employ a simple strategy of uniformly selecting the sparsity support at random and learning only the non-zero entries with the fixed support. While being simple, the random fixed-support sparse learning strategy significantly enhances pretraining when combined with low-rank learning. Our results show that SLTrain adds minimal extra parameters and memory costs compared to pretraining with low-rank parameterization, yet achieves substantially better performance, which is comparable to full-rank training. Remarkably, when combined with quantization and per-layer updates, SLTrain can reduce memory requirements by up to 73% when pretraining the LLaMA 7B model.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2406.02214

Country:

Asia (0.14)
North America > United States (0.14)

Genre: Research Report > New Finding (0.54)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

A Framework for Bilevel Optimization on Riemannian Manifolds

Han, Andi, Mishra, Bamdev, Jawanpuria, Pratik, Takeda, Akiko

arXiv.org Artificial IntelligenceFeb-6-2024

Bilevel optimization has seen an increasing presence in various domains of applications. In this work, we propose a framework for solving bilevel optimization problems where variables of both lower and upper level problems are constrained on Riemannian manifolds. We provide several hypergradient estimation strategies on manifolds and study their estimation error. We provide convergence and complexity analysis for the proposed hypergradient descent algorithm on manifolds. We also extend the developments to stochastic bilevel optimization and to the use of general retraction. We showcase the utility of the proposed framework on various applications.

artificial intelligence, machine learning, optimization, (17 more...)

arXiv.org Artificial Intelligence

2402.03883

Country:

Asia (0.14)
Europe (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)

Add feedback

Theory and Algorithms for Shapelet-based Multiple-Instance Learning

Suehiro, Daiki, Hatano, Kohei, Takimoto, Eiji, Yamamoto, Shuji, Bannai, Kenichi, Takeda, Akiko

arXiv.org Machine LearningOct-13-2020

We propose a new formulation of Multiple-Instance Learning (MIL), in which a unit of data consists of a set of instances called a bag. The goal is to find a good classifier of bags based on the similarity with a "shapelet" (or pattern), where the similarity of a bag with a shapelet is the maximum similarity of instances in the bag. In previous work, some of the training instances are chosen as shapelets with no theoretical justification. In our formulation, we use all possible, and thus infinitely many shapelets, resulting in a richer class of classifiers. We show that the formulation is tractable, that is, it can be reduced through Linear Programming Boosting (LPBoost) to Difference of Convex (DC) programs of finite (actually polynomial) size. Our theoretical result also gives justification to the heuristics of some of the previous work. The time complexity of the proposed algorithm highly depends on the size of the set of all instances in the training sample. To apply to the data containing a large number of instances, we also propose a heuristic option of the algorithm without the loss of the theoretical guarantee. Our empirical study demonstrates that our algorithm uniformly works for Shapelet Learning tasks on time-series classification and various MIL tasks with comparable accuracy to the existing methods. Moreover, we show that the proposed heuristics allow us to achieve the result with reasonable computational time.

artificial intelligence, optimization problem, shapelet, (16 more...)

arXiv.org Machine Learning

doi: 10.1162/neco_a_01297

2006.0113

Country: Asia > Japan (0.14)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.68)

Add feedback

Trimmed Density Ratio Estimation

Liu, Song, Takeda, Akiko, Suzuki, Taiji, Fukumizu, Kenji

Neural Information Processing SystemsFeb-14-2020, 15:41:29 GMT

Density ratio estimation is a vital tool in both machine learning and statistical community. However, due to the unbounded nature of density ratio, the estimation proceudre can be vulnerable to corrupted data points, which often pushes the estimated ratio toward infinity. In this paper, we present a robust estimator which automatically identifies and trims outliers. The proposed estimator has a convex formulation, and the global optimum can be obtained via subgradient descent. We analyze the parameter estimation error of this estimator under high-dimensional settings.

artificial intelligence, density ratio estimation, machine learning, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.71)

Add feedback

Multiple-Instance Learning by Boosting Infinitely Many Shapelet-based Classifiers

Suehiro, Daiki, Hatano, Kohei, Takimoto, Eiji, Yamamoto, Shuji, Bannai, Kenichi, Takeda, Akiko

arXiv.org Machine LearningDec-9-2018

We propose a new formulation of Multiple-Instance Learning (MIL). In typical MIL settings, a unit of data is given as a set of instances called a bag and the goal is to find a good classifier of bags based on similarity from a single or finitely many "shapelets" (or patterns), where the similarity of the bag from a shapelet is the maximum similarity of instances in the bag. Classifiers based on a single shapelet are not sufficiently strong for certain applications. Additionally, previous work with multiple shapelets has heuristically chosen some of the instances as shapelets with no theoretical guarantee of its generalization ability. Our formulation provides a richer class of the final classifiers based on infinitely many shapelets. We provide an efficient algorithm for the new formulation, in addition to generalization bound. Our empirical study demonstrates that our approach is effective not only for MIL tasks but also for Shapelet Learning for time-series classification.

artificial intelligence, machine learning, shapelet, (16 more...)

arXiv.org Machine Learning

1811.08084

Country:

North America > United States (0.46)
Asia > Japan > Kyūshū & Okinawa > Kyūshū (0.14)
Asia > Japan > Honshū > Kantō (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Robust Bayesian Model Selection for Variable Clustering with the Gaussian Graphical Model

Andrade, Daniel, Takeda, Akiko, Fukumizu, Kenji

arXiv.org Machine LearningJun-15-2018

Variable clustering is important for explanatory analysis. However, only few dedicated methods for variable clustering with the Gaussian graphical model have been proposed. Even more severe, small insignificant partial correlations due to noise can dramatically change the clustering result when evaluating for example with the Bayesian Information Criteria (BIC). In this work, we try to address this issue by proposing a Bayesian model that accounts for negligible small, but not necessarily zero, partial correlations. Based on our model, we propose to evaluate a variable clustering result using the marginal likelihood. To address the intractable calculation of the marginal likelihood, we propose two solutions: one based on a variational approximation, and another based on MCMC. Experiments on simulated data shows that the proposed method is similarly accurate as BIC in the no noise setting, but considerably more accurate when there are noisy partial correlations. Furthermore, on real data the proposed method provides clustering results that are intuitively sensible, which is not always the case when using BIC or its extensions.

anmi 0, bayesian inference, health & medicine, (18 more...)

arXiv.org Machine Learning

1806.05924

Country:

Asia (0.28)
North America > United States (0.28)

Genre: Research Report > New Finding (0.67)

Industry:

Banking & Finance > Trading (0.47)
Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Trimmed Density Ratio Estimation

Liu, Song, Takeda, Akiko, Suzuki, Taiji, Fukumizu, Kenji

Neural Information Processing SystemsDec-31-2017

artificial intelligence, estimator, optimization problem, (16 more...)

Neural Information Processing Systems

Country:

Asia > Japan (0.14)
North America > United States (0.14)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.88)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Add feedback

Position-based Multiple-play Bandit Problem with Unknown Position Bias

Komiyama, Junpei, Honda, Junya, Takeda, Akiko

Neural Information Processing SystemsDec-31-2017

Motivated by online advertising, we study a multiple-play multi-armed bandit problem with position bias that involves several slots and the latter slots yield fewer rewards. We characterize the hardness of the problem by deriving an asymptotic regret bound. We propose the Permutation Minimum Empirical Divergence (PMED) algorithm and derive its asymptotically optimal regret bound. Because of the uncertainty of the position bias, the optimal algorithm for such a problem requires non-convex optimizations that are different from usual partial monitoring and semi-bandit problems. We propose a cutting-plane method and related bi-convex relaxation for these optimizations by using auxiliary variables.

algorithm, artificial intelligence, big data, (18 more...)

Neural Information Processing Systems

Country: North America > United States (0.46)

Industry:

Marketing (0.89)
Information Technology > Services (0.34)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Optimistic Robust Optimization With Applications To Machine Learning

Norton, Matthew, Takeda, Akiko, Mafusalov, Alexander

arXiv.org Machine LearningNov-20-2017

Robust Optimization has traditionally taken a pessimistic, or worst-case viewpoint of uncertainty which is motivated by a desire to find sets of optimal policies that maintain feasibility under a variety of operating conditions. In this paper, we explore an optimistic, or best-case view of uncertainty and show that it can be a fruitful approach. We show that these techniques can be used to address a wide variety of problems. First, we apply our methods in the context of robust linear programming, providing a method for reducing conservatism in intuitive ways that encode economically realistic modeling assumptions. Second, we look at problems in machine learning and find that this approach is strongly connected to the existing literature. Specifically, we provide a new interpretation for popular sparsity inducing non-convex regularization schemes. Additionally, we show that successful approaches for dealing with outliers and noise can be interpreted as optimistic robust optimization problems. Although many of the problems resulting from our approach are non-convex, we find that DCA or DCA-like optimization approaches can be intuitive and efficient.

application, artificial intelligence, optimization problem, (14 more...)

arXiv.org Machine Learning

1711.07511

Country: North America > United States (0.46)

Genre: Research Report (0.63)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback