AITopics | Matsushima, Shin

Collaborating Authors

Matsushima, Shin

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Detection of Unobserved Common Causes based on NML Code in Discrete, Mixed, and Continuous Variables

Kobayashi, Masatoshi, Miyagichi, Kohei, Matsushima, Shin

arXiv.org Machine LearningMar-11-2024

Causal discovery in the presence of unobserved common causes from observational data only is a crucial but challenging problem. We categorize all possible causal relationships between two random variables into the following four categories and aim to identify one from observed data: two cases in which either of the direct causality exists, a case that variables are independent, and a case that variables are confounded by latent confounders. Although existing methods have been proposed to tackle this problem, they require unobserved variables to satisfy assumptions on the form of their equation models. In our previous study (Kobayashi et al., 2022), the first causal discovery method without such assumptions is proposed for discrete data and named CLOUD. Using Normalized Maximum Likelihood (NML) Code, CLOUD selects a model that yields the minimum codelength of the observed data from a set of model candidates. This paper extends CLOUD to apply for various data types across discrete, mixed, and continuous. We not only performed theoretical analysis to show the consistency of CLOUD in terms of the model selection, but also demonstrated that CLOUD is more effective than existing methods in inferring causal relationships by extensive experiments on both synthetic and real-world data.

artificial intelligence, codelength, machine learning, (18 more...)

arXiv.org Machine Learning

2403.06499

Country:

Europe > Germany (0.14)
Asia > Japan (0.14)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.92)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory > Minimum Complexity Machines (0.68)

Add feedback

Statistical Learnability of Generalized Additive Models based on Total Variation Regularization

Matsushima, Shin

arXiv.org Machine LearningFeb-16-2018

A generalized additive model (GAM, Hastie and Tibshirani (1987)) is a nonparametric model by the sum of univariate functions with respect to each explanatory variable, i.e., $f({\mathbf x}) = \sum f_j(x_j)$, where $x_j\in\mathbb{R}$ is $j$-th component of a sample ${\mathbf x}\in \mathbb{R}^p$. In this paper, we introduce the total variation (TV) of a function as a measure of the complexity of functions in $L^1_{\rm c}(\mathbb{R})$-space. Our analysis shows that a GAM based on TV-regularization exhibits a Rademacher complexity of $O(\sqrt{\frac{\log p}{m}})$, which is tight in terms of both $m$ and $p$ in the agnostic case of the classification problem. In result, we obtain generalization error bounds for finite samples according to work by Bartlett and Mandelson (2002).

artificial intelligence, learnability, machine learning, (15 more...)

arXiv.org Machine Learning

1802.03001

Country: Asia > Japan (0.14)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

DS-MLR: Exploiting Double Separability for Scaling up Distributed Multinomial Logistic Regression

Raman, Parameswaran, Srinivasan, Sriram, Matsushima, Shin, Zhang, Xinhua, Yun, Hyokun, Vishwanathan, S. V. N.

arXiv.org Machine LearningFeb-14-2018

Scaling multinomial logistic regression to datasets with very large number of data points and classes has not been trivial. This is primarily because one needs to compute the log-partition function on every data point. This makes distributing the computation hard. In this paper, we present a distributed stochastic gradient descent based optimization method (DS-MLR) for scaling up multinomial logistic regression problems to massive scale datasets without hitting any storage constraints on the data and model parameters. Our algorithm exploits double-separability, an attractive property we observe in the objective functions of several models in machine learning, that allows us to achieve both data as well as model parallelism simultaneously. In addition to being parallelizable, our algorithm can also easily be made non-blocking and asynchronous. We demonstrate the effectiveness of DS-MLR empirically on several real-world datasets, the largest being a reddit dataset created out of 1.7 billion user comments, where the data and parameter sizes are 228 GB and 358 GB respectively.

artificial intelligence, ds-mlr, optimization problem, (16 more...)

arXiv.org Machine Learning

1604.04706

Country:

North America > United States (0.28)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)

Genre:

Research Report > New Finding (0.82)
Research Report > Experimental Study (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.56)

Add feedback

Grafting for Combinatorial Boolean Model using Frequent Itemset Mining

Lee, Taito, Matsushima, Shin, Yamanishi, Kenji

arXiv.org Machine LearningNov-13-2017

This paper introduces the combinatorial Boolean model (CBM), which is defined as the class of linear combinations of conjunctions of Boolean attributes. This paper addresses the issue of learning CBM from labeled data. CBM is of high knowledge interpretability but na\"{i}ve learning of it requires exponentially large computation time with respect to data dimension and sample size. To overcome this computational difficulty, we propose an algorithm GRAB (GRAfting for Boolean datasets), which efficiently learns CBM within the $L_1$-regularized loss minimization framework. The key idea of GRAB is to reduce the loss minimization problem to the weighted frequent itemset mining, in which frequent patterns are efficiently computable. We employ benchmark datasets to empirically demonstrate that GRAB is effective in terms of computational efficiency, prediction accuracy and knowledge discovery.

algorithm, artificial intelligence, data mining, (18 more...)

arXiv.org Machine Learning

1711.02478

Country: North America > United States > New York (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

WordRank: Learning Word Embeddings via Robust Ranking

Ji, Shihao, Yun, Hyokun, Yanardag, Pinar, Matsushima, Shin, Vishwanathan, S. V. N.

arXiv.org Machine LearningSep-27-2016

Embedding words in a vector space has gained a lot of attention in recent years. While state-of-the-art methods provide efficient computation of word similarities via a low-dimensional matrix embedding, their motivation is often left unclear. In this paper, we argue that word embedding can be naturally viewed as a ranking problem due to the ranking nature of the evaluation metrics. Then, based on this insight, we propose a novel framework WordRank that efficiently estimates word representations via robust ranking, in which the attention mechanism and robustness to noise are readily achieved via the DCG-like ranking losses. The performance of WordRank is measured in word similarity and word analogy benchmarks, and the results are compared to the state-of-the-art word embedding techniques. Our algorithm is very competitive to the state-of-the- arts on large corpora, while outperforms them by a significant margin when the training set is limited (i.e., sparse and noisy). With 17 million tokens, WordRank performs almost as well as existing methods using 7.2 billion tokens on a popular word similarity benchmark. Our multi-node distributed implementation of WordRank is publicly available for general usage.

machine translation, neural network, wordrank, (20 more...)

arXiv.org Machine Learning

1506.02761

Country: North America > United States > California (0.28)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.68)
(2 more...)

Add feedback

Distributed Stochastic Optimization of the Regularized Risk

Matsushima, Shin, Yun, Hyokun, Zhang, Xinhua, Vishwanathan, S. V. N.

arXiv.org Machine LearningJun-9-2015

Many machine learning algorithms minimize a regularized risk, and stochastic optimization is widely used for this task. When working with massive data, it is desirable to perform stochastic optimization in parallel. Unfortunately, many existing stochastic optimization algorithms cannot be parallelized efficiently. In this paper we show that one can rewrite the regularized risk minimization problem as an equivalent saddle-point problem, and propose an efficient distributed stochastic optimization (DSO) algorithm. We prove the algorithm's rate of convergence; remarkably, our analysis shows that the algorithm scales almost linearly with the number of processors. We also verify with empirical evaluations that the proposed algorithm is competitive with other parallel, general purpose stochastic and batch optimization algorithms for regularized risk minimization.

algorithm, artificial intelligence, machine learning, (16 more...)

arXiv.org Machine Learning

doi: 10.1007/978-3-319-71249-9_28

1406.4363

Genre: Research Report > New Finding (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback