AITopics | Terada, Yoshikazu

Collaborating Authors

Terada, Yoshikazu

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Nonparametric logistic regression with deep learning

Yara, Atsutomo, Terada, Yoshikazu

arXiv.org Machine LearningJan-22-2024

Consider the nonparametric logistic regression problem. In the logistic regression, we usually consider the maximum likelihood estimator, and the excess risk is the expectation of the Kullback-Leibler (KL) divergence between the true and estimated conditional class probabilities. However, in the nonparametric logistic regression, the KL divergence could diverge easily, and thus, the convergence of the excess risk is difficult to prove or does not hold. Several existing studies show the convergence of the KL divergence under strong assumptions. In most cases, our goal is to estimate the true conditional class probabilities. Thus, instead of analyzing the excess risk itself, it suffices to show the consistency of the maximum likelihood estimator in some suitable metric. In this paper, using a simple unified approach for analyzing the nonparametric maximum likelihood estimator (NPMLE), we directly derive the convergence rates of the NPMLE in the Hellinger distance under mild assumptions. Although our results are similar to the results in some existing studies, we provide simple and more direct proofs for these results. As an important application, we derive the convergence rates of the NPMLE with deep neural networks and show that the derived rate nearly achieves the minimax optimal rate.

artificial intelligence, machine learning, nonparametric logistic regression, (1 more...)

arXiv.org Machine Learning

2401.12482

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Convex Clustering through MM: An Efficient Algorithm to Perform Hierarchical Clustering

Touw, Daniel J. W., Groenen, Patrick J. F., Terada, Yoshikazu

arXiv.org Machine LearningDec-21-2023

Convex clustering is a modern method with both hierarchical and $k$-means clustering characteristics. Although convex clustering can capture complex clustering structures hidden in data, the existing convex clustering algorithms are not scalable to large data sets with sample sizes greater than several thousands. Moreover, it is known that convex clustering sometimes fails to produce a complete hierarchical clustering structure. This issue arises if clusters split up or the minimum number of possible clusters is larger than the desired number of clusters. In this paper, we propose convex clustering through majorization-minimization (CCMM) -- an iterative algorithm that uses cluster fusions and a highly efficient updating scheme derived using diagonal majorization. Additionally, we explore different strategies to ensure that the hierarchical clustering structure terminates in a single cluster. With a current desktop computer, CCMM efficiently solves convex clustering problems featuring over one million objects in seven-dimensional space, achieving a solution time of 51 seconds on average.

algorithm, artificial intelligence, machine learning, (17 more...)

arXiv.org Machine Learning

2211.01877

Country:

Europe (0.46)
Asia > Japan > Honshū (0.14)

Genre: Research Report > New Finding (0.68)

Industry: Energy > Power Industry (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

More Powerful Selective Kernel Tests for Feature Selection

Lim, Jen Ning, Yamada, Makoto, Jitkrittum, Wittawat, Terada, Yoshikazu, Matsui, Shigeyuki, Shimodaira, Hidetoshi

arXiv.org Machine LearningOct-14-2019

Refining one's hypotheses in the light of data is a commonplace scientific practice, however, this approach introduces selection bias and can lead to specious statistical analysis. One approach of addressing this phenomena is via conditioning on the selection procedure, i.e., how we have used the data to generate our hypotheses, and prevents information to be used again after selection. Many selective inference (a.k.a. post-selection inference) algorithms typically take this approach but will "over-condition" for sake of tractability. While this practice obtains well calibrated $p$-values, it can incur a major loss in power. In our work, we extend two recent proposals for selecting features using the Maximum Mean Discrepancy and Hilbert Schmidt Independence Criterion to condition on the minimal conditioning event. We show how recent advances in multiscale bootstrap makes conditioning on the minimal selection event possible and demonstrate our proposal over a range of synthetic and real world experiments. Our results show that our proposed test is indeed more powerful in most scenarios.

artificial intelligence, null hsic inc, survey article, (17 more...)

arXiv.org Machine Learning

1910.06134

Country:

Asia > Japan > Honshū (0.28)
North America > United States (0.28)

Genre: Research Report > New Finding (0.54)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.48)

Add feedback

Fast generalization error bound of deep learning without scale invariance of activation functions

Terada, Yoshikazu, Hirose, Ryoma

arXiv.org Machine LearningJul-25-2019

In theoretical analysis of deep learning, discovering which features of deep learning lead to good performance is an important task. In this paper, using the framework for analyzing the generalization error developed in Suzuki (2018), we derive a fast learning rate for deep neural networks with more general activation functions. In Suzuki (2018), assuming the scale invariance of activation functions, the tight generalization error bound of deep learning was derived. They mention that the scale invariance of the activation function is essential to derive tight error bounds. Whereas the rectified linear unit (ReLU; Nair and Hinton, 2010) satisfies the scale invariance, the other famous activation functions including the sigmoid and the hyperbolic tangent functions, and the exponential linear unit (ELU; Clevert et al., 2016) does not satisfy this condition. The existing analysis indicates a possibility that a deep learning with the non scale invariant activations may have a slower convergence rate of $O(1/\sqrt{n})$ when one with the scale invariant activations can reach a rate faster than $O(1/\sqrt{n})$. In this paper, without the scale invariance of activation functions, we derive the tight generalization error bound which is essentially the same as that of Suzuki (2018). From this result, at least in the framework of Suzuki (2018), it is shown that the scale invariance of the activation functions is not essential to get the fast rate of convergence. Simultaneously, it is also shown that the theoretical framework proposed by Suzuki (2018) can be widely applied for analysis of deep learning with general activation functions.

deep learning, neural network, scale invariance, (15 more...)

arXiv.org Machine Learning

1907.109

Country: Asia > Japan (0.14)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Clustering for high-dimension, low-sample size data using distance vectors

Terada, Yoshikazu

arXiv.org Machine LearningDec-25-2013

In high-dimension, low-sample size (HDLSS) data, it is not always true that closeness of two objects reflects a hidden cluster structure. We point out the important fact that it is not the closeness, but the "values" of distance that contain information of the cluster structure in high-dimensional space. Based on this fact, we propose an efficient and simple clustering approach, called distance vector clustering, for HDLSS data. Under the assumptions given in the work of Hall et al. (2005), we show the proposed approach provides a true cluster label under milder conditions when the dimension tends to infinity with the sample size fixed. The effectiveness of the distance vector clustering approach is illustrated through a numerical experiment and real data analysis.

artificial intelligence, machine learning, vector, (15 more...)

arXiv.org Machine Learning

1312.3386

Country: Asia > Japan (0.14)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.75)

Add feedback