AITopics | Oono, Kenta

Collaborating Authors

Oono, Kenta

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Virtual Human Generative Model: Masked Modeling Approach for Learning Human Characteristics

Oono, Kenta, Charoenphakdee, Nontawat, Bito, Kotatsu, Gao, Zhengyan, Ota, Yoshiaki, Yamaguchi, Shoichiro, Sugawara, Yohei, Maeda, Shin-ichi, Miyoshi, Kunihiko, Saito, Yuki, Tsuda, Koki, Maruyama, Hiroshi, Hayashi, Kohei

arXiv.org Artificial IntelligenceAug-14-2023

Identifying the relationship between healthcare attributes, lifestyles, and personality is vital for understanding and improving physical and mental conditions. Machine learning approaches are promising for modeling their relationships and offering actionable suggestions. In this paper, we propose Virtual Human Generative Model (VHGM), a machine learning model for estimating attributes about healthcare, lifestyles, and personalities. VHGM is a deep generative model trained with masked modeling to learn the joint distribution of attributes conditioned on known ones. Using heterogeneous tabular datasets, VHGM learns more than 1,800 attributes efficiently. We numerically evaluate the performance of VHGM and its training techniques. As a proof-of-concept of VHGM, we present several applications demonstrating user scenarios, such as virtual measurements of healthcare attributes and hypothesis verifications of lifestyles.

artificial intelligence, deep learning, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2306.10656

Country: North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre:

Research Report > Strength High (1.00)
Research Report > Experimental Study (1.00)
Research Report > New Finding (0.68)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Consumer Health (1.00)
Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Generation (0.81)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Controlling Posterior Collapse by an Inverse Lipschitz Constraint on the Decoder Network

Kinoshita, Yuri, Oono, Kenta, Fukumizu, Kenji, Yoshida, Yuichi, Maeda, Shin-ichi

arXiv.org Artificial IntelligenceApr-25-2023

However, While VAEs are nowadays omnipresent in the field of machine in practice, they suffer from a problem called learning, it is also widely recognized that there remain posterior collapse, which occurs when the encoder in practice some major challenges that still require effective coincides, or collapses, with the prior taking no solutions. Notably, they suffer from the problem of information from the latent structure of the input posterior collapse, which occurs when the distribution corresponding data into consideration. In this work, we introduce to the encoder coincides, or collapses, with the an inverse Lipschitz neural network into the prior taking no information from the latent structure of the decoder and, based on this architecture, provide a input data into consideration. Also known as KL vanishing new method that can control in a simple and clear or over-pruning, this phenomenon makes VAEs incapable manner the degree of posterior collapse for a wide to produce pertinent representations and has been reportedly range of VAE models equipped with a concrete observed in many fields (e.g., Bowman et al. (2016); Fu et al. theoretical guarantee. We also illustrate the effectiveness (2019); Wang & Ziyin (2022); Yeung et al. (2017)). There of our method through several numerical exists now a large body of literature that examines its underlying experiments.

artificial intelligence, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2304.1277

Country: North America > United States > Minnesota (0.28)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)

Add feedback

TabRet: Pre-training Transformer-based Tabular Models for Unseen Columns

Onishi, Soma, Oono, Kenta, Hayashi, Kohei

arXiv.org Artificial IntelligenceApr-15-2023

TabRet is designed to work on a downstream task that contains columns not seen in pre-training. Unlike other methods, TabRet has an extra learning step before fine-tuning called retokenizing, which calibrates feature embeddings based on the masked autoencoding loss. In experiments, we pre-trained TabRet with a large collection of public health surveys and fine-tuned it on classification tasks in healthcare, and TabRet achieved the best AUC performance on four datasets. In addition, an ablation study shows retokenizing and random shuffle augmentation of columns during pre-training contributed to performance gains. Transformer-based pre-trained models have been successfully applied to various domains such as text and images (Bommasani et al., 2021). The Transformer-like architecture consists of two modules: a tokenizer, which converts an input feature into a token embedding, and a mixer, which repeatedly manipulates the tokens with attention and Feed-Forward Networks (FFN) (Lin et al., 2021; Yu et al., 2022). During pre-training, both modules are trained to learn representations that generalize to downstream tasks. What has often been overlooked in the literature are scenarios where the input space change between pretext and downstream tasks. A supervised problem on tabular data is a typical example, where rows or records represent data points and columns represent input features. Since the data scale is not as large as text and images, pre-trained models are expected to be beneficial (Borisov et al., 2022).

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2303.15747

Country: North America > United States > Minnesota (0.28)

Genre: Research Report > New Finding (0.68)

Industry: Health & Medicine > Therapeutic Area (0.70)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Fast Estimation Method for the Stability of Ensemble Feature Selectors

Onda, Rina, Gao, Zhengyan, Kotera, Masaaki, Oono, Kenta

arXiv.org Machine LearningAug-3-2021

It is preferred that feature selectors be \textit{stable} for better interpretabity and robust prediction. Ensembling is known to be effective for improving the stability of feature selectors. Since ensembling is time-consuming, it is desirable to reduce the computational cost to estimate the stability of the ensemble feature selectors. We propose a simulator of a feature selector, and apply it to a fast estimation of the stability of ensemble feature selectors. To the best of our knowledge, this is the first study that estimates the stability of ensemble feature selectors and reduces the computation time theoretically and empirically.

health & medicine, oncology, selector, (15 more...)

arXiv.org Machine Learning

2108.01485

Country: Asia > Japan (0.15)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Oncology (0.71)
Health & Medicine > Pharmaceuticals & Biotechnology (0.47)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Universal Approximation Property of Neural Ordinary Differential Equations

Teshima, Takeshi, Tojo, Koichi, Ikeda, Masahiro, Ishikawa, Isao, Oono, Kenta

arXiv.org Machine LearningDec-4-2020

Neural ordinary differential equations (NODEs) is an invertible neural network architecture promising for its free-form Jacobian and the availability of a tractable Jacobian determinant estimator. Recently, the representation power of NODEs has been partly uncovered: they form an $L^p$-universal approximator for continuous maps under certain conditions. However, the $L^p$-universality may fail to guarantee an approximation for the entire input domain as it may still hold even if the approximator largely differs from the target function on a small region of the input space. To further uncover the potential of NODEs, we show their stronger approximation property, namely the $\sup$-universality for approximating a large class of diffeomorphisms. It is shown by leveraging a structure theorem of the diffeomorphism group, and the result complements the existing literature by establishing a fairly large set of mappings that NODEs can approximate with a stronger guarantee.

artificial intelligence, neural network, node, (17 more...)

arXiv.org Machine Learning

2012.02414

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Coupling-based Invertible Neural Networks Are Universal Diffeomorphism Approximators

Teshima, Takeshi, Ishikawa, Isao, Tojo, Koichi, Oono, Kenta, Ikeda, Masahiro, Sugiyama, Masashi

arXiv.org Machine LearningNov-3-2020

Invertible neural networks based on coupling flows (CF-INNs) are neural network architectures with invertibility by design [1, 2]. Endowed with the analytic-form invertibility and the tractability of the Jacobian, CF-INNs have demonstrated their usefulness in various machine learning tasks such as generative modeling [3-7], probabilistic inference [8-10], solving inverse problems [11], and feature extraction and manipulation [4, 12-14]. The attractive properties of CF-INNs come at the cost of potential restrictions on the set of functions that they can approximate because they rely on carefully designed network layers. To circumvent the potential drawback, a variety of layer designs have been proposed to construct CF-INNs with high representation power, e.g., the affine coupling flow [3, 4, 15-17], the neural autoregressive flow [18-20], and the polynomial flow [21], each demonstrating enhanced empirical performance. Despite the diversity of layer designs [1, 2], the theoretical understanding of the representation power of CF-INNs has been limited. Indeed, the most basic property as a function approximator, namely the universal approximation property (or universality for short) [22], has not been elucidated for CF-INNs. The universality can be crucial when CF-INNs are used to learn an invertible transformation (e.g., feature extraction [12] or independent component analysis [14]) because, informally speaking, lack of universality implies that there exists an invertible transformation, even among well-behaved ones, that CF-INN can never approximate, and it would render the model class unreliable for the task of function approximation.

artificial intelligence, machine learning, universality, (11 more...)

arXiv.org Machine Learning

2006.11469

Country: Europe (0.14)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Optimization and Generalization Analysis of Transduction through Gradient Boosting and Application to Multi-scale Graph Neural Networks

Oono, Kenta, Suzuki, Taiji

arXiv.org Machine LearningOct-20-2020

It is known that the current graph neural networks (GNNs) are difficult to make themselves deep due to the problem known as over-smoothing. Multi-scale GNNs are a promising approach for mitigating the over-smoothing problem. However, there is little explanation of why it works empirically from the viewpoint of learning theory. In this study, we derive the optimization and generalization guarantees of transductive learning algorithms that include multi-scale GNNs. Using the boosting theory, we prove the convergence of the training error under weak learning-type conditions. By combining it with generalization gap bounds in terms of transductive Rademacher complexity, we show that a test error bound of a specific type of multi-scale GNNs that decreases corresponding to the number of node aggregations under some conditions. Our results offer theoretical explanations for the effectiveness of the multi-scale structure against the over-smoothing problem. We apply boosting algorithms to the training of multi-scale GNNs for real-world node prediction tasks. We confirm that its performance is comparable to existing GNNs, and the practical behaviors are consistent with theoretical observations. Code is available at https://github.com/delta2323/GB-GNN.

deep learning, international conference, neural network, (15 more...)

arXiv.org Machine Learning

2006.0855

Country:

Asia > Japan (0.14)
North America > United States (0.14)
North America > Canada (0.14)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Weisfeiler-Lehman Embedding for Molecular Graph Neural Networks

Ishiguro, Katsuhiko, Oono, Kenta, Hayashi, Kohei

arXiv.org Machine LearningAug-17-2020

A graph neural network (GNN) is a good choice for predicting the chemical properties of molecules. Compared with other deep networks, however, the current performance of a GNN is limited owing to the "curse of depth." Inspired by long-established feature engineering in the field of chemistry, we expanded an atom representation using Weisfeiler-Lehman (WL) embedding, which is designed to capture local atomic patterns dominating the chemical properties of a molecule. In terms of representability, we show WL embedding can replace the first two layers of ReLU GNN -- a normal embedding and a hidden GNN layer -- with a smaller weight norm. We then demonstrate that WL embedding consistently improves the empirical performance over multiple GNN architectures and several molecular graph datasets.

dataset, neural network, us government, (20 more...)

arXiv.org Machine Learning

2006.06909

Country: North America > United States (0.28)

Genre: Research Report (0.82)

Industry: Health & Medicine > Therapeutic Area (0.32)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Graph Residual Flow for Molecular Graph Generation

Honda, Shion, Akita, Hirotaka, Ishiguro, Katsuhiko, Nakanishi, Toshiki, Oono, Kenta

arXiv.org Machine LearningSep-30-2019

Statistical generative models for molecular graphs attract attention from many researchers from the fields of bio- and chemo-informatics. Among these models, invertible flow-based approaches are not fully explored yet. In this paper, we propose a powerful invertible flow for molecular graphs, called graph residual flow (GRF). The GRF is based on residual flows, which are known for more flexible and complex non-linear mappings than traditional coupling flows. We theoretically derive non-trivial conditions such that GRF is invertible, and present a way of keeping the entire flows invertible throughout the training and sampling. Experimental results show that a generative model based on the proposed GRF achieves comparable generation performance, with much smaller number of trainable parameters compared to the existing flow-based model.

artificial intelligence, neural network, null, (19 more...)

arXiv.org Machine Learning

1909.13521

Country: Europe > Sweden (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.90)

Add feedback

On Asymptotic Behaviors of Graph CNNs from Dynamical Systems Perspective

Oono, Kenta, Suzuki, Taiji

arXiv.org Machine LearningMay-26-2019

Graph Convolutional Neural Networks (graph CNNs) are a promising deep learning approach for analyzing graph-structured data. However, it is known that they do not improve (or sometimes worsen) their predictive performance as we pile up more layers and make them deeper. To tackle this problem, we investigate the expressive power of graph CNNs by analyzing their asymptotic behaviors as the layer size tends to infinity. Our strategy is to generalize the forward propagation of a Graph Convolutional Network (GCN), which is one of the most popular graph CNN variants, as a specific dynamical system. In the case of GCNs, we show that when the weights satisfy the conditions determined by the spectra of the (augmented) normalized Laplacian, the output of GCNs exponentially approaches the set of signals that carry only information of the connected components and node degrees for distinguishing nodes. Our theory enables us to directly relate the expressive power of GCNs with the topological information of the underlying graphs, which is inherent in the graph spectra. To demonstrate this, we characterize the asymptotic behavior of GCNs on the Erd\H{o}s -- R\'{e}nyi graph. We show that when the Erd\H{o}s -- R\'{e}nyi graph is sufficiently dense and large, a wide range of GCNs on them suffers from this ``information loss" in the limit of infinite layers with high probability. Furthermore, our theory provides principled guidelines for the weight normalization of graph CNNs. We experimentally confirmed that weight scaling based on our theory enhanced the predictive performance of GCNs in real data.

deep learning, graph cnn, neural network, (15 more...)

arXiv.org Machine Learning

1905.10947

Country:

Asia > Japan (0.14)
Europe > United Kingdom (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback