AITopics | Kawarabayashi, Ken-ichi

Collaborating Authors

Kawarabayashi, Ken-ichi

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

New classes of the greedy-applicable arm feature distributions in the sparse linear bandit problem

Ichikawa, Koji, Ito, Shinji, Hatano, Daisuke, Sumita, Hanna, Fukunaga, Takuro, Kakimura, Naonori, Kawarabayashi, Ken-ichi

arXiv.org Machine LearningDec-19-2023

We consider the sparse contextual bandit problem where arm feature affects reward through the inner product of sparse parameters. Recent studies have developed sparsity-agnostic algorithms based on the greedy arm selection policy. However, the analysis of these algorithms requires strong assumptions on the arm feature distribution to ensure that the greedily selected samples are sufficiently diverse; One of the most common assumptions, relaxed symmetry, imposes approximate origin-symmetry on the distribution, which cannot allow distributions that has origin-asymmetric support. In this paper, we show that the greedy algorithm is applicable to a wider range of the arm feature distributions from two aspects. Firstly, we show that a mixture distribution that has a greedy-applicable component is also greedy-applicable. Second, we propose new distribution classes, related to Gaussian mixture, discrete, and radial distribution, for which the sample diversity is guaranteed. The proposed classes can describe distributions with origin-asymmetric support and, in conjunction with the first claim, provide theoretical guarantees of the greedy policy for a very wide range of the arm feature distributions.

artificial intelligence, big data, data mining, (16 more...)

arXiv.org Machine Learning

2312.124

Country: Asia > Japan (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.84)

Add feedback

A Neighbourhood-Aware Differential Privacy Mechanism for Static Word Embeddings

Bollegala, Danushka, Otake, Shuichi, Machide, Tomoya, Kawarabayashi, Ken-ichi

arXiv.org Artificial IntelligenceSep-19-2023

We propose a Neighbourhood-Aware Differential Privacy (NADP) mechanism considering the neighbourhood of a word in a pretrained static word embedding space to determine the minimal amount of noise required to guarantee a specified privacy level. We first construct a nearest neighbour graph over the words using their embeddings, and factorise it into a set of connected components (i.e. neighbourhoods). We then separately apply different levels of Gaussian noise to the words in each neighbourhood, determined by the set of words in that neighbourhood. Experiments show that our proposed NADP mechanism consistently outperforms multiple previously proposed DP mechanisms such as Laplacian, Gaussian, and Mahalanobis in multiple downstream tasks, while guaranteeing higher levels of privacy.

data mining, machine learning, mechanism, (21 more...)

arXiv.org Artificial Intelligence

2309.10551

Country:

Europe (1.00)
North America > United States > New York (0.14)
North America > United States > Minnesota (0.14)

Genre: Research Report (0.82)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
(2 more...)

Add feedback

Near-Optimal Regret Bounds for Contextual Combinatorial Semi-Bandits with Linear Payoff Functions

Takemura, Kei, Ito, Shinji, Hatano, Daisuke, Sumita, Hanna, Fukunaga, Takuro, Kakimura, Naonori, Kawarabayashi, Ken-ichi

arXiv.org Machine LearningJan-19-2021

The contextual combinatorial semi-bandit problem with linear payoff functions is a decision-making problem in which a learner chooses a set of arms with the feature vectors in each round under given constraints so as to maximize the sum of rewards of arms. Several existing algorithms have regret bounds that are optimal with respect to the number of rounds $T$. However, there is a gap of $\tilde{O}(\max(\sqrt{d}, \sqrt{k}))$ between the current best upper and lower bounds, where $d$ is the dimension of the feature vectors, $k$ is the number of the chosen arms in a round, and $\tilde{O}(\cdot)$ ignores the logarithmic factors. The dependence of $k$ and $d$ is of practical importance because $k$ may be larger than $T$ in real-world applications such as recommender systems. In this paper, we fill the gap by improving the upper and lower bounds. More precisely, we show that the C${}^2$UCB algorithm proposed by Qin, Chen, and Zhu (2014) has the optimal regret bound $\tilde{O}(d\sqrt{kT} + dk)$ for the partition matroid constraints. For general constraints, we propose an algorithm that modifies the reward estimates of arms in the C${}^2$UCB algorithm and demonstrate that it enjoys the optimal regret bound for a more general problem that can take into account other objectives simultaneously. We also show that our technique would be applicable to related problems. Numerical experiments support our theoretical results and considerations.

algorithm, artificial intelligence, big data, (19 more...)

arXiv.org Machine Learning

2101.07957

Country: Asia > Japan (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.48)

Add feedback

How Neural Networks Extrapolate: From Feedforward to Graph Neural Networks

Xu, Keyulu, Li, Jingling, Zhang, Mozhi, Du, Simon S., Kawarabayashi, Ken-ichi, Jegelka, Stefanie

arXiv.org Artificial IntelligenceSep-30-2020

We study how neural networks trained by gradient descent extrapolate, i.e., what they learn outside the support of the training distribution. Previous works report mixed empirical results when extrapolating with neural networks: while multilayer perceptrons (MLPs) do not extrapolate well in certain simple tasks, Graph Neural Network (GNN), a structured network with MLP modules, has shown some success in more complex tasks. Working towards a theoretical explanation, we identify conditions under which MLPs and GNNs extrapolate well. First, we quantify the observation that ReLU MLPs quickly converge to linear functions along any direction from the origin, which implies that ReLU MLPs do not extrapolate most non-linear functions. But, they can provably learn a linear target function when the training distribution is sufficiently "diverse". Second, in connection to analyzing successes and limitations of GNNs, these results suggest a hypothesis for which we provide theoretical and empirical evidence: the success of GNNs in extrapolating algorithmic tasks to new data (e.g., larger graphs or edge weights) relies on encoding task-specific non-linearities in the architecture or features.

artificial intelligence, neural network, us government, (17 more...)

arXiv.org Artificial Intelligence

2009.11848

Country: North America > United States > Massachusetts (0.14)

Genre: Research Report > New Finding (0.87)

Industry: Government > Regional Government > North America Government > United States Government (0.45)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.68)

Add feedback

Are Girls Neko or Sh\=ojo? Cross-Lingual Alignment of Non-Isomorphic Embeddings with Iterative Normalization

Zhang, Mozhi, Xu, Keyulu, Kawarabayashi, Ken-ichi, Jegelka, Stefanie, Boyd-Graber, Jordan

arXiv.org Artificial IntelligenceJun-4-2019

Cross-lingual word embeddings (CLWE) underlie many multilingual natural language processing systems, often through orthogonal transformations of pre-trained monolingual embeddings. However, orthogonal mapping only works on language pairs whose embeddings are naturally isomorphic. For non-isomorphic pairs, our method (Iterative Normalization) transforms monolingual embeddings to make orthogonal alignment easier by simultaneously enforcing that (1) individual word vectors are unit length, and (2) each language's average vector is zero. Iterative Normalization consistently improves word translation accuracy of three CLWE methods, with the largest improvement observed on English-Japanese (from 2% to 44% test accuracy).

artificial intelligence, machine translation, normalization, (19 more...)

arXiv.org Artificial Intelligence

1906.01622

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > Maryland > Prince George's County > College Park (0.14)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.35)

Add feedback

What Can Neural Networks Reason About?

Xu, Keyulu, Li, Jingling, Zhang, Mozhi, Du, Simon S., Kawarabayashi, Ken-ichi, Jegelka, Stefanie

arXiv.org Artificial IntelligenceMay-31-2019

Neural networks have successfully been applied to solving reasoning tasks, ranging from learning simple concepts like "close to", to intricate questions whose reasoning procedures resemble algorithms. Empirically, not all network structures work equally well for reasoning. For example, Graph Neural Networks have achieved impressive empirical results, while the less structured neural networks may fail to learn to reason. Theoretically, there is currently limited understanding of the interplay between reasoning tasks and network learning. In this paper, we develop a framework to characterize which tasks a neural network can learn well, by studying how well its structure aligns with the algorithmic structure of the relevant reasoning procedure. This suggests that Graph Neural Networks can learn dynamic programming, a powerful algorithmic strategy that solves a broad class of reasoning problems, such as relational question answering, sorting, intuitive physics, and shortest paths. Our perspective also implies strategies to design neural architectures for complex reasoning. On several abstract reasoning tasks, we see empirically that our theory aligns well with practice.

deep learning, mlp, neural network, (20 more...)

arXiv.org Artificial Intelligence

1905.13211

Country: North America > United States > Massachusetts (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Representation Learning on Graphs with Jumping Knowledge Networks

Xu, Keyulu, Li, Chengtao, Tian, Yonglong, Sonobe, Tomohiro, Kawarabayashi, Ken-ichi, Jegelka, Stefanie

arXiv.org Artificial IntelligenceJun-25-2018

Recent deep learning approaches for representation learning on graphs follow a neighborhood aggregation procedure. We analyze some important properties of these models, and propose a strategy to overcome those. In particular, the range of "neighboring" nodes that a node's representation draws from strongly depends on the graph structure, analogous to the spread of a random walk. To adapt to local neighborhood properties and tasks, we explore an architecture -- jumping knowledge (JK) networks -- that flexibly leverages, for each node, different neighborhood ranges to enable better structure-aware representation. In a number of experiments on social, bioinformatics and citation networks, we demonstrate that our model achieves state-of-the-art performance. Furthermore, combining the JK framework with models like Graph Convolutional Networks, GraphSAGE and Graph Attention Networks consistently improves those models' performance.

deep learning, neural network, representation, (19 more...)

arXiv.org Artificial Intelligence

1806.03536

Genre: Research Report (0.64)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Causal Bandits with Propagating Inference

Yabe, Akihiro, Hatano, Daisuke, Sumita, Hanna, Ito, Shinji, Kakimura, Naonori, Fukunaga, Takuro, Kawarabayashi, Ken-ichi

arXiv.org Machine LearningJun-6-2018

Bandit is a framework for designing sequential experiments. In each experiment, a learner selects an arm $A \in \mathcal{A}$ and obtains an observation corresponding to $A$. Theoretically, the tight regret lower-bound for the general bandit is polynomial with respect to the number of arms $|\mathcal{A}|$. This makes bandit incapable of handling an exponentially large number of arms, hence the bandit problem with side-information is often considered to overcome this lower bound. Recently, a bandit framework over a causal graph was introduced, where the structure of the causal graph is available as side-information. A causal graph is a fundamental model that is frequently used with a variety of real problems. In this setting, the arms are identified with interventions on a given causal graph, and the effect of an intervention propagates throughout all over the causal graph. The task is to find the best intervention that maximizes the expected value on a target node. Existing algorithms for causal bandit overcame the $\Omega(\sqrt{|\mathcal{A}|/T})$ simple-regret lower-bound; however, their algorithms work only when the interventions $\mathcal{A}$ are localized around a single node (i.e., an intervention propagates only to its neighbors). We propose a novel causal bandit algorithm for an arbitrary set of interventions, which can propagate throughout the causal graph. We also show that it achieves $O(\sqrt{ \gamma^*\log(|\mathcal{A}|T) / T})$ regret bound, where $\gamma^*$ is determined by using a causal graph structure. In particular, if the in-degree of the causal graph is bounded, then $\gamma^* = O(N^2)$, where $N$ is the number $N$ of nodes.

artificial intelligence, big data, intervention, (18 more...)

arXiv.org Machine Learning

1806.02252

Country: North America > United States (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.87)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.67)

Add feedback

ClassiNet -- Predicting Missing Features for Short-Text Classification

Bollegala, Danushka, Atanasov, Vincent, Maehara, Takanori, Kawarabayashi, Ken-ichi

arXiv.org Artificial IntelligenceApr-14-2018

The fundamental problem in short-text classification is \emph{feature sparseness} -- the lack of feature overlap between a trained model and a test instance to be classified. We propose \emph{ClassiNet} -- a network of classifiers trained for predicting missing features in a given instance, to overcome the feature sparseness problem. Using a set of unlabeled training instances, we first learn binary classifiers as feature predictors for predicting whether a particular feature occurs in a given instance. Next, each feature predictor is represented as a vertex $v_i$ in the ClassiNet where a one-to-one correspondence exists between feature predictors and vertices. The weight of the directed edge $e_{ij}$ connecting a vertex $v_i$ to a vertex $v_j$ represents the conditional probability that given $v_i$ exists in an instance, $v_j$ also exists in the same instance. We show that ClassiNets generalize word co-occurrence graphs by considering implicit co-occurrences between features. We extract numerous features from the trained ClassiNet to overcome feature sparseness. In particular, for a given instance $\vec{x}$, we find similar features from ClassiNet that did not appear in $\vec{x}$, and append those features in the representation of $\vec{x}$. Moreover, we propose a method based on graph propagation to find features that are indirectly related to a given short-text. We evaluate ClassiNets on several benchmark datasets for short-text classification. Our experimental results show that by using ClassiNet, we can statistically significantly improve the accuracy in short-text classification tasks, without having to use any external resources such as thesauri for finding related features.

classinet, deep learning, neural network, (23 more...)

arXiv.org Artificial Intelligence

1804.0526

Country:

Europe (0.67)
Asia > Middle East > Qatar (0.14)
North America > United States > Colorado (0.14)
North America > United States > Massachusetts (0.14)

Genre: Research Report > New Finding (0.88)

Industry:

Media (0.67)
Banking & Finance (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning (1.00)
(6 more...)

Add feedback

Using k-Way Co-Occurrences for Learning Word Embeddings

Bollegala, Danushka (The University of Liverpool) | Yoshida, Yuichi (National Institute of Informatics) | Kawarabayashi, Ken-ichi (National Institute of Informatics)

AAAI ConferencesFeb-8-2018

Co-occurrences between two words provide useful insights into the semantics of those words.Consequently, numerous prior work on word embedding learning has used co-occurrences between two wordsas the training signal for learning word embeddings.However, in natural language texts it is common for multiple words to be related and co-occurring in the same context.We extend the notion of co-occurrences to cover k (≥2)-way co-occurrences among a set of k- words.Specifically, we prove a theoretical relationship between the joint probability of k (≥2) words, and the sum of l_2 norms of their embeddings. Next, we propose a learning objective motivated by our theoretical resultthat utilises k- way co-occurrences for learning word embeddings.Our experimental results show that the derived theoretical relationship does indeed hold empirically, anddespite data sparsity, for some smaller k (≤5) values, k- way embeddings perform comparably or better than 2-way embeddings in a range of tasks.

artificial intelligence, dataset, text processing, (19 more...)

AAAI Conferences

Thirty-Second AAAI Conference on Artificial Intelligence

Country: Asia > Japan (0.14)

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback