Goto

Collaborating Authors

 Tolmachev, Arseny


LLM-jp: A Cross-organizational Project for the Research and Development of Fully Open Japanese LLMs

arXiv.org Artificial Intelligence

This paper introduces LLM-jp, a cross-organizational project for the research and development of Japanese large language models (LLMs). LLM-jp aims to develop open-source and strong Japanese LLMs, and as of this writing, more than 1,500 participants from academia and industry are working together for this purpose. This paper presents the background of the establishment of LLM-jp, summaries of its activities, and technical reports on the LLMs developed by LLM-jp.


Crowdsourcing Evaluation of Saliency-based XAI Methods

arXiv.org Artificial Intelligence

Understanding the reasons behind the predictions made by deep neural networks is critical for gaining human trust in many important applications, which is reflected in the increasing demand for explainability in AI (XAI) in recent years. Saliency-based feature attribution methods, which highlight important parts of images that contribute to decisions by classifiers, are often used as XAI methods, especially in the field of computer vision. In order to compare various saliency-based XAI methods quantitatively, several approaches for automated evaluation schemes have been proposed; however, there is no guarantee that such automated evaluation metrics correctly evaluate explainability, and a high rating by an automated evaluation scheme does not necessarily mean a high explainability for humans. In this study, instead of the automated evaluation, we propose a new human-based evaluation scheme using crowdsourcing to evaluate XAI methods. Our method is inspired by a human computation game, "Peek-a-boom", and can efficiently compare different XAI methods by exploiting the power of crowds. We evaluate the saliency maps of various XAI methods on two datasets with automated and crowd-based evaluation schemes. Our experiments show that the result of our crowd-based evaluation scheme is different from those of automated evaluation schemes. In addition, we regard the crowd-based evaluation results as ground truths and provide a quantitative performance measure to compare different automated evaluation schemes. We also discuss the impact of crowd workers on the results and show that the varying ability of crowd workers does not significantly impact the results.


Bermuda Triangles: GNNs Fail to Detect Simple Topological Structures

arXiv.org Artificial Intelligence

Most graph neural network architectures work by message-passing node vector embeddings over the adjacency matrix, and it is assumed that they capture graph topology by doing that. We design two synthetic tasks, focusing purely on topological problems - triangle detection and clique distance - on which graph neural networks perform surprisingly badly, failing to detect those "bermuda" triangles. Many tasks need to handle the graph representation of data in areas such as chemistry (Wale & Karypis, Method Triangles Clique 2006), social networks (Fan et al., 2019), and transportation GCN 50.0 50.0 (Zhao et al., 2019). Furthermore, it is not GCN D 75.7 83.2 limited to these graph tasks but also includes images GCN D ID 80.4 83.4 (Chen et al., 2019) and 3D polygons (Shi & Rajkumar, GIN 74.1 97 2020) that are possible to convert to graph data GIN D 75.0 99.4 formats. Because of these broad applications, Graph GIN D ID 70.5 100.0 Deep Learning is an important field in machine learning GAT 50.0 50.0 research. GAT D 88.5 99.9 Graph neural networks (GNNs, (Scarselli et al., 2008)) GAT D ID 94.1 100.0 is a common approach to perform machine learning SVM WL 67.2 73.1 with graphs. Most graph neural networks update SVM Graphlets 99.6 60.3 the graph node vector embeddings using the message passing. Node vector embeddings are usually initialized FCNN 55.6 54.6 with data features and local graph features like TF 100.0 70.0 node degrees. Then, for a (n 1)-th stacked layer, the TF AM 100.0 100.0 new node state is computed from the node vector representation TF-IS AM 86.7 100.0 of the previous layer (n).