Aramaki, Eiji
Investigating Neurons and Heads in Transformer-based LLMs for Typographical Errors
Tsuji, Kohei, Hiraoka, Tatsuya, Cheng, Yuchang, Aramaki, Eiji, Iwakura, Tomoya
This paper investigates how LLMs encode inputs with typos. We hypothesize that specific neurons and attention heads recognize typos and fix them internally using local and global contexts. We introduce a method to identify typo neurons and typo heads that work actively when inputs contain typos. Our experimental results suggest the following: 1) LLMs can fix typos with local contexts when the typo neurons in either the early or late layers are activated, even if those in the other are not. 2) Typo neurons in the middle layers are responsible for the core of typo-fixing with global contexts. 3) Typo heads fix typos by widely considering the context not focusing on specific tokens. 4) Typo neurons and typo heads work not only for typo-fixing but also for understanding general contexts.
LLM-jp: A Cross-organizational Project for the Research and Development of Fully Open Japanese LLMs
LLM-jp, null, :, null, Aizawa, Akiko, Aramaki, Eiji, Chen, Bowen, Cheng, Fei, Deguchi, Hiroyuki, Enomoto, Rintaro, Fujii, Kazuki, Fukumoto, Kensuke, Fukushima, Takuya, Han, Namgi, Harada, Yuto, Hashimoto, Chikara, Hiraoka, Tatsuya, Hisada, Shohei, Hosokawa, Sosuke, Jie, Lu, Kamata, Keisuke, Kanazawa, Teruhito, Kanezashi, Hiroki, Kataoka, Hiroshi, Katsumata, Satoru, Kawahara, Daisuke, Kawano, Seiya, Keyaki, Atsushi, Kiryu, Keisuke, Kiyomaru, Hirokazu, Kodama, Takashi, Kubo, Takahiro, Kuga, Yohei, Kumon, Ryoma, Kurita, Shuhei, Kurohashi, Sadao, Li, Conglong, Maekawa, Taiki, Matsuda, Hiroshi, Miyao, Yusuke, Mizuki, Kentaro, Mizuki, Sakae, Murawaki, Yugo, Nakamura, Ryo, Nakamura, Taishi, Nakayama, Kouta, Nakazato, Tomoka, Niitsuma, Takuro, Nishitoba, Jiro, Oda, Yusuke, Ogawa, Hayato, Okamoto, Takumi, Okazaki, Naoaki, Oseki, Yohei, Ozaki, Shintaro, Ryu, Koki, Rzepka, Rafal, Sakaguchi, Keisuke, Sasaki, Shota, Sekine, Satoshi, Suda, Kohei, Sugawara, Saku, Sugiura, Issa, Sugiyama, Hiroaki, Suzuki, Hisami, Suzuki, Jun, Suzumura, Toyotaro, Tachibana, Kensuke, Takagi, Yu, Takami, Kyosuke, Takeda, Koichi, Takeshita, Masashi, Tanaka, Masahiro, Taura, Kenjiro, Tolmachev, Arseny, Ueda, Nobuhiro, Wan, Zhen, Yada, Shuntaro, Yahata, Sakiko, Yamamoto, Yuya, Yamauchi, Yusuke, Yanaka, Hitomi, Yokota, Rio, Yoshino, Koichiro
This paper introduces LLM-jp, a cross-organizational project for the research and development of Japanese large language models (LLMs). LLM-jp aims to develop open-source and strong Japanese LLMs, and as of this writing, more than 1,500 participants from academia and industry are working together for this purpose. This paper presents the background of the establishment of LLM-jp, summaries of its activities, and technical reports on the LLMs developed by LLM-jp.
Enhancing In-Context Learning with Semantic Representations for Relation Extraction
Han, Peitao, Pereira, Lis Kanashiro, Cheng, Fei, She, Wan Jou, Aramaki, Eiji
In this work, we employ two AMR-enhanced semantic representations for ICL on RE: one that explores the AMR structure generated for a sentence at the subgraph level (shortest AMR path), and another that explores the full AMR structure generated for a sentence. In both cases, we demonstrate that all settings benefit from the fine-grained AMR's semantic structure. We evaluate our model on four RE datasets. Our results show that our model can outperform the GPT-based baselines, and achieve SOTA performance on two of the datasets, and competitive performance on the other two.
A Dataset for Pharmacovigilance in German, French, and Japanese: Annotating Adverse Drug Reactions across Languages
Raithel, Lisa, Yeh, Hui-Syuan, Yada, Shuntaro, Grouin, Cyril, Lavergne, Thomas, Névéol, Aurélie, Paroubek, Patrick, Thomas, Philippe, Nishiyama, Tomohiro, Möller, Sebastian, Aramaki, Eiji, Matsumoto, Yuji, Roller, Roland, Zweigenbaum, Pierre
User-generated data sources have gained significance in uncovering Adverse Drug Reactions (ADRs), with an increasing number of discussions occurring in the digital world. However, the existing clinical corpora predominantly revolve around scientific articles in English. This work presents a multilingual corpus of texts concerning ADRs gathered from diverse sources, including patient fora, social media, and clinical reports in German, French, and Japanese. Our corpus contains annotations covering 12 entity types, four attribute types, and 13 relation types. It contributes to the development of real-world multilingual language models for healthcare. We provide statistics to highlight certain challenges associated with the corpus and conduct preliminary experiments resulting in strong baselines for extracting entities and relations between these entities, both within and across languages.
JaMIE: A Pipeline Japanese Medical Information Extraction System
Cheng, Fei, Yada, Shuntaro, Tanaka, Ribeka, Aramaki, Eiji, Kurohashi, Sadao
We present an open-access natural language processing toolkit for Japanese medical information extraction. We first propose a novel relation annotation schema for investigating the medical and temporal relations between medical entities in Japanese medical reports. We experiment with the practical annotation scenarios by separately annotating two different types of reports. We design a pipeline system with three components for recognizing medical entities, classifying entity modalities, and extracting relations. The empirical results show accurate analyzing performance and suggest the satisfactory annotation quality, the effective annotation strategy for targeting report types, and the superiority of the latest contextual embedding models.
Single Model for Influenza Forecasting of Multiple Countries by Multi-task Learning
Murayama, Taichi, Wakamiya, Shoko, Aramaki, Eiji
The accurate forecasting of infectious epidemic diseases such as influenza is a crucial task undertaken by medical institutions. Although numerous flu forecasting methods and models based mainly on historical flu activity data and online user-generated contents have been proposed in previous studies, no flu forecasting model targeting multiple countries using two types of data exists at present. Our paper leverages multi-task learning to tackle the challenge of building one flu forecasting model targeting multiple countries; each country as each task. Also, to develop the flu prediction model with higher performance, we solved two issues; finding suitable search queries, which are part of the user-generated contents, and how to leverage search queries efficiently in the model creation. For the first issue, we propose the transfer approaches from English to other languages. For the second issue, we propose a novel flu forecasting model that takes advantage of search queries using an attention mechanism and extend the model to a multi-task model for multiple countries' flu forecasts. Experiments on forecasting flu epidemics in five countries demonstrate that our model significantly improved the performance by leveraging the search queries and multi-task learning compared to the baselines.
Smartphone-Based Self Management System for Type-2 Diabetes Patients
Aramaki, Eiji (University of Tokyo) | Miyabe, Mai (University of Tokyo) | Waki, Kayo (University of Tokyo) | Fujita, Hideo (University of Tokyo) | Uchimura, Yuji (University of Tokyo) | Omae, Koji (University of Tokyo) | Hayakawa, Masayo (University of Tokyo) | Kadowaki, Takashi (University of Tokyo) | Ohe, Kazuhiko (University of Tokyo)
This paper proposes a novel telemedicine system for type 2 diabetes patients. The proposed system supports the patient self-management via a set of telemedicine devices, consisting of health sensors and a smart phone. The proposed system covers not only the sensor data but also the diet (food) and exercise data. To capture the food information, we also developed the voice recognition module focusing on the food names. The basic feasibility of the system is practically demonstrated in the preliminary experiment.
Influenza Patients Are Invisible in the Web: Traditional Model Still Improves the State of the Art Web Based Influenza Surveillance
Aramaki, Eiji (University of Tokyo) | Maskawa, Sachiko (University of Tokyo) | Morita, Mizuki
Although web-based information extraction systems draw much attention, most of such systems assume that the web directly reflects the real world. For instance, Google flu trend, which is one of the-state-of-the-art influenza surveillance systems, relies on the basic idea that the amount of the influenza related search queries directly correlates with the number of the influenza patients. However, the real patients suffering from influenza symptoms are invisible in the web, because they do not use Internet. Considering this gap, this paper employs an infectious model, assuming that a potential patient utilizes Internet at the first sign of flu. The proposed model improves two types of the state-of-the-art systems, Google based system (from 0.837 correlation to 0.928) and Twitter based system (from 0.898 correlation to 0.918). This study demonstrated that a simple model could easily improve the web-based surveillance.