Rzepka, Rafal
Speciesism in Natural Language Processing Research
Takeshita, Masashi, Rzepka, Rafal
Natural Language Processing (NLP) research on AI Safety and social bias in AI has focused on safety for humans and social bias against human minorities. However, some AI ethicists have argued that the moral significance of nonhuman animals has been ignored in AI research. Therefore, the purpose of this study is to investigate whether there is speciesism, i.e., discrimination against nonhuman animals, in NLP research. First, we explain why nonhuman animals are relevant in NLP research. Next, we survey the findings of existing research on speciesism in NLP researchers, data, and models and further investigate this problem in this study. The findings of this study suggest that speciesism exists within researchers, data, and models, respectively. Specifically, our survey and experiments show that (a) among NLP researchers, even those who study social bias in AI, do not recognize speciesism or speciesist bias; (b) among NLP data, speciesist bias is inherent in the data annotated in the datasets used to evaluate NLP models; (c) OpenAI GPTs, recent NLP models, exhibit speciesist bias by default. Finally, we discuss how we can reduce speciesism in NLP research.
LLM-jp: A Cross-organizational Project for the Research and Development of Fully Open Japanese LLMs
LLM-jp, null, :, null, Aizawa, Akiko, Aramaki, Eiji, Chen, Bowen, Cheng, Fei, Deguchi, Hiroyuki, Enomoto, Rintaro, Fujii, Kazuki, Fukumoto, Kensuke, Fukushima, Takuya, Han, Namgi, Harada, Yuto, Hashimoto, Chikara, Hiraoka, Tatsuya, Hisada, Shohei, Hosokawa, Sosuke, Jie, Lu, Kamata, Keisuke, Kanazawa, Teruhito, Kanezashi, Hiroki, Kataoka, Hiroshi, Katsumata, Satoru, Kawahara, Daisuke, Kawano, Seiya, Keyaki, Atsushi, Kiryu, Keisuke, Kiyomaru, Hirokazu, Kodama, Takashi, Kubo, Takahiro, Kuga, Yohei, Kumon, Ryoma, Kurita, Shuhei, Kurohashi, Sadao, Li, Conglong, Maekawa, Taiki, Matsuda, Hiroshi, Miyao, Yusuke, Mizuki, Kentaro, Mizuki, Sakae, Murawaki, Yugo, Nakamura, Ryo, Nakamura, Taishi, Nakayama, Kouta, Nakazato, Tomoka, Niitsuma, Takuro, Nishitoba, Jiro, Oda, Yusuke, Ogawa, Hayato, Okamoto, Takumi, Okazaki, Naoaki, Oseki, Yohei, Ozaki, Shintaro, Ryu, Koki, Rzepka, Rafal, Sakaguchi, Keisuke, Sasaki, Shota, Sekine, Satoshi, Suda, Kohei, Sugawara, Saku, Sugiura, Issa, Sugiyama, Hiroaki, Suzuki, Hisami, Suzuki, Jun, Suzumura, Toyotaro, Tachibana, Kensuke, Takagi, Yu, Takami, Kyosuke, Takeda, Koichi, Takeshita, Masashi, Tanaka, Masahiro, Taura, Kenjiro, Tolmachev, Arseny, Ueda, Nobuhiro, Wan, Zhen, Yada, Shuntaro, Yahata, Sakiko, Yamamoto, Yuya, Yamauchi, Yusuke, Yanaka, Hitomi, Yokota, Rio, Yoshino, Koichiro
This paper introduces LLM-jp, a cross-organizational project for the research and development of Japanese large language models (LLMs). LLM-jp aims to develop open-source and strong Japanese LLMs, and as of this writing, more than 1,500 participants from academia and industry are working together for this purpose. This paper presents the background of the establishment of LLM-jp, summaries of its activities, and technical reports on the LLMs developed by LLM-jp.
Global Brain That Makes You Think Twice
Rzepka, Rafal (Hokkaido University) | Mazur, Michal (Hokkaido University) | Clapp, Austin (Stanford University) | Araki, Kenji (Hokkaido University)
In this position paper we introduce our approach to positive computing by developing and integrating methods for future assistant and companion agents which could help us a) avoid making mistakes due to biases caused by insufficient knowledge, b) be more empathic and righteous, c) be more sensitive and thoughtful. We present text processing techniques for automatic discovery of possible reasoning errors and provide hints to make users doubt their beliefs when there is a possibility of harm. We present existing sources and methods, discuss on how natural language processing technologies could contribute to various aspects of well-being by giving examples of systems we develop, and describe the strengths and weaknesses of our approach.
Japanese Puns Are Not Necessarily Jokes
Dybala, Pawel (Otaru University of Commerce) | Rzepka, Rafal (Hokkaido University) | Araki, Kenji (Hokkaido University) | Sayama, Kohichi (Otaru University of Commerce)
In English, “puns” are usually perceived as a subclass of “jokes”. In Japanese, however, this is not necessarily true. In this paper we investigate whether Japanese native speakers perceive dajare (puns) as jooku (jokes). We first summarize existing research in the field of computational humor, both in English and Japanese, focusing on the usage of these two terms. This shows that in works of Japanese native speakers, puns are not commonly treated as jokes. Next we present some dictionary definitions of dajare and jooku, which show that they may actually be used in a similar manner to English. In order to study this issue, we conducted a survey, in which we asked Japanese participants three questions: whether they like jokes (jooku), whether they like puns (dajare) and whether dajare are jooku. The results showed that there is no common agreement regarding dajare being a genre of jokes. We analyze the outcome of this experiment and discuss them from different points of view.
Just Keep Tweeting, Dear: Web-Mining Methods for Helping a Social Robot Understand User Needs
Takagi, Keisuke (Hokkaido University) | Rzepka, Rafal (Hokkaido University) | Araki, Kenji (Hokkaido University)
An intelligent system of the future should make its user feel comfortable, which is impossible without understanding context they coexist in. However, our past research did not treat language information as a part of the context a robot works in, and data about reasons why the user had made his decisions was not obtained. Therefore, we decided to utilize the Web as a knowledge source to discover context information that could suggest a robot's behavior when it acquires verbal information from its user or users. By comparing user utterances (blogs, Twitter or Facebook entries, not direct orders) with other people's written experiences (mostly blogs), a system can judge whether it is a situation in which the robot can perform or improve its performance. In this paper we introduce several methods that can be applied to a simple floor-cleaning robot. We describe basic experiments showing that text processing is helpful when dealing with multiple users who are not willing to give rich feedback. For example, we describe a method for finding usual reasons for cleaning on the Web by using Okapi BM25 to extract feature words from sentences retrieved by the query word "cleaning". Then, we introduce our ideas for dealing with conflicts of interest in multiuser environments and possible methods for avoiding such conflicts by achieving better situation understanding. Also, an emotion recognizer for guessing user needs and moods and a method to calculate situation naturalness are described.
A Japanese Natural Language Toolset Implementation for ConceptNet
Roberts, Tyson Michael (Hokkaido University) | Rzepka, Rafal (Hokkaido University) | Araki, Kenji (Hokkaido University)
In recent years, ConceptNet has gained notoriety in the Natural Language Processing (NLP) as a textual commonsense knowledge base (CSKB) for its utilization of k-lines (Liu and Sing, 2004a) which make it suitable for making practical inferences on corpora (Liu and Sing, 2004b). However, until now, ConceptNet has lacked support for many non-English languages. To alleviate this problem, we have implemented a software toolset for the Japanese Language that allows Japanese to be used with ConceptNet's concept inference system. This paper discusses the implementation of this toolset and a possible path for the development of toolsets in other languages with similar features.
CAO: A Fully Automatic Emoticon Analysis System
Ptaszynski, Michal (Hokkaido University) | Maciejewski, Jacek (Hokkaido University) | Dybala, Pawel (Hokkaido University) | Rzepka, Rafal (Hokkaido University) | Araki, Kenji (Hokkaido University)
This paper presents CAO, a system for affect analysis of emoticons. Emoticons are strings of symbols widely used in text-based online communication to convey emotions. It extracts emoticons from input and determines specific emotions they express. Firstly, by matching the extracted emoticons to a raw emoticon database, containing over ten thousand emoticon samples extracted from the Web and annotated automatically. The emoticons for which emotion types could not be determined using only this database, are automatically divided into semantic areas representing "mouths" or "eyes," based on the theory of kinesics. The areas are automatically annotated according to their co-occurrence in the database. The annotation is firstly based on the eye-mouth-eye triplet, and if no such triplet is found, all semantic areas are estimated separately. This provides the system coverage exceeding 3 million possibilities. The evaluation, performed on both training and test sets, confirmed the system's capability to sufficiently detect and extract any emoticon, analyze its semantic structure and estimate the potential emotion types expressed. The system achieved nearly ideal scores, outperforming existing emoticon analysis systems.
A Pragmatic Approach to Implementation of Emotional Intelligence in Machines
Ptaszynski, Michal (Hokkaido University) | Rzepka, Rafal (Hokkaido University) | Araki, Kenji (Hokkaido University)
By this paper we would like to open a discussion on the need ofBy this paper we would like to open a discussion on the need of Emotional Intelligence as a feature in machines interacting with humans. However, we restrain from making a statement about the need of emotional experience in machines. We argue that providing machines computable means for processing emotions is a practical need requiring implementation of a set of abilities included in the Emotional Intelligence Framework. We introduce our methods and present the results of some of the first experiments we performed in this matter.