Goto

Collaborating Authors

 Suzuki, Masahiro


JMedLoRA:Medical Domain Adaptation on Japanese Large Language Models using Instruction-tuning

arXiv.org Artificial Intelligence

In the ongoing wave of impact driven by large language models (LLMs) like ChatGPT, the adaptation of LLMs to medical domain has emerged as a crucial research frontier. Since mainstream LLMs tend to be designed for general-purpose applications, constructing a medical LLM through domain adaptation is a huge challenge. While instruction-tuning is used to fine-tune some LLMs, its precise roles in domain adaptation remain unknown. Here we show the contribution of LoRA-based instruction-tuning to performance in Japanese medical question-answering tasks. In doing so, we employ a multifaceted evaluation for multiple-choice questions, including scoring based on "Exact match" and "Gestalt distance" in addition to the conventional accuracy. Our findings suggest that LoRA-based instruction-tuning can partially incorporate domain-specific knowledge into LLMs, with larger models demonstrating more pronounced effects. Furthermore, our results underscore the potential of adapting English-centric models for Japanese applications in domain adaptation, while also highlighting the persisting limitations of Japanese-centric models. This initiative represents a pioneering effort in enabling medical institutions to fine-tune and operate models without relying on external services.


From Base to Conversational: Japanese Instruction Dataset and Tuning Large Language Models

arXiv.org Artificial Intelligence

Instruction tuning is essential for large language models (LLMs) to become interactive. While many instruction tuning datasets exist in English, there is a noticeable lack in other languages. Also, their effectiveness has not been well verified in non-English languages. We construct a Japanese instruction dataset by expanding and filtering existing datasets and apply the dataset to a Japanese pre-trained base model. We performed Low-Rank Adaptation (LoRA) tuning on both Japanese and English existing models using our instruction dataset. We evaluated these models from both quantitative and qualitative perspectives. As a result, the effectiveness of Japanese instruction datasets is confirmed. The results also indicate that even with relatively small LLMs, performances in downstream tasks would be improved through instruction tuning. Our instruction dataset, tuned models, and implementation are publicly available online.


Personalized human mobility prediction for HuMob challenge

arXiv.org Artificial Intelligence

We explain the methodology used to create the data submitted to HuMob Challenge, a data analysis competition for human mobility prediction. We adopted a personalized model to predict the individual's movement trajectory from their data, instead of predicting from the overall movement, based on the hypothesis that human movement is unique to each person. We devised the features such as the date and time, activity time, days of the week, time of day, and frequency of visits to POI (Point of Interest). As additional features, we incorporated the movement of other individuals with similar behavior patterns through the employment of clustering. The machine learning model we adopted was the Support Vector Regression (SVR). We performed accuracy through offline assessment and carried out feature selection and parameter tuning. Although overall dataset provided consists of 100,000 users trajectory, our method use only 20,000 target users data, and do not need to use other 80,000 data. Despite the personalized model's traditional feature engineering approach, this model yields reasonably good accuracy with lower computational cost.


Pixyz: a Python library for developing deep generative models

arXiv.org Artificial Intelligence

With the recent rapid progress in the study of deep generative models (DGMs), there is a need for a framework that can implement them in a simple and generic way. In this research, we focus on two features of DGMs: (1) deep neural networks are encapsulated by probability distributions, and (2) models are designed and learned based on an objective function. Taking these features into account, we propose a new Python library to implement DGMs called Pixyz. This library adopts a step-by-step implementation method with three APIs, which allows us to implement various DGMs more concisely and intuitively. In addition, the library introduces memoization to reduce the cost of duplicate computations in DGMs to speed up the computation. We demonstrate experimentally that this library is faster than existing probabilistic programming languages in training DGMs.


End-to-end Training of Deep Boltzmann Machines by Unbiased Contrastive Divergence with Local Mode Initialization

arXiv.org Artificial Intelligence

Wu, 2017; Ma, 2020), multimodal learning (Srivastava & Salakhutdinov, 2012), and collaborative filtering (Salakhutdinov We address the problem of biased gradient estimation et al., 2007). Boltzmann machines also have the in deep Boltzmann machines (DBMs). The potential as powerful generative models because it is known existing method to obtain an unbiased estimator as a universal approximator of the probability mass function uses a maximal coupling based on a Gibbs sampler, on discrete variables (Le Roux & Bengio, 2008). Among but when the state is high-dimensional, it them, deep Boltzmann machines (DBMs) (Salakhutdinov & takes a long time to converge. In this study, we Larochelle, 2010), which are multi-layered undirected models, propose to use a coupling based on the Metropolis-can capture complex structures by their deep structure Hastings (MH) and to initialize the state around while retaining the advantages of the Boltzmann machine.


llm-japanese-dataset v0: Construction of Japanese Chat Dataset for Large Language Models and its Methodology

arXiv.org Artificial Intelligence

This study constructed a Japanese chat dataset for tuning large language models (LLMs), which consist of about 8.4 million records. Recently, LLMs have been developed and gaining popularity. However, high-performing LLMs are usually mainly for English. There are two ways to support languages other than English by those LLMs: constructing LLMs from scratch or tuning existing models. However, in both ways, datasets are necessary parts. In this study, we focused on supporting Japanese in those LLMs and making a dataset for training or tuning LLMs in Japanese. The dataset we constructed consisted of various tasks, such as translation and knowledge tasks. In our experiment, we tuned an existing LLM using our dataset and evaluated the performance qualitatively. The results suggest that our dataset is possibly beneficial for LLMs. However, we also revealed some difficulties in constructing LLMs in languages other than English.


World Models and Predictive Coding for Cognitive and Developmental Robotics: Frontiers and Challenges

arXiv.org Artificial Intelligence

Creating autonomous robots that can actively explore the environment, acquire knowledge and learn skills continuously is the ultimate achievement envisioned in cognitive and developmental robotics. Their learning processes should be based on interactions with their physical and social world in the manner of human learning and cognitive development. Based on this context, in this paper, we focus on the two concepts of world models and predictive coding. Recently, world models have attracted renewed attention as a topic of considerable interest in artificial intelligence. Cognitive systems learn world models to better predict future sensory observations and optimize their policies, i.e., controllers. Alternatively, in neuroscience, predictive coding proposes that the brain continuously predicts its inputs and adapts to model its own dynamics and control behavior in its environment. Both ideas may be considered as underpinning the cognitive development of robots and humans capable of continual or lifelong learning. Although many studies have been conducted on predictive coding in cognitive robotics and neurorobotics, the relationship between world model-based approaches in AI and predictive coding in robotics has rarely been discussed. Therefore, in this paper, we clarify the definitions, relationships, and status of current research on these topics, as well as missing pieces of world models and predictive coding in conjunction with crucially related concepts such as the free-energy principle and active inference in the context of cognitive and developmental robotics. Furthermore, we outline the frontiers and challenges involved in world models and predictive coding toward the further integration of AI and robotics, as well as the creation of robots with real cognitive and developmental capabilities in the future.


A survey of multimodal deep generative models

arXiv.org Machine Learning

Multimodal learning is a framework for building models that make predictions based on different types of modalities. Important challenges in multimodal learning are the inference of shared representations from arbitrary modalities and cross-modal generation via these representations; however, achieving this requires taking the heterogeneous nature of multimodal data into account. In recent years, deep generative models, i.e., generative models in which distributions are parameterized by deep neural networks, have attracted much attention, especially variational autoencoders, which are suitable for accomplishing the above challenges because they can consider heterogeneity and infer good representations of data. Therefore, various multimodal generative models based on variational autoencoders, called multimodal deep generative models, have been proposed in recent years. In this paper, we provide a categorized survey of studies on multimodal deep generative models.


Improving the Robustness to Variations of Objects and Instructions with a Neuro-Symbolic Approach for Interactive Instruction Following

arXiv.org Artificial Intelligence

An interactive instruction following task has been proposed as a benchmark for learning to map natural language instructions and first-person vision into sequences of actions to interact with objects in a 3D simulated environment. We find that an existing end-to-end neural model for this task is not robust to variations of objects and language instructions. We assume that this problem is due to the high sensitiveness of neural feature extraction to small changes in vision and language inputs. To mitigate this problem, we propose a neuro-symbolic approach that performs reasoning over high-level symbolic representations that are robust to small changes in raw inputs. Our experiments on the ALFRED dataset show that our approach significantly outperforms the existing model by 18, 52, and 73 points in the success rate on the ToggleObject, PickupObject, and SliceObject subtasks in unseen environments respectively.


Whole brain Probabilistic Generative Model toward Realizing Cognitive Architecture for Developmental Robots

arXiv.org Artificial Intelligence

Through the developmental process, they acquire basic physical skills (such as reaching and grasping), perceptional skills (such as object recognition and phoneme recognition), and social skills (such as linguistic communication and intention estimation) (Taniguchi et al., 2018). This open-ended online learning process involving many types of modalities, tasks, and interactions is often referred to as lifelong learning (Oudeyer et al., 2007; Parisi et al., 2019). The central question in next-generation artificial intelligence (AI) and developmental robotics is how to build an integrative cognitive system that is capable of lifelong learning and humanlike behavior in environments such as homes, offices, and outdoor. In this paper, inspired by the human whole brain architecture (WBA) approach, we introduce the idea of building an integrative cognitive system using a whole brain probabilistic generative model (WB-PGM) (see 2.1). The integrative cognitive system can alternatively be referred to as artificial general intelligence (AGI) (Yamakawa, 2021). Against this backdrop, we explore the process of establishing a cognitive architecture for developmental robots. Cognitive architecture is a hypothesis about the mechanisms of human intelligence underlying our behaviors (Rosenbloom, 2011). The study of cognitive architecture involves developing a presumably standard model of the humanlike mind (Laird et al., 2017).