AITopics | Suzuki, Jun

Collaborating Authors

Suzuki, Jun

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Drop-Upcycling: Training Sparse Mixture of Experts with Partial Re-initialization

Nakamura, Taishi, Akiba, Takuya, Fujii, Kazuki, Oda, Yusuke, Yokota, Rio, Suzuki, Jun

arXiv.org Artificial IntelligenceMar-15-2025

The Mixture of Experts (MoE) architecture reduces the training and inference cost significantly compared to a dense model of equivalent capacity. Upcycling is an approach that initializes and trains an MoE model using a pre-trained dense model. While upcycling leads to initial performance gains, the training progresses slower than when trained from scratch, leading to suboptimal performance in the long term. We propose Drop-Upcycling - a method that effectively addresses this problem. Drop-Upcycling combines two seemingly contradictory approaches: utilizing the knowledge of pre-trained dense models while statistically re-initializing some parts of the weights. This approach strategically promotes expert specialization, significantly enhancing the MoE model's efficiency in knowledge acquisition. Extensive large-scale experiments demonstrate that Drop-Upcycling significantly outperforms previous MoE construction methods in the long term, specifically when training on hundreds of billions of tokens or more. As a result, our MoE model with 5.9B active parameters achieves comparable performance to a 13B dense model in the same model family, while requiring approximately 1/4 of the training FLOPs. All experimental resources, including source code, training data, model checkpoints and logs, are publicly available to promote reproducibility and future research on MoE.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2502.19261

Country:

Asia > Japan > Honshū (0.14)
North America > Canada (0.14)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Reference-free Evaluation Metrics for Text Generation: A Survey

Ito, Takumi, van Deemter, Kees, Suzuki, Jun

arXiv.org Artificial IntelligenceJan-21-2025

A number of automatic evaluation metrics have been proposed for natural language generation systems. The most common approach to automatic evaluation is the use of a reference-based metric that compares the model's output with gold-standard references written by humans. However, it is expensive to create such references, and for some tasks, such as response generation in dialogue, creating references is not a simple matter. Therefore, various reference-free metrics have been developed in recent years. In this survey, which intends to cover the full breadth of all NLG tasks, we investigate the most commonly used approaches, their application, and their other uses beyond evaluating models. The survey concludes by highlighting some promising directions for future research.

computational linguistic, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2501.12011

Country:

Asia > Japan > Honshū (0.28)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Portugal > Lisbon > Lisbon (0.14)

Genre: Overview (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Generation (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Can Input Attributions Interpret the Inductive Reasoning Process Elicited in In-Context Learning?

Ye, Mengyu, Kuribayashi, Tatsuki, Kobayashi, Goro, Suzuki, Jun

arXiv.org Artificial IntelligenceDec-20-2024

Elucidating the rationale behind neural models' outputs has been challenging in the machine learning field, which is indeed applicable in this age of large language models (LLMs) and in-context learning (ICL). When it comes to estimating input attributions (IA), ICL poses a new issue of interpreting which example in the prompt, consisting of a set of examples, contributed to identifying the task/rule to be solved. To this end, in this paper, we introduce synthetic diagnostic tasks inspired by the poverty of the stimulus design in inductive reasoning; here, most in-context examples are ambiguous w.r.t. their underlying rule, and one critical example disambiguates the task demonstrated. The question is whether conventional IA methods can identify such an example in interpreting the inductive reasoning process in ICL. Our experiments provide several practical findings; for example, a certain simple IA method works the best, and the larger the model, the generally harder it is to interpret the ICL with gradient-based IA methods.

computational linguistic, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2412.15628

Country:

North America > United States (1.00)
Europe (1.00)
Asia (0.68)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)

Add feedback

Pruning Multilingual Large Language Models for Multilingual Inference

Kim, Hwichan, Suzuki, Jun, Hirasawa, Tosho, Komachi, Mamoru

arXiv.org Artificial IntelligenceOct-2-2024

Multilingual large language models (MLLMs), trained on multilingual balanced data, demonstrate better zero-shot learning performance in non-English languages compared to large language models trained on English-dominant data. However, the disparity in performance between English and non-English languages remains a challenge yet to be fully addressed. A distinctive characteristic of MLLMs is their high-quality translation capabilities, indicating an acquired proficiency in aligning between languages. This study explores how to enhance the zero-shot performance of MLLMs in non-English languages by leveraging their alignment capability between English and non-English languages. To achieve this, we first analyze the behavior of MLLMs when performing translation and reveal that there are large magnitude features that play a critical role in the translation process. Inspired by these findings, we retain the weights associated with operations involving the large magnitude features and prune other weights to force MLLMs to rely on these features for tasks beyond translation. We empirically demonstrate that this pruning strategy can enhance the MLLMs' performance in non-English language.

demonstration, large language model, natural language, (16 more...)

arXiv.org Artificial Intelligence

2409.16911

Country:

Asia (1.00)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (0.93)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

LLM-jp: A Cross-organizational Project for the Research and Development of Fully Open Japanese LLMs

LLM-jp, null, :, null, Aizawa, Akiko, Aramaki, Eiji, Chen, Bowen, Cheng, Fei, Deguchi, Hiroyuki, Enomoto, Rintaro, Fujii, Kazuki, Fukumoto, Kensuke, Fukushima, Takuya, Han, Namgi, Harada, Yuto, Hashimoto, Chikara, Hiraoka, Tatsuya, Hisada, Shohei, Hosokawa, Sosuke, Jie, Lu, Kamata, Keisuke, Kanazawa, Teruhito, Kanezashi, Hiroki, Kataoka, Hiroshi, Katsumata, Satoru, Kawahara, Daisuke, Kawano, Seiya, Keyaki, Atsushi, Kiryu, Keisuke, Kiyomaru, Hirokazu, Kodama, Takashi, Kubo, Takahiro, Kuga, Yohei, Kumon, Ryoma, Kurita, Shuhei, Kurohashi, Sadao, Li, Conglong, Maekawa, Taiki, Matsuda, Hiroshi, Miyao, Yusuke, Mizuki, Kentaro, Mizuki, Sakae, Murawaki, Yugo, Nakamura, Ryo, Nakamura, Taishi, Nakayama, Kouta, Nakazato, Tomoka, Niitsuma, Takuro, Nishitoba, Jiro, Oda, Yusuke, Ogawa, Hayato, Okamoto, Takumi, Okazaki, Naoaki, Oseki, Yohei, Ozaki, Shintaro, Ryu, Koki, Rzepka, Rafal, Sakaguchi, Keisuke, Sasaki, Shota, Sekine, Satoshi, Suda, Kohei, Sugawara, Saku, Sugiura, Issa, Sugiyama, Hiroaki, Suzuki, Hisami, Suzuki, Jun, Suzumura, Toyotaro, Tachibana, Kensuke, Takagi, Yu, Takami, Kyosuke, Takeda, Koichi, Takeshita, Masashi, Tanaka, Masahiro, Taura, Kenjiro, Tolmachev, Arseny, Ueda, Nobuhiro, Wan, Zhen, Yada, Shuntaro, Yahata, Sakiko, Yamamoto, Yuya, Yamauchi, Yusuke, Yanaka, Hitomi, Yokota, Rio, Yoshino, Koichiro

arXiv.org Artificial IntelligenceJul-4-2024

This paper introduces LLM-jp, a cross-organizational project for the research and development of Japanese large language models (LLMs). LLM-jp aims to develop open-source and strong Japanese LLMs, and as of this writing, more than 1,500 participants from academia and industry are working together for this purpose. This paper presents the background of the establishment of LLM-jp, summaries of its activities, and technical reports on the LLMs developed by LLM-jp.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2407.03963

Country:

North America (1.00)
Europe (1.00)
Asia > Japan > Honshū > Kantō (0.14)

Genre:

Research Report (0.50)
Questionnaire & Opinion Survey (0.46)

Industry:

Education (0.93)
Health & Medicine > Therapeutic Area (0.68)
Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Detecting Response Generation Not Requiring Factual Judgment

Kamei, Ryohei, Shiono, Daiki, Akama, Reina, Suzuki, Jun

arXiv.org Artificial IntelligenceJun-14-2024

With the remarkable development of large language models (LLMs), ensuring the factuality of output has become a challenge. However, having all the contents of the response with given knowledge or facts is not necessarily a good thing in dialogues. This study aimed to achieve both attractiveness and factuality in a dialogue response for which a task was set to predict sentences that do not require factual correctness judgment such as agreeing, or personal opinions/feelings. We created a dataset, dialogue dataset annotated with fact-check-needed label (DDFC), for this task via crowdsourcing, and classification tasks were performed on several models using this dataset. The model with the highest classification accuracy could yield about 88% accurate classification results.

computational linguistic, large language model, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2406.09702

Country: North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (0.65)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.53)

Add feedback

A Large Collection of Model-generated Contradictory Responses for Consistency-aware Dialogue Systems

Sato, Shiki, Akama, Reina, Suzuki, Jun, Inui, Kentaro

arXiv.org Artificial IntelligenceMar-19-2024

Mitigating the generation of contradictory responses poses a substantial challenge in dialogue response generation. The quality and quantity of available contradictory response data play a vital role in suppressing these contradictions, offering two significant benefits. First, having access to large contradiction data enables a comprehensive examination of their characteristics. Second, data-driven methods to mitigate contradictions may be enhanced with large-scale contradiction data for training. Nevertheless, no attempt has been made to build an extensive collection of model-generated contradictory responses. In this paper, we build a large dataset of response generation models' contradictions for the first time. Then, we acquire valuable insights into the characteristics of model-generated contradictions through an extensive analysis of the collected responses. Lastly, we also demonstrate how this dataset substantially enhances the performance of data-driven contradiction suppression methods.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2403.125

Country: Asia (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Spike No More: Stabilizing the Pre-training of Large Language Models

Takase, Sho, Kiyono, Shun, Kobayashi, Sosuke, Suzuki, Jun

arXiv.org Artificial IntelligenceFeb-2-2024

Loss spikes often occur during pre-training of large language models. The spikes degrade the performance of large language models and sometimes ruin the pre-training. Since the pre-training needs a vast computational budget, we should avoid such spikes. To investigate the cause of loss spikes, we focus on gradients of internal layers. Through theoretical analyses, we reveal two causes of the exploding gradients, and provide requirements to prevent the explosion. In addition, we propose a method to satisfy the requirements by combining the initialization method and a simple modification to embeddings. We conduct various experiments to verify our theoretical analyses empirically. Experimental results indicate that the combination is effective in preventing spikes during pre-training.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2312.16903

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

InstructDoc: A Dataset for Zero-Shot Generalization of Visual Document Understanding with Instructions

Tanaka, Ryota, Iki, Taichi, Nishida, Kyosuke, Saito, Kuniko, Suzuki, Jun

arXiv.org Artificial IntelligenceJan-24-2024

We study the problem of completing various visual document understanding (VDU) tasks, e.g., question answering and information extraction, on real-world documents through human-written instructions. To this end, we propose InstructDoc, the first large-scale collection of 30 publicly available VDU datasets, each with diverse instructions in a unified format, which covers a wide range of 12 tasks and includes open document types/formats. Furthermore, to enhance the generalization performance on VDU tasks, we design a new instruction-based document reading and understanding model, InstructDr, that connects document images, image encoders, and large language models (LLMs) through a trainable bridging module. Experiments demonstrate that InstructDr can effectively adapt to new VDU datasets, tasks, and domains via given instructions and outperforms existing multimodal LLMs and ChatGPT without specific training.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2401.13313

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

A Challenging Multimodal Video Summary: Simultaneously Extracting and Generating Keyframe-Caption Pairs from Video

Kudo, Keito, Nagasawa, Haruki, Suzuki, Jun, Shimizu, Nobuyuki

arXiv.org Artificial IntelligenceDec-3-2023

This paper proposes a practical multimodal video summarization task setting and a dataset to train and evaluate the task. The target task involves summarizing a given video into a predefined number of keyframe-caption pairs and displaying them in a listable format to grasp the video content quickly. This task aims to extract crucial scenes from the video in the form of images (keyframes) and generate corresponding captions explaining each keyframe's situation. This task is useful as a practical application and presents a highly challenging problem worthy of study. Specifically, achieving simultaneous optimization of the keyframe selection performance and caption quality necessitates careful consideration of the mutual dependence on both preceding and subsequent keyframes and captions. To facilitate subsequent research in this field, we also construct a dataset by expanding upon existing datasets and propose an evaluation framework. Furthermore, we develop two baseline systems and report their respective performance.

caption, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2312.01575

Country:

Asia (0.14)
Europe > Netherlands (0.14)

Genre: Research Report > New Finding (0.67)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(3 more...)

Add feedback