AITopics | Chi, Zewen

Collaborating Authors

Chi, Zewen

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

ProtLLM: An Interleaved Protein-Language LLM with Protein-as-Word Pre-Training

Zhuo, Le, Chi, Zewen, Xu, Minghao, Huang, Heyan, Zheng, Heqi, He, Conghui, Mao, Xian-Ling, Zhang, Wentao

arXiv.org Artificial IntelligenceFeb-27-2024

We propose ProtLLM, a versatile cross-modal large language model (LLM) for both protein-centric and protein-language tasks. ProtLLM features a unique dynamic protein mounting mechanism, enabling it to handle complex inputs where the natural language text is interspersed with an arbitrary number of proteins. Besides, we propose the protein-as-word language modeling approach to train ProtLLM. By developing a specialized protein vocabulary, we equip the model with the capability to predict not just natural language but also proteins from a vast pool of candidates. Additionally, we construct a large-scale interleaved protein-text dataset, named InterPT, for pre-training. This dataset comprehensively encompasses both (1) structured data sources like protein annotations and (2) unstructured data sources like biological research papers, thereby endowing ProtLLM with crucial knowledge for understanding proteins. We evaluate ProtLLM on classic supervised protein-centric tasks and explore its novel protein-language applications. Experimental results demonstrate that ProtLLM not only achieves superior performance against protein-specialized baselines on protein-centric tasks but also induces zero-shot and in-context learning capabilities on protein-language tasks.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2403.0792

Country: North America (0.28)

Genre: Research Report (0.84)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Optimizing Prompts for Text-to-Image Generation

Hao, Yaru, Chi, Zewen, Dong, Li, Wei, Furu

arXiv.org Artificial IntelligenceDec-29-2023

Well-designed prompts can guide text-to-image models to generate amazing images. However, the performant prompts are often model-specific and misaligned with user input. Instead of laborious human engineering, we propose prompt adaptation, a general framework that automatically adapts original user input to model-preferred prompts. Specifically, we first perform supervised fine-tuning with a pretrained language model on a small collection of manually engineered prompts. Then we use reinforcement learning to explore better prompts. We define a reward function that encourages the policy to generate more aesthetically pleasing images while preserving the original user intentions. Experimental results on Stable Diffusion show that our method outperforms manual prompt engineering in terms of both automatic metrics and human preference ratings. Moreover, reinforcement learning further boosts performance, especially on out-of-domain prompts. The pretrained checkpoints are available at https://aka.ms/promptist. The demo can be found at https://aka.ms/promptist-demo.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2212.09611

Country: Europe > Italy (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Measuring Cross-Lingual Transferability of Multilingual Transformers on Sentence Classification

Chi, Zewen, Huang, Heyan, Mao, Xian-Ling

arXiv.org Artificial IntelligenceMay-15-2023

Recent studies have exhibited remarkable capabilities of pre-trained multilingual Transformers, especially cross-lingual transferability. However, current methods do not measure cross-lingual transferability well, hindering the understanding of multilingual Transformers. In this paper, we propose IGap, a cross-lingual transferability metric for multilingual Transformers on sentence classification tasks. IGap takes training error into consideration, and can also estimate transferability without end-task data. Experimental results show that IGap outperforms baseline metrics for transferability measuring and transfer direction ranking. Besides, we conduct extensive systematic experiments where we compare transferability among various multilingual Transformers, fine-tuning algorithms, and transfer directions. More importantly, our results reveal three findings about cross-lingual transfer, which helps us to better understand multilingual Transformers.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2305.088

Country:

Europe (0.68)
Asia (0.46)
North America > United States > Minnesota (0.28)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.46)

Add feedback

Language Is Not All You Need: Aligning Perception with Language Models

Huang, Shaohan, Dong, Li, Wang, Wenhui, Hao, Yaru, Singhal, Saksham, Ma, Shuming, Lv, Tengchao, Cui, Lei, Mohammed, Owais Khan, Patra, Barun, Liu, Qiang, Aggarwal, Kriti, Chi, Zewen, Bjorck, Johan, Chaudhary, Vishrav, Som, Subhojit, Song, Xia, Wei, Furu

arXiv.org Artificial IntelligenceMar-1-2023

A big convergence of language, multimodal perception, action, and world modeling is a key step toward artificial general intelligence. In this work, we introduce Kosmos-1, a Multimodal Large Language Model (MLLM) that can perceive general modalities, learn in context (i.e., few-shot), and follow instructions (i.e., zero-shot). Specifically, we train Kosmos-1 from scratch on web-scale multimodal corpora, including arbitrarily interleaved text and images, image-caption pairs, and text data. We evaluate various settings, including zero-shot, few-shot, and multimodal chain-of-thought prompting, on a wide range of tasks without any gradient updates or finetuning. Experimental results show that Kosmos-1 achieves impressive performance on (i) language understanding, generation, and even OCR-free NLP (directly fed with document images), (ii) perception-language tasks, including multimodal dialogue, image captioning, visual question answering, and (iii) vision tasks, such as image recognition with descriptions (specifying classification via text instructions). We also show that MLLMs can benefit from cross-modal transfer, i.e., transfer knowledge from language to multimodal, and from multimodal to language. In addition, we introduce a dataset of Raven IQ test, which diagnoses the nonverbal reasoning capability of MLLMs.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2302.14045

Country: North America > United States > Minnesota (0.28)

Genre: Research Report > New Finding (0.48)

Industry:

Leisure & Entertainment (1.00)
Media > Film (0.68)
Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)

Add feedback

Bridging The Gap: Entailment Fused-T5 for Open-retrieval Conversational Machine Reading Comprehension

Zhang, Xiao, Huang, Heyan, Chi, Zewen, Mao, Xian-Ling

arXiv.org Artificial IntelligenceDec-19-2022

Open-retrieval conversational machine reading comprehension (OCMRC) simulates reallife conversational interaction scenes. Machines are required to make a decision of Yes/No/Inquire or generate a follow-up question when the decision is Inquire based on retrieved rule texts, user scenario, user question, and dialogue history. Recent studies explored the methods to reduce the information gap between decision-making and question generation and thus improve the performance of generation. However, the information gap still exists because these pipeline structures are still limited in decision-making, span extraction, and question rephrasing three stages. Decision-making and generation are reasoning separately, and the entailment reasoning utilized in decision-making is hard to share through all stages. To tackle the above problem, we proposed a novel one-stage endto-end framework, called Entailment Fused-Figure 1: An example in the OCMRC dataset. Given T5 (EFT), to bridge the information gap between the user scenario and user question, machines are decision-making and generation in a required to first retrieve related rule texts in the global understanding manner. The extensive knowledge database, and then make a decision of experimental results demonstrate that our proposed Yes/No/Inquire or generate a follow-up question framework achieves new state-of-the-art when the decision is Inquire based on retrieved rule performance on the OR-ShARC benchmark.

machine learning, natural language, question answering, (20 more...)

arXiv.org Artificial Intelligence

2212.09353

Country:

Asia > China (0.28)
North America > United States > Minnesota (0.28)

Genre: Research Report > New Finding (0.48)

Industry: Education > Assessment & Standards > Student Performance (0.62)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.56)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.49)

Add feedback

TorchScale: Transformers at Scale

Ma, Shuming, Wang, Hongyu, Huang, Shaohan, Wang, Wenhui, Chi, Zewen, Dong, Li, Benhaim, Alon, Patra, Barun, Chaudhary, Vishrav, Song, Xia, Wei, Furu

arXiv.org Artificial IntelligenceNov-23-2022

Large Transformers have achieved state-of-the-art performance across many tasks. Most open-source libraries on scaling Transformers focus on improving training or inference with better parallelization. In this work, we present TorchScale, an open-source toolkit that allows researchers and developers to scale up Transformers efficiently and effectively. TorchScale has the implementation of several modeling techniques, which can improve modeling generality and capability, as well as training stability and efficiency. Experimental results on language modeling and neural machine translation demonstrate that TorchScale can successfully scale Transformers to different sizes without tears. The library is available at https://aka.ms/torchscale.

artificial intelligence, natural language, transformer, (15 more...)

arXiv.org Artificial Intelligence

2211.13184

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

A Robust and Domain-Adaptive Approach for Low-Resource Named Entity Recognition

Yu, Houjin, Mao, Xian-Ling, Chi, Zewen, Wei, Wei, Huang, Heyan

arXiv.org Artificial IntelligenceJan-2-2021

Recently, it has attracted much attention to build reliable named entity recognition (NER) systems using limited annotated data. Nearly all existing works heavily rely on domain-specific resources, such as external lexicons and knowledge bases. However, such domain-specific resources are often not available, meanwhile it's difficult and expensive to construct the resources, which has become a key obstacle to wider adoption. To tackle the problem, in this work, we propose a novel robust and domain-adaptive approach RDANER for low-resource NER, which only uses cheap and easily obtainable resources. Extensive experiments on three benchmark datasets demonstrate that our approach achieves the best performance when only using cheap and easily obtainable resources, and delivers competitive results against state-of-the-art methods which use difficultly obtainable domainspecific resources. All our code and corpora can be found on https://github.com/houking-can/RDANER.

artificial intelligence, dataset, text processing, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/ICBK50248.2020.00050

2101.00388

Country: Asia > China (0.14)

Genre: Research Report (0.70)

Technology: Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)

Add feedback