AITopics | Jiao, Jian

Collaborating Authors

Jiao, Jian

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

CAPSTONE: Curriculum Sampling for Dense Retrieval with Document Expansion

He, Xingwei, Gong, Yeyun, Jin, A-Long, Zhang, Hang, Dong, Anlei, Jiao, Jian, Yiu, Siu Ming, Duan, Nan

arXiv.org Artificial IntelligenceOct-29-2023

The dual-encoder has become the de facto architecture for dense retrieval. Typically, it computes the latent representations of the query and document independently, thus failing to fully capture the interactions between the query and document. To alleviate this, recent research has focused on obtaining query-informed document representations. During training, it expands the document with a real query, but during inference, it replaces the real query with a generated one. This inconsistency between training and inference causes the dense retrieval model to prioritize query information while disregarding the document when computing the document representation. Consequently, it performs even worse than the vanilla dense retrieval model because its performance heavily relies on the relevance between the generated queries and the real query.In this paper, we propose a curriculum sampling strategy that utilizes pseudo queries during training and progressively enhances the relevance between the generated query and the real query. By doing so, the retrieval model learns to extend its attention from the document alone to both the document and query, resulting in high-quality query-informed document representations. Experimental results on both in-domain and out-of-domain datasets demonstrate that our approach outperforms previous dense retrieval models.

information retrieval, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2212.09114

Country:

North America > United States (0.15)
Asia > China (0.14)
Asia > Middle East > UAE (0.14)
(2 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Evoke: Evoking Critical Thinking Abilities in LLMs via Reviewer-Author Prompt Editing

Hu, Xinyu, Tang, Pengfei, Zuo, Simiao, Wang, Zihan, Song, Bowen, Lou, Qiang, Jiao, Jian, Charles, Denis

arXiv.org Artificial IntelligenceOct-20-2023

Large language models (LLMs) have made impressive progress in natural language processing. These models rely on proper human instructions (or prompts) to generate suitable responses. However, the potential of LLMs are not fully harnessed by commonly-used prompting methods: many human-in-the-loop algorithms employ ad-hoc procedures for prompt selection; while auto prompt generation approaches are essentially searching all possible prompts randomly and inefficiently. We propose Evoke, an automatic prompt refinement framework. In Evoke, there are two instances of a same LLM: one as a reviewer (LLM-Reviewer), it scores the current prompt; the other as an author (LLM-Author), it edits the prompt by considering the edit history and the reviewer's feedback. Such an author-reviewer feedback loop ensures that the prompt is refined in each iteration. We further aggregate a data selection approach to Evoke, where only the hard samples are exposed to the LLM. The hard samples are more important because the LLM can develop deeper understanding of the tasks out of them, while the model may already know how to solve the easier cases. Experimental results show that Evoke significantly outperforms existing methods. For instance, in the challenging task of logical fallacy detection, Evoke scores above 80, while all other baseline methods struggle to reach 20.

artificial intelligence, large language model, natural language, (4 more...)

arXiv.org Artificial Intelligence

2310.13855

Genre: Research Report (0.69)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

AutoHint: Automatic Prompt Optimization with Hint Generation

Sun, Hong, Li, Xue, Xu, Yinchuan, Homma, Youkow, Cao, Qi, Wu, Min, Jiao, Jian, Charles, Denis

arXiv.org Artificial IntelligenceAug-8-2023

This paper presents AutoHint, a novel framework for automatic prompt engineering and optimization for Large Language Models (LLM). While LLMs have demonstrated remarkable ability in achieving high-quality annotation in various tasks, the key to applying this ability to specific tasks lies in developing high-quality prompts. Thus we propose a framework to inherit the merits of both in-context learning and zero-shot learning by incorporating enriched instructions derived from input-output demonstrations to optimize original prompt. We refer to the enrichment as the hint and propose a framework to automatically generate the hint from labeled data. More concretely, starting from an initial prompt, our method first instructs a LLM to deduce new hints for selected samples from incorrect predictions, and then summarizes from per-sample hints and adds the results back to the initial prompt to form a new, enriched instruction. The proposed method is evaluated on the BIG-Bench Instruction Induction dataset for both zero-shot and few-short prompts, where experiments demonstrate our method is able to significantly boost accuracy for multiple tasks.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2307.07415

Country: North America > United States (0.14)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

PROD: Progressive Distillation for Dense Retrieval

Lin, Zhenghao, Gong, Yeyun, Liu, Xiao, Zhang, Hang, Lin, Chen, Dong, Anlei, Jiao, Jian, Lu, Jingwen, Jiang, Daxin, Majumder, Rangan, Duan, Nan

arXiv.org Artificial IntelligenceJun-24-2023

Knowledge distillation is an effective way to transfer knowledge from a strong teacher to an efficient student model. Ideally, we expect the better the teacher is, the better the student. However, this expectation does not always come true. It is common that a better teacher model results in a bad student via distillation due to the nonnegligible gap between teacher and student. To bridge the gap, we propose PROD, a PROgressive Distillation method, for dense retrieval. PROD consists of a teacher progressive distillation and a data progressive distillation to gradually improve the student. We conduct extensive experiments on five widely-used benchmarks, MS MARCO Passage, TREC Passage 19, TREC Document 19, MS MARCO Document and Natural Questions, where PROD achieves the state-of-the-art within the distillation methods for dense retrieval. The code and models will be released.

distillation, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2209.13335

Country: North America > United States (1.00)

Genre:

Research Report (1.00)
Workflow (0.68)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Dual-Alignment Pre-training for Cross-lingual Sentence Embedding

Li, Ziheng, Huang, Shaohan, Zhang, Zihan, Deng, Zhi-Hong, Lou, Qiang, Huang, Haizhen, Jiao, Jian, Wei, Furu, Deng, Weiwei, Zhang, Qi

arXiv.org Artificial IntelligenceMay-15-2023

Recent studies have shown that dual encoder models trained with the sentence-level translation ranking task are effective methods for cross-lingual sentence embedding. However, our research indicates that token-level alignment is also crucial in multilingual scenarios, which has not been fully explored previously. Based on our findings, we propose a dual-alignment pre-training (DAP) framework for cross-lingual sentence embedding that incorporates both sentence-level and token-level alignment. To achieve this, we introduce a novel representation translation learning (RTL) task, where the model learns to use one-side contextualized token representation to reconstruct its translation counterpart. This reconstruction objective encourages the model to embed translation information into the token representation. Compared to other token-level alignment methods such as translation language modeling, RTL is more suitable for dual encoder architectures and is computationally efficient. Extensive experiments on three sentence-level cross-lingual benchmarks demonstrate that our approach can significantly improve sentence embedding. Our code is available at https://github.com/ChillingDream/DAP.

computational linguistic, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2305.09148

Country:

Europe (0.68)
Asia > China (0.28)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.46)

Add feedback

AnnoLLM: Making Large Language Models to Be Better Crowdsourced Annotators

He, Xingwei, Lin, Zhenghao, Gong, Yeyun, Jin, A-Long, Zhang, Hang, Lin, Chen, Jiao, Jian, Yiu, Siu Ming, Duan, Nan, Chen, Weizhu

arXiv.org Artificial IntelligenceMar-29-2023

Many natural language processing (NLP) tasks rely on labeled data to train machine learning models to achieve high performance. However, data annotation can be a time-consuming and expensive process, especially when the task involves a large amount of data or requires specialized domains. Recently, GPT-3.5 series models have demonstrated remarkable few-shot and zero-shot ability across various NLP tasks. In this paper, we first claim that large language models (LLMs), such as GPT-3.5, can serve as an excellent crowdsourced annotator by providing them with sufficient guidance and demonstrated examples. To make LLMs to be better annotators, we propose a two-step approach, 'explain-then-annotate'. To be more precise, we begin by creating prompts for every demonstrated example, which we subsequently utilize to prompt a LLM to provide an explanation for why the specific ground truth answer/label was chosen for that particular example. Following this, we construct the few-shot chain-of-thought prompt with the self-generated explanation and employ it to annotate the unlabeled data. We conduct experiments on three tasks, including user input and keyword relevance assessment, BoolQ and WiC. The annotation results from GPT-3.5 surpasses those from crowdsourced annotation for user input and keyword relevance assessment. Additionally, for the other two tasks, GPT-3.5 achieves results that are comparable to those obtained through crowdsourced annotation.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2303.16854

Country:

Asia (0.68)
Europe (0.67)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (0.50)

Industry:

Media (1.00)
Leisure & Entertainment > Games > Computer Games (1.00)
Law (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Pre-training Transformers for Knowledge Graph Completion

Chen, Sanxing, Cheng, Hao, Liu, Xiaodong, Jiao, Jian, Ji, Yangfeng, Gao, Jianfeng

arXiv.org Artificial IntelligenceMar-27-2023

Co-training LMs and KG completion As a fundamental component of human intelligence, models has been shown to be effective in improving relational knowledge plays a crucial role the performance of downstream knowledgeintensive in imitating human cognitive abilities with machine NLP tasks, but not so much for the KG learning (Halford et al., 2010). Knowledge completion task itself (Wang et al., 2021; Yasunaga graphs (KGs) are the most widely used representation et al., 2022). Despite the progress on transferring of relational knowledge, with well-known knowledge between structured KGs and unstructured examples such as Freebase (Bollacker et al., 2008), texts, the generalization from one KG to another YAGO (Suchanek et al., 2007), and Wikidata (Vrandečić is still an open problem that is rarely studied and Krötzsch, 2014). KG is also a key ingredient (Kocijan and Lukasiewicz, 2021).

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2303.15682

Country:

Asia (0.68)
Europe (0.68)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (0.53)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.46)

Add feedback

Efficient Long Sequence Modeling via State Space Augmented Transformer

Zuo, Simiao, Liu, Xiaodong, Jiao, Jian, Charles, Denis, Manavoglu, Eren, Zhao, Tuo, Gao, Jianfeng

arXiv.org Artificial IntelligenceDec-15-2022

Transformer models have achieved superior performance in various natural language processing tasks. However, the quadratic computational cost of the attention mechanism limits its practicality for long sequences. There are existing attention variants that improve the computational efficiency, but they have limited ability to effectively compute global information. In parallel to Transformer models, state space models (SSMs) are tailored for long sequences, but they are not flexible enough to capture complicated local information. We propose SPADE, short for $\underline{\textbf{S}}$tate s$\underline{\textbf{P}}$ace $\underline{\textbf{A}}$ugmente$\underline{\textbf{D}}$ Transform$\underline{\textbf{E}}$r. Specifically, we augment a SSM into the bottom layer of SPADE, and we employ efficient local attention methods for the other layers. The SSM augments global information, which complements the lack of long-range dependency issue in local attention methods. Experimental results on the Long Range Arena benchmark and language modeling tasks demonstrate the effectiveness of the proposed method. To further demonstrate the scalability of SPADE, we pre-train large encoder-decoder models and present fine-tuning results on natural language understanding and natural language generation tasks.

computational linguistic, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2212.08136

Country:

Europe (1.00)
North America > United States > Minnesota (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback