AITopics | Ru, Dongyu

Collaborating Authors

Ru, Dongyu

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Towards Competitive Search Relevance For Inference-Free Learned Sparse Retrievers

Geng, Zhichao, Ru, Dongyu, Yang, Yang

arXiv.org Artificial IntelligenceNov-6-2024

Learned sparse retrieval, which can efficiently perform retrieval through mature inverted-index engines, has garnered growing attention in recent years. Particularly, the inference-free sparse retrievers are attractive as they eliminate online model inference in the retrieval phase thereby avoids huge computational cost, offering reasonable throughput and latency. However, even the state-of-the-art (SOTA) inference-free sparse models lag far behind in terms of search relevance when compared to both sparse and dense siamese models. Towards competitive search relevance for inference-free sparse retrievers, we argue that they deserve dedicated training methods other than using same ones with siamese encoders. In this paper, we propose two different approaches for performance improvement. First, we introduce the IDF-aware FLOPS loss, which introduces Inverted Document Frequency (IDF) to the sparsification of representations. We find that it mitigates the negative impact of the FLOPS regularization on search relevance, allowing the model to achieve a better balance between accuracy and efficiency. Moreover, we propose a heterogeneous ensemble knowledge distillation framework that combines siamese dense and sparse retrievers to generate supervisory signals during the pre-training phase. The ensemble framework of dense and sparse retriever capitalizes on their strengths respectively, providing a strong upper bound for knowledge distillation. To concur the diverse feedback from heterogeneous supervisors, we normalize and then aggregate the outputs of the teacher models to eliminate score scale differences. On the BEIR benchmark, our model outperforms existing SOTA inference-free sparse model by \textbf{3.3 NDCG@10 score}. It exhibits search relevance comparable to siamese sparse retrievers and client-side latency only \textbf{1.1x that of BM25}.

information retrieval, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2411.04403

Country:

Asia > China (0.14)
North America > United States (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Information Management (0.93)

Add feedback

RefChecker: Reference-based Fine-grained Hallucination Checker and Benchmark for Large Language Models

Hu, Xiangkun, Ru, Dongyu, Qiu, Lin, Guo, Qipeng, Zhang, Tianhang, Xu, Yang, Luo, Yun, Liu, Pengfei, Zhang, Yue, Zhang, Zheng

arXiv.org Artificial IntelligenceMay-23-2024

Large Language Models (LLMs) have shown impressive capabilities but also a concerning tendency to hallucinate. This paper presents RefChecker, a framework that introduces claim-triplets to represent claims in LLM responses, aiming to detect fine-grained hallucinations. In RefChecker, an extractor generates claim-triplets from a response, which are then evaluated by a checker against a reference. We delineate three task settings: Zero, Noisy and Accurate Context, to reflect various real-world use cases. We curated a benchmark spanning various NLP tasks and annotated 11k claim-triplets from 2.1k responses by seven LLMs. RefChecker supports both proprietary and open-source models as the extractor and checker. Experiments demonstrate that claim-triplets enable superior hallucination detection, compared to other granularities such as response, sentence and sub-sentence level claims. RefChecker outperforms prior methods by 6.8 to 26.1 points on our benchmark and the checking results of RefChecker are strongly aligned with human judgments. This work is open sourced at https://github.com/amazon-science/RefChecker

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2405.14486

Country: Europe > Ireland (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.98)

Add feedback

StoryAnalogy: Deriving Story-level Analogies from Large Language Models to Unlock Analogical Understanding

Jiayang, Cheng, Qiu, Lin, Chan, Tsz Ho, Fang, Tianqing, Wang, Weiqi, Chan, Chunkit, Ru, Dongyu, Guo, Qipeng, Zhang, Hongming, Song, Yangqiu, Zhang, Yue, Zhang, Zheng

arXiv.org Artificial IntelligenceOct-23-2023

Analogy-making between narratives is crucial for human reasoning. In this paper, we evaluate the ability to identify and generate analogies by constructing a first-of-its-kind large-scale story-level analogy corpus, \textsc{StoryAnalogy}, which contains 24K story pairs from diverse domains with human annotations on two similarities from the extended Structure-Mapping Theory. We design a set of tests on \textsc{StoryAnalogy}, presenting the first evaluation of story-level analogy identification and generation. Interestingly, we find that the analogy identification tasks are incredibly difficult not only for sentence embedding models but also for the recent large language models (LLMs) such as ChatGPT and LLaMa. ChatGPT, for example, only achieved around 30% accuracy in multiple-choice questions (compared to over 85% accuracy for humans). Furthermore, we observe that the data in \textsc{StoryAnalogy} can improve the quality of analogy generation in LLMs, where a fine-tuned FlanT5-xxl model achieves comparable performance to zero-shot ChatGPT.

analogy, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2310.12874

Country: Asia > China (0.28)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.92)

Add feedback

Distributed Marker Representation for Ambiguous Discourse Markers and Entangled Relations

Ru, Dongyu, Qiu, Lin, Qiu, Xipeng, Zhang, Yue, Zhang, Zheng

arXiv.org Artificial IntelligenceJun-18-2023

Discourse analysis is an important task because it models intrinsic semantic structures between sentences in a document. Discourse markers are natural representations of discourse in our daily language. One challenge is that the markers as well as pre-defined and human-labeled discourse relations can be ambiguous when describing the semantics between sentences. We believe that a better approach is to use a contextual-dependent distribution over the markers to express discourse information. In this work, we propose to learn a Distributed Marker Representation (DMR) by utilizing the (potentially) unlimited discourse marker data with a latent discourse sense, thereby bridging markers with sentence pairs. Such representations can be learned automatically from data without supervision, and in turn provide insights into the data itself. Experiments show the SOTA performance of our DMR on the implicit discourse relation recognition task and strong interpretability. Our method also offers a valuable tool to understand complex ambiguity and entanglement among discourse markers and manually defined discourse relations.

artificial intelligence, natural language, relation, (16 more...)

arXiv.org Artificial Intelligence

2306.10658

Country:

North America > United States (1.00)
Europe (0.68)

Genre: Research Report (0.64)

Industry: Media (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)

Add feedback

QA4IE: A Question Answering based Framework for Information Extraction

Qiu, Lin, Zhou, Hao, Qu, Yanru, Zhang, Weinan, Li, Suoheng, Rong, Shu, Ru, Dongyu, Qian, Lihua, Tu, Kewei, Yu, Yong

arXiv.org Artificial IntelligenceJan-28-2019

Information Extraction (IE) refers to automatically extracting structured relation tuples from unstructured texts. Common IE solutions, including Relation Extraction (RE) and open IE systems, can hardly handle cross-sentence tuples, and are severely restricted by limited relation types as well as informal relation specifications (e.g., free-text based relation tuples). In order to overcome these weaknesses, we propose a novel IE framework named QA4IE, which leverages the flexible question answering (QA) approaches to produce high quality relation triples across sentences. Based on the framework, we develop a large IE benchmark with high quality human evaluation. This benchmark contains 293K documents, 2M golden relation triples, and 636 relation types. We compare our system with some IE baselines on our benchmark and the results show that our system achieves great improvements.

benchmark, deep learning, neural network, (20 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/978-3-030-00671-6_12

1804.03396

Country:

North America (0.14)
Europe > United Kingdom > Scotland (0.14)
Asia > China (0.14)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.71)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.71)

Add feedback

Approximate Random Dropout

Song, Zhuoran, Ru, Dongyu, Wang, Ru, Huang, Hongru, Peng, Zhenghao, Ke, Jing, Liang, Xiaoyao, Jiang, Li

arXiv.org Machine LearningMay-22-2018

The training phases of Deep neural network (DNN) consume enormous processing time and energy. Compression techniques for inference acceleration leveraging the sparsity of DNNs, however, can be hardly used in the training phase. Because the training involves dense matrix-multiplication using GPGPU, which endorse regular and structural data layout. In this paper, we exploit the sparsity of DNN resulting from the random dropout technique to eliminate the unnecessary computation and data access for those dropped neurons or synapses in the training phase. Experiments results on MLP and LSTM on standard benchmarks show that the proposed Approximate Random Dropout can reduce the training time by half on average with ignorable accuracy loss.

deep learning, dropout pattern, neural network, (20 more...)

arXiv.org Machine Learning

1805.08939

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback