AITopics | Jiang, Haoming

Collaborating Authors

Jiang, Haoming

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Graph Reasoning for Question Answering with Triplet Retrieval

Li, Shiyang, Gao, Yifan, Jiang, Haoming, Yin, Qingyu, Li, Zheng, Yan, Xifeng, Zhang, Chao, Yin, Bing

arXiv.org Artificial IntelligenceMay-30-2023

Answering complex questions often requires reasoning over knowledge graphs (KGs). State-of-the-art methods often utilize entities in questions to retrieve local subgraphs, which are then fed into KG encoder, e.g. graph neural networks (GNNs), to model their local structures and integrated into language models for question answering. However, this paradigm constrains retrieved knowledge in local subgraphs and discards more diverse triplets buried in KGs that are disconnected but useful for question answering. In this paper, we propose a simple yet effective method to first retrieve the most relevant triplets from KGs and then rerank them, which are then concatenated with questions to be fed into language models. Extensive results on both CommonsenseQA and OpenbookQA datasets show that our method can outperform state-of-the-art up to 4.6% absolute accuracy.

computational linguistic, machine learning, question answering, (19 more...)

arXiv.org Artificial Intelligence

2305.18742

Country:

Europe (0.94)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.84)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)

Add feedback

CCGen: Explainable Complementary Concept Generation in E-Commerce

Huang, Jie, Gao, Yifan, Li, Zheng, Yang, Jingfeng, Song, Yangqiu, Zhang, Chao, Zhu, Zining, Jiang, Haoming, Chang, Kevin Chen-Chuan, Yin, Bing

arXiv.org Artificial IntelligenceMay-19-2023

We propose and study Complementary Concept Generation (CCGen): given a concept of interest, e.g., "Digital Cameras", generating a list of complementary concepts, e.g., 1) Camera Lenses 2) Batteries 3) Camera Cases 4) Memory Cards 5) Battery Chargers. CCGen is beneficial for various applications like query suggestion and item recommendation, especially in the e-commerce domain. To solve CCGen, we propose to train language models to generate ranked lists of concepts with a two-step training strategy. We also teach the models to generate explanations by incorporating explanations distilled from large teacher models. Extensive experiments and analysis demonstrate that our model can generate high-quality concepts complementary to the input concept while producing explanations to justify the predictions.

explanation, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2305.1148

Country:

North America > United States (1.00)
Asia (1.00)
North America > Canada (0.68)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Services > e-Commerce Services (0.61)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Add feedback

Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond

Yang, Jingfeng, Jin, Hongye, Tang, Ruixiang, Han, Xiaotian, Feng, Qizhang, Jiang, Haoming, Yin, Bing, Hu, Xia

arXiv.org Artificial IntelligenceApr-27-2023

This paper presents a comprehensive and practical guide for practitioners and end-users working with Large Language Models (LLMs) in their downstream natural language processing (NLP) tasks. We provide discussions and insights into the usage of LLMs from the perspectives of models, data, and downstream tasks. Firstly, we offer an introduction and brief summary of current GPT- and BERT-style LLMs. Then, we discuss the influence of pre-training data, training data, and test data. Most importantly, we provide a detailed discussion about the use and non-use cases of large language models for various natural language processing tasks, such as knowledge-intensive tasks, traditional natural language understanding tasks, natural language generation tasks, emergent abilities, and considerations for specific tasks.We present various use cases and non-use cases to illustrate the practical applications and limitations of LLMs in real-world scenarios. We also try to understand the importance of data and the specific challenges associated with each NLP task. Furthermore, we explore the impact of spurious biases on LLMs and delve into other essential considerations, such as efficiency, cost, and latency, to ensure a comprehensive understanding of deploying LLMs in practice. This comprehensive guide aims to provide researchers and practitioners with valuable insights and best practices for working with LLMs, thereby enabling the successful implementation of these models in a wide range of NLP tasks. A curated list of practical guide resources of LLMs, regularly updated, can be found at \url{https://github.com/Mooler0410/LLMsPracticalGuide}.

arxiv preprint arxiv, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2304.13712

Country: North America > United States (0.69)

Genre:

Research Report (1.00)
Overview (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

HomoDistil: Homotopic Task-Agnostic Distillation of Pre-trained Transformers

Liang, Chen, Jiang, Haoming, Li, Zheng, Tang, Xianfeng, Yin, Bin, Zhao, Tuo

arXiv.org Artificial IntelligenceFeb-19-2023

Knowledge distillation has been shown to be a powerful model compression approach to facilitate the deployment of pre-trained language models in practice. This paper focuses on task-agnostic distillation. It produces a compact pre-trained model that can be easily fine-tuned on various tasks with small computational costs and memory footprints. Despite the practical benefits, task-agnostic distillation is challenging. Since the teacher model has a significantly larger capacity and stronger representation power than the student model, it is very difficult for the student to produce predictions that match the teacher's over a massive amount of open-domain training data. Such a large prediction discrepancy often diminishes the benefits of knowledge distillation. To address this challenge, we propose Homotopic Distillation (HomoDistil), a novel task-agnostic distillation approach equipped with iterative pruning. Specifically, we initialize the student model from the teacher model, and iteratively prune the student's neurons until the target width is reached. Such an approach maintains a small discrepancy between the teacher's and student's predictions throughout the distillation process, which ensures the effectiveness of knowledge transfer. Extensive experiments demonstrate that HomoDistil achieves significant improvements on existing baselines.

arxiv preprint arxiv, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2302.09632

Country:

North America > United States (0.46)
North America > Canada (0.28)

Genre: Research Report (0.40)

Industry: Education > Educational Technology > Educational Software (0.56)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.68)

Add feedback

Context-Aware Query Rewriting for Improving Users' Search Experience on E-commerce Websites

Zuo, Simiao, Yin, Qingyu, Jiang, Haoming, Xi, Shaohui, Yin, Bing, Zhang, Chao, Zhao, Tuo

arXiv.org Artificial IntelligenceSep-24-2022

E-commerce queries are often short and ambiguous. Consequently, query understanding often uses query rewriting to disambiguate user-input queries. While using e-commerce search tools, users tend to enter multiple searches, which we call context, before purchasing. These history searches contain contextual insights about users' true shopping intents. Therefore, modeling such contextual information is critical to a better query rewriting model. However, existing query rewriting models ignore users' history behaviors and consider only the instant search query, which is often a short string offering limited information about the true shopping intent. We propose an end-to-end context-aware query rewriting model to bridge this gap, which takes the search context into account. Specifically, our model builds a session graph using the history search queries and their contained words. We then employ a graph attention mechanism that models cross-query relations and computes contextual information of the session. The model subsequently calculates session representations by combining the contextual information with the instant search query using an aggregation network. The session representations are then decoded to generate rewritten queries. Empirically, we demonstrate the superiority of our method to state-of-the-art approaches under various metrics. On in-house data from an online shopping platform, by introducing contextual information, our model achieves 11.6% improvement under the MRR (Mean Reciprocal Rank) metric and 20.1% improvement under the HIT@16 metric (a hit rate metric), in comparison with the best baseline method (Transformer-based model).

artificial intelligence, information management, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2209.07584

Country:

Europe (1.00)
North America > United States > Minnesota (0.28)

Genre: Research Report > Promising Solution (0.34)

Industry: Information Technology > Services > e-Commerce Services (1.00)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Named Entity Recognition with Small Strongly Labeled and Large Weakly Labeled Data

Jiang, Haoming, Zhang, Danqing, Cao, Tianyu, Yin, Bing, Zhao, Tuo

arXiv.org Artificial IntelligenceJun-16-2021

Weak supervision has shown promising results in many natural language processing tasks, such as Named Entity Recognition (NER). Existing work mainly focuses on learning deep NER models only with weak supervision, i.e., without any human annotation, and shows that by merely using weakly labeled data, one can achieve good performance, though still underperforms fully supervised NER with manually/strongly labeled data. In this paper, we consider a more practical scenario, where we have both a small amount of strongly labeled data and a large amount of weakly labeled data. Unfortunately, we observe that weakly labeled data does not necessarily improve, or even deteriorate the model performance (due to the extensive noise in the weak labels) when we train deep NER models over a simple or weighted combination of the strongly labeled and weakly labeled data. To address this issue, we propose a new multi-stage computational framework -- NEEDLE with three essential ingredients: (1) weak label completion, (2) noise-aware loss function, and (3) final fine-tuning over the strongly labeled data. Through experiments on E-commerce query NER and Biomedical NER, we demonstrate that NEEDLE can effectively suppress the noise of the weak labels and outperforms existing methods. In particular, we achieve new SOTA F1-scores on 3 Biomedical NER datasets: BC5CDR-chem 93.74, BC5CDR-disease 90.69, NCBI-disease 92.28.

deep learning, labeled data, neural network, (21 more...)

arXiv.org Artificial Intelligence

2106.08977

Country:

Europe (1.00)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report (1.00)

Industry: Information Technology > Services (0.36)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Towards Automatic Evaluation of Dialog Systems: A Model-Free Off-Policy Evaluation Approach

Jiang, Haoming, Dai, Bo, Yang, Mengjiao, Zhao, Tuo, Wei, Wei

arXiv.org Artificial IntelligenceFeb-28-2021

Reliable automatic evaluation of dialogue systems under an interactive environment has long been overdue. An ideal environment for evaluating dialog systems, also known as the Turing test, needs to involve human interaction, which is usually not affordable for large-scale experiments. Though researchers have attempted to use metrics (e.g., perplexity, BLEU) in language generation tasks or some model-based reinforcement learning methods (e.g., self-play evaluation) for automatic evaluation, these methods only show a very weak correlation with the actual human evaluation in practice. To bridge such a gap, we propose a new framework named ENIGMA for estimating human evaluation scores based on recent advances of off-policy evaluation in reinforcement learning. ENIGMA only requires a handful of pre-collected experience data, and therefore does not involve human interaction with the target policy during the evaluation, making automatic evaluations feasible. More importantly, ENIGMA is model-free and agnostic to the behavior policies for collecting the experience data (see details in Section 2), which significantly alleviates the technical difficulties of modeling complex dialogue environments and human behaviors. Our experiments show that ENIGMA significantly outperforms existing methods in terms of correlation with human evaluation scores.

artificial intelligence, evaluation, survey article, (17 more...)

arXiv.org Artificial Intelligence

2102.10242

Genre: Research Report > New Finding (0.46)

Industry: Energy (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Calibrated Language Model Fine-Tuning for In- and Out-of-Distribution Data

Kong, Lingkai, Jiang, Haoming, Zhuang, Yuchen, Lyu, Jie, Zhao, Tuo, Zhang, Chao

arXiv.org Artificial IntelligenceOct-22-2020

Fine-tuned pre-trained language models can suffer from severe miscalibration for both in-distribution and out-of-distribution (OOD) data due to over-parameterization. To mitigate this issue, we propose a regularized fine-tuning method. Our method introduces two types of regularization for better calibration: (1) On-manifold regularization, which generates pseudo on-manifold samples through interpolation within the data manifold. Augmented training with these pseudo samples imposes a smoothness regularization to improve in-distribution calibration. (2) Off-manifold regularization, which encourages the model to output uniform distributions for pseudo off-manifold samples to address the over-confidence issue for OOD data. Our experiments demonstrate that the proposed method outperforms existing calibration methods for text classification in terms of expectation calibration error, misclassification detection, and OOD detection on six datasets. Our code can be found at https://github.com/Lingkai-Kong/Calibrated-BERT-Fine-Tuning.

artificial intelligence, dataset, natural language, (16 more...)

arXiv.org Artificial Intelligence

2010.11506

Country: North America > United States (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.87)

Add feedback

BOND: BERT-Assisted Open-Domain Named Entity Recognition with Distant Supervision

Liang, Chen, Yu, Yue, Jiang, Haoming, Er, Siawpeng, Wang, Ruijia, Zhao, Tuo, Zhang, Chao

arXiv.org Artificial IntelligenceJun-28-2020

We study the open-domain named entity recognition (NER) problem under distant supervision. The distant supervision, though does not require large amounts of manual annotations, yields highly incomplete and noisy distant labels via external knowledge bases. To address this challenge, we propose a new computational framework -- BOND, which leverages the power of pre-trained language models (e.g., BERT and RoBERTa) to improve the prediction performance of NER models. Specifically, we propose a two-stage training algorithm: In the first stage, we adapt the pre-trained language model to the NER tasks using the distant labels, which can significantly improve the recall and precision; In the second stage, we drop the distant labels, and propose a self-training approach to further improve the model performance. Thorough experiments on 5 benchmark datasets demonstrate the superiority of BOND over existing distantly supervised NER methods. The code and distantly labeled data have been released in https://github.com/cliang1453/BOND.

deep learning, named entity recognition, soccer, (21 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3394486.3403149

2006.15509

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.46)

Industry: Leisure & Entertainment > Sports > Soccer (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Picasso: A Sparse Learning Library for High Dimensional Data Analysis in R and Python

Ge, Jason, Li, Xingguo, Jiang, Haoming, Liu, Han, Zhang, Tong, Wang, Mengdi, Zhao, Tuo

arXiv.org Machine LearningJun-26-2020

We describe a new library named picasso, which implements a unified framework of pathwise coordinate optimization for a variety of sparse learning problems (e.g., sparse linear regression, sparse logistic regression, sparse Poisson regression and scaled sparse linear regression) combined with efficient active set selection strategies. Besides, the library allows users to choose different sparsity-inducing regularizers, including the convex $\ell_1$, nonconvex MCP and SCAD regularizers. The library is coded in C++ and has user-friendly R and Python wrappers. Numerical experiments demonstrate that picasso can scale up to large problems efficiently.

artificial intelligence, coordinate optimization, health & medicine, (17 more...)

arXiv.org Machine Learning

2006.15261

Genre: Research Report (0.95)

Industry: Health & Medicine (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)

Add feedback