AITopics | Zhang, June

Collaborating Authors

Zhang, June

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

REAL: Response Embedding-based Alignment for LLMs

Zhang, Honggen, Zhao, Xufeng, Molybog, Igor, Zhang, June

arXiv.org Artificial IntelligenceDec-20-2024

Aligning large language models (LLMs) to human preferences is a crucial step in building helpful and safe AI tools, which usually involve training on supervised datasets. Popular algorithms such as Direct Preference Optimization rely on pairs of AI-generated responses ranked according to human feedback. The response pair annotation process is the most labor-intensive and costly part of the alignment pipeline, and improving its efficiency and annotation quality would have a meaningful impact on AI development. We propose REAL: Response Embedding-based Alignment for LLMs, a strategy for constructing a high-quality training dataset that focuses on acquiring the most informative response pairs for labeling out of a set of response candidates. Our selection process is based on embedding responses independently of prompts. Experimental results on real-world dataset SHP2 and synthetic HH-RLHF benchmarks indicate that choosing dissimilar response pairs enhances the direct alignment of LLMs while reducing inherited labeling errors. The model aligned on dissimilar response pairs obtained a better margin and win rate on the dialogue task. Our findings suggest that focusing on distinct pairs can reduce the label error to improve the efficiency of LLM alignment, saving up to 65% of annotators' work.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2409.17169

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.86)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Out-of-Distribution Detection using Maximum Entropy Coding

Abolfazli, Mojtaba, Amirani, Mohammad Zaeri, Høst-Madsen, Anders, Zhang, June, Bratincsak, Andras

arXiv.org Artificial IntelligenceApr-25-2024

Given a default distribution $P$ and a set of test data $x^M=\{x_1,x_2,\ldots,x_M\}$ this paper seeks to answer the question if it was likely that $x^M$ was generated by $P$. For discrete distributions, the definitive answer is in principle given by Kolmogorov-Martin-L\"{o}f randomness. In this paper we seek to generalize this to continuous distributions. We consider a set of statistics $T_1(x^M),T_2(x^M),\ldots$. To each statistic we associate its maximum entropy distribution and with this a universal source coder. The maximum entropy distributions are subsequently combined to give a total codelength, which is compared with $-\log P(x^M)$. We show that this approach satisfied a number of theoretical properties. For real world data $P$ usually is unknown. We transform data into a standard distribution in the latent space using a bidirectional generate network and use maximum entropy coding there. We compare the resulting method to other methods that also used generative neural networks to detect anomalies. In most cases, our results show better performance.

artificial intelligence, machine learning, maximum entropy distribution, (16 more...)

arXiv.org Artificial Intelligence

2404.17023

Country:

North America > United States > Hawaii (0.14)
Oceania > Australia (0.14)

Genre: Research Report > New Finding (0.68)

Industry: Health & Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Maximum Entropy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory > Minimum Complexity Machines (0.68)

Add feedback

HaSa: Hardness and Structure-Aware Contrastive Knowledge Graph Embedding

Zhang, Honggen, Zhang, June, Molybog, Igor

arXiv.org Artificial IntelligenceOct-14-2023

We consider a contrastive learning approach to knowledge graph embedding (KGE) via InfoNCE. For KGE, efficient learning relies on augmenting the training data with negative triples. However, most KGE works overlook the bias from generating the negative triples-false negative triples (factual triples missing from the knowledge graph). We argue that the generation of high-quality (i.e., hard) negative triples might lead to an increase in false negative triples. To mitigate the impact of false negative triples during the generation of hard negative triples, we propose the Hardness and Structure-aware (\textbf{HaSa}) contrastive KGE method, which alleviates the effect of false negative triples while generating the hard negative triples. Experiments show that HaSa improves the performance of InfoNCE-based KGE approaches and achieves state-of-the-art results in several metrics for WN18RR datasets and competitive results for FB15k-237 datasets compared to both classic and pre-trained LM-based KGE methods.

artificial intelligence, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2305.10563

Country: North America > United States (0.69)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (0.85)

Add feedback

How News Evolves? Modeling News Text and Coverage using Graphs and Hawkes Process

Zhang, Honggen, Zhang, June

arXiv.org Artificial IntelligenceNov-18-2021

Monitoring news content automatically is an important problem. The news content, unlike traditional text, has a temporal component. However, few works have explored the combination of natural language processing and dynamic system models. One reason is that it is challenging to mathematically model the nuances of natural language. In this paper, we discuss how we built a novel dataset of news articles collected over time. Then, we present a method of converting news text collected over time to a sequence of directed multi-graphs, which represent semantic triples (Subject ! Predicate ! Object). We model the dynamics of specific topological changes from these graphs using discrete-time Hawkes processes. With our real-world data, we show that analyzing the structures of the graphs and the discrete-time Hawkes process model can yield insights on how the news events were covered and how to predict how it may be covered in the future.

artificial intelligence, hawke process, natural language, (17 more...)

arXiv.org Artificial Intelligence

2112.03008

Country:

North America > United States (1.00)
Asia > Middle East > Iran (0.29)

Genre: Research Report (1.00)

Industry:

Government > Regional Government > North America Government > United States Government (1.00)
Media > News (0.71)
Government > Military (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Differential Description Length for Hyperparameter Selection in Machine Learning

Host-Madsen, Anders, Abolfazli, Mojtaba, Zhang, June

arXiv.org Machine LearningFeb-12-2019

This paper introduces a new method for model selection and more generally hyperparameter selection in machine learning. The paper first proves a relationship between generalization error and a difference of description lengths of the training data; we call this difference differential description length (DDL). This allows prediction of generalization error from the training data \emph{alone} by performing encoding of the training data. This can now be used for model selection by choosing the model that has the smallest predicted generalization error. We show how this encoding can be done for linear regression and neural networks. We provide experiments showing that this leads to smaller generalization error than cross-validation and traditional MDL and Bayes methods.

artificial intelligence, bayesian inference, generalization error, (17 more...)

arXiv.org Machine Learning

1902.04699

Country:

North America > United States > New York (0.14)
North America > United States > Hawaii (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.47)

Add feedback