Goto

Collaborating Authors

 ted



47b4f1bfdf6d298682e610ad74b37dca-Paper.pdf

Neural Information Processing Systems

Given only positive examples and unlabeled examples (from both positive and negative classes), we might hope nevertheless to estimate an accurate positiveversus-negative classifier. Formally, this task is broken down into two subtasks: (i) Mixture Proportion Estimation(MPE)--determining the fraction of positive examples in the unlabeled data; and (ii)PU-learning--given such an estimate, learning the desired positive-versus-negative classifier.



Mixture Proportion Estimation and PU Learning: A Modern Approach

Neural Information Processing Systems

Given only positive examples and unlabeled examples (from both positive and negative classes), we might hope nevertheless to estimate an accurate positive-versus-negative classifier. Formally, this task is broken down into two subtasks: (i) Mixture Proportion Estimation (MPE)--determining the fraction of positive examples in the unlabeled data; and (ii) PU-learning --given such an estimate, learning the desired positive-versus-negative classifier.


"Eddington" Is a Lethally Self-Satisfied COVID Satire

The New Yorker

"Eddington" is a slog, but a slog with ambitions--and its director and screenwriter, Ari Aster, is savvy enough to cultivate an air of mystery about what those ambitions are. His earlier chillers, "Hereditary" (2018) and "Midsommar" (2019), had their labyrinthine ambiguities, too, but they also had propulsive craft and cunning, plus a resolute commitment to scaring us stupid. Then came the ungainly "Beau Is Afraid" (2023), a cavalcade of Oedipal neuroses both showy and coy, in which Aster didn't seem to lose focus so much as sacrifice it on the altar of auteurism. With "Eddington," his high-minded unravelling continues. No longer a horror wunderkind, Aster, at thirty-nine, yearns to be an impish anatomist of the body politic.


The Truth Becomes Clearer Through Debate! Multi-Agent Systems with Large Language Models Unmask Fake News

Liu, Yuhan, Liu, Yuxuan, Zhang, Xiaoqing, Chen, Xiuying, Yan, Rui

arXiv.org Artificial Intelligence

In today's digital environment, the rapid propagation of fake news via social networks poses significant social challenges. Most existing detection methods either employ traditional classification models, which suffer from low interpretability and limited generalization capabilities, or craft specific prompts for large language models (LLMs) to produce explanations and results directly, failing to leverage LLMs' reasoning abilities fully. Inspired by the saying that "truth becomes clearer through debate," our study introduces a novel multi-agent system with LLMs named TruEDebate (TED) to enhance the interpretability and effectiveness of fake news detection. TED employs a rigorous debate process inspired by formal debate settings. Central to our approach are two innovative components: the DebateFlow Agents and the InsightFlow Agents. The DebateFlow Agents organize agents into two teams, where one supports and the other challenges the truth of the news. These agents engage in opening statements, cross-examination, rebuttal, and closing statements, simulating a rigorous debate process akin to human discourse analysis, allowing for a thorough evaluation of news content. Concurrently, the InsightFlow Agents consist of two specialized sub-agents: the Synthesis Agent and the Analysis Agent. The Synthesis Agent summarizes the debates and provides an overarching viewpoint, ensuring a coherent and comprehensive evaluation. The Analysis Agent, which includes a role-aware encoder and a debate graph, integrates role embeddings and models the interactions between debate roles and arguments using an attention mechanism, providing the final judgment.


TED: Turn Emphasis with Dialogue Feature Attention for Emotion Recognition in Conversation

Ono, Junya, Wakaki, Hiromi

arXiv.org Artificial Intelligence

Emotion recognition in conversation (ERC) has been attracting attention by methods for modeling multi-turn contexts. The multi-turn input to a pretraining model implicitly assumes that the current turn and other turns are distinguished during the training process by inserting special tokens into the input sequence. This paper proposes a priority-based attention method to distinguish each turn explicitly by adding dialogue features into the attention mechanism, called Turn Emphasis with Dialogue (TED). It has a priority for each turn according to turn position and speaker information as dialogue features. It takes multi-head self-attention between turn-based vectors for multi-turn input and adjusts attention scores with the dialogue features. We evaluate TED on four typical benchmarks. The experimental results demonstrate that TED has high overall performance in all datasets and achieves state-of-the-art performance on IEMOCAP with numerous turns.


Benchmarking symbolic regression constant optimization schemes

Reis, L. G. A dos, Caminha, V. L. P. S., Penna, T. J. P.

arXiv.org Artificial Intelligence

Symbolic regression is a machine learning technique, and it has seen many advancements in recent years, especially in genetic programming approaches (GPSR). Furthermore, it has been known for many years that constant optimization of parameters, during the evolutionary search, greatly increases GPSR performance However, different authors approach such tasks differently and no consensus exists regarding which methods perform best. In this work, we evaluate eight different parameter optimization methods, applied during evolutionary search, over ten known benchmark problems, in two different scenarios. We also propose using an under-explored metric called Tree Edit Distance (TED), aiming to identify symbolic accuracy. In conjunction with classical error measures, we develop a combined analysis of model performance in symbolic regression. We then show that different constant optimization methods perform better in certain scenarios and that there is no overall best choice for every problem. Finally, we discuss how common metric decisions may be biased and appear to generate better models in comparison.


Black Myth: Wukong – the summer's most exciting, and most controversial, video game

The Guardian

When Chinese developer Game Science revealed its debut console game Black Myth: Wukong last year, it immediately caused a stir. Inspired by the great 16th-century Chinese novel, Journey to the West, the action-packed footage featured the titular mythological monkey Sun Wukong battling Buddhist-folklore demons and sword-wielding anthropomorphic foxes in lusciously rendered forests. Smartphone games are inordinately popular in China, but console game developers are still few and far between, and the excitement for Wukong in Game Science's homeland reached fever pitch. Within 24 hours, the trailer racked up 2m views on YouTube and more than 10m on Chinese video sharing site Bilibili, much to its creators' shock and delight. One excited fan even broke into the developer's office, desperate for more info on the game.


Generalization or Memorization: Data Contamination and Trustworthy Evaluation for Large Language Models

Dong, Yihong, Jiang, Xue, Liu, Huanyu, Jin, Zhi, Gu, Bin, Yang, Mengfei, Li, Ge

arXiv.org Artificial Intelligence

Recent statements about the impressive capabilities of large language models (LLMs) are usually supported by evaluating on open-access benchmarks. Considering the vast size and wide-ranging sources of LLMs' training data, it could explicitly or implicitly include test data, leading to LLMs being more susceptible to data contamination. However, due to the opacity of training data, the black-box access of models, and the rapid growth of synthetic training data, detecting and mitigating data contamination for LLMs faces significant challenges. In this paper, we propose CDD, which stands for Contamination Detection via output Distribution for LLMs. CDD necessitates only the sampled texts to detect data contamination, by identifying the peakedness of LLM's output distribution. To mitigate the impact of data contamination in evaluation, we also present TED: Trustworthy Evaluation via output Distribution, based on the correction of LLM's output distribution. To facilitate this study, we introduce two benchmarks, i.e., DetCon and ComiEval, for data contamination detection and contamination mitigation evaluation tasks. Extensive experimental results show that CDD achieves the average relative improvements of 21.8\%-30.2\% over other contamination detection approaches in terms of Accuracy, F1 Score, and AUC metrics, and can effectively detect implicit contamination. TED substantially mitigates performance improvements up to 66.9\% attributed to data contamination across various contamination setups. In real-world applications, we reveal that ChatGPT exhibits a high potential to suffer from data contamination on HumanEval benchmark.