Goto

Collaborating Authors

 television sery


PSSD: Making Large Language Models Self-denial via Human Psyche Structure

Liao, Jinzhi, Liao, Zenghua, Zhao, Xiang

arXiv.org Artificial Intelligence

The enhance of accuracy in reasoning results of LLMs arouses the community's interests, wherein pioneering studies investigate post-hoc strategies to rectify potential mistakes. Despite extensive efforts, they are all stuck in a state of resource competition demanding significant time and computing expenses. The cause of the situation lies in the failure of identifying the fundamental feature of the solutions in this line, coined as the self-denial of LLMs. In other words, LLMs should confidently determine the potential existence of mistakes and carefully execute the targeted correction. As the whole procedure conducts within LLMs, supporting and persuasive references are hard to acquire, while the absence of specific steps towards refining hidden mistakes persists even when errors are acknowledged. In response to the challenges, we present PSSD, which refers to and implements the human psyche structure such that three distinct and interconnected roles contribute to human reasoning. Specifically, PSSD leverages the recent multi-agent paradigm, and is further enhanced with three innovatively conceived roles: (1) the intuition-based id role that provides initial attempts based on benign LLMs; (2) the rule-driven superego role that summarizes rules to regulate the above attempts, and returns specific key points as guidance; and (3) the script-centric ego role that absorbs all procedural information to generate executable script for the final answer prediction. Extensive experiments demonstrate that the proposed design not only better enhance reasoning capabilities, but also seamlessly integrate with current models, leading to superior performance.


Growing a Tail: Increasing Output Diversity in Large Language Models

Shur-Ofry, Michal, Horowitz-Amsalem, Bar, Rahamim, Adir, Belinkov, Yonatan

arXiv.org Artificial Intelligence

For large groups, use the name of the group or consortium and include a full list of the authors and affiliations at the end of the main manuscript or in the Supplementary Materials. Abstract: How diverse are the outputs of large language models when diversity is desired? We examine the diversity of responses of various models to questions with multiple possible answers, comparing them with human responses. Our findings suggest that models' outputs are highly concentrated, reflecting a narrow, mainstream'worldview', in comparison to humans, whose responses exhibit a much longer-tail. We examine three ways to increase models' output diversity: 1) increasing generation randomness via temperature sampling; 2) prompting models to answer from diverse perspectives; 3) aggregating outputs from several models. A combination of these measures significantly increases models' output diversity, reaching that of humans. We discuss implications of these findings for AI policy that wishes to preserve cultural diversity, an essential building block of a democratic social fabric. Conversely, a lack of diversity can result in extremism and exclusion (e.g., 1, 2).


RAC: Efficient LLM Factuality Correction with Retrieval Augmentation

Li, Changmao, Flanigan, Jeffrey

arXiv.org Artificial Intelligence

Large Language Models (LLMs) exhibit impressive results across a wide range of natural language processing (NLP) tasks, yet they can often produce factually incorrect outputs. This paper introduces a simple but effective low-latency post-correction method, \textbf{Retrieval Augmented Correction (RAC)}, aimed at enhancing the factual performance of LLMs without requiring additional fine-tuning. Our method is general and can be used with any instruction-tuned LLM, and has greatly reduced latency compared to prior approaches. RAC decomposes the LLM's output into atomic facts and applies a fine-grained verification and correction process with retrieved content to verify and correct the LLM-generated output. Our extensive experiments show that RAC yields up to 30\% improvements over state-of-the-art baselines across two popular factuality evaluation datasets, validating its efficacy and robustness in both with and without the integration of Retrieval-Augmented Generation (RAG) across different LLMs.\footnote{Our code is at \url{https://github.com/jlab-nlp/Retrieval-Augmented-Correction}}


Generation with Dynamic Vocabulary

Liu, Yanting, Ji, Tao, Sun, Changzhi, Wu, Yuanbin, Wang, Xiaoling

arXiv.org Artificial Intelligence

We introduce a new dynamic vocabulary for language models. It can involve arbitrary text spans during generation. These text spans act as basic generation bricks, akin to tokens in the traditional static vocabularies. We show that, the ability to generate multi-tokens atomically improve both generation quality and efficiency (compared to the standard language model, the MAUVE metric is increased by 25%, the latency is decreased by 20%). The dynamic vocabulary can be deployed in a plug-and-play way, thus is attractive for various downstream applications. For example, we demonstrate that dynamic vocabulary can be applied to different domains in a training-free manner. It also helps to generate reliable citations in question answering tasks (substantially enhancing citation results without compromising answer accuracy).


Mitigating the Learning Bias towards Repetition by Self-Contrastive Training for Open-Ended Generation

Guan, Jian, Huang, Minlie

arXiv.org Artificial Intelligence

Despite the huge progress in myriad generation tasks, pretrained language models (LMs) such as GPT2 still tend to generate repetitive texts with maximization-based decoding algorithms for open-ended generation. We attribute their overestimation of token-level repetition probabilities to the learning bias: LMs capture simple repetitive patterns faster with the MLE loss. We propose self-contrastive training to penalize the output of a premature checkpoint of the same model when it incorrectly predicts repetition, which is shown to mitigate repetition effectively while maintaining fluency on two datasets. Furthermore, we find that LMs use longer-range dependencies to predict repetitive tokens than non-repetitive ones, which may be the cause of sentence-level repetition loops.


"Demon Slayer": The Viral Blockbuster from Japan

The New Yorker

One of the seismic cultural shifts of the pandemic era has been a migration into fantasies. Some of them are troubling, such as the conspiratorial prejudice that has fuelled QAnon and the recent surge in violence against those of Asian descent. Others are restorative: the immersive worlds of books, the virtual realities of video games, the hypnotic lull of binge-streamed television series. Many of the escapes that we use to nourish ourselves originated in Japan. The stunning success of Nintendo's Animal Crossing: New Horizons, which sold thirty-one million copies worldwide last year, is a striking example.


Residual Energy-Based Models for Text

Bakhtin, Anton, Deng, Yuntian, Gross, Sam, Ott, Myle, Ranzato, Marc'Aurelio, Szlam, Arthur

arXiv.org Machine Learning

Current large-scale auto-regressive language models (Radford et al., 2019; Liu et al., 2018; Graves, 2013) display impressive fluency and can generate convincing text. In this work we start by asking the question: Can the generations of these models be reliably distinguished from real text by statistical discriminators? We find experimentally that the answer is affirmative when we have access to the training data for the model, and guardedly affirmative even if we do not. This suggests that the auto-regressive models can be improved by incorporating the (globally normalized) discriminators into the generative process. We give a formalism for this using the Energy-Based Model framework, and show that it indeed improves the results of the generative models, measured both in terms of perplexity and in terms of human evaluation.


Real or Fake? Learning to Discriminate Machine from Human Generated Text

Bakhtin, Anton, Gross, Sam, Ott, Myle, Deng, Yuntian, Ranzato, Marc'Aurelio, Szlam, Arthur

arXiv.org Machine Learning

Recent advances in generative modeling of text have demonstrated remarkable improvements in terms of fluency and coherency. In this work we investigate to which extent a machine can discriminate real from machine generated text. This is important in itself for automatic detection of computer generated stories, but can also serve as a tool for further improving text generation. We show that learning a dedicated scoring function to discriminate between real and fake text achieves higher precision than employing the likelihood of a generative model. The scoring functions generalize to other generators than those used for training as long as these generators have comparable model complexity and are trained on similar datasets.


Showtime Orders 10 Episodes Of Long-Dormant 'Halo' TV Series

International Business Times

A long-dormant TV project appears to have finally emerged from its hibernation, if Showtime is to be believed. The premium cable network announced Thursday it had ordered 10 episodes of a television series based on the popular "Halo" video game franchise. The project was originally announced five years ago. The news was confirmed in a Thursday blog post on the official Halo Waypoint website by Kiki Wolfkill, the head of transmedia for "Halo" developer 343 Industries. The show will be headed by "Lone Star" creator Kyle Killen and "Rise of the Planet of the Apes" director Rupert Wyatt, with Steven Spielberg's Amblin Television producing.


Want to get more family-friendly entertainment on TV and in the movies? Here's how

FOX News

Some nights I flip and flip and flip through the television channels searching for something family-friendly to watch. It blows my mind that I have several hundred channels and yet there are so few choices. The same is often true when looking for a movie for the family to go see. I grew up watching shows that built my character. I watched television series about family members who clearly loved and respected each other.