AITopics

2402.10618

Country:

Europe > France (0.04)
Asia > Singapore (0.04)

Genre:

Personal > Interview (0.94)
Research Report > New Finding (0.67)

Industry:

Leisure & Entertainment (1.00)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

arXiv.org Artificial IntelligenceFeb-16-2024

Can We Verify Step by Step for Incorrect Answer Detection?

Xu, Xin, Diao, Shizhe, Yang, Can, Wang, Yang

Chain-of-Thought (CoT) prompting has marked a significant advancement in enhancing the reasoning capabilities of large language models (LLMs). Previous studies have developed various extensions of CoT, which focus primarily on enhancing end-task performance. In addition, there has been research on assessing the quality of reasoning chains in CoT. This raises an intriguing question: Is it possible to predict the accuracy of LLM outputs by scrutinizing the reasoning chains they generate? To answer this research question, we introduce a benchmark, R2PE, designed specifically to explore the relationship between reasoning chains and performance in various reasoning tasks spanning five different domains. This benchmark aims to measure the falsehood of the final output of LLMs based on the reasoning steps. To make full use of information in multiple reasoning chains, we propose the process discernibility score (PDS) framework that beats the answer-checking baseline by a large margin. Concretely, this resulted in an average of 5.1% increase in the F1 score across all 45 subsets within R2PE. We further demonstrate our PDS's efficacy in advancing open-domain QA accuracy. Data and code are available at https://github.com/XinXU-USTC/R2PE.

arxiv preprint, gpt-3, subset, (14 more...)

2402.10528

Country:

Europe > Czechia (0.14)
Asia > Pakistan (0.05)
North America > Canada > Alberta (0.04)
(20 more...)

Genre:

Personal > Obituary (0.68)
Research Report > New Finding (0.66)

Industry:

Media > Film (1.00)
Leisure & Entertainment > Sports > Motorsports (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

Wettig, Alexander, Gupta, Aatmik, Malik, Saumya, Chen, Danqi

QuRating: Selecting High-Quality Data for Training Language Models

Selecting high-quality pre-training data is important for creating capable language models, but existing methods rely on simple heuristics. We introduce QuRating, a method for selecting pre-training data that captures the abstract qualities of texts which humans intuitively perceive. In this paper, we investigate four qualities - writing style, required expertise, facts & trivia, and educational value. We find that LLMs are able to discern these qualities and observe that they are better at making pairwise judgments of texts than at rating the quality of a text directly. We train a QuRater model to learn scalar ratings from pairwise judgments, and use it to annotate a 260B training corpus with quality ratings for each of the four criteria. In our experiments, we select 30B tokens according to the different quality ratings and train 1.3B-parameter language models on the selected data. We find that it is important to balance quality and diversity, as selecting only the highest-rated documents leads to poor results. When we sample using quality ratings as logits over documents, our models achieve lower perplexity and stronger in-context learning performance than baselines. Beyond data selection, we use the quality ratings to construct a training curriculum which improves performance without changing the training dataset. We extensively analyze the quality ratings and discuss their characteristics, biases, and wider implications.

large language model, machine learning, southern asia 10, (22 more...)

2402.09739

Country:

North America > United States > Texas (0.67)
Europe > Russia (0.67)
Asia > Middle East > Republic of Türkiye (0.67)
(41 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Personal (1.00)
Instructional Material (1.00)

Industry:

Transportation > Passenger (1.00)
Transportation > Air (1.00)
Retail (1.00)
(40 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Lemercier, Jean-Marie, Richter, Julius, Welker, Simon, Moliner, Eloi, Välimäki, Vesa, Gerkmann, Timo

Diffusion Models for Audio Restoration

With the development of audio playback devices and fast data transmission, the demand for high sound quality is rising, for both entertainment and communications. In this quest for better sound quality, challenges emerge from distortions and interferences originating at the recording side or caused by an imperfect transmission pipeline. To address this problem, audio restoration methods aim to recover clean sound signals from the corrupted input data. We present here audio restoration algorithms based on diffusion models, with a focus on speech enhancement and music restoration tasks. Traditional approaches, often grounded in handcrafted rules and statistical heuristics, have shaped our understanding of audio signals. In the past decades, there has been a notable shift towards data-driven methods that exploit the modeling capabilities of deep neural networks (DNNs). Deep generative models, and among them diffusion models, have emerged as powerful techniques for learning complex data distributions. However, relying solely on DNN-based learning approaches carries the risk of reducing interpretability, particularly when employing end-to-end models. Nonetheless, data-driven approaches allow more flexibility in comparison to statistical model-based frameworks whose performance depends on distributional and statistical assumptions that can be difficult to guarantee. Here, we aim to show that diffusion models can combine the best of both worlds and offer the opportunity to design audio restoration algorithms with a good degree of interpretability and a remarkable performance in terms of sound quality.

artificial intelligence, diffusion model, machine learning, (15 more...)

2402.09821

Country:

Europe > Germany (0.46)
Europe > Finland (0.14)
North America > United States (0.14)
(4 more...)

Genre:

Research Report (0.82)
Personal (0.67)

Industry:

Energy > Oil & Gas (0.68)
Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)

Darwin Turing Dawkins: Building a General Theory of Evolution

Adleman, Leonard M.

Living things, computers, societies, and even books are part of a grand evolutionary struggle to survive. That struggle shapes nature, nations, religions, art, science, and you. What you think, feel, and do is determined by it. Darwinian evolution does not apply solely to the genes that are stored in DNA. Using the insights of Alan Turing and Richard Dawkins, we will see that it also applies to the memes we store in our brains and the information we store in our computers. The next time you run for president, fight a war, or just deal with the ordinary problems humans are heir to, perhaps this book will be of use. If you want to understand why and when you will die, or if you want to achieve greatness this book may help. If you are concerned about where the computer revolution is headed, this book may provide some answers.

brain, computer, prene, (16 more...)

2402.10393

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
Europe > Sweden (0.04)
North America > United States > California > Los Angeles County > Los Angeles (0.04)
(18 more...)

Genre:

Research Report (1.00)
Personal (1.00)

Industry:

Media (1.00)
Leisure & Entertainment (1.00)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
(13 more...)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.67)

BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of data

Łajszczak, Mateusz, Cámbara, Guillermo, Li, Yang, Beyhan, Fatih, van Korlaar, Arent, Yang, Fan, Joly, Arnaud, Martín-Cortinas, Álvaro, Abbas, Ammar, Michalski, Adam, Moinet, Alexis, Karlapati, Sri, Muszyńska, Ewa, Guo, Haohan, Putrycz, Bartosz, Gambino, Soledad López, Yoo, Kayeon, Sokolova, Elena, Drugman, Thomas

We introduce a text-to-speech (TTS) model called BASE TTS, which stands for Big Adaptive Streamable TTS with Emergent abilities. BASE TTS is the largest TTS model to-date, trained on 100K hours of public domain speech data, achieving a new state-of-the-art in speech naturalness. It deploys a 1-billionparameter autoregressive Transformer that converts raw texts into discrete codes ("speechcodes") followed by a convolution-based decoder which converts these speechcodes into waveforms in an incremental, streamable manner. Further, our speechcodes are built using a novel speech tokenization technique that features speaker ID disentanglement and compression with byte-pair encoding. Echoing the widely-reported "emergent abilities" of large language models when trained on increasing volume of data, we show that BASE TTS variants built with 10K+ hours and 500M+ parameters begin to demonstrate natural prosody on textually complex sentences. We design and share a specialized dataset to measure these emergent abilities for text-to-speech. We showcase state-of-the-art naturalness of BASE TTS by evaluating against baselines that include publicly available large-scale text-tospeech systems: YourTTS, Bark and TortoiseTTS. Audio samples generated by the model can be heard at https://amazon-ltts-paper.com/.

arxiv preprint arxiv, representation, speechcode, (13 more...)

2402.08093

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Middle East > Jordan (0.04)
Asia > Maldives (0.04)
(19 more...)

Genre:

Personal (1.00)
Research Report > Experimental Study (0.92)

Industry:

Media (1.00)
Leisure & Entertainment (1.00)
Health & Medicine (0.67)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Synthesis (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

The GuardianFeb-14-2024, 14:00:40 GMT

Voices of the dead: shooting victims plead for gun reform with AI-voice messages

Six years ago today, Joaquin Oliver was killed in a hallway outside his Florida classroom, one of 17 students and staff murdered in the worst high school shooting in the US. On Wednesday, lawmakers in Washington DC will hear his voice, recreated by artificial intelligence, in phone calls demanding to know why they've done nothing to tackle the plague of gun violence. "It's been six years and you've done nothing. Not a thing to stop all the shootings that have happened since," the message from Oliver, who was 17 when he died in the 2018 Valentine's Day's tragedy at Parkland's Marjory Stoneman Douglas high school, says. "I'm back today because my parents used AI to recreate my voice to call you. Other victims like me will be calling too, again and again, to demand action. How many calls will it take for you to care? How many dead voices will you hear before you finally listen?"

ai-voice message, gun reform, joaquin, (14 more...)

The Guardian

Country:

North America > United States > District of Columbia > Washington (0.25)
North America > United States > Maryland (0.06)
North America > United States > Connecticut (0.06)
(4 more...)

Genre: Personal > Obituary (0.70)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Government (1.00)
Education > Health & Safety > School Safety & Security > School Violence (1.00)
Health & Medicine > Therapeutic Area (0.97)

Technology: Information Technology > Artificial Intelligence > Applied AI (0.35)

arXiv.org Artificial IntelligenceFeb-14-2024

Long-form evaluation of model editing

Rosati, Domenic, Gonzales, Robie, Chen, Jinkun, Yu, Xuemin, Erkan, Melis, Kayani, Yahya, Chavatapalli, Satya Deepika, Rudzicz, Frank, Sajjad, Hassan

Evaluations of model editing currently only use the `next few token' completions after a prompt. As a result, the impact of these methods on longer natural language generation is largely unknown. We introduce long-form evaluation of model editing (\textbf{\textit{LEME}}) a novel evaluation protocol that measures the efficacy and impact of model editing in long-form generative settings. Our protocol consists of a machine-rated survey and a classifier which correlates well with human ratings. Importantly, we find that our protocol has very little relationship with previous short-form metrics (despite being designed to extend efficacy, generalization, locality, and portability into a long-form setting), indicating that our method introduces a novel set of dimensions for understanding model editing methods. Using this protocol, we benchmark a number of model editing techniques and present several findings including that, while some methods (ROME and MEMIT) perform well in making consistent edits within a limited scope, they suffer much more from factual drift than other methods. Finally, we present a qualitative analysis that illustrates common failure modes in long-form generative settings including internal consistency, lexical cohesion, and locality issues.

consistency, eiffel tower, evaluation, (14 more...)

2402.09394

Country:

Europe > Italy > Lazio > Rome (0.05)
North America > United States > New York (0.04)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
(23 more...)

Genre:

Research Report > Experimental Study (1.00)
Questionnaire & Opinion Survey (1.00)
Personal (0.93)
Research Report > New Finding (0.68)

Industry:

Education (0.92)
Leisure & Entertainment > Games > Computer Games (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.52)

Fang, Yihao, Thomas, Stephen W., Zhu, Xiaodan

HGOT: Hierarchical Graph of Thoughts for Retrieval-Augmented In-Context Learning in Factuality Evaluation

arXiv.org Artificial IntelligenceFeb-14-2024

With the widespread adoption of large language models (LLMs) in numerous applications, the challenge of factuality and the propensity for hallucinations raises significant concerns. To address this issue, particularly in retrieval-augmented in-context learning, we introduce the hierarchical graph of thoughts (HGOT), a structured, multi-layered graph approach designed to enhance the retrieval of pertinent passages during in-context learning. The framework utilizes the emergent planning capabilities of LLMs, employing the divide-and-conquer strategy to break down complex queries into manageable sub-queries. It refines self-consistency majority voting for answer selection, which incorporates the recently proposed citation recall and precision metrics to assess the quality of thoughts, linking an answer's credibility intrinsically to the thought's quality. This methodology introduces a weighted system in majority voting, prioritizing answers based on the citation quality of their thoughts. Additionally, we propose a scoring mechanism for evaluating retrieved passages, considering factors such as citation frequency and quality, self-consistency confidence, and the retrieval module's ranking. Experiments reveal that HGOT outperforms other retrieval-augmented in-context learning methods, including Demonstrate-Search-Predict (DSP), ReAct, Self-Ask, and Retrieve-then-Read on different datasets by as much as $7\%$, demonstrating its efficacy in enhancing the factuality of LLMs.

president, step 1, todd boehly, (14 more...)

2402.0939

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > California > Los Angeles County > Los Angeles (0.14)
North America > United States > Oklahoma (0.06)
(20 more...)

Genre:

Personal (0.93)
Research Report (0.82)

Industry:

Media (1.00)
Law (1.00)
Health & Medicine > Therapeutic Area (0.95)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

TIME - TechFeb-13-2024, 16:14:54 GMT

Meta's AI Chief Yann LeCun on AGI, Open-Source, and AI Risk

Meta's chief AI scientist, Yann LeCun, received another accolade to add to his long list of awards on Sunday, when he was recognized with a TIME100 Impact Award for his contributions to the world of artificial intelligence. Ahead of the award ceremony in Dubai, LeCun sat down with TIME to discuss the barriers to achieving "artificial general intelligence" (AGI), the merits of Meta's open-source approach, and what he sees as the "preposterous" claim that AI could pose an existential risk to the human race. TIME spoke with LeCun on Jan. 26. This conversation has been condensed and edited for clarity. Many people in the tech world today believe that training large language models (LLMs) on more computing power and more data will lead to artificial general intelligence.

general intelligence, intelligence, meta, (13 more...)

TIME - Tech

Country:

Asia > Middle East > UAE > Dubai Emirate > Dubai (0.24)
North America > United States (0.15)

Genre: Personal (0.89)

Technology:

Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.72)