Goto

Collaborating Authors

 Danbury


Long-form factuality in large language models Jerry Wei 1 Chengrun Y ang 1 Xinying Song 1 Yifeng Lu

Neural Information Processing Systems

To benchmark a model's long-form factuality in open domains, we first use GPT -4 to generate LongFact, a prompt set comprising thousands of questions spanning 38 topics. We then propose that LLM agents can be used as automated evaluators for long-form factuality through a method which we call Search-Augmented Factuality Evaluator (SAFE).



WavePulse: Real-time Content Analytics of Radio Livestreams

Mittal, Govind, Gupta, Sarthak, Wagle, Shruti, Chopra, Chirag, DeMattee, Anthony J, Memon, Nasir, Ahamad, Mustaque, Hegde, Chinmay

arXiv.org Artificial Intelligence

Radio remains a pervasive medium for mass information dissemination, with AM/FM stations reaching more Americans than either smartphone-based social networking or live television. Increasingly, radio broadcasts are also streamed online and accessed over the Internet. We present WavePulse, a framework that records, documents, and analyzes radio content in real-time. While our framework is generally applicable, we showcase the efficacy of WavePulse in a collaborative project with a team of political scientists focusing on the 2024 Presidential Elections. We use WavePulse to monitor livestreams of 396 news radio stations over a period of three months, processing close to 500,000 hours of audio streams. These streams were converted into time-stamped, diarized transcripts and analyzed to track answer key political science questions at both the national and state levels. Our analysis revealed how local issues interacted with national trends, providing insights into information flow. Our results demonstrate WavePulse's efficacy in capturing and analyzing content from radio livestreams sourced from the Web. Code and dataset can be accessed at \url{https://wave-pulse.io}.


Long-form factuality in large language models

Wei, Jerry, Yang, Chengrun, Song, Xinying, Lu, Yifeng, Hu, Nathan, Huang, Jie, Tran, Dustin, Peng, Daiyi, Liu, Ruibo, Huang, Da, Du, Cosmo, Le, Quoc V.

arXiv.org Artificial Intelligence

Large language models (LLMs) often generate content that contains factual errors when responding to fact-seeking prompts on open-ended topics. To benchmark a model's long-form factuality in open domains, we first use GPT-4 to generate LongFact, a prompt set comprising thousands of questions spanning 38 topics. We then propose that LLM agents can be used as automated evaluators for long-form factuality through a method which we call Search-Augmented Factuality Evaluator (SAFE). SAFE utilizes an LLM to break down a long-form response into a set of individual facts and to evaluate the accuracy of each fact using a multi-step reasoning process comprising sending search queries to Google Search and determining whether a fact is supported by the search results. Furthermore, we propose extending F1 score as an aggregated metric for long-form factuality. To do so, we balance the percentage of supported facts in a response (precision) with the percentage of provided facts relative to a hyperparameter representing a user's preferred response length (recall). Empirically, we demonstrate that LLM agents can outperform crowdsourced human annotators - on a set of ~16k individual facts, SAFE agrees with crowdsourced human annotators 72% of the time, and on a random subset of 100 disagreement cases, SAFE wins 76% of the time. At the same time, SAFE is more than 20 times cheaper than human annotators. We also benchmark thirteen language models on LongFact across four model families (Gemini, GPT, Claude, and PaLM-2), finding that larger language models generally achieve better long-form factuality. LongFact, SAFE, and all experimental code are available at https://github.com/google-deepmind/long-form-factuality.


Forecasting Post-Wildfire Vegetation Recovery in California using a Convolutional Long Short-Term Memory Tensor Regression Network

Liu, Jiahe, Wang, Xiaodi

arXiv.org Artificial Intelligence

The study of post-wildfire plant regrowth is essential for developing successful ecosystem recovery strategies. Prior research mainly examines key ecological and biogeographical factors influencing post-fire succession. This research proposes a novel approach for predicting and analyzing post-fire plant recovery. We develop a Convolutional Long Short-Term Memory Tensor Regression (ConvLSTMTR) network that predicts future Normalized Difference Vegetation Index (NDVI) based on short-term plant growth data after fire containment. The model is trained and tested on 104 major California wildfires occurring between 2013 and 2020, each with burn areas exceeding 3000 acres. The integration of ConvLSTM with tensor regression enables the calculation of an overall logistic growth rate k using predicted NDVI. Overall, our k-value predictions demonstrate impressive performance, with 50% of predictions exhibiting an absolute error of 0.12 or less, and 75% having an error of 0.24 or less. Finally, we employ Uniform Manifold Approximation and Projection (UMAP) and KNN clustering to identify recovery trends, offering insights into regions with varying rates of recovery. This study pioneers the combined use of tensor regression and ConvLSTM, and introduces the application of UMAP for clustering similar wildfires. This advances predictive ecological modeling and could inform future post-fire vegetation management strategies.


#cx_2022-01-16_16-58-09.xlsx

#artificialintelligence

The graph represents a network of 2,962 Twitter users whose tweets in the requested range contained "#cx", or who were replied to or mentioned in those tweets. The network was obtained from the NodeXL Graph Server on Monday, 17 January 2022 at 01:14 UTC. The requested start date was Sunday, 16 January 2022 at 01:01 UTC and the maximum number of days (going backward) was 14. The maximum number of tweets collected was 7,500. The tweets in the network were tweeted over the 3-day, 7-hour, 28-minute period from Wednesday, 12 January 2022 at 17:28 UTC to Sunday, 16 January 2022 at 00:56 UTC.


Differentially Private M-band Wavelet-Based Mechanisms in Machine Learning Environments

Choi, Kenneth, Lee, Tony

arXiv.org Machine Learning

In the post-industrial world, data science and analytics have gained paramount importance regarding digital data privacy. Improper methods of establishing privacy for accessible datasets can compromise large amounts of user data even if the adversary has a small amount of preliminary knowledge of a user. Many researchers have been developing high-level privacy-preserving mechanisms that also retain the statistical integrity of the data to apply to machine learning. Recent developments of differential privacy, such as the Laplace and Privelet mechanisms, drastically decrease the probability that an adversary can distinguish the elements in a data set and thus extract user information. In this paper, we develop three privacy-preserving mechanisms with the discrete M-band wavelet transform that embed noise into data. The first two methods (LS and LS+) add noise through a Laplace-Sigmoid distribution that multiplies Laplace-distributed values with the sigmoid function, and the third method utilizes pseudo-quantum steganography to embed noise into the data. We then show that our mechanisms successfully retain both differential privacy and learnability through statistical analysis in various machine learning environments.


Stock Forecasting using M-Band Wavelet-Based SVR and RNN-LSTMs Models

Nguyen, Hieu Quang, Rahimyar, Abdul Hasib, Wang, Xiaodi

arXiv.org Machine Learning

The task of predicting future stock values has always been one that is heavily desired albeit very difficult. This difficulty arises from stocks with non-stationary behavior, and without any explicit form. Hence, predictions are best made through analysis of financial stock data. To handle big data sets, current convention involves the use of the Moving Average. However, by utilizing the Wavelet Transform in place of the Moving Average to denoise stock signals, financial data can be smoothened and more accurately broken down. This newly transformed, denoised, and more stable stock data can be followed up by non-parametric statistical methods, such as Support Vector Regression (SVR) and Recurrent Neural Network (RNN) based Long Short-Term Memory (LSTM) networks to predict future stock prices. Through the implementation of these methods, one is left with a more accurate stock forecast, and in turn, increased profits.


10 Principles for Winning the Game of Digital Disruption

#artificialintelligence

A version of this article appeared in the Spring 2018 issue of strategy business. If you haven't noticed, a high-stakes global game of digital disruption is currently under way. It is fueled by the latest wave of technology: advances in artificial intelligence, data analytics, robotics, the Internet of Things, and new software-enabled industrial platforms that incorporate all these technologies and more. Every enterprise leader recognizes that, as a result, the prevailing business models in his or her industry could drastically and fundamentally change. A wide range of industries, such as entertainment and media, military contracting, and grocery retail have already been profoundly affected. No enterprise, including yours, can afford to ignore the threat. Yet most companies are still not moving fast enough to meet this change. Some leaders are still in denial about it, some are reluctant to upend the status quo in their companies, and some are unaware of the necessary steps to take. But these excuses are not good enough. If your company is aleady struggling, then digital disruption will accentuate your problems. You may not have needed a plan for the new digital age yet, if only because it didn't seem relevant to your industry. But you will need it now.


10 Principles for Winning the Game of Digital Disruption

#artificialintelligence

If you haven't already noticed, a high-stakes global game of digital disruption is currently under way. It is enabled by the latest wave of technology: advances in artificial intelligence, data analytics, robotics, the Internet of Things, and new software-enabled industrial platforms that incorporate all these technologies and more. Every enterprise leader recognizes that, as a result, the prevailing business models in his or her industry could drastically and fundamentally change. A wide range of industries, such as entertainment and media, military contracting, and grocery retail have been profoundly affected. No enterprise, including yours, can afford to ignore the threat. Yet most companies are still not moving fast enough to meet this change. Some leaders are still in denial about it, some are reluctant to upend the status quo in their companies, and some are unaware of the necessary steps to take. But these are not good enough excuses. If your company is currently struggling, then digital disruption will accentuate your problems. You may not have needed a plan for the new digital age yet, if only because it didn't seem relevant to your industry. But you will need it now. Otherwise, no matter how well you run your business, it will not produce results at a scale that will allow you to compete.