Goto

Collaborating Authors

 Pacific Ocean


ARES: An Automated Evaluation Framework for Retrieval-Augmented Generation Systems

arXiv.org Artificial Intelligence

Evaluating retrieval-augmented generation (RAG) systems traditionally relies on hand annotations for input queries, passages to retrieve, and responses to generate. We introduce ARES, an Automated RAG Evaluation System, for evaluating RAG systems along the dimensions of context relevance, answer faithfulness, and answer relevance. Using synthetic training data, ARES finetunes lightweight LM judges to assess the quality of individual RAG components. To mitigate potential prediction errors, ARES utilizes a small set of human-annotated datapoints for prediction-powered inference (PPI). Across six different knowledge-intensive tasks in KILT and SuperGLUE, ARES accurately evaluates RAG systems while using a few hundred human annotations during evaluation. Furthermore, ARES judges remain effective across domain shifts, proving accurate even after changing the type of queries and/or documents used in the evaluated RAG systems. We make our datasets and code for replication and deployment available at https://github.com/stanford-futuredata/ARES.


Alternatives to the Scaled Dot Product for Attention in the Transformer Neural Network Architecture

arXiv.org Artificial Intelligence

The transformer neural network architecture uses a form of attention in which the dot product of query and key is divided by the square root of the key dimension before applying softmax. This scaling of the dot product is designed to avoid the absolute value of the dot products becoming so large that applying softmax leads to vanishing gradients. In this paper, we propose some alternative scalings, including dividing the dot product instead by the sum of the key lengths before applying softmax. We use simulated keys and queries to show that in many situations this appears to be more effective at avoiding regions where applying softmax leads to vanishing gradients. Attention plays a prominent role in the transformer neural network architecture, as indicated by the title of the landmark paper introducing the architecture, "Attention Is All You Need" [1], by Vaswani et al.


When does In-context Learning Fall Short and Why? A Study on Specification-Heavy Tasks

arXiv.org Artificial Intelligence

In-context learning (ICL) has become the default method for using large language models (LLMs), making the exploration of its limitations and understanding the underlying causes crucial. In this paper, we find that ICL falls short of handling specification-heavy tasks, which are tasks with complicated and extensive task specifications, requiring several hours for ordinary humans to master, such as traditional information extraction tasks. The performance of ICL on these tasks mostly cannot reach half of the state-of-the-art results. To explore the reasons behind this failure, we conduct comprehensive experiments on 18 specification-heavy tasks with various LLMs and identify three primary reasons: inability to specifically understand context, misalignment in task schema comprehension with humans, and inadequate long-text understanding ability. Furthermore, we demonstrate that through fine-tuning, LLMs can achieve decent performance on these tasks, indicating that the failure of ICL is not an inherent flaw of LLMs, but rather a drawback of existing alignment methods that renders LLMs incapable of handling complicated specification-heavy tasks via ICL. To substantiate this, we perform dedicated instruction tuning on LLMs for these tasks and observe a notable improvement. We hope the analyses in this paper could facilitate advancements in alignment methods enabling LLMs to meet more sophisticated human demands.


Estimating Appearance Models for Image Segmentation via Tensor Factorization

arXiv.org Machine Learning

Image Segmentation is one of the core tasks in Computer Vision and solving it often depends on modeling the image appearance data via the color distributions of each it its constituent regions. Whereas many segmentation algorithms handle the appearance models dependence using alternation or implicit methods, we propose here a new approach to directly estimate them from the image without prior information on the underlying segmentation. Our method uses local high order color statistics from the image as an input to tensor factorization-based estimator for latent variable models. This approach is able to estimate models in multiregion images and automatically output the regions proportions without prior user interaction, overcoming the drawbacks from a prior attempt to this problem. We also demonstrate the performance of our proposed method in many challenging synthetic and real imaging scenarios and show that it leads to an efficient segmentation algorithm.


Biden and Xi look to put floor under plummeting U.S.-China ties

The Japan Times

Nearly a year to the date since their last meeting, U.S. President Joe Biden and Chinese leader Xi Jinping will sit down Wednesday in the San Francisco Bay Area to try and put a floor under ties that have plummeted to fresh lows in recent months. When Biden and Xi meet on the sidelines of the Asia-Pacific Economic Cooperation forum in San Francisco, both will have a laundry list of concerns to discuss. From military-to-military lines of communication, Taiwan, and the South and East China Seas to tough U.S. semiconductor export controls, the manufacture and export of fentanyl, and artificial intelligence threats -- all will be on the table during several hours of discussions. But don't expect the talks -- the pair's seventh interaction since the start of the Biden administration but just the second in-person meeting -- to yield any dramatic breakthroughs.


Improving Zero-shot Reader by Reducing Distractions from Irrelevant Documents in Open-Domain Question Answering

arXiv.org Artificial Intelligence

Large language models (LLMs) enable zero-shot approaches in open-domain question answering (ODQA), yet with limited advancements as the reader is compared to the retriever. This study aims at the feasibility of a zero-shot reader that addresses the challenges of computational cost and the need for labeled data. We find that LLMs are distracted due to irrelevant documents in the retrieved set and the overconfidence of the generated answers when they are exploited as zero-shot readers. To tackle these problems, we mitigate the impact of such documents via Distraction-aware Answer Selection (DAS) with a negation-based instruction and score adjustment for proper answer selection. Experimental results show that our approach successfully handles distraction across diverse scenarios, enhancing the performance of zero-shot readers. Furthermore, unlike supervised readers struggling with unseen data, zero-shot readers demonstrate outstanding transferability without any training.


The US Wants China to Start Talking About AI Weapons

WIRED

When US President Joe Biden meets with his Chinese counterpart Xi Jinping in the San Francisco Bay Area this week, the pair will have a long list of matters to discuss, including the Israel-Hamas war and Russia's ongoing invasion of Ukraine. Behind the scenes at the APEC summit, however, US officials hope to strike up a dialogue with China about placing guardrails around military use of artificial intelligence, with the ultimate goal of lessening the potential risks that rapid adoption--and reckless use--of the technology might bring. "We have a collective interest in reducing the potential risks from the deployment of unreliable AI applications," because of risks of unintended escalation, says a senior State Department official familiar with recent efforts to broach the issue, who spoke on condition of anonymity. "We very much hope to have a further conversation with China on this issue." Biden's meeting with Xi this week may provide momentum for more military dialogue.


Consistency Analysis of ChatGPT

arXiv.org Artificial Intelligence

ChatGPT has gained a huge popularity since its introduction. Its positive aspects have been reported through many media platforms, and some analyses even showed that ChatGPT achieved a decent grade in professional exams, adding extra support to the claim that AI can now assist and even replace humans in industrial fields. Others, however, doubt its reliability and trustworthiness. This paper investigates the trustworthiness of ChatGPT and GPT-4 regarding logically consistent behaviour, focusing specifically on semantic consistency and the properties of negation, symmetric, and transitive consistency. Our findings suggest that while both models appear to show an enhanced language understanding and reasoning ability, they still frequently fall short of generating logically consistent predictions. We also ascertain via experiments that prompt designing, few-shot learning and employing larger large language models (LLMs) are unlikely to be the ultimate solution to resolve the inconsistency issue of LLMs.


DynaConF: Dynamic Forecasting of Non-Stationary Time-Series

arXiv.org Machine Learning

Deep learning has shown impressive results in a variety of time series forecasting tasks, where modeling the conditional distribution of the future given the past is the essence. However, when this conditional distribution is non-stationary, it poses challenges for these models to learn consistently and to predict accurately. In this work, we propose a new method to model non-stationary conditional distributions over time by clearly decoupling stationary conditional distribution modeling from non-stationary dynamics modeling. Our method is based on a Bayesian dynamic model that can adapt to conditional distribution changes and a deep conditional distribution model that handles multivariate time series using a factorized output space. Our experimental results on synthetic and real-world datasets show that our model can adapt to non-stationary time series better than state-of-the-art deep learning solutions.


Abandoned America: AI images what famous US cities would look like after 100 years - if they were deserted by humans

Daily Mail - Science & tech

What would American cities look like 100 years after human beings have left, with the streets devoid of human life - and beginning to be reclaimed by nature? While the chatbot put our future world in text, the AI photo generator Midjourney painted pictures of these abandoned metropolises, showing the concrete jungles transforming into jungles. Kieron Connolly, author of Abandoned Places and Abandoned Civilizations, says that visions of abandoned cities have a unique power. This isn't what city life is supposed to look like. Nature is allowed to reclaim the land,' Connolly said. ChatGPT writes, 'In the year 2123, the once-thriving metropolis of Chicago stands as a haunting testament to the passage of time and the resilience of nature.