Goto

Collaborating Authors

 Personal


HalluCana: Fixing LLM Hallucination with A Canary Lookahead

arXiv.org Artificial Intelligence

In this paper, we present HalluCana, a canary lookahead to detect and correct factuality hallucinations of Large Language Models (LLMs) in long-form generation. HalluCana detects and intervenes as soon as traces of hallucination emerge, during and even before generation. To support timely detection, we exploit the internal factuality representation in the LLM hidden space, where we investigate various proxies to the LLMs' factuality self-assessment, and discuss its relation to the models' context familiarity from their pre-training. On biography generation, our method improves generation quality by up to 2.5x, while consuming over 6 times less compute.


My Words Imply Your Opinion: Reader Agent-Based Propagation Enhancement for Personalized Implicit Emotion Analysis

arXiv.org Artificial Intelligence

In implicit emotion analysis (IEA), the subtlety of emotional expressions makes it particularly sensitive to user-specific characteristics. Existing studies often inject personalization into the analysis by focusing on the authorial dimension of the emotional text. However, these methods overlook the potential influence of the intended reader on the reaction of implicit emotions. In this paper, we refine the IEA task to Personalized Implicit Emotion Analysis (PIEA) and introduce the RAPPIE model, a novel framework designed to address the issue of missing user information within this task. In particular, 1) we create reader agents based on the Large Language Model to simulate reader reactions, to address challenges of the spiral of silence and data incompleteness encountered when acquiring reader feedback information. 2) We establish a reader propagation role system and develop a role-aware emotion propagation multi-view graph learning model, which effectively deals with the sparsity of reader information by utilizing the distribution of propagation roles. 3) We annotate two Chinese PIEA datasets with detailed user metadata, thereby addressing the limitation of prior datasets that primarily focus on textual content annotation. Extensive experiments on these datasets indicate that the RAPPIE model outperforms current state-of-the-art baselines, highlighting the significance and efficacy of incorporating reader feedback into the PIEA process.


2025 will be the year Arm dominates PCs

PCWorld

Qualcomm's 2024 debut of new Arm processors for Windows laptops was arguably the most important PC hardware announcement since the introduction of Intel's 486 processors in 1989. Just as that CPU line heralded an age of Intel-driven x86 dominance, Qualcomm's Snapdragon X Elite chips have now taken us into a new era of competition. But 2024 was only the preview. Qualcomm's Snapdragon debut was limited, targeting a specific subset of premium, thin-and-light Windows laptops that don't require discrete graphics. I spoke with two expert analysts in the hardware space for insights on how Arm PCs will continue to grow going forward.


The 50 greatest innovations of 2024

Popular Science

In 1988, we launched the Best of What's New Awards. The original list highlighted "the very things that make our lives more comfortable, more rewarding, more exciting, and more fun," to quote then-Publisher Grant A. Burnett. Now, in 2024, we continue our decades-old tradition of honoring big ideas. We even see hints of our original honorees in this year's list: Sea-Doo and Ford made both lists, 36 years apart. We're proud to bring you promising innovations--from things that make life at home easier to literal out-of-this-world explorations. This is the Best of What's New 2024. Had you asked me at the beginning of 2024 what our best gadgets list would look like, I'd have guessed it would be filled with quirky AI-driven devices like the rabbit R1 or the Humane Ai Pin. "Now with AI" is a phrase that has dominated consumer electronics in the 2020s. These devices promised unadulterated access to the power of neural networks in ways that would seamlessly integrate into our lives without relying on phones or smart fridges. Then, the devices came out. The software is slow and buggy, and the hardware is clunky. Maybe the stand-alone AI device will still have its year, and we'll look back and chuckle at these humble beginnings. In reality, 2024's big breakthrough came from Apple in the form of its long-rumored Vision Pro headset. The device has its own hurdles to clear, but after just a few minutes of using it, it was clear that it's something different, important, and honestly pretty amazing. The list also includes Sony's innovative pro-grade camera, the most accessible drone we've ever used, and a no-fun phone--no fun in a good way, of course. Credible rumors of Apple's VR bounced around the gadget blogs and tech sites for nearly a decade. It was consumer tech's sasquatch in that people claimed to have seen it, but no one knew if it even existed. Then, the Vision Pro emerged from the proverbial forest in February with a surprising design and a massive 3,500 price tag. It also came toting a new R-series chip and a dedicated OS meant for spatial computing.


Efficient Task Grouping Through Samplewise Optimisation Landscape Analysis

arXiv.org Artificial Intelligence

Shared training approaches, such as multi-task learning (MTL) and gradient-based meta-learning, are widely used in various machine learning applications, but they often suffer from negative transfer, leading to performance degradation in specific tasks. While several optimisation techniques have been developed to mitigate this issue for pre-selected task cohorts, identifying optimal task combinations for joint learning - known as task grouping - remains underexplored and computationally challenging due to the exponential growth in task combinations and the need for extensive training and evaluation cycles. This paper introduces an efficient task grouping framework designed to reduce these overwhelming computational demands of the existing methods. The proposed framework infers pairwise task similarities through a sample-wise optimisation landscape analysis, eliminating the need for the shared model training required to infer task similarities in existing methods. With task similarities acquired, a graph-based clustering algorithm is employed to pinpoint near-optimal task groups, providing an approximate yet efficient and effective solution to the originally NP-hard problem. Empirical assessments conducted on 8 different datasets highlight the effectiveness of the proposed framework, revealing a five-fold speed enhancement compared to previous state-of-the-art methods. Moreover, the framework consistently demonstrates comparable performance, confirming its remarkable efficiency and effectiveness in task grouping.


Fuzzy Norm-Explicit Product Quantization for Recommender Systems

arXiv.org Artificial Intelligence

As the data resources grow, providing recommendations that best meet the demands has become a vital requirement in business and life to overcome the information overload problem. However, building a system suggesting relevant recommendations has always been a point of debate. One of the most cost-efficient techniques in terms of producing relevant recommendations at a low complexity is Product Quantization (PQ). PQ approaches have continued developing in recent years. This system's crucial challenge is improving product quantization performance in terms of recall measures without compromising its complexity. This makes the algorithm suitable for problems that require a greater number of potentially relevant items without disregarding others, at high-speed and low-cost to keep up with traffic. This is the case of online shops where the recommendations for the purpose are important, although customers can be susceptible to scoping other products. This research proposes a fuzzy approach to perform norm-based product quantization. Type-2 Fuzzy sets (T2FSs) define the codebook allowing sub-vectors (T2FSs) to be associated with more than one element of the codebook, and next, its norm calculus is resolved by means of integration. Our method finesses the recall measure up, making the algorithm suitable for problems that require querying at most possible potential relevant items without disregarding others. The proposed method outperforms all PQ approaches such as NEQ, PQ, and RQ up to +6%, +5%, and +8% by achieving a recall of 94%, 69%, 59% in Netflix, Audio, Cifar60k datasets, respectively. More and over, computing time and complexity nearly equals the most computationally efficient existing PQ method in the state-of-the-art.


Integrative Decoding: Improve Factuality via Implicit Self-consistency

arXiv.org Artificial Intelligence

Self-consistency-based approaches, which involve repeatedly sampling multiple outputs and selecting the most consistent one as the final response, prove to be remarkably effective in improving the factual accuracy of large language models. Nonetheless, existing methods usually have strict constraints on the task format, largely limiting their applicability. In this paper, we present Integrative Decoding (ID), to unlock the potential of self-consistency in open-ended generation tasks. ID operates by constructing a set of inputs, each prepended with a previously sampled response, and then processes them concurrently, with the next token being selected by aggregating of all their corresponding predictions at each decoding step. In essence, this simple approach implicitly incorporates self-consistency in the decoding objective. Extensive evaluation shows that ID consistently enhances factuality over a wide range of language models, with substantial improvements on the TruthfulQA (+11.2%), Biographies (+15.4%) and LongFact (+8.5%) benchmarks. The performance gains amplify progressively as the number of sampled responses increases, indicating the potential of ID to scale up with repeated sampling.


Ethnography and Machine Learning: Synergies and New Directions

arXiv.org Artificial Intelligence

Ethnography (social scientific methods that illuminate how people understand, navigate and shape the real world contexts in which they live their lives) and machine learning (computational techniques that use big data and statistical learning models to perform quantifiable tasks) are each core to contemporary social science. Yet these tools have remained largely separate in practice. This chapter draws on a growing body of scholarship that argues that ethnography and machine learning can be usefully combined, particularly for large comparative studies. Specifically, this paper (a) explains the value (and challenges) of using machine learning alongside qualitative field research for certain types of projects, (b) discusses recent methodological trends to this effect, (c) provides examples that illustrate workflow drawn from several large projects, and (d) concludes with a roadmap for enabling productive coevolution of field methods and machine learning.


Murdered health insurance boss Brian Thompson backed 'malicious' AI that denied 90% of patient coverage

Daily Mail - Science & tech

A controversial AI program used to deny elderly people health coverage is now at the center of questions about the shooting of the UnitedHealthcare CEO. Brian Thompson, 50 was gunned down Wednesday outside a Hilton in Midtown Manhattan in what police have described as a'brazen' and'targeted' attack. The killer is still on the loose and the motive is not yet known - but a former-FBI agent told Newsweek that he may have been denied health coverage. UnitedHealthcare became the largest denier of insurance plans in 2023, dismissing one in every three claims. It has now emerged that during the years before that, the company implemented AI software that had a 90 percent denial rate.


Are Frontier Large Language Models Suitable for Q&A in Science Centres?

arXiv.org Artificial Intelligence

This paper investigates the suitability of frontier Large Language Models (LLMs) for Q&A interactions in science centres, with the aim of boosting visitor engagement while maintaining factual accuracy. Using a dataset of questions collected from the National Space Centre in Leicester (UK), we evaluated responses generated by three leading models: OpenAI's GPT-4, Claude 3.5 Sonnet, and Google Gemini 1.5. Each model was prompted for both standard and creative responses tailored to an 8-year-old audience, and these responses were assessed by space science experts based on accuracy, engagement, clarity, novelty, and deviation from expected answers. The results revealed a trade-off between creativity and accuracy, with Claude outperforming GPT and Gemini in both maintaining clarity and engaging young audiences, even when asked to generate more creative responses. Nonetheless, experts observed that higher novelty was generally associated with reduced factual reliability across all models. This study highlights the potential of LLMs in educational settings, emphasizing the need for careful prompt engineering to balance engagement with scientific rigor.