Goto

Collaborating Authors

 Generative AI


Self-Correcting Decoding with Generative Feedback for Mitigating Hallucinations in Large Vision-Language Models

arXiv.org Artificial Intelligence

While recent Large Vision-Language Models (LVLMs) have shown remarkable performance in multi-modal tasks, they are prone to generating hallucinatory text responses that do not align with the given visual input, which restricts their practical applicability in real-world scenarios. In this work, inspired by the observation that the text-to-image generation process is the inverse of image-conditioned response generation in LVLMs, we explore the potential of leveraging text-to-image generative models to assist in mitigating hallucinations in LVLMs. We discover that generative models can offer valuable self-feedback for mitigating hallucinations at both the response and token levels. Building on this insight, we introduce self-correcting Decoding with Generative Feedback (DeGF), a novel training-free algorithm that incorporates feedback from text-to-image generative models into the decoding process to effectively mitigate hallucinations in LVLMs. Specifically, DeGF generates an image from the initial response produced by LVLMs, which acts as an auxiliary visual reference and provides self-feedback to verify and correct the initial response through complementary or contrastive decoding. Extensive experimental results validate the effectiveness of our approach in mitigating diverse types of hallucinations, consistently surpassing state-of-the-art methods across six benchmarks. Code is available at https://github.com/zhangce01/DeGF.


Big Tech whistleblower's parents sue, sounding alarm over son's unexpected death

FOX News

If you or someone you know is having thoughts of suicide, please contact the Suicide & Crisis Lifeline at 988 or 1-800-273-TALK (8255). The parents of a young California tech whistleblower whose 2024 death was ruled a suicide are now suing the City and County of San Francisco, alleging they violated public records laws by refusing to fulfill their requests for information about their son's death. Suchir Balaji, 26, was an employee at OpenAI, the artificial intelligence company behind ChatGPT, at the time of his Nov. 26, 2024, death. A San Francisco County medical examiner concluded the next day he died from a self-inflicted gunshot wound inside his apartment. "In the two-plus months since their son's passing, Petitioners and their counsel have been stymied at every turn as they have sought more information about the cause of and circumstances surrounding Suchir's tragic death. This petition, they hope, is the beginning of the end of that obstruction," the lawsuit states.


20 million OpenAI users hacked? Here's how to stay safe, just in case

PCWorld

Have you ever tried ChatGPT? You may want to take a quick moment to freshen up your account's security. A Russian hacker is claiming to have login data for over 20 million OpenAI users--and the information includes email addresses and passwords. On Friday, samples of OpenAI logins emerged on the dark web, along with an offer to sell the full trove of data. Currently, OpenAI says it has not yet found evidence of compromised systems (as per The Independent).


Neural Genetic Search in Discrete Spaces

arXiv.org Artificial Intelligence

Effective search methods are crucial for improving the performance of deep generative models at test time. In this paper, we introduce a novel test-time search method, Neural Genetic Search (NGS), which incorporates the evolutionary mechanism of genetic algorithms into the generation procedure of deep models. The core idea behind NGS is its crossover, which is defined as parent-conditioned generation using trained generative models. This approach offers a versatile and easy-to-implement search algorithm for deep generative models. We demonstrate the effectiveness and flexibility of NGS through experiments across three distinct domains: routing problems, adversarial prompt generation for language models, and molecular design.


2025: The Year of the AI App

WIRED

What a great idea I had for the first Plaintext of 2025. After following the frantic competition between OpenAI, Google, Meta, and Anthropic to churn out brainier and deeper "frontier" foundation models, I settled on a thesis about what's ahead: In the new year, those mighty trailblazers will consume billions of dollars, countless gigawatts, and all the silicon Nvidia can muster in their pursuit of AGI. We'll be bombarded by press releases boasting advanced reasoning, more tokens, and maybe even assurances that their models won't make up crazy facts. But people are tired of hearing about how AI is transformational and seeing few transformations to their day-to-day existence. Getting an AI summary of Google search results or having Facebook ask if you want to pose a follow-up question on a post doesn't make you a traveler to the neo-human future.


Review for NeurIPS paper: Accelerating Reinforcement Learning through GPU Atari Emulation

Neural Information Processing Systems

Weaknesses: My main concern is that results seem to be contradictory to what the authors claimed as the benefit of leveraging GPU accelerations. Specifically, in the "impact statement" the authors described CuLE can "provide access to an accelerated training environment to researchers with limited computational capabilities," but the results show the acceleration won't take into effect unless you use more computation---Figure 2, CuLE runs slower than OpenAI when using a fewer number of environments. If someone can only afford to run 100 environments, would this mean CuLE is not useful here? The limitation of the memory has been noted in the paper which is good. I was confused when looking at Table 3. First, why is there no 120 envs experiment for CuLE?


BF-GAN: Development of an AI-driven Bubbly Flow Image Generation Model Using Generative Adversarial Networks

arXiv.org Artificial Intelligence

A generative AI architecture called bubbly flow generative adversarial networks (BF-GAN) is developed, designed to generate realistic and high-quality bubbly flow images through physically conditioned inputs, jg and jf. Initially, 52 sets of bubbly flow experiments under varying conditions are conducted to collect 140,000 bubbly flow images with physical labels of jg and jf for training data. A multi-scale loss function is then developed, incorporating mismatch loss and pixel loss to enhance the generative performance of BF-GAN further. Regarding evaluative metrics of generative AI, the BF-GAN has surpassed conventional GAN. Physically, key parameters of bubbly flow generated by BF-GAN are extracted and compared with measurement values and empirical correlations, validating BF-GAN's generative performance. The comparative analysis demonstrate that the BF-GAN can generate realistic and high-quality bubbly flow images with any given jg and jf within the research scope. BF-GAN offers a generative AI solution for two-phase flow research, substantially lowering the time and cost required to obtain high-quality data. In addition, it can function as a benchmark dataset generator for bubbly flow detection and segmentation algorithms, enhancing overall productivity in this research domain. The BF-GAN model is available online (https://github.com/zhouzhouwen/BF-GAN).


An Annotated Reading of 'The Singer of Tales' in the LLM Era

arXiv.org Artificial Intelligence

The Parry-Lord oral-formulaic theory was a breakthrough in understanding how oral narrative poetry is learned, composed, and transmitted by illiterate bards. In this paper, we provide an annotated reading of the mechanism underlying this theory from the lens of large language models (LLMs) and generative artificial intelligence (AI). We point out the the similarities and differences between oral composition and LLM generation, and comment on the implications to society and AI policy.


Brief analysis of DeepSeek R1 and its implications for Generative AI

arXiv.org Artificial Intelligence

The relatively short history of Generative AI has been punctuated with big steps forward in model capability. This happened again over the last few weeks triggered by a couple of papers released by a Chinese company DeepSeek [1]. In late December they released DeepSeek-V3 [2] a direct competitor to OpenAI's GPT4o, apparently trained in two months, for approximately $5.6 million [3, 4], which equates to 1/50th of the costs of other comparable models [5]. On the 20th of January they released DeepSeek-R1 [6] a set of reasoning models, containing "numerous powerful and intriguing reasoning behaviours" [6], achieving comparable performance to OpenAI's o1 model - and they are open for researchers to examine [7]. This openness is a welcome move for many AI researchers keen to understand more about the models they are using. It should be noted that these models are released as'open weights' meaning the model can be built upon, and freely used (under the MIT license), but without the training data it's not truly open source. However, more details than usual were shared about the training process in the associated documentation.


Deep Generative model that uses physical quantities to generate and retrieve solar magnetic active regions

arXiv.org Machine Learning

Deep generative models have shown immense potential in generating unseen data that has properties of real data. These models learn complex data-generating distributions starting from a smaller set of latent dimensions. However, generative models have encountered great skepticism in scientific domains due to the disconnection between generative latent vectors and scientifically relevant quantities. In this study, we integrate three types of machine learning models to generate solar magnetic patches in a physically interpretable manner and use those as a query to find matching patches in real observations. We use the magnetic field measurements from Space-weather HMI Active Region Patches (SHARPs) to train a Generative Adversarial Network (GAN). We connect the physical properties of GAN-generated images with their latent vectors to train Support Vector Machines (SVMs) that do mapping between physical and latent spaces. These produce directions in the GAN latent space along which known physical parameters of the SHARPs change. We train a self-supervised learner (SSL) to make queries with generated images and find matches from real data. We find that the GAN-SVM combination enables users to produce high-quality patches that change smoothly only with a prescribed physical quantity, making generative models physically interpretable. We also show that GAN outputs can be used to retrieve real data that shares the same physical properties as the generated query. This elevates Generative Artificial Intelligence (AI) from a means-to-produce artificial data to a novel tool for scientific data interrogation, supporting its applicability beyond the domain of heliophysics.