Goto

Collaborating Authors

 Personal


When Machine Unlearning Meets Retrieval-Augmented Generation (RAG): Keep Secret or Forget Knowledge?

arXiv.org Artificial Intelligence

The deployment of large language models (LLMs) like ChatGPT and Gemini has shown their powerful natural language generation capabilities. However, these models can inadvertently learn and retain sensitive information and harmful content during training, raising significant ethical and legal concerns. To address these issues, machine unlearning has been introduced as a potential solution. While existing unlearning methods take into account the specific characteristics of LLMs, they often suffer from high computational demands, limited applicability, or the risk of catastrophic forgetting. To address these limitations, we propose a lightweight unlearning framework based on Retrieval-Augmented Generation (RAG) technology. By modifying the external knowledge base of RAG, we simulate the effects of forgetting without directly interacting with the unlearned LLM. We approach the construction of unlearned knowledge as a constrained optimization problem, deriving two key components that underpin the effectiveness of RAG-based unlearning. This RAG-based approach is particularly effective for closed-source LLMs, where existing unlearning methods often fail. We evaluate our framework through extensive experiments on both open-source and closed-source models, including ChatGPT, Gemini, Llama-2-7b-chat-hf, and PaLM 2. The results demonstrate that our approach meets five key unlearning criteria: effectiveness, universality, harmlessness, simplicity, and robustness. Meanwhile, this approach can extend to multimodal large language models and LLM-based agents.


A humanoid robot's painting called 'AI God' may sell for over 120,000

Popular Science

A humanoid robot is slated to become first of its kind to have its artwork sold by a major auction house. On October 16, Sotheby's announced it will soon begin accepting bids starting at 120,000 for "AI God." The abstract portrait of Alan Turing was painted by Ai-Da, an ongoing, experimental AI-powered robotics project that cites a pivotal 1980's transhumanist feminist manifesto as its inspiration. The auction is scheduled to run from October 31st through November 7th. Completed in 2019 by gallerist Aidan Meller in collaboration with Oxford University researchers and the robotics company, Engineered Arts, Ai-Da uses cameras to capture visual inputs that onboard graphics algorithms then use to formulate generative images with some human guidance and adjustments.


Identifying Privacy Personas

arXiv.org Artificial Intelligence

Privacy personas capture the differences in user segments with respect to one's knowledge, behavioural patterns, level of self-efficacy, and perception of the importance of privacy protection. Modelling these differences is essential for appropriately choosing personalised communication about privacy (e.g. to increase literacy) and for defining suitable choices for privacy enhancing technologies (PETs). While various privacy personas have been derived in the literature, they group together people who differ from each other in terms of important attributes such as perceived or desired level of control, and motivation to use PET. To address this lack of granularity and comprehensiveness in describing personas, we propose eight personas that we derive by combining qualitative and quantitative analysis of the responses to an interactive educational questionnaire. We design an analysis pipeline that uses divisive hierarchical clustering and Boschloo's statistical test of homogeneity of proportions to ensure that the elicited clusters differ from each other based on a statistical measure. Additionally, we propose a new measure for calculating distances between questionnaire responses, that accounts for the type of the question (closed- vs open-ended) used to derive traits. We show that the proposed privacy personas statistically differ from each other. We statistically validate the proposed personas and also compare them with personas in the literature, showing that they provide a more granular and comprehensive understanding of user segments, which will allow to better assist users with their privacy needs.


Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation

arXiv.org Artificial Intelligence

In this paper, we introduce Janus, an autoregressive framework that unifies multimodal understanding and generation. Prior research often relies on a single visual encoder for both tasks, such as Chameleon. However, due to the differing levels of information granularity required by multimodal understanding and generation, this approach can lead to suboptimal performance, particularly in multimodal understanding. To address this issue, we decouple visual encoding into separate pathways, while still leveraging a single, unified transformer architecture for processing. The decoupling not only alleviates the conflict between the visual encoder's roles in understanding and generation, but also enhances the framework's flexibility. For instance, both the multimodal understanding and generation components can independently select their most suitable encoding methods. Experiments show that Janus surpasses previous unified model and matches or exceeds the performance of task-specific models. The simplicity, high flexibility, and effectiveness of Janus make it a strong candidate for next-generation unified multimodal models.


Jennifer Doudna on the Brave New World Being Ushered In by Gene Editing

The New Yorker

In 2012, the biochemist Jennifer Doudna and her colleague Emmanuelle Charpentier developed a method for using RNA-guided proteins to edit specific sections of DNA. Their innovation--for which the two won the Nobel Prize in Chemistry, in 2020--is known as the CRISPR-Cas9 gene-editing system. CRISPR has since been used to alter plants (to, for instance, produce greater yields), insects (preventing them from carrying certain diseases), and people (to treat sickle-cell disease). The technology's promise can sound as if derived from science fiction: it might help us adapt to a radically different climate, or grow organs for those in need, or reprogram a cancer patient's own cells to target tumors. But there are also worries about its possible side effects, both biological and social.


TechScape: Elon Musk is stumping hard for Donald Trump

The Guardian

Thank you for joining me. Elon Musk is stumping hard for Donald Trump. The Tesla and SpaceX CEO has funded a pro-Trump political action committee with tens of millions of dollars and planned a packed campaign schedule to boost the former president in Pennsylvania. He speaks to Trump multiple times per week and has urged other billionaires to endorse the Republican candidate en masse in private gatherings, according to the New York Times. Taken together, Musk's actions amount to something unprecedented in modern times โ€“ a man who is both the richest in the world and owner of an influential means of mass communication throwing all his weight behind a political candidate.


A data bottleneck is holding AI science back, says new Nobel winner

MIT Technology Review

AI has been a gamechanger for biochemists like Baker. Seeing what DeepMind was able to do with AlphaFold made it clear that deep learning was going to be a powerful tool for their work. "There's just all these problems that were really hard before that we are now having much more success with thanks to generative AI methods. We can do much more complicated things," Baker says. Baker is already busy at work.


Facing Identity: The Formation and Performance of Identity via Face-Based Artificial Intelligence Technologies

arXiv.org Artificial Intelligence

How is identity constructed and performed in the digital via face-based artificial intelligence technologies? While questions of identity on the textual Internet have been thoroughly explored, the Internet has progressed to a multimedia form that not only centers the visual, but specifically the face. At the same time, a wealth of scholarship has and continues to center the topics of surveillance and control through facial recognition technologies (FRTs), which have extended the logics of the racist pseudoscience of physiognomy. Much less work has been devoted to understanding how such face-based artificial intelligence technologies have influenced the formation and performance of identity. This literature review considers how such technologies interact with faciality, which entails the construction of what a face may represent or signify, along axes of identity such as race, gender, and sexuality. In grappling with recent advances in AI such as image generation and deepfakes, I propose that we are now in an era of "post-facial" technologies that build off our existing culture of facility while eschewing the analog face, complicating our relationship with identity vis-รก-vis the face. Drawing from previous frameworks of identity play in the digital, as well as trans practices that have historically played with or transgressed the boundaries of identity classification, we can develop concepts adequate for analyzing digital faciality and identity given the current landscape of post-facial artificial intelligence technologies that allow users to interface with the digital in an entirely novel manner. To ground this framework of transgression, I conclude by proposing an interview study with VTubers -- online streamers who perform using motion-captured avatars instead of their real-life faces -- to gain qualitative insight on the experience and perceptions of users of post-facial technologies and how these sociotechnical experiences interface with our relationships with identity and the digital anew.


OMCAT: Omni Context Aware Transformer

arXiv.org Artificial Intelligence

Large Language Models (LLMs) have made significant strides in text generation and comprehension, with recent advancements extending into multimodal LLMs that integrate visual and audio inputs. However, these models continue to struggle with fine-grained, cross-modal temporal understanding, particularly when correlating events across audio and video streams. We address these challenges with two key contributions: a new dataset and model, called OCTAV and OMCAT respectively. OCTAV (Omni Context and Temporal Audio Video) is a novel dataset designed to capture event transitions across audio and video. Second, OMCAT (Omni Context Aware Transformer) is a powerful model that leverages RoTE (Rotary Time Embeddings), an innovative extension of RoPE, to enhance temporal grounding and computational efficiency in time-anchored tasks. Our model demonstrates state-of-the-art performance on Audio-Visual Question Answering (AVQA) tasks and the OCTAV benchmark, showcasing significant gains in temporal reasoning and cross-modal alignment, as validated through comprehensive experiments and ablation studies. Our dataset and code will be made publicly available. The link to our demo page is https://om-cat.github.io. Okay, so the sound of children playing There is a sound of children playing from There are two sounds of children from from 6 to 7 seconds. After this sound, 16 to 17 seconds and from 17 to 25 playing, one from 6 to 7 seconds and from 7 to 16 seconds, a man is talking while seconds, the man is holding a shovel the other one from 16 to 17 seconds. Which one are you referring to? Figure 1: Illustration of a video sequence from our proposed OCTAV dataset. Large language models (LLMs) (Achiam et al., 2023; Touvron et al., 2023) have achieved remarkable breakthroughs in both text generation and comprehension (McKeown, 1992; Achiam et al., 2023) tasks. Since then, significant progress has been made to extend LLMs to multimodal LLMs (Cheng et al., 2024; Li et al., 2023b; Maaz et al., 2023; Li et al., 2024), which integrate visual and audio inputs with textual instructions to provide understanding in multimodal contexts (Yang et al., 2022b; Chen et al., 2023a;b). In this paper, we address these limitations by proposing a new dataset OCTAV and a model called OMCAT. The Omni Context and Temporal Audio Video dataset, OCTAV, consists of question-answer pairs for a video. The Omni Context Aware Transformer, OMCAT, addresses the limitations of existing models (Maaz et al., 2023; Tang et al., 2024; Su et al., 2023; Cheng et al., 2024) through a unified audio and visual language model by effectively incorporating time representations to ground the modalities temporally. However, these models still face challenges in handling fine-grained, cross-modal temporal understanding when both audio and video are provided.


MIND: Math Informed syNthetic Dialogues for Pretraining LLMs

arXiv.org Artificial Intelligence

The utility of synthetic data to enhance pretraining data quality and hence to improve downstream task accuracy has been widely explored in recent large language models (LLMs). Yet, these approaches fall inadequate in complex, multi-hop and mathematical reasoning tasks as the synthetic data typically fails to add complementary knowledge to the existing raw corpus. In this work, we propose a novel large-scale and diverse Math Informed syNthetic Dialogue (MIND) generation method that improves the mathematical reasoning ability of LLMs. Specifically, using MIND, we generate synthetic conversations based on OpenWebMath (OWM), resulting in a new math corpus, MIND-OWM. Our experiments with different conversational settings reveal that incorporating knowledge gaps between dialog participants is essential for generating high-quality math data. We further identify an effective way to format and integrate synthetic and raw data during pretraining to maximize the gain in mathematical reasoning, emphasizing the need to restructure raw data rather than use it as-is. Compared to pretraining just on raw data, a model pretrained on MIND-OWM shows significant boost in mathematical reasoning (GSM8K: +13.42%, MATH: +2.30%), including superior performance in specialized knowledge (MMLU: +4.55%, MMLU-STEM: +4.28%) and general purpose reasoning tasks (GENERAL REASONING: +2.51%).