Generative AI
OpenAI announces surprise 'Deep Research' stream tonight
OpenAI announced on X that it's hosting a livestream from Tokyo tonight, offering no more context beyond, "Deep Research." You can watch it on YouTube below. Just a few days ago, OpenAI released its new reasoning model, o3-mini. The company says it produces "more accurate and clearer answers, with stronger reasoning abilities" than its predecessor, and "works with search to find up-to-date answers with links to relevant web sources." CEO Sam Altman and other members of the OpenAI team held an AMA on Reddit on Friday to talk about it.
The AI business model is built on hype. That's the real reason the tech bros fear DeepSeek Kenan Malik
No, it was not a "Sputnik moment". The launch last month of DeepSeek R1, the Chinese generative AI or chatbot, created mayhem in the tech world, with stocks plummeting and much chatter about the US losing its supremacy in AI technology. Yet, for all the disruption, the Sputnik analogy reveals less about DeepSeek than about American neuroses. The original Sputnik moment came on 4 October 1957 when the Soviet Union shocked the world by launching Sputnik 1, the first time humanity had sent a satellite into orbit. It was, to anachronistically borrow a phrase from a later and even more momentous landmark, "one giant leap for mankind", in Neil Armstrong's historic words as he took a "small step" on to the surface of the moon.
Secure & Personalized Music-to-Video Generation via CHARCHA
Agarwal, Mehul, Agarwal, Gauri, Benoit, Santiago, Lippman, Andrew, Oh, Jean
Music is a deeply personal experience and our aim is to enhance this with a fullyautomated pipeline for personalized music video generation. Our work allows listeners to not just be consumers but co-creators in the music video generation process by creating personalized, consistent and context-driven visuals based on lyrics, rhythm and emotion in the music. The pipeline combines multimodal translation and generation techniques and utilizes low-rank adaptation on listeners' images to create immersive music videos that reflect both the music and the individual. To ensure the ethical use of users' identity, we also introduce CHARCHA, a facial identity verification protocol that protects people against unauthorized use of their face while at the same time collecting authorized images from users for personalizing their videos. This paper thus provides a secure and innovative framework for creating deeply personalized music videos. Figure 1: Image stills and lyrics from generated music videos for Rick Astley's "Never Gonna Give You Up," with character reference from CHARCHA. The videos use Queratogray Sketch[1], Western Animation Diffusion[2], and Realistic Vision V5.1[3] checkpoint models .
Guidance Source Matters: How Guidance from AI, Expert, or a Group of Analysts Impacts Visual Data Preparation and Analysis
Narechania, Arpit, Endert, Alex, Sinha, Atanu R
The progress in generative AI has fueled AI-powered tools like co-pilots and assistants to provision better guidance, particularly during data analysis. However, research on guidance has not yet examined the perceived efficacy of the source from which guidance is offered and the impact of this source on the user's perception and usage of guidance. We ask whether users perceive all guidance sources as equal, with particular interest in three sources: (i) AI, (ii) human expert, and (iii) a group of human analysts. As a benchmark, we consider a fourth source, (iv) unattributed guidance, where guidance is provided without attribution to any source, enabling isolation of and comparison with the effects of source-specific guidance. We design a five-condition between-subjects study, with one condition for each of the four guidance sources and an additional (v) no-guidance condition, which serves as a baseline to evaluate the influence of any kind of guidance. We situate our study in a custom data preparation and analysis tool wherein we task users to select relevant attributes from an unfamiliar dataset to inform a business report. Depending on the assigned condition, users can request guidance, which the system then provides in the form of attribute suggestions. To ensure internal validity, we control for the quality of guidance across source-conditions. Through several metrics of usage and perception, we statistically test five preregistered hypotheses and report on additional analysis. We find that the source of guidance matters to users, but not in a manner that matches received wisdom. For instance, users utilize guidance differently at various stages of analysis, including expressing varying levels of regret, despite receiving guidance of similar quality. Notably, users in the AI condition reported both higher post-task benefit and regret.
ChartCitor: Multi-Agent Framework for Fine-Grained Chart Visual Attribution
Goswami, Kanika, Mathur, Puneet, Rossi, Ryan, Dernoncourt, Franck
Large Language Models (LLMs) can perform chart question-answering tasks but often generate unverified hallucinated responses. Existing answer attribution methods struggle to ground responses in source charts due to limited visual-semantic context, complex visual-text alignment requirements, and difficulties in bounding box prediction across complex layouts. We present ChartCitor, a multi-agent framework that provides fine-grained bounding box citations by identifying supporting evidence within chart images. The system orchestrates LLM agents to perform chart-to-table extraction, answer reformulation, table augmentation, evidence retrieval through pre-filtering and re-ranking, and table-to-chart mapping. ChartCitor outperforms existing baselines across different chart types. Qualitative user studies show that ChartCitor helps increase user trust in Generative AI by providing enhanced explainability for LLM-assisted chart QA and enables professionals to be more productive.
AI is not just powerful. What's really worrying is that DeepSeek has made it cheap, too John Naughton
Nothing cheers up a tech columnist more than the sight of 600bn being wiped off the market cap of an overvalued tech giant in a single day. And yet last Monday that's what happened to Nvidia, the leading maker of electronic picks and shovels for the AI gold rush. It was the biggest one-day slump for any company in history, and it was not alone โ shares of companies in semiconductor, power and infrastructure industries exposed to AI collectively shed more than 1tn in value on the same day. The proximate cause of this chaos was the news that a Chinese tech startup of whom few had hitherto heard had released DeepSeek R1, a powerful AI assistant that was much cheaper to train and operate than the dominant models of the US tech giants โ and yet was comparable in competence to OpenAI's o1 "reasoning" model. Just to illustrate the difference: R1 was said to have cost only 5.58m to build, which is small change compared with the billions that OpenAI and co have spent on their models; and R1 is about 15 times more efficient (in terms of resource use) than anything comparable made by Meta. The DeepSeek app immediately zoomed to the top of the Apple app store, where it attracted huge numbers of users who were clearly unfazed by the fact that the terms and conditions and the privacy policy they needed to accept were in Chinese.
Strengthening Generative Robot Policies through Predictive World Modeling
Qi, Han, Yin, Haocheng, Du, Yilun, Yang, Heng
We present generative predictive control (GPC), a learning control framework that (i) clones a generative diffusion-based policy from expert demonstrations, (ii) trains a predictive action-conditioned world model from both expert demonstrations and random explorations, and (iii) synthesizes an online planner that ranks and optimizes the action proposals from (i) by looking ahead into the future using the world model from (ii). Crucially, we show that conditional video diffusion allows learning (near) physics-accurate visual world models and enable robust visual foresight. Focusing on planar pushing with rich contact and collision, we show GPC dominates behavior cloning across state-based and vision-based, simulated and real-world experiments.
Lessons for GenAI Literacy From a Field Study of Human-GenAI Augmentation in the Workplace
Johri, Aditya, Schleiss, Johannes, Ranade, Nupoor
Generative artificial intelligence (GenAI) is increasingly becoming a part of work practices across the technology industry and being used across a range of industries. This has necessitated the need to better understand how GenAI is being used by professionals in the field so that we can better prepare students for the workforce. An improved understanding of the use of GenAI in practice can help provide guidance on the design of GenAI literacy efforts including how to integrate it within courses and curriculum, what aspects of GenAI to teach, and even how to teach it. This paper presents a field study that compares the use of GenAI across three different functions - product development, software engineering, and digital content creation - to identify how GenAI is currently being used in the industry. This study takes a human augmentation approach with a focus on human cognition and addresses three research questions: how is GenAI augmenting work practices; what knowledge is important and how are workers learning; and what are the implications for training the future workforce. Findings show a wide variance in the use of GenAI and in the level of computing knowledge of users. In some industries GenAI is being used in a highly technical manner with deployment of fine-tuned models across domains. Whereas in others, only off-the-shelf applications are being used for generating content. This means that the need for what to know about GenAI varies, and so does the background knowledge needed to utilize it. For the purposes of teaching and learning, our findings indicated that different levels of GenAI understanding needs to be integrated into courses. From a faculty perspective, the work has implications for training faculty so that they are aware of the advances and how students are possibly, as early adopters, already using GenAI to augment their learning practices.
Semantic Communication based on Generative AI: A New Approach to Image Compression and Edge Optimization
As digital technologies advance, communication networks face challenges in handling the vast data generated by intelligent devices. Autonomous vehicles, smart sensors, and IoT systems necessitate new paradigms. This thesis addresses these challenges by integrating semantic communication and generative models for optimized image compression and edge network resource allocation. Unlike bit-centric systems, semantic communication prioritizes transmitting meaningful data specifically selected to convey the meaning rather than obtain a faithful representation of the original data. The communication infrastructure can benefit to significant improvements in bandwidth efficiency and latency reduction. Central to this work is the design of semantic-preserving image compression using Generative Adversarial Networks and Denoising Diffusion Probabilistic Models. These models compress images by encoding only semantically relevant features, allowing for high-quality reconstruction with minimal transmission. Additionally, a Goal-Oriented edge network optimization framework is introduced, leveraging the Information Bottleneck principle and stochastic optimization to dynamically allocate resources and enhance efficiency. By integrating semantic communication into edge networks, this approach balances computational efficiency and communication effectiveness, making it suitable for real-time applications. The thesis compares semantic-aware models with conventional image compression techniques using classical and semantic evaluation metrics. Results demonstrate the potential of combining generative AI and semantic communication to create more efficient semantic-goal-oriented communication networks that meet the demands of modern data-driven applications.
Ethics of generative AI and manipulation: a design-oriented research agenda
Generative AI enables automated, effective manipulation at scale. Despite the growing general ethical discussion around generative AI, the specific manipulation risks remain inadequately investigated. This article outlines essential inquiries encompassing conceptual, empirical, and design dimensions of manipulation, pivotal for comprehending and curbing manipulation risks. By highlighting these questions, the article underscores the necessity of an appropriate conceptualisation of manipulation to ensure the responsible development of Generative AI technologies.