Goto

Collaborating Authors

 Generative AI


5 Things to Know Before Using an AI Browser

TIME - Tech

A smartphone shows the official website of ChatGPT Atlas. A smartphone shows the official website of ChatGPT Atlas. "It'd be really nice to have a service that was sort of just observing your life and proactively helping you when you needed it," said OpenAI CEO Sam Altman in a recent Q&A about OpenAI's plans. This vision is at the heart of a new crop of AI browsers, notably OpenAI's ChatGPT Atlas and Perplexity's Comet. AI browsers differ from traditional browsers in at least two important ways.


OpenAI's Open-Weight Models Are Coming to the US Military

WIRED

OpenAI's Open-Weight Models Are Coming to the US Military The gpt-oss models are being tested for use on sensitive military computers. But some defense insiders say that OpenAI is still behind the competition. When OpenAI unveiled its first open-weight models in years this August, it wasn't just tech companies that were paying attention. The release also excited US military and defense contractors, which saw a chance to use them for highly secure operations. Initial results show that OpenAI's tools lag behind competitors in desired capabilities, some military vendors tell WIRED.


China's AI is quietly making big inroads in Silicon Valley

Al Jazeera

China's AI is quietly making big inroads in Silicon Valley China's AI models are quickly gaining traction in Silicon Valley, becoming integral to the operations of American companies and earning the praise of a growing list of tech leaders. Their rapid ascent has highlighted the competitive edge that Chinese developers such as Alibaba, Z.ai, Moonshot, and MiniMax have been able to gain by offering so-called "open" language models at much lower costs than their rivals in the United States. Airbnb CEO Brian Chesky generated headlines in October when he revealed that the short-term rental platform had opted for Alibaba's Qwen over OpenAI's ChatGPT, praising the Chinese model as "fast and cheap". Social Capital CEO Chamath Palihapitiya revealed the same month that his company had migrated much of its work to Moonshot's Kimi K2 as it was "way more performant" and "a ton cheaper" than models from OpenAI and Anthropic. Programmers on social media also recently highlighted evidence that two popular US-developed coding assistants, Composer and Windsurf, were built on Chinese models.


DataSentinel: A Game-Theoretic Detection of Prompt Injection Attacks

arXiv.org Artificial Intelligence

LLM-integrated applications and agents-such as Bing Copilot [1], Google search with AI overviews [2], and Amazon's review highlights [3]-are emerging applications built upon large language models (LLMs). The growing popularity of LLM-integrated applications has led to the emergence of app stores, such as OpenAI's GPT Store and Poe [4], where developers can publish their LLMintegrated applications and users can access them, much like the Google Play and App Store for mobile apps. In general, an LLM-integrated application intends to perform a task (referred to as target task), such as webpage summarization in AI-assisted search. Towards this goal, an LLM-integrated application takes a prompt, which is the concatenation of an instruction (referred to as target instruction) and data (referred to as target data), as an input to query the backend LLM, whose response would solve the target task. The target instruction is often designed by an application developer to direct the backend LLM to perform the target task, while the data is the information to be processed by the backend LLM and is usually from an external source, e.g., the Internet. For instance, when the target task is webpage summarization in AI-assisted search, the target instruction can be "Please summarize the following web pages: [Text from relevant web pages].",


MADD: Multi-Agent Drug Discovery Orchestra

arXiv.org Artificial Intelligence

Hit identification is a central challenge in early drug discovery, traditionally requiring substantial experimental resources. Recent advances in artificial intelligence, particularly large language models (LLMs), have enabled virtual screening methods that reduce costs and improve efficiency. However, the growing complexity of these tools has limited their accessibility to wet-lab researchers. Multi-agent systems offer a promising solution by combining the interpretability of LLMs with the precision of specialized models and tools. In this work, we present MADD, a multi-agent system that builds and executes customized hit identification pipelines from natural language queries. MADD employs four coordinated agents to handle key subtasks in de novo compound generation and screening. We evaluate MADD across seven drug discovery cases and demonstrate its superior performance compared to existing LLM-based solutions. Using MADD, we pioneer the application of AI-first drug design to five biological targets and release the identified hit molecules. Finally, we introduce a new benchmark of query-molecule pairs and docking scores for over three million compounds to contribute to the agentic future of drug design.


Consensus Sampling for Safer Generative AI

arXiv.org Artificial Intelligence

Many approaches to AI safety rely on inspecting model outputs or activations, yet certain risks are inherently undetectable by inspection alone. We propose a complementary, architecture-agnostic approach that enhances safety through the aggregation of multiple generative models, with the aggregated model inheriting its safety from the safest subset of a given size among them. Specifically, we present a consensus sampling algorithm that, given $k$ models and a prompt, achieves risk competitive with the average risk of the safest $s$ of the $k$ models, where $s$ is a chosen parameter, while abstaining when there is insufficient agreement between them. The approach leverages the models' ability to compute output probabilities, and we bound the probability of abstention when sufficiently many models are safe and exhibit adequate agreement. The algorithm is inspired by the provable copyright protection algorithm of Vyas et al. (2023). It requires some overlap among safe models, offers no protection when all models are unsafe, and may accumulate risk over repeated use. Nonetheless, our results provide a new, model-agnostic approach for AI safety by amplifying safety guarantees from an unknown subset of models within a collection to that of a single reliable model.


Efficient Reasoning via Reward Model

arXiv.org Artificial Intelligence

Reinforcement learning with verifiable rewards (RLVR) has been shown to enhance the reasoning capabilities of large language models (LLMs), enabling the development of large reasoning models (LRMs). However, LRMs such as DeepSeek-R1 and OpenAI o1 often generate verbose responses containing redundant or irrelevant reasoning step-a phenomenon known as overthinking-which substantially increases computational costs. Prior efforts to mitigate this issue commonly incorporate length penalties into the reward function, but we find they frequently suffer from two critical issues: length collapse and training collapse, resulting in sub-optimal performance. To address them, we propose a pipeline for training a Conciseness Reward Model (CRM) that scores the conciseness of reasoning path. Additionally, we introduce a novel reward formulation named Conciseness Reward Function (CRF) with explicit dependency between the outcome reward and conciseness score, thereby fostering both more effective and more efficient reasoning. From a theoretical standpoint, we demonstrate the superiority of the new reward from the perspective of variance reduction and improved convergence properties. Besides, on the practical side, extensive experiments on five mathematical benchmark datasets demonstrate the method's effectiveness and token efficiency, which achieves an 8.1% accuracy improvement and a 19.9% reduction in response token length on Qwen2.5-7B. Furthermore, the method generalizes well to other LLMs including Llama and Mistral. The implementation code and datasets are publicly available for reproduction: https://anonymous.4open.science/r/CRM.


Advancing Autonomous Emergency Response Systems: A Generative AI Perspective

arXiv.org Artificial Intelligence

Abstract--Autonomous V ehicles (A Vs) are poised to revolutionize emergency services by enabling faster, safer, and more efficient responses. This transformation is driven by advances in Artificial Intelligence (AI), particularly Reinforcement Learning (RL), which allows A Vs to navigate complex environments and make critical decisions in real time. However, conventional RL paradigms often suffer from poor sample efficiency and lack adaptability in dynamic emergency scenarios. This paper reviews next-generation A V optimization strategies to address these limitations. We analyze the shift from conventional RL to Diffusion Model (DM)-augmented RL, which enhances policy robustness through synthetic data generation, albeit with increased computational cost. Additionally, we explore the emerging paradigm of Large Language Model (LLM)-assisted In-Context Learning (ICL), which offers a lightweight and interpretable alternative by enabling rapid, on-the-fly adaptation without retraining. By reviewing the state of the art in A V intelligence, DM-augmented RL, and LLM-assisted ICL, this paper provides a critical framework for understanding the next generation of autonomous emergency response systems from a Generative AI perspective. Autonomous vehicles (A Vs) are poised to transform emergency services by enabling faster, safer, and more intelligent responses. Uncrewed Aerial V ehicles (UA Vs), as key enablers within the A V ecosystem, provide rapid deployment and precise mobility. They can serve as both aerial base stations and data collectors, enhancing connectivity and information gathering for A V operations.



How to Talk to ChatGPT for Free Inside WhatsApp (While You Still Can)

WIRED

Meta's messaging app offers free access to the AI chatbot, but only until January 2026. There are plenty of places you can get access to ChatGPT: Not just in the official apps for the web and mobile devices, but also through Copilot from Microsoft, and in Apple's Siri assistant ... and inside the messaging app WhatsApp . WhatsApp, run by Facebook developer Meta, is available free of charge on the web, and on Android and iOS . It's used by billions of people worldwide, which helps to explain why OpenAI has made ChatGPT available here as well as everywhere else. Unfortunately, OpenAI will be pulling free access to its chatbot within WhatsApp on January 15, 2026.