Goto

Collaborating Authors

 Generative AI


Virology Capabilities Test (VCT): A Multimodal Virology Q&A Benchmark

arXiv.org Artificial Intelligence

We present the Virology Capabilities Test (VCT), a large language model (LLM) benchmark that measures the capability to troubleshoot complex virology laboratory protocols. Constructed from the inputs of dozens of PhD-level expert virologists, VCT consists of $322$ multimodal questions covering fundamental, tacit, and visual knowledge that is essential for practical work in virology laboratories. VCT is difficult: expert virologists with access to the internet score an average of $22.1\%$ on questions specifically in their sub-areas of expertise. However, the most performant LLM, OpenAI's o3, reaches $43.8\%$ accuracy, outperforming $94\%$ of expert virologists even within their sub-areas of specialization. The ability to provide expert-level virology troubleshooting is inherently dual-use: it is useful for beneficial research, but it can also be misused. Therefore, the fact that publicly available models outperform virologists on VCT raises pressing governance considerations. We propose that the capability of LLMs to provide expert-level troubleshooting of dual-use virology work should be integrated into existing frameworks for handling dual-use technologies in the life sciences.


OpenAI rolls back update that made ChatGPT an ass-kissing weirdo

Engadget

OpenAI is rolling back a recent update to GPT-4o, the default model that powers ChatGPT, following complaints from users that it made the chat bot act like a weirdo. "The last couple of GPT-4o updates have made the personality too sycophant-y and annoying (even though there are some very good parts of it), and we are working on fixes asap, some today and some this week," said OpenAI CEO Sam Altman in a X post spotted by TechCrunch. As of midday Tuesday, Altman said ChatGPT was running on an older, less sycophantic version of GPT-4o for all free users. The company hopes to get paid users back on an older release of the model by later today. "We're working on additional fixes to model personality and will share more in the coming days," Altman said, adding OpenAI would share more information about what went wrong "at some point."


Meta has a plan to bring AI to WhatsApp chats without breaking privacy

Engadget

As Meta's first-ever generative AI conference gets underway, the company is also previewing a significant update on its plans to bring AI features to WhatsApp chats. Buried in its LlamaCon updates, the company shared that it's working on something called "Private Processing," which will allow users to take advantage of generative AI capabilities within WhatsApp without eroding its privacy features. According to Meta, Private Processing is an "optional capability" that will enable people to "leverage AI capabilities for things like summarizing unread messages or refining them, while keeping messages private." WhatsApp, of course, is known for its strong privacy protections and end-to-end encryption. That would seem incompatible with cloud-based AI features like Meta AI.


WhatsApp Is Walking a Tightrope Between AI Features and Privacy

WIRED

The end-to-end encrypted communication app WhatsApp, used by roughly 3 billion people around the world, will roll out cloud-based AI capabilities in the coming weeks that are designed to preserve WhatsApp's defining security and privacy guarantees while offering users access to message summarization and composition tools. Meta has been incorporating generative AI features across its services that are built on its open source large language model, Llama. And WhatsApp already incorporates a light blue circle that gives users access to the Meta AI assistant. But many users have balked at this addition, given that interactions with the AI assistant aren't shielded from Meta the way end-to-end encrtyped WhatsApp chats are. The new feature, dubbed Private Processing, is meant to address these concerns with what the company says is a carefully architected and purpose-built platform devoted to processing data for AI tasks without the information being accessible to Meta, WhatsApp, or any other party.


OpenAI adds shopping features to ChatGPT Search

Engadget

OpenAI, which spends far more money than it takes in, is trying something new to stanch the bleeding. The company just announced that all users, including on the free tier, can shop from ChatGPT Search. "You can now search for a product, compare options and buy products in ChatGPT," OpenAI said in a press release. Categories currently available include fashion, beauty, home goods and electronics, with expansion to more categories set to come later. The search results you'll obtain are "chosen independently and are not ads," the company promises. The updates are available in 4o and are rolling out to ChatGPT Plus, Pro, Free and even logged-out users.


The AI Hype Index: AI agent cyberattacks, racing robots, and musical models

MIT Technology Review

That's why we've created the AI Hype Index--a simple, at-a-glance summary of everything you need to know about the state of the industry. AI agents are the AI industry's hypiest new product--intelligent assistants capable of completing tasks without human supervision. But while they can be theoretically useful--Simular AI's S2 agent, for example, intelligently switches between models depending on what it's been told to do--they could also be weaponized to execute cyberattacks. Elsewhere, OpenAI is reported to be throwing its hat into the social media arena, and AI models are getting more adept at making music. Oh, and if the results of the first half-marathon pitting humans against humanoid robots are anything to go by, we won't have to worry about the robot uprising any time soon.


Enhancing Surgical Documentation through Multimodal Visual-Temporal Transformers and Generative AI

arXiv.org Artificial Intelligence

The automatic summarization of surgical videos is essential for enhancing procedural documentation, supporting surgical training, and facilitating post-operative analysis. This paper presents a novel method at the intersection of artificial intelligence and medicine, aiming to develop machine learning models with direct real-world applications in surgical contexts. We propose a multi-modal framework that leverages recent advancements in computer vision and large language models to generate comprehensive video summaries. % The approach is structured in three key stages. First, surgical videos are divided into clips, and visual features are extracted at the frame level using visual transformers. This step focuses on detecting tools, tissues, organs, and surgical actions. Second, the extracted features are transformed into frame-level captions via large language models. These are then combined with temporal features, captured using a ViViT-based encoder, to produce clip-level summaries that reflect the broader context of each video segment. Finally, the clip-level descriptions are aggregated into a full surgical report using a dedicated LLM tailored for the summarization task. % We evaluate our method on the CholecT50 dataset, using instrument and action annotations from 50 laparoscopic videos. The results show strong performance, achieving 96\% precision in tool detection and a BERT score of 0.74 for temporal context summarization. This work contributes to the advancement of AI-assisted tools for surgical reporting, offering a step toward more intelligent and reliable clinical documentation.


Taming the Titans: A Survey of Efficient LLM Inference Serving

arXiv.org Artificial Intelligence

Large Language Models (LLMs) for Generative AI have achieved remarkable progress, evolving into sophisticated and versatile tools widely adopted across various domains and applications. However, the substantial memory overhead caused by their vast number of parameters, combined with the high computational demands of the attention mechanism, poses significant challenges in achieving low latency and high throughput for LLM inference services. Recent advancements, driven by groundbreaking research, have significantly accelerated progress in this field. This paper provides a comprehensive survey of these methods, covering fundamental instance-level approaches, in-depth cluster-level strategies, emerging scenario directions, and other miscellaneous but important areas. At the instance level, we review model placement, request scheduling, decoding length prediction, storage management, and the disaggregation paradigm. At the cluster level, we explore GPU cluster deployment, multi-instance load balancing, and cloud service solutions. For emerging scenarios, we organize the discussion around specific tasks, modules, and auxiliary methods. To ensure a holistic overview, we also highlight several niche yet critical areas. Finally, we outline potential research directions to further advance the field of LLM inference serving.


Generative AI in Education: Student Skills and Lecturer Roles

arXiv.org Artificial Intelligence

Generative Artificial Intelligence (GenAI) tools such as ChatGPT are emerging as a revolutionary tool in education that brings both positive aspects and challenges for educators and students, reshaping how learning and teaching are approached. This study aims to identify and evaluate the key competencies students need to effectively engage with GenAI in education and to provide strategies for lecturers to integrate GenAI into teaching practices. The study applied a mixed method approach with a combination of a literature review and a quantitative survey involving 130 students from South Asia and Europe to obtain its findings. The literature review identified 14 essential student skills for GenAI engagement, with AI literacy, critical thinking, and ethical AI practices emerging as the most critical. The student survey revealed gaps in prompt engineering, bias awareness, and AI output management. In our study of lecturer strategies, we identified six key areas, with GenAI Integration and Curriculum Design being the most emphasised. Our findings highlight the importance of incorporating GenAI into education. While literature prioritized ethics and policy development, students favour hands-on, project-based learning and practical AI applications. To foster inclusive and responsible GenAI adoption, institutions should ensure equitable access to GenAI tools, establish clear academic integrity policies, and advocate for global GenAI research initiatives.


CapsFake: A Multimodal Capsule Network for Detecting Instruction-Guided Deepfakes

arXiv.org Artificial Intelligence

Unlike traditional text-to-image generation, where the entire image is synthesized from scratch, instruction-guided editing targets real images and modifies specific semantic attributes (such as object identity, background context, or visual style) while preserving global visual coherence. These manipulations are particularly concerning from a cybersecurity standpoint because they maintain the illusion of authenticity while enabling adversaries to alter identity, fabricate visual evidence, or inject misinformation into trusted media pipelines. As illustrated in Figure 2, the instruction-guided image editing pipeline comprises three key AI components, each playing a distinct role in enabling semantically precise and visually coherent manipulations. 4 Figure 2: Malicious Image Manipulation Pipeline. A threat actor uses generative AI tools to manipulate specific elements of an image, leveraging image translation and understanding models to guide semantic edits. These capabilities facilitate identity obfuscation, impersonation, and disinformation. First, an image translation model is used to convert the raw source image into a descriptive textual caption that semantically captures its visual content. This step, commonly implemented with models like CLIP [22], or BLIP-2 [23], provides a language-based anchor that enables subsequent manipulation. For example, a facial image may be described as "a girl wearing a blue and white striped shirt", forming the basis for meaningful transformation prompts.