Goto

Collaborating Authors

 large language model


Reformulating Zero-shot Action Recognition for Multi-label Actions (Supplementary Material)

Neural Information Processing Systems

Standard video models expect frame dimensions with the same height and width, so we crop a square region around the actor and resize it to the network specific dimensions (112 112). We present some examples of AVA video frames with their annotations as well as the generated crops in Figure 1. This square crop can cause multiple actors to appear within one clip, as seen in the second example, but it ensures the aspect ratio of the person is not altered, which is necessary as this is the manner in which the video model is trained. Figure 1: Example of original ground-truth bounding boxes (left) in the AVA dataset, with the cropped actors on the right. For PS-ZSAR prediction confidences are obtained from the softmax probabilities output by our pair-wise similarity function.


The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale

Neural Information Processing Systems

The performance of a large language model (LLM) depends heavily on the quality and size of its pretraining dataset. However, the pretraining datasets for state-ofthe-art open LLMs like Llama 3 and Mixtral are not publicly available and very little is known about how they were created. In this work, we introduce FineWeb, a 15-trillion token dataset derived from 96 Common Crawl snapshots that produces better-performing LLMs than other open pretraining datasets. To advance the understanding of how best to curate high-quality pretraining datasets, we carefully document and ablate all of the design choices used in FineWeb, including indepth investigations of deduplication and filtering strategies. In addition, we introduce FineWeb-Edu, a 1.3-trillion token collection of educational text filtered from FineWeb.


On Giant's Shoulders: Effortless Weakto Strong by Dynamic Logits Fusion

Neural Information Processing Systems

Efficient fine-tuning of large language models for task-specific applications is imperative, yet the vast number of parameters in these models makes their training increasingly challenging. Despite numerous proposals for effective methods, a substantial memory overhead remains for gradient computations during updates. Can we fine-tune a series of task-specific small models and transfer their knowledge directly to a much larger model without additional training? In this paper, we explore weak-to-strong specialization using logit arithmetic, facilitating a direct answer to this question. Existing weak-to-strong methods often employ a static knowledge transfer ratio and a single small model for transferring complex knowledge, which leads to suboptimal performance.


Q: Question-Asking LLMs and a Benchmark for Reliable Interactive Clinical Reasoning

Neural Information Processing Systems

Users typically engage with LLMs interactively, yet most existing benchmarks evaluate them in a static, single-turn format, posing reliability concerns in interactive scenarios. We identify a key obstacle towards reliability: LLMs are trained to answer any question, even with incomplete context or insufficient knowledge.


iPhone design guru and OpenAI chief promise an AI device revolution

The Guardian

Everything over the last 30 years, according to Sir Jony Ive, has led to this moment: a partnership between the iPhone designer and the developer of ChatGPT. Ive has sold his hardware startup, io, to OpenAI and will take on creative and design leadership across the merged businesses. "I have a growing sense that everything I have learned over the last 30 years has led me to this place, to this moment," he says in a video announcing the 6.4bn ( 4.8bn) deal. The main aim will be to move on from Ive's signature achievement designing Apple's most successful product, as well as the iPod, iPad and Apple Watch. The British-born designer has already developed a prototype io device, and one of its users is OpenAI's chief executive, Sam Altman.


AI Is Eating Data Center Power Demand--and It's Only Getting Worse

WIRED

AI's energy use already represents as much as 20 percent of global data-center power demand, research published Thursday in the journal Joule shows. That demand from AI, the research states, could double by the end of this year, comprising nearly half of all total data-center electricity consumption worldwide, excluding the electricity used for bitcoin mining. The new research is published in a commentary by Alex de Vries-Gao, the founder of Digiconomist, a research company that evaluates the environmental impact of technology. De Vries-Gao started Digiconomist in the late 2010s to explore the impact of bitcoin mining, another extremely energy-intensive activity, would have on the environment. Looking at AI, he says, has grown more urgent over the past few years because of the widespread adoption of ChatGPT and other large language models that use massive amounts of energy. According to his research, worldwide AI energy demand is now set to surpass demand from bitcoin mining by the end of this year.


Anthropic's latest Claude AI models are here - and you can try one for free today

ZDNet

Since its founding in 2021, Anthropic has quickly become one of the leading AI companies and a worthy competitor to OpenAI, Google, and Microsoft with its Claude models. Building on this momentum, the company held its first developer conference, Thursday, -- Code with Claude -- which showcased what the company has done so far and where it is going next. Also: I let Google's Jules AI agent into my code repo and it did four hours of work in an instant Anthropic used the event stage to unveil two highly anticipated models, Claude Opus 4 and Claude Sonnet 4. Both offer improvements over their preceding models, including better performance in coding and reasoning. Beyond that, the company launched new features and tools for its models that should improve the user experience. Keep reading to learn more about the new models.


A United Arab Emirates Lab Announces Frontier AI Projects--and a New Outpost in Silicon Valley

WIRED

A United Arab Emirates (UAE) academic lab today launched an artificial intelligence world model and agent, two large language models (LLMs) and a new research center in Silicon Valley as it ramps up its investment in the cutting-edge field. The UAE's Mohamed bin Zayed University of Artificial Intelligence (MBZUAI) revealed an AI world model called PAN, which can be used to build physically realistic simulations for testing and honing the performance of AI agents. Eric Xing, President and Professor of MBZUAI and a leading AI researcher, revealed the models and lab at the Computer History Museum in Mountain View, California today. The UAE has made big investments in AI in recent years under the guidance of Sheikh Tahnoun bin Zayed al Nahyan, the nation's tech-savvy national security advisor and younger brother of president Mohamed bin Zayed Al Nahyan. Xing says the UAE's new center in Sunnyvale, California, will help the nation tap into the world's most concentrated source of AI knowledge and talent.


DOGE Used Meta AI Model to Review Emails From Federal Workers

WIRED

Elon Musk's so-called Department of Government Efficiency (DOGE) used artificial intelligence from Meta's Llama model to comb through and analyze emails from federal workers. Materials viewed by WIRED show that DOGE affiliates within the Office of Personnel Management (OPM) tested and used Meta's Llama 2 model to review and classify responses from federal workers to the infamous "Fork in the Road" email that was sent across the government in late January. The email offered deferred resignation to anyone opposed to changes the Trump administration was making to its federal workforce, including an enforced return to office policy, downsizing, and a requirement to be "loyal." To leave their position, recipients merely needed to reply with the word "resign." This email closely mirrored one that Musk sent to Twitter employees shortly after he took over the company in 2022.


Leak reveals what Sam Altman and Jony Ive are cooking up: 100 million AI companion devices

Mashable

OpenAI and Jony Ive's vision for its AI device is a screenless companion that knows everything about you. Details leaked to the Wall Street Journal give us a clearer picture of OpenAI's acquisition of io, cofounded by Ive, the iconic iPhone designer. The ChatGPT maker reportedly plans to ship 100 million AI devices designed to fit in with users' everyday life. "The product will be capable of being fully aware of a user's surroundings and life, will be unobtrusive, able to rest in one's pocket or on one's desk," according to a recording of an OpenAI staff meeting reviewed by the Journal. The device "will be a third core device a person would put on a desk after a MacBook Pro and an iPhone," per the meeting which occurred the same day (Wednesday) that OpenAI announced its acquisition of Ive's company.