Plotting

Generating QM1B with PySCF _{\text{IPU}}

Neural Information Processing Systems

The emergence of foundation models in Computer Vision and Natural Language Processing have resulted in immense progress on downstream tasks. This progress was enabled by datasets with billions of training examples. Similar benefits are yet to be unlocked for quantum chemistry, where the potential of deep learning is constrained by comparatively small datasets with 100k to 20M training examples. These datasets are limited in size because the labels are computed using the accurate (but computationally demanding) predictions of Density Functional Theory (DFT). Notably, prior DFT datasets were created using CPU supercomputers without leveraging hardware acceleration.


How to use Windows Photo Viewer's AI tools to perfect your pictures

PCWorld

If you're into trying out new AI tools, you'll dig thisโ€“ the latest Windows 11 24H2 update brings some cool upgrades to the Photos app. Microsoft added AI-powered editing that makes touching up your pics super easy. One standout feature is the "Restyle Image" option. It lets you apply different artistic styles to your photos. This feature really opens up a whole world of possibilities.


The Shaped Transformer: Attention Models in the Infinite Depth-and-Width Limit

Neural Information Processing Systems

In deep learning theory, the covariance matrix of the representations serves as aproxy to examine the network's trainability. Motivated by the success of Transform-ers, we study the covariance matrix of a modified Softmax-based attention modelwith skip connections in the proportional limit of infinite-depth-and-width. Weshow that at initialization the limiting distribution can be described by a stochasticdifferential equation (SDE) indexed by the depth-to-width ratio. To achieve awell-defined stochastic limit, the Transformer's attention mechanism is modifiedby centering the Softmax output at identity, and scaling the Softmax logits by awidth-dependent temperature parameter. We examine the stability of the networkthrough the corresponding SDE, showing how the scale of both the drift and diffu-sion can be elegantly controlled with the aid of residual connections.


Weakly Supervised 3D Open-vocabulary Segmentation

Neural Information Processing Systems

Open-vocabulary segmentation of 3D scenes is a fundamental function of human perception and thus a crucial objective in computer vision research. However, this task is heavily impeded by the lack of large-scale and diverse 3D open-vocabulary segmentation datasets for training robust and generalizable models. Distilling knowledge from pre-trained 2D open-vocabulary segmentation models helps but it compromises the open-vocabulary feature as the 2D models are mostly finetuned with close-vocabulary datasets. We tackle the challenges in 3D open-vocabulary segmentation by exploiting pre-trained foundation models CLIP and DINO in a weakly supervised manner. Specifically, given only the open-vocabulary text descriptions of the objects in a scene, we distill the open-vocabulary multimodal knowledge and object reasoning capability of CLIP and DINO into a neural radiance field (NeRF), which effectively lifts 2D features into view-consistent 3D segmentation.


Fine-Tuning Language Models with Just Forward Passes

Neural Information Processing Systems

Fine-tuning language models (LMs) has yielded success on diverse downstream tasks, but as LMs grow in size, backpropagation requires a prohibitively large amount of memory. Zeroth-order (ZO) methods can in principle estimate gradients using only two forward passes but are theorized to be catastrophically slow for optimizing large models. In this work, we propose a memory-efficient zerothorder optimizer (MeZO), adapting the classical ZO-SGD method to operate in-place, thereby fine-tuning LMs with the same memory footprint as inference. For example, with a single A100 80GB GPU, MeZO can train a 30-billion parameter model, whereas fine-tuning with backpropagation can train only a 2.7B LM with the same budget. We conduct comprehensive experiments across model types (masked and autoregressive LMs), model scales (up to 66B), and downstream tasks (classification, multiple-choice, and generation).


Super Speeders are deadly. This technology can slow them down.

Popular Science

Breakthroughs, discoveries, and DIY tips sent every weekday. In 2013, Amy Cohen experienced the unthinkable for a parent. It was a mild October day in New York City and her 12-year-old son Sammy stopped by the house to grab a snack on his way from school to soccer practice. When he stepped out onto their street in Brooklyn, Sammy was struck and killed by a speeding van. "It's a horror no parent should ever experience," Cohen told Popular Science.


SPAE: Semantic Pyramid AutoEncoder for Multimodal Generation with Frozen LLMs

Neural Information Processing Systems

In this work, we introduce Semantic Pyramid AutoEncoder (SPAE) for enabling frozen LLMs to perform both understanding and generation tasks involving non-linguistic modalities such as images or videos. SPAE converts between raw pixels and interpretable lexical tokens (or words) extracted from the LLM's vocabulary. The resulting tokens capture both the rich semantic meaning and the fine-grained details needed for visual reconstruction, effectively translating the visual content into a language comprehensible to the LLM, and empowering it to perform a wide array of multimodal tasks. Our approach is validated through in-context learning experiments with frozen PaLM 2 and GPT 3.5 on a diverse set of image understanding and generation tasks.Our method marks the first successful attempt to enable a frozen LLM to generate image content while surpassing state-of-the-art performance in image understanding tasks, under the same setting, by over 25%.


Box's new AI agents can organize, find, and extract data from documents for you

ZDNet

AI agents, as you've probably noticed, are all the rage in Silicon Valley. On Thursday, the content management platform Box joined a growing list of companies hoping to cash in on this latest tech trend. The new Box AI Agents are designed to help enterprise customers organize and retrieve critical information from files across the platform. Also: 100 leading AI scientists map route to more'trustworthy, reliable, secure' AI Like many new "agentic" products, the agents are promoted as time-saving tools that enterprise customers can harness to reduce mundane tasks that tend to eat up large chunks of employees' workdays, like summarizing HR forms or pulling key details from lengthy contracts. The agents are being released as part of Box AI, the company's AI-powered content management tool, which debuted in late 2023.


AI PCs rely on NPUs. So what exactly are these newfangled chips?

PCWorld

CPUs and GPUs are old news. These days, the cutting edge is all about NPUs, and hardware manufacturers are talking up NPU performance. The NPU is a computer component designed to accelerate AI tasks in a power-efficient manner, paving the way for new Windows desktop applications with powerful AI features. All PCs will eventually have NPUs, but at the moment only some laptops have them. Here's everything you need to know about NPUs and why they're such a hot topic in the computer industry right now.


GPT-4.1 makes ChatGPT smarter, faster, and more useful for paying users, especially coders

ZDNet

OpenAI is now bringing GPT-4.1 to the Plus, Pro, and Team tiers of ChatGPT. GPT-4.1 was previously available only to API users. Since I'm throwing a whole lot of buzzwords at you, let's spend a minute deconstructing all these terms. OK, so that should bring you up to speed. Back in April, OpenAI released GPT-4.1 for developers to use via the API.