AITopics

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

The Shaped Transformer: Attention Models in the Infinite Depth-and-Width Limit

Neural Information Processing SystemsMay-15-2025, 13:48:22 GMT

In deep learning theory, the covariance matrix of the representations serves as aproxy to examine the network's trainability. Motivated by the success of Transform-ers, we study the covariance matrix of a modified Softmax-based attention modelwith skip connections in the proportional limit of infinite-depth-and-width. Weshow that at initialization the limiting distribution can be described by a stochasticdifferential equation (SDE) indexed by the depth-to-width ratio. To achieve awell-defined stochastic limit, the Transformer's attention mechanism is modifiedby centering the Softmax output at identity, and scaling the Softmax logits by awidth-dependent temperature parameter. We examine the stability of the networkthrough the corresponding SDE, showing how the scale of both the drift and diffu-sion can be elegantly controlled with the aid of residual connections.

artificial intelligence, machine learning, transformer, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Weakly Supervised 3D Open-vocabulary Segmentation

Neural Information Processing SystemsMay-15-2025, 13:36:10 GMT

Open-vocabulary segmentation of 3D scenes is a fundamental function of human perception and thus a crucial objective in computer vision research. However, this task is heavily impeded by the lack of large-scale and diverse 3D open-vocabulary segmentation datasets for training robust and generalizable models. Distilling knowledge from pre-trained 2D open-vocabulary segmentation models helps but it compromises the open-vocabulary feature as the 2D models are mostly finetuned with close-vocabulary datasets. We tackle the challenges in 3D open-vocabulary segmentation by exploiting pre-trained foundation models CLIP and DINO in a weakly supervised manner. Specifically, given only the open-vocabulary text descriptions of the objects in a scene, we distill the open-vocabulary multimodal knowledge and object reasoning capability of CLIP and DINO into a neural radiance field (NeRF), which effectively lifts 2D features into view-consistent 3D segmentation.

artificial intelligence, open-vocabulary segmentation, segmentation, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Vision (0.99)

Add feedback

Fine-Tuning Language Models with Just Forward Passes

Neural Information Processing SystemsMay-15-2025, 13:26:18 GMT

Fine-tuning language models (LMs) has yielded success on diverse downstream tasks, but as LMs grow in size, backpropagation requires a prohibitively large amount of memory. Zeroth-order (ZO) methods can in principle estimate gradients using only two forward passes but are theorized to be catastrophically slow for optimizing large models. In this work, we propose a memory-efficient zerothorder optimizer (MeZO), adapting the classical ZO-SGD method to operate in-place, thereby fine-tuning LMs with the same memory footprint as inference. For example, with a single A100 80GB GPU, MeZO can train a 30-billion parameter model, whereas fine-tuning with backpropagation can train only a 2.7B LM with the same budget. We conduct comprehensive experiments across model types (masked and autoregressive LMs), model scales (up to 66B), and downstream tasks (classification, multiple-choice, and generation).

artificial intelligence, fine-tuning language model, machine learning, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.86)

Add feedback

Super Speeders are deadly. This technology can slow them down.

Breakthroughs, discoveries, and DIY tips sent every weekday. In 2013, Amy Cohen experienced the unthinkable for a parent. It was a mild October day in New York City and her 12-year-old son Sammy stopped by the house to grab a snack on his way from school to soccer practice. When he stepped out onto their street in Brooklyn, Sammy was struck and killed by a speeding van. "It's a horror no parent should ever experience," Cohen told Popular Science.

artificial intelligence, speed limit, vehicle, (14 more...)

Popular Science

Country: North America > United States > New York (0.28)

Industry:

Transportation > Ground > Road (1.00)
Law (1.00)
Education (0.89)
Government > Regional Government > North America Government > United States Government (0.48)

Technology: Information Technology > Artificial Intelligence (0.49)

Add feedback

SPAE: Semantic Pyramid AutoEncoder for Multimodal Generation with Frozen LLMs

Neural Information Processing SystemsMay-15-2025, 13:24:35 GMT

In this work, we introduce Semantic Pyramid AutoEncoder (SPAE) for enabling frozen LLMs to perform both understanding and generation tasks involving non-linguistic modalities such as images or videos. SPAE converts between raw pixels and interpretable lexical tokens (or words) extracted from the LLM's vocabulary. The resulting tokens capture both the rich semantic meaning and the fine-grained details needed for visual reconstruction, effectively translating the visual content into a language comprehensible to the LLM, and empowering it to perform a wide array of multimodal tasks. Our approach is validated through in-context learning experiments with frozen PaLM 2 and GPT 3.5 on a diverse set of image understanding and generation tasks.Our method marks the first successful attempt to enable a frozen LLM to generate image content while surpassing state-of-the-art performance in image understanding tasks, under the same setting, by over 25%.

artificial intelligence, large language model, natural language, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Box's new AI agents can organize, find, and extract data from documents for you

ZDNetMay-15-2025, 13:00:58 GMT

AI agents, as you've probably noticed, are all the rage in Silicon Valley. On Thursday, the content management platform Box joined a growing list of companies hoping to cash in on this latest tech trend. The new Box AI Agents are designed to help enterprise customers organize and retrieve critical information from files across the platform. Also: 100 leading AI scientists map route to more'trustworthy, reliable, secure' AI Like many new "agentic" products, the agents are promoted as time-saving tools that enterprise customers can harness to reduce mundane tasks that tend to eat up large chunks of employees' workdays, like summarizing HR forms or pulling key details from lengthy contracts. The agents are being released as part of Box AI, the company's AI-powered content management tool, which debuted in late 2023.

artificial intelligence, machine learning, natural language, (11 more...)

ZDNet

Country: North America > United States > California (0.26)

Industry: Information Technology (0.37)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.55)

Add feedback

AI PCs rely on NPUs. So what exactly are these newfangled chips?

PCWorldMay-15-2025, 13:00:00 GMT

CPUs and GPUs are old news. These days, the cutting edge is all about NPUs, and hardware manufacturers are talking up NPU performance. The NPU is a computer component designed to accelerate AI tasks in a power-efficient manner, paving the way for new Windows desktop applications with powerful AI features. All PCs will eventually have NPUs, but at the moment only some laptops have them. Here's everything you need to know about NPUs and why they're such a hot topic in the computer industry right now.

artificial intelligence, machine learning, natural language, (19 more...)

PCWorld

Industry: Information Technology > Hardware (0.49)

Technology:

Information Technology > Hardware (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.98)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)

Add feedback

GPT-4.1 makes ChatGPT smarter, faster, and more useful for paying users, especially coders

ZDNetMay-15-2025, 12:31:41 GMT

OpenAI is now bringing GPT-4.1 to the Plus, Pro, and Team tiers of ChatGPT. GPT-4.1 was previously available only to API users. Since I'm throwing a whole lot of buzzwords at you, let's spend a minute deconstructing all these terms. OK, so that should bring you up to speed. Back in April, OpenAI released GPT-4.1 for developers to use via the API.

large language model, machine learning, natural language, (21 more...)

ZDNet

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.54)

Add feedback

The Download: Montana's experimental treatments, and Google DeepMind's new AI agent

MIT Technology ReviewMay-15-2025, 12:10:00 GMT

The news: A bill that allows clinics to sell unproven treatments has been passed in Montana. Under the legislation, doctors can apply for a license to open an experimental treatment clinic and recommend and sell therapies not approved by the Food and Drug Administration (FDA) to their patients. Why it matters: Once it's signed by the governor, the law will be the most expansive in the country in allowing access to drugs that have not been fully tested. The bill allows for any drug produced in the state to be sold in it, providing it has been through phase I clinical trials--but these trials do not determine if the drug is effective. The big picture: The bill was drafted and lobbied for by people interested in extending human lifespans.

google deepmind, large language model, machine learning, (8 more...)

MIT Technology Review

Country: North America > United States > Montana (0.99)

Genre:

Research Report > Experimental Study (0.97)
Research Report > Strength Medium (0.60)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Government > Regional Government > North America Government > United States Government > FDA (0.60)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.98)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.83)

Add feedback

LaSCal: Label-Shift Calibration without target labels

Neural Information Processing SystemsMay-15-2025, 12:09:09 GMT

When machine learning systems face dataset shift, model calibration plays a pivotal role in ensuring their reliability.Calibration error (CE) provides insights into the alignment between the predicted confidence scores and the classifier accuracy.While prior works have delved into the implications of dataset shift on calibration, existing CE estimators either (i) assume access to labeled data from the target domain, often unavailable in practice, or (ii) are derived under a covariate shift assumption.In this work we propose a novel, label-free, consistent CE estimator under label shift. Label shift is characterized by changes in the marginal label distribution p(Y), with a constant conditional p(X Y) distribution between the source and target. We introduce a novel calibration method, called LaSCal, which uses the estimator in conjunction with a post-hoc calibration strategy, to perform unsupervised calibration on the target distribution. Our thorough empirical analysis demonstrates the effectiveness and reliability of the proposed approach across different modalities, model architectures and label shift intensities.

artificial intelligence, label-shift calibration, machine learning, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback