AITopics | Technology

Collaborating Authors

Technology

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

News Overviews Instructional Materials AI-Alerts Classics

Sparse MeZO: Less Parameters for Better Performance in Zeroth-Order LLMFine-Tuning

Neural Information Processing SystemsJun-15-2026, 12:18:26 GMT

While fine-tuning large language models (LLMs) for specific tasks often yields impressive results, it comes at the cost of memory inefficiency due to back-propagation in gradient-based training. Memory-efficient Zeroth-order (MeZO) optimizers, recently proposed to address this issue, only require forward passes during training, making them more memory-friendly. However, compared with exact gradients, ZO-based gradients usually exhibit an estimation error, which can significantly hurt the optimization process, leading to slower convergence and suboptimal solutions. In addition, we find that the estimation error will hurt more when adding to large weights instead of small weights. Based on this observation, this paper introduces Sparse MeZO, a novel memory-efficient zeroth-order optimization approach that applies ZO only to a carefully chosen subset of parameters. We propose a simple yet effective parameter selection scheme that yields significant performance gains with Sparse-MeZO. Additionally, we develop a memory-optimized implementation for sparse masking, ensuring the algorithm requires only inference-level memory consumption, allowing Sparse-MeZO to fine-tune LLaMA-30b on a single A100 GPU. Experimental results illustrate that Sparse-MeZO consistently improves both performance and convergence speed over MeZO without any overhead. For example, it achieves a 9% absolute accuracy improvement and 3.5x speedup over MeZO on the RTE task.

arxiv preprint arxiv, large language model, machine learning, (18 more...)

Neural Information Processing Systems

Country: North America > United States > California (0.28)

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Online Multi-Class Selection with Group Fairness Guarantee

Neural Information Processing SystemsJun-15-2026, 12:17:28 GMT

We study the online multi-class selection problem with group fairness guarantees, where limited resources must be allocated to sequentially arriving agents.

artificial intelligence, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country: North America > Canada > Alberta (0.14)

Genre: Research Report > Experimental Study (1.00)

Industry: Education > Educational Setting > Online (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.68)
Information Technology > Communications > Networks (0.67)

Add feedback

AUnified Stability Analysis of SAM vs SGD: Role of Data Coherence and Emergence of Simplicity Bias

Neural Information Processing SystemsJun-15-2026, 12:17:08 GMT

Understanding the dynamics of optimization in deep learning is increasingly important as models scale. While stochastic gradient descent (SGD) and its variants reliably find solutions that generalize well, the mechanisms driving this generalization remain unclear. Notably, these algorithms often prefer flatter or simpler minima--particularly in overparameterized settings. Prior work has linked flatness to generalization, and methods like Sharpness-Aware Minimization (SAM) explicitly encourage flatness, but a unified theory connecting data structure, optimization dynamics, and the nature of learned solutions is still lacking. In this work, we develop a linear stability framework that analyzes the behavior of SGD, random perturbations, and SAM--particularly in two-layer ReLU networks. Central to our analysis is a coherence measure that quantifies how gradient curvature aligns across data points, revealing why certain minima are stable and favored during training.

artificial intelligence, machine learning, stability, (17 more...)

Neural Information Processing Systems

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

The Download: cutting AC emissions, and nature's drug designer

MIT Technology ReviewJun-15-2026, 12:10:00 GMT

Plus: Anthropic has shut down access to its top models after a US directive. That's good for our health, but bad for the planet: it already accounts for 7% of global electricity use and 3% of greenhouse-gas emissions. Feeling the heat, scientists and startups are hoping to amp up solid-state cooling. These systems move heat through conductive materials, which could cool spaces and surfaces with fewer messy side effects. The catch is whether it can match the efficiency of traditional AC. Find out how the unconventional coolers aim to dial down AC emissions .

artificial intelligence, machine learning, natural language, (18 more...)

MIT Technology Review

Country: North America > United States (0.30)

Industry:

Energy (0.55)
Information Technology > Security & Privacy (0.49)
Health & Medicine > Pharmaceuticals & Biotechnology (0.49)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.31)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.31)

Add feedback

CAMILA: Context-Aware Masking for Image Editing with Language Alignment

Neural Information Processing SystemsJun-15-2026, 12:07:43 GMT

Text-guided image editing has been allowing users to transform and synthesize images through natural language instructions, offering considerable flexibility. However, most existing image editing models naively attempt to follow all user instructions, even if those instructions are inherently infeasible or contradictory, often resulting in nonsensical output. To address these challenges, we propose a contextaware method for image editing named as CAMILA (Context-Aware Masking for Image Editing with Language Alignment). CAMILA is designed to validate the contextual coherence between instructions and the image, ensuring that only relevant edits are applied to the designated regions while ignoring non-executable instructions. For comprehensive evaluation of this new method, we constructed datasets for both single-and multi-instruction image editing, incorporating the presence of infeasible requests. Our method achieves better performance and higher semantic alignment than state-of-the-art models, demonstrating its effectiveness in handling complex instruction challenges while preserving image integrity.

large language model, machine learning, natural language, (14 more...)

Neural Information Processing Systems

Genre: Research Report > Experimental Study (1.00)

Industry: Media > Photography (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Iterative Missing Data Imputation with Model Form Adaptation and Non-Missing Feature Supervision

Neural Information Processing SystemsJun-15-2026, 12:07:24 GMT

Iterative imputation is a prevalent method for missing data imputation, where each feature is imputed iteratively by treating it as a target variable estimated from all other features. However, iterative imputation method suffers from two principal limitations: it imposes a single parametric model form to impute all features, neglecting the potential for optimal models to vary among features, which risks model misspecification; and it assumes every feature contains missing values, overlooking the potential presence of non-missing features, termed as oracle features, which are informative for imputation. To address these limitations, we propose kernel point imputation (KPI), a bi-level optimization framework for iterative missing data imputation. At the inner level, KPI adaptively learns the optimal model form for each feature within a reproducing kernel Hilbert space, addressing limitation . At the outer level, KPI utilizes oracle features as supervisory signals to iteratively refine the imputations, addressing limitation . Experiments demonstrate that KPI outperforms competitive imputation methods. Code is available at https://github.com/FMLYD/kpi.git.

data quality, imputation, machine learning, (18 more...)

Neural Information Processing Systems

Country: Asia > China (0.46)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Information Technology (0.67)

Technology:

Information Technology > Data Science > Data Quality (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Streaming Attention Approximation via Discrepancy Theory

Neural Information Processing SystemsJun-15-2026, 12:06:26 GMT

Large language models (LLMs) have achieved impressive success, but their high memory requirements present challenges for long-context token generation. In this paper we study the streaming complexity of attention approximation, a key computational primitive underlying token generation. Our main contribution is BalanceKV, a streaming algorithm for ϵ-approximating attention computations based on geometric process for selecting a balanced collection of Key and Value tokens as per Banaszczyk's vector balancing theory. We complement our algorithm with space lower bounds for streaming attention computation. Besides strong theoretical guarantees, BalanceKV exhibits empirically validated performance improvements over existing methods, both for attention approximation and end-to-end performance on various long context benchmarks.

large language model, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

MM-Agent: LLM as Agents for Real-world Mathematical Modeling Problem

Neural Information Processing SystemsJun-15-2026, 12:03:34 GMT

Mathematical modeling is a cornerstone of scientific discovery and engineering practice, enabling the translation of real-world problems into formal systems across domains such as physics, biology, and economics. Unlike mathematical reasoning, which assumes a predefined formulation, modeling requires open-ended problem analysis, abstraction, and principled formalization. While Large Language Models (LLMs) have shown strong reasoning capabilities, they fall short in rigorous model construction, limiting their utility in real-world problem-solving. To this end, we formalize the task of LLM-powered real-world mathematical modeling, where agents must analyze problems, construct domain-appropriate formulations, and generate complete end-to-end solutions. We introduce MM-Bench, a curated benchmark of 111 problems from the Mathematical Contest in Modeling (MCM/ICM) 1, spanning the years 2000 to 2025 and across ten diverse domains such as physics, biology, and economics. To tackle this task, we propose MM-Agent, an expertinspired framework that decomposes mathematical modeling into four stages: openended problem analysis, structured model formulation, computational problem solving, and report generation. Experiments on MM-Bench show that MM-Agent significantly outperforms baseline agents, achieving an 11.88% improvement over human expert solutions while requiring only 15 minutes and $0.88 per task using GPT-4o. Furthermore, under official MCM/ICM protocols, MM-Agent assisted two undergraduate teams in winning the Finalist Award (top 2.0% among 27,456 teams) in MCM/ICM 2025, demonstrating its practical effectiveness as a modeling copilot.

large language model, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country:

Asia > China (0.46)
Europe > Austria (0.28)

Genre:

Workflow (1.00)
Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Overview (0.67)

Industry:

Health & Medicine (0.92)
Education (0.67)
Leisure & Entertainment > Sports (0.46)
Information Technology > Security & Privacy (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

OmniFC: Rethinking Federated Clustering via Lossless and Secure Distance Reconstruction

Neural Information Processing SystemsJun-15-2026, 12:02:12 GMT

Federated clustering (FC) aims to discover global cluster structures across decentralized clients without sharing raw data, making privacy preservation a fundamental requirement. There are two critical challenges: (1) privacy leakage during collaboration, and (2) robustness degradation due to aggregation of proxy information from non-independent and identically distributed (Non-IID) local data, leading to inaccurate or inconsistent global clustering. Existing solutions typically rely on model-specific local proxies, which are sensitive to data heterogeneity and inherit inductive biases from their centralized counterparts, thus limiting robustness and generality. We propose Omni Federated Clustering (OmniFC), a unified and modelagnostic framework. Leveraging Lagrange coded computing, our method enables clients to share only encoded data, allowing exact reconstruction of the global distance matrix--a fundamental representation of sample relationships--without leaking private information, even under client collusion. This construction is naturally resilient to Non-IID data distributions. This approach decouples FC from model-specific proxies, providing a unified extension mechanism applicable to diverse centralized clustering methods. Theoretical analysis confirms both reconstruction fidelity and privacy guarantees, while comprehensive experiments demonstrate OmniFC's superior robustness, effectiveness, and generality across various benchmarks compared to state-of-the-art methods.

artificial intelligence, data mining, machine learning, (18 more...)

Neural Information Processing Systems

Genre: Research Report > Experimental Study (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

From Cradle to Cane: ATwo-Pass Framework for High-Fidelity Lifespan Face Aging

Neural Information Processing SystemsJun-15-2026, 12:01:15 GMT

Face aging has become a crucial task in computer vision, with applications ranging from entertainment to healthcare. However, existing methods struggle with achieving a realistic and seamless transformation across the entire lifespan, especially when handling large age gaps or extreme head poses. The core challenge lies in balancing age accuracy and identity preservation--what we refer to as the Age-ID trade-off.

artificial intelligence, machine learning, natural language, (14 more...)

Neural Information Processing Systems

Country: Asia > China (0.68)

Genre: