Goto

Collaborating Authors

 computing


Analog In-memory Training on General Non-ideal Resistive Elements: The Impact of Response Functions

Neural Information Processing Systems

As the economic and environmental costs of training and deploying large vision or language models increase dramatically, analog in-memory computing (AIMC) emerges as a promising energy-efficient solution. However, the training perspective, especially its training dynamics, is underexplored. In AIMC hardware, the trainable weights are represented by the conductance of resistive elements and updated using consecutive electrical pulses. While the conductance changes by a constant in response to each pulse, in reality, the change is scaled by asymmetric and non-linear response functions, leading to a non-ideal training dynamics. This paper provides a theoretical foundation for gradient-based training on AIMC hardware with nonideal response functions.


Near-Optimal Quantum Algorithms for Computing (Coarse) Correlated Equilibria of General-Sum Games

Neural Information Processing Systems

Computing Nash equilibria of zero-sum games in classical and quantum settings is extensively studied. For general-sum games, computing Nash equilibria is PPAD-hard and the computing of a more general concept called correlated equilibria has been widely explored in game theory. In this paper, we initiate the study of quantum algorithms for computing $\varepsilon$-approximate correlated equilibria (CE) and coarse correlated equilibria (CCE) in multi-player normal-form games. Our approach utilizes quantum improvements to the multi-scale Multiplicative Weight Update (MWU) method for CE calculations, achieving a query complexity of $\tilde{O}(m\sqrt{n})$ for fixed $\varepsilon$. For CCE, we extend techniques from quantum algorithms for zero-sum games to multi-player settings, achieving query complexity $\tilde{O}(m\sqrt{n}/\varepsilon^{2.5})$. Both algorithms demonstrate a near-optimal scaling in the number of players $m$ and actions $n$, as confirmed by our quantum query lower bounds.


FlowMoE: A Scalable Pipeline Scheduling Framework for Distributed Mixture-of-Experts Training

Neural Information Processing Systems

The parameter size of modern large language models (LLMs) can be scaled up to the trillion-level via the sparsely-activated Mixture-of-Experts (MoE) technique to avoid excessive increase of the computational costs. To further improve training efficiency, pipelining computation and communication has become a promising solution for distributed MoE training. However, existing work primarily focuses on scheduling tasks within the MoE layer, such as expert computing and all-to-all (A2A) communication, while neglecting other key operations including multi-head attention (MHA) computing, gating, and all-reduce communication. In this paper, we propose FlowMoE, a scalable framework for scheduling multi-type task pipelines.


QCircuitBench: A Large-Scale Dataset for Benchmarking Quantum Algorithm Design

Neural Information Processing Systems

Quantum computing is an emerging field recognized for the significant speedup it offers over classical computing through quantum algorithms. However, designing and implementing quantum algorithms pose challenges due to the complex nature of quantum mechanics and the necessity for precise control over quantum states. Despite the significant advancements in AI, there has been a lack of datasets specifically tailored for this purpose. In this work, we introduce QCircuitBench, the first benchmark dataset designed to evaluate AI's capability in designing and implementing quantum algorithms in the form of quantum circuit codes. Unlike using AI for writing traditional codes, this task is fundamentally more complicated due to highly flexible design space. Our key contributions include: 1. A general framework which formulates the key features of quantum algorithm design task for Large Language Models.2. Implementation for quantum algorithms from basic primitives to advanced applications, spanning 3 task suites, 25 algorithms, and 120,290 data points.3.


Half of AI health answers are wrong even though they sound convincing โ€“ new study

AIHub

Imagine you have just been diagnosed with early-stage cancer and, before your next appointment, you type a question into an AI chatbot: "Which alternative clinics can successfully treat cancer?" Within seconds you get a polished, footnoted answer that reads like it was written by a doctor. Except some of the claims are unfounded, the footnotes lead nowhere, and the chatbot never once suggests that the question itself might be the wrong one to ask. That scenario is not hypothetical. It is, roughly speaking, what a team of seven researchers found when they put five of the world's most popular chatbots through a systematic health-information stress test. The results are published in BMJ Open .


Report on foundation model impacts released

AIHub

Partnership on AI has published a progress report on post-deployment governance practices pertaining to foundation models. The document, entitled " 2026 Transparency Report on Foundation Model Impacts ", measures the progress of 13 foundation model providers* in publicly documenting the impacts of their foundation models. In carrying out their analysis, authors Jacob Pratt and Albert Tanjaya reviewed more than 150 papers, articles, websites, and reports. For assessment, these four practices were broken down into 19 processes, or activities, that support how foundation model providers adopt practices. Although several leading organizations are defining what information to share and how, the rest are slow in adopting information-sharing practices.



Distributed Deep Learning In Open Collaborations

Neural Information Processing Systems

Modern deep learning applications require increasingly more compute to train state-of-the-art models. To address this demand, large corporations and institutions use dedicated High-Performance Computing clusters, whose construction and maintenance are both environmentally costly and well beyond the budget of most organizations. As a result, some research directions become the exclusive domain of a few large industrial and even fewer academic actors. To alleviate this disparity, smaller groups may pool their computational resources and run collaborative experiments that benefit all participants. This paradigm, known as grid-or volunteer computing, has seen successful applications in numerous scientific areas. However, using this approach for machine learning is difficult due to high latency, asymmetric bandwidth, and several challenges unique to volunteer computing. In this work, we carefully analyze these constraints and propose a novel algorithmic framework designed specifically for collaborative training. We demonstrate the effectiveness of our approach for SwAV and ALBERT pretraining in realistic conditions and achieve performance comparable to traditional setups at a fraction of the cost. Finally, we provide a detailed report of successful collaborative language model pretraining with 40 participants.


Differentiable Analog Quantum Computing for Optimization and Control

Neural Information Processing Systems

We formulate the first differentiable analog quantum computing framework with specific parameterization design at the analog signal (pulse) level to better exploit near-term quantum devices via variational methods. We further propose a scalable approach to estimate the gradients of quantum dynamics using a forward pass with Monte Carlo sampling, which leads to a quantum stochastic gradient descent algorithm for scalable gradient-based training in our framework. Applying our framework to quantum optimization and control, we observe a significant advantage of differentiable analog quantum computing against SOTAs based on parameterized digital quantum circuits by orders of magnitude.