Goto

Collaborating Authors

 Information Technology


A Outline and notation

Neural Information Processing Systems

We start by choosing the positional embedding E and construct a Transformer network that implements quantization of the input, contextual mapping of the quantized input, and value mapping of the context ids. 1. Choose the positional embedding E according to ฮณ in Assumption 1.2.


Efficient Foundation Models for PDEs

Neural Information Processing Systems

It is based on a multiscale operator transformer, with time-conditioned layer norms that enable continuous-in-time evaluations. A novel training strategy leveraging the semi-group property of time-dependent PDEs to allow for significant scaling-up of the training data is also proposed.


Tighter Convergence Bounds for Shuffled SGD via Primal-Dual Perspective

Neural Information Processing Systems

Stochastic gradient descent (SGD) is perhaps the most prevalent optimization method in modern machine learning. Contrary to the empirical practice of sampling from the datasets without replacement and with (possible) reshuffling at each epoch, the theoretical counterpart of SGD usually relies on the assumption of sampling with replacement. It is only very recently that SGD using sampling without replacement - shuffled SGD - has been analyzed with matching upper and lower bounds. However, we observe that those bounds are too pessimistic to explain often superior empirical performance of data permutations (sampling without replacement) over vanilla counterparts (sampling with replacement) on machine learning problems. Through fine-grained analysis in the lens of primal-dual cyclic coordinate methods and the introduction of novel smoothness parameters, we present several results for shuffled SGD on smooth and non-smooth convex losses, where our novel analysis framework provides tighter convergence bounds over all popular shuffling schemes (IG, SO, and RR). Notably, our new bounds predict faster convergence than existing bounds in the literature - by up to a factor of O( n), mirroring benefits from tighter convergence bounds using component smoothness parameters in randomized coordinate methods. Lastly, we numerically demonstrate on common machine learning datasets that our bounds are indeed much tighter, thus offering a bridge between theory and practice.


ADOPT: Modified Adam Can Converge with Any ฮฒ 2 with the Optimal Rate Keno Harada The University of Tokyo

Neural Information Processing Systems

Adam is one of the most popular optimization algorithms in deep learning. However, it is known that Adam does not converge in theory unless choosing a hyperparameter, i.e., ฮฒ


84ca3f2d9d9bfca13f69b48ea63eb4a5-Paper-Conference.pdf

Neural Information Processing Systems

Motivated by the neuromorphic principles that regulate biological neural behaviors, PPLNs are ideal for processing data captured by event cameras, which are built to simulate neural activities in the human retina. We discuss how to represent the membrane potential of an artificial neuron by a parametric piecewise linear function with learnable coefficients. This design echoes the idea of building deep models from learnable parametric functions recently popularized by Kolmogorov-Arnold Networks (KANs). Experiments demonstrate the state-of-the-art performance of PPLNs in event-based and image-based vision applications, including steering prediction, human pose estimation, and motion deblurring.


9ec51f6eb240fb631a35864e13737bca-AuthorFeedback.pdf

Neural Information Processing Systems

We thank all reviewers for their careful reading of the paper, thoughtful feedback, and constructive suggestions. Each reviewer's major comments are addressed below. Reviewer 1. Thanks for your time and effort devoted to reviewing our submission, as well as for the positive comments Distinct novelties relative to Ref. [31] are: i) Algorithm: The present submission develops Following your suggestion, [31] will be discussed more thoroughly in the revised paper. We will respectfully disagree that it "makes more sense to take a decaying step-size." Due to space limitation, the focus of this paper was placed on analysis under both IID and Markovian data.


Physics-informed Neural Networks for Functional Differential Equations: Cylindrical Approximation and Its Convergence Guarantees

Neural Information Processing Systems

We propose the first learning scheme for functional differential equations (FDEs). FDEs play a fundamental role in physics, mathematics, and optimal control. However, the numerical analysis of FDEs has faced challenges due to its unrealistic computational costs and has been a long standing problem over decades. Thus, numerical approximations of FDEs have been developed, but they often oversimplify the solutions. To tackle these two issues, we propose a hybrid approach combining physics-informed neural networks (PINNs) with the cylindrical approximation. The cylindrical approximation expands functions and functional derivatives with an orthonormal basis and transforms FDEs into high-dimensional PDEs. To validate the reliability of the cylindrical approximation for FDE applications, we prove the convergence theorems of approximated functional derivatives and solutions. Then, the derived high-dimensional PDEs are numerically solved with PINNs. Through the capabilities of PINNs, our approach can handle a broader class of functional derivatives more efficiently than conventional discretization-based methods, improving the scalability of the cylindrical approximation.


How practical AI prevailed over hype at Red Hat Summit 2025

ZDNet

At the Red Hat Summit and Ansible Fest in Boston this month, much of the hype and overpromising about generative AI took a back seat to conversations about how organizations can actually build and deploy AI for their own business using their own data. Of course, this is a Red Hat Summit, and there was plenty of focus on core topics like open source, with the release of Red Hat Enterprise Linux 10, and automation and management with Ansible. But like everything nowadays, AI took up a lot of the attention at the conference, but at least much of it was refreshingly and critically practical. Also: 96% of IT pros say AI agents are a security risk, but they're deploying them anyway Rather than the more hyped AI-areas such as AI assistants, which a recent Aberdeen/ZDNet poll found to be of limited interest to a majority of users, most of the sessions and even major announcements were focused on technologies and strategies that business can use today to help them get the most out of AI while leveraging their own data in a secure and efficient manner. For example, there was a great deal of focus on inferencing, the process of running an AI model with new data to make predictions or decisions. Announcements on technologies such as vLLM and llm-d provide improved scaling and deployment options that simplify the complexities of inferencing while spreading compute loads.


EEG2Video: Towards Decoding Dynamic Visual Perception from EEG Signals

Neural Information Processing Systems

Our visual experience in daily life are dominated by dynamic change. Decoding such dynamic information from brain activity can enhance the understanding of the brain's visual processing system. However, previous studies predominately focus on reconstructing static visual stimuli. In this paper, we explore to decode dynamic visual perception from electroencephalography (EEG), a neuroimaging technique able to record brain activity with high temporal resolution (1000 Hz) for capturing rapid changes in brains. Our contributions are threefold: Firstly, we develop a large dataset recording signals from 20 subjects while they were watching 1400 dynamic video clips of 40 concepts. This dataset fills the gap in the lack of EEG-video pairs. Secondly, we annotate each video clip to investigate the potential for decoding some specific meta information (e.g., color, dynamic, human or not) from EEG. Thirdly, we propose a novel baseline EEG2Video for video reconstruction from EEG signals that better aligns dynamic movements with high temporal resolution brain signals by Seq2Seq architecture. EEG2Video achieves a 2-way accuracy of 79.8% in semantic classification tasks and 0.256 in structural similarity index (SSIM). Overall, our works takes an important step towards decoding dynamic visual perception from EEG signals.


DISP-LLM: Dimension-Independent Structural Pruning for Large Language Models

Neural Information Processing Systems

Large Language Models (LLMs) have achieved remarkable success in various natural language processing tasks, including language modeling, understanding, and generation. However, the increased memory and computational costs associated with these models pose significant challenges for deployment on resource-limited devices. Structural pruning has emerged as a promising solution to reduce the costs of LLMs without requiring post-processing steps. Prior structural pruning methods either follow the dependence of structures at the cost of limiting flexibility, or introduce non-trivial additional parameters by incorporating different projection matrices. In this work, we propose a novel approach that relaxes the constraint imposed by regular structural pruning methods and eliminates the structural dependence along the embedding dimension.