AITopics

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.60)

Neural Information Processing SystemsDec-24-2025, 08:43:08 GMT

Qimera: Data-free Quantization with Synthetic Boundary Supporting Samples

Model quantization is known as a promising method to compress deep neural networks, especially for inferences on lightweight mobile or edge devices. However, model quantization usually requires access to the original training data to maintain the accuracy of the full-precision models, which is often infeasible in real-world scenarios for security and privacy issues.A popular approach to perform quantization without access to the original data is to use synthetically generated samples, based on batch-normalization statistics or adversarial learning.However, the drawback of such approaches is that they primarily rely on random noise input to the generator to attain diversity of the synthetic samples. We find that this is often insufficient to capture the distribution of the original data, especially around the decision boundaries.To this end, we propose Qimera, a method that uses superposed latent embeddings to generate synthetic boundary supporting samples.For the superposed embeddings to better reflect the original distribution, we also propose using an additional disentanglement mapping layer and extracting information from the full-precision model.The experimental results show that Qimera achieves state-of-the-art performances for various settings on data-free quantization.

data-free quantization, name change, synthetic boundary, (6 more...)

Genre: Research Report (0.60)

Industry: Information Technology > Security & Privacy (0.60)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.60)

Neural Information Processing SystemsDec-24-2025, 01:44:02 GMT

Learning from Few Samples: Transformation-Invariant SVMs with Composition and Locality at Multiple Scales

Motivated by the problem of learning with small sample sizes, this paper shows how to incorporate into support-vector machines (SVMs) those properties that have made convolutional neural networks (CNNs) successful. Particularly important is the ability to incorporate domain knowledge of invariances, e.g., translational invariance of images. Kernels based on the \textit{maximum} similarity over a group of transformations are not generally positive definite. Perhaps it is for this reason that they have not been studied theoretically. We address this lacuna and show that positive definiteness indeed holds \textit{with high probability} for kernels based on the maximum similarity in the small training sample set regime of interest, and that they do yield the best results in that regime. We also show how additional properties such as their ability to incorporate local features at multiple spatial scales, e.g., as done in CNNs through max pooling, and to provide the benefits of composition through the architecture of multiple layers, can also be embedded into SVMs. We verify through experiments on widely available image sets that the resulting SVMs do provide superior accuracy in comparison to well-established deep neural network benchmarks for small sample sizes.

composition and locality, name change, transformation-invariant svm, (10 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.60)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.60)

Neural Information Processing SystemsMay-27-2025, 11:07:54 GMT

Improving Adversarial Robust Fairness via Anti-Bias Soft Label Distillation

Adversarial Training (AT) has been widely proved to be an effective method to improve the adversarial robustness against adversarial examples for Deep Neural Networks (DNNs). As a variant of AT, Adversarial Robustness Distillation (ARD) has demonstrated its superior performance in improving the robustness of small student models with the guidance of large teacher models. However, both AT and ARD encounter the robust fairness problem: these models exhibit strong robustness when facing part of classes (easy class), but weak robustness when facing others (hard class). In this paper, we give an in-depth analysis of the potential factors and argue that the smoothness degree of samples' soft labels for different classes (i.e., hard class or easy class) will affect the robust fairness of DNNs from both empirical observation and theoretical analysis. Based on the above finding, we propose an Anti-Bias Soft Label Distillation (ABSLD) method to mitigate the adversarial robust fairness problem within the framework of Knowledge Distillation (KD). Specifically, ABSLD adaptively reduces the student's error risk gap between different classes to achieve fairness by adjusting the class-wise smoothness degree of samples' soft labels during the training process, and the smoothness degree of soft labels is controlled by assigning different temperatures in KD to different classes.

artificial intelligence, distillation, machine learning, (8 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.61)

arXiv.org Artificial IntelligenceMay-13-2025

Attention Mechanisms in Dynamical Systems: A Case Study with Predator-Prey Models

Balaban, David

Attention mechanisms are widely used in artificial intelligence to enhance performance and interpretability. In this paper, we investigate their utility in modeling classical dynamical systems -- specifically, a noisy predator-prey (Lotka-Volterra) system. We train a simple linear attention model on perturbed time-series data to reconstruct system trajectories. Remarkably, the learned attention weights align with the geometric structure of the Lyapunov function: high attention corresponds to flat regions (where perturbations have small effect), and low attention aligns with steep regions (where perturbations have large effect). We further demonstrate that attention-based weighting can serve as a proxy for sensitivity analysis, capturing key phase-space properties without explicit knowledge of the system equations. These results suggest a novel use of AI-derived attention for interpretable, data-driven analysis and control of nonlinear systems. For example our framework could support future work in biological modeling of circadian rhythms, and interpretable machine learning for dynamical environments.

artificial intelligence, attention mechanism, machine learning, (15 more...)

2505.06503

Genre: Research Report > New Finding (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.95)

Shi, Yaozhong, Gao, Angela F., Ross, Zachary E., Azizzadenesheli, Kamyar

Universal Functional Regression with Neural Operator Flows

arXiv.org Machine LearningApr-3-2024

The notion of inference on function spaces is essential to the physical sciences and engineering, where the governing equations are frequently partial differential equations (PDEs) describing the evolution of functions in space and time. In particular, it is often desirable to infer the values of a function everywhere in a physical domain given a sparse number of observation points. There are numerous types of problems in which functional regression plays an important role, such as inverse problems, time series forecasting, data imputation/assimilation. Functional regression problems can be particularly challenging for real world datasets because the underlying stochastic process is often unknown. Much of the work on functional regression and inference has relied on Gaussian processes (GPs) (Rasmussen and Williams, 2006), a specific type of stochastic process in which any finite collection of points has a multivariate Gaussian distribution. Some of the earliest applications focused on analyzing geological data, such as the locations of valuable ore deposits, to identify where new deposits might be found (Chiles and Delfiner, 2012). GP regression (GPR) provides several advantages for functional inference including robustness and mathematical tractability for various problems. This has led to the use of GPR in an assortment of scientific and engineering fields, where precision and reliability in predictions and inferences can significantly impact outcomes (Deringer et al., 2021; Aigrain and Foreman-Mackey, 2023). Despite widespread adoption, the assumption of a GP prior for functional inference problems can be rather limiting, particularly in scenarios where the data exhibit heavy-tailed or multimodal distributions, e.g.

artificial intelligence, bayesian inference, machine learning, (17 more...)

arXiv.org Machine Learning

2404.02986

Country:

South America > Chile (0.24)
North America > United States > California (0.14)
North America > United States > Michigan (0.14)
Asia > Japan (0.14)

Genre: Research Report (1.00)

Industry:

Energy > Oil & Gas > Upstream (1.00)
Materials (0.68)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

arXiv.org Artificial IntelligenceDec-25-2023

GenCast: Diffusion-based ensemble forecasting for medium-range weather

Price, Ilan, Sanchez-Gonzalez, Alvaro, Alet, Ferran, Ewalds, Timo, El-Kadi, Andrew, Stott, Jacklynn, Mohamed, Shakir, Battaglia, Peter, Lam, Remi, Willson, Matthew

Probabilistic weather forecasting is critical for decision-making in high-impact domains such as flood forecasting, energy system planning or transportation routing, where quantifying the uncertainty of a forecast -- including probabilities of extreme events -- is essential to guide important cost-benefit trade-offs and mitigation measures. Traditional probabilistic approaches rely on producing ensembles from physics-based models, which sample from a joint distribution over spatio-temporally coherent weather trajectories, but are expensive to run. An efficient alternative is to use a machine learning (ML) forecast model to generate the ensemble, however state-of-the-art ML forecast models for medium-range weather are largely trained to produce deterministic forecasts which minimise mean-squared-error. Despite improving skills scores, they lack physical consistency, a limitation that grows at longer lead times and impacts their ability to characterize the joint distribution. We introduce GenCast, a ML-based generative model for ensemble weather forecasting, trained from reanalysis data. It forecasts ensembles of trajectories for 84 weather variables, for up to 15 days at 1 degree resolution globally, taking around a minute per ensemble member on a single Cloud TPU v4 device. We show that GenCast is more skillful than ENS, a top operational ensemble forecast, for more than 96\% of all 1320 verification targets on CRPS and Ensemble-Mean RMSE, while maintaining good reliability and physically consistent power spectra. Together our results demonstrate that ML-based probabilistic weather forecasting can now outperform traditional ensemble systems at 1 degree, opening new doors to skillful, fast weather forecasts that are useful in key applications.

artificial intelligence, machine learning, modeling & simulation, (14 more...)

2312.15796

Country:

Europe > United Kingdom > England (0.14)
North America > United States (0.14)

Genre: Research Report > New Finding (0.54)

Industry: Energy > Oil & Gas > Upstream (0.68)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

arXiv.org Artificial IntelligenceNov-18-2023

Provably Safe Reinforcement Learning: Conceptual Analysis, Survey, and Benchmarking

Krasowski, Hanna, Thumm, Jakob, Müller, Marlon, Schäfer, Lukas, Wang, Xiao, Althoff, Matthias

Ensuring the safety of reinforcement learning (RL) algorithms is crucial to unlock their potential for many real-world tasks. However, vanilla RL and most safe RL approaches do not guarantee safety. In recent years, several methods have been proposed to provide hard safety guarantees for RL, which is essential for applications where unsafe actions could have disastrous consequences. Nevertheless, there is no comprehensive comparison of these provably safe RL methods. Therefore, we introduce a categorization of existing provably safe RL methods, present the conceptual foundations for both continuous and discrete action spaces, and empirically benchmark existing methods. We categorize the methods based on how they adapt the action: action replacement, action projection, and action masking. Our experiments on an inverted pendulum and a quadrotor stabilization task indicate that action replacement is the best-performing approach for these applications despite its comparatively simple realization. Furthermore, adding a reward penalty, every time the safety verification is engaged, improved training performance in our experiments. Finally, we provide practical guidance on selecting provably safe RL approaches depending on the safety specification, RL algorithm, and type of action space.

action replacement, machine learning, reinforcement learning, (16 more...)

2205.0675

Country:

Europe > Germany (0.14)
North America > United States (0.14)

Genre: Research Report > New Finding (0.87)

Industry:

Transportation > Ground > Road (1.00)
Automobiles & Trucks (1.00)
Energy > Oil & Gas (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

arXiv.org Artificial IntelligenceFeb-24-2023

Adapting Pre-trained Language Models for Quantum Natural Language Processing

Li, Qiuchi, Wang, Benyou, Zhu, Yudong, Lioma, Christina, Liu, Qun

The emerging classical-quantum transfer learning paradigm has brought a decent performance to quantum computational models in many tasks, such as computer vision, by enabling a combination of quantum models and classical pre-trained neural networks. However, using quantum computing with pre-trained models has yet to be explored in natural language processing (NLP). Due to the high linearity constraints of the underlying quantum computing infrastructures, existing Quantum NLP models are limited in performance on real tasks. We fill this gap by pretraining a sentence state with complex-valued BERT-like architecture, and adapting it to the classical-quantum transfer learning scheme for sentence classification. On quantum simulation experiments, the pre-trained representation can bring 50% to 60% increases to the capacity of end-to-end quantum models. Quantum computing combines quantum mechanics and computer science. The concepts of superposition and entanglement bring inherent parallelism between qubits, the basic computational element, which endow enormous computational power to quantum devices. Classical-quantum transfer learning (Mari et al., 2020) has emerged as an appealing quantum machine learning technique.

artificial intelligence, machine learning, natural language, (21 more...)

2302.13812

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > Denmark > Capital Region > Copenhagen (0.04)
North America > United States > Wisconsin > Dane County > Madison (0.04)
(6 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

#artificialintelligenceApr-6-2022, 11:10:24 GMT

GitHub - deel-ai/xplique: 👋 Xplique is a Neural Networks Explainability Toolbox

Xplique (pronounced \ɛks.plik\) is a Python toolkit dedicated to explainability, currently based on Tensorflow. The goal of this library is to gather the state of the art of Explainable AI to help you understand your complex neural network models. The library is composed of several modules, the Attributions Methods module implements various methods (e.g Saliency, Grad-CAM, Integrated-Gradients...), with explanations, examples and links to official papers. The Feature Visualization module allows to see how neural networks build their understanding of images by finding inputs that maximize neurons, channels, layers or compositions of these elements. The Concepts module allows you to extract human concepts from a model and to test their usefulness with respect to a class.

library, neural network explainability toolbox, xplique, (10 more...)

#artificialintelligence

Country: Europe > France > Occitanie > Haute-Garonne > Toulouse (0.06)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)