AITopics | pvq

Collaborating Authors

pvq

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

New Insights and Algorithms for Optimal Diagonal Preconditioning

Ghadimi, Saeed, Jung, Woosuk L., Sujanani, Arnesh, Torregrosa-Belén, David, Wolkowicz, Henry

arXiv.org Artificial IntelligenceSep-30-2025

Preconditioning (scaling) is essential in many areas of mathematics, and in particular in optimization. In this work, we study the problem of finding an optimal diagonal preconditioner. We focus on minimizing two different notions of condition number: the classical, worst-case type, $κ$-condition number, and the more averaging motivated $ω$-condition number. We provide affine based pseudoconvex reformulations of both optimization problems. The advantage of our formulations is that the gradient of the objective is inexpensive to compute and the optimization variable is just an $n\times 1$ vector. We also provide elegant characterizations of the optimality conditions of both problems. We develop a competitive subgradient method, with convergence guarantees, for $κ$-optimal diagonal preconditioning that scales much better and is more efficient than existing SDP-based approaches. We also show that the preconditioners found by our subgradient method leads to better PCG performance for solving linear systems than other approaches. Finally, we show the interesting phenomenon that we can apply the $ω$-optimal preconditioner to the exact $κ$-optimally diagonally preconditioned matrix $A$ and get consistent, significantly improved convergence results for PCG methods.

artificial intelligence, machine learning, optimization problem, (17 more...)

arXiv.org Artificial Intelligence

2509.23439

Country: North America > Canada > Ontario (0.28)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.86)

Add feedback

Pyramid Vector Quantization for LLMs

van der Ouderaa, Tycho F. A., Croci, Maximilian L., Hilmkil, Agrin, Hensman, James

arXiv.org Artificial IntelligenceDec-4-2024

Recent works on compression of large language models (LLM) using quantization considered reparameterizing the architecture such that weights are distributed on the sphere. This demonstratively improves the ability to quantize by increasing the mathematical notion of coherence, resulting in fewer weight outliers without affecting the network output. In this work, we aim to further exploit this spherical geometry of the weights when performing quantization by considering Pyramid Vector Quantization (PVQ) for large language models. Arranging points evenly on the sphere is notoriously difficult, especially in high dimensions, and in case approximate solutions exists, representing points explicitly in a codebook is typically not feasible due to its additional memory cost. Instead, PVQ uses a fixed integer lattice on the sphere by projecting points onto the 1-sphere, which allows for efficient encoding and decoding without requiring an explicit codebook in memory. To obtain a practical algorithm, we propose to combine PVQ with scale quantization for which we derive theoretically optimal quantizations, under empirically verified assumptions. Further, we extend pyramid vector quantization to use Hessian information to minimize quantization error under expected feature activations, instead of only relying on weight magnitudes. Experimentally, we achieves state-of-the-art quantization performance with pareto-optimal trade-off between performance and bits per weight and bits per activation, compared to compared methods. On weight-only, we find that we can quantize a Llama-3 70B model to 3.25 bits per weight and retain 98\% accuracy on downstream tasks.

empirical, groupsize, weight layer, (15 more...)

arXiv.org Artificial Intelligence

2410.16926

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

GPU-Accelerated Counterfactual Regret Minimization

Kim, Juho

arXiv.org Artificial IntelligenceSep-6-2024

Counterfactual regret minimization is a family of algorithms of no-regret learning dynamics capable of solving large-scale imperfect information games. We propose implementing this algorithm as a series of dense and sparse matrix and vector operations, thereby making it highly parallelizable for a graphical processing unit, at a cost of higher memory usages. Our experiments show that our implementation performs up to about 352.5 times faster than OpenSpiel's Python implementation and up to about 22.2 times faster than OpenSpiel's C++ implementation and the speedup becomes more pronounced as the size of the game being solved grows. Counterfactual regret minimization (CFR) (Zinkevich et al., 2007) is a family of algorithms of noregret learning dynamics capable of solving large-scale imperfect information games. Its variants dominated the development of AI agents for large imperfect information games like Poker (Tammelin et al., 2015; Moravčík et al., 2017; Brown & Sandholm, 2018; 2019b) and The Resistance: Avalon (Serrino et al., 2019) and were components of ReBeL (Brown et al., 2020) and student of games (Schmid et al., 2023).

implementation, openspiel, pvq, (16 more...)

arXiv.org Artificial Intelligence

2408.14778

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > Texas (0.04)

Genre: Research Report (0.40)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Hardware (1.00)
Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.34)

Add feedback

A low-rank non-convex norm method for multiview graph clustering

Zahir, Alaeddine, Jbilou, Khalide, Ratnani, Ahmed

arXiv.org Artificial IntelligenceDec-18-2023

This study introduces a novel technique for multi-view clustering known as the "Consensus Graph-Based Multi-View Clustering Method Using Low-Rank Non-Convex Norm" (CGMVC-NC). Multi-view clustering is a challenging task in machine learning as it requires the integration of information from multiple data sources or views to cluster data points accurately. The suggested approach makes use of the structural characteristics of multi-view data tensors, introducing a non-convex tensor norm to identify correlations between these views. In contrast to conventional methods, this approach demonstrates superior clustering accuracy across several benchmark datasets. Despite the non-convex nature of the tensor norm used, the proposed method remains amenable to efficient optimization using existing algorithms. The approach provides a valuable tool for multi-view data analysis and has the potential to enhance our understanding of complex systems in various fields. Further research can explore the application of this method to other types of data and extend it to other machine-learning tasks.

graph, matrix, pvq, (15 more...)

arXiv.org Artificial Intelligence

2312.11157

Country:

North America > United States > Wisconsin > Dane County > Madison (0.04)
Europe > France (0.04)
Africa > Senegal > Kolda Region > Kolda (0.04)
Africa > Middle East > Morocco (0.04)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Weight Compander: A Simple Weight Reparameterization for Regularization

Cakaj, Rinor, Mehnert, Jens, Yang, Bin

arXiv.org Artificial IntelligenceJun-29-2023

Regularization is a set of techniques that are used to improve the generalization ability of deep neural networks. In this paper, we introduce weight compander (WC), a novel effective method to improve generalization by reparameterizing each weight in deep neural networks using a nonlinear function. It is a general, intuitive, cheap and easy to implement method, which can be combined with various other regularization techniques. Large weights in deep neural networks are a sign of a more complex network that is overfitted to the training data. Moreover, regularized networks tend to have a greater range of weights around zero with fewer weights centered at zero. We introduce a weight reparameterization function which is applied to each weight and implicitly reduces overfitting by restricting the magnitude of the weights while forcing them away from zero at the same time. This leads to a more democratic decision-making in the network. Firstly, individual weights cannot have too much influence in the prediction process due to the restriction of their magnitude. Secondly, more weights are used in the prediction process, since they are forced away from zero during the training. This promotes the extraction of more features from the input data and increases the level of weight redundancy, which makes the network less sensitive to statistical differences between training and test data. We extend our method to learn the hyperparameters of the introduced weight reparameterization function. This avoids hyperparameter search and gives the network the opportunity to align the weight reparameterization with the training progress. We show experimentally that using weight compander in addition to standard regularization methods improves the performance of neural networks.

artificial intelligence, machine learning, weight compander, (15 more...)

arXiv.org Artificial Intelligence

2306.16993

Country:

North America > United States > California > Los Angeles County > Long Beach (0.14)
North America > Canada > Ontario > Toronto (0.14)
Europe > Germany > Baden-Württemberg > Stuttgart Region > Stuttgart (0.04)
(11 more...)

Genre: Research Report (0.83)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

On Identifiability of Conditional Causal Effects

Kivva, Yaroslav, Etesami, Jalal, Kiyavash, Negar

arXiv.org Artificial IntelligenceJun-19-2023

We address the problem of identifiability of an arbitrary conditional causal effect given both the causal graph and a set of any observational and/or interventional distributions of the form $Q[S]:=P(S|do(V\setminus S))$, where $V$ denotes the set of all observed variables and $S\subseteq V$. We call this problem conditional generalized identifiability (c-gID in short) and prove the completeness of Pearl's $do$-calculus for the c-gID problem by providing sound and complete algorithm for the c-gID problem. This work revisited the c-gID problem in Lee et al. [2020], Correa et al. [2021] by adding explicitly the positivity assumption which is crucial for identifiability. It extends the results of [Lee et al., 2019, Kivva et al., 2022] on general identifiability (gID) which studied the problem for unconditional causal effects and Shpitser and Pearl [2006b] on identifiability of conditional causal effects given merely the observational distribution $P(\mathbf{V})$ as our algorithm generalizes the algorithms proposed in [Kivva et al., 2022] and [Shpitser and Pearl, 2006b].

artificial intelligence, machine learning, realization, (17 more...)

arXiv.org Artificial Intelligence

2306.11755

Country:

Europe > Switzerland > Vaud > Lausanne (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
North America > United States > Virginia > Arlington County > Arlington (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.81)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.67)
Information Technology > Artificial Intelligence > Machine Learning (0.67)

Add feedback

Time-uniform confidence bands for the CDF under nonstationarity

Mineiro, Paul, Howard, Steven R.

arXiv.org Artificial IntelligenceFeb-27-2023

What would have happened if I had acted differently? Although this question is as old as time itself, successful companies have recently embraced this question via counterfactual estimation of outcomes from the exhaust of their controlled experimentation platforms, e.g., based upon A/B testing or contextual bandits. These experiments are run in the real (digital) world, which is rich enough to demand statistical techniques that are non-asymptotic, non-parametric, and non-stationary. Although recent advances admit characterizing counterfactual average outcomes in this general setting, counterfactually estimating a complete distribution of outcomes is heretofore only possible with additional assumptions. Nonethless, the practical importance of this problem has motivated multiple solutions: see Table 1 for a summary, and Section 5 for complete discussion. Intriguingly, this problem is provably impossible in the data dependent setting without additional assumptions. Rakhlin et al. [2015] Consequently, our bounds always achieve non-asymptotic coverage, but may converge to zero width slowly or not at all, depending on the hardness of the instance. We call this design principle AVAST (Always Valid And Sometimes Trivial). In pursuit of our ultimate goal, we derive factual distribution estimators which are useful for estimating the complete distribution of outcomes from direct experience.

artificial intelligence, confidence sequence, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2302.14248

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > Experimental Study (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.34)

Add feedback

Sampling Streaming Data with Parallel Vector Quantization -- PVQ

Sultan, Mujahid

arXiv.org Artificial IntelligenceOct-4-2022

Accumulation of corporate data in the cloud has attracted more enterprise applications to the cloud creating data gravity. As a consequence, network traffic has become more cloud centric. This increase in cloud centric traffic poses new challenges in designing learning systems for streaming data due to class imbalance. The number of classes plays a vital role in the accuracy of the classifiers built from the data streams. In this paper, we present a vector quantization-based sampling method, which substantially reduces the class imbalance in data streams. We demonstrate its effectiveness by conducting experiments on network traffic and anomaly dataset with commonly used ML model building methods; Multilayered Perceptron on TensorFlow backend, Support Vector Machines, K-Nearest Neighbour, and Random Forests. We built models using parallel processing, batch processing, and randomly selecting samples. We show that the accuracy of classification models improves when the data streams are pre-processed with our method. We used out of the box hyper-parameters of these classifiers and auto sklearn for hyperparameter optimization.

artificial intelligence, dataset, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2210.01792

Country:

North America > United States > New York (0.04)
North America > United States > Indiana > Marion County > Lawrence (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (0.68)

Add feedback

Kernel Mode Decomposition and programmable/interpretable regression networks

Owhadi, Houman, Scovel, Clint, Yoo, Gene Ryan

arXiv.org Machine LearningJul-19-2019

Mode decomposition is a prototypical pattern recognition problem that can be addressed from the (a priori distinct) perspectives of numerical approximation, statistical inference and deep learning. Could its analysis through these combined perspectives be used as a Rosetta stone for deciphering mechanisms at play in deep learning? Motivated by this question we introduce programmable and interpretable regression networks for pattern recognition and address mode decomposition as a prototypical problem. The programming of these networks is achieved by assembling elementary modules decomposing and recomposing kernels and data. These elementary steps are repeated across levels of abstraction and interpreted from the equivalent perspectives of optimal recovery, game theory and Gaussian process regression (GPR). The prototypical mode/kernel decomposition module produces an optimal approximation $(w_1,w_2,\cdots,w_m)$ of an element $(v_1,v_2,\ldots,v_m)$ of a product of Hilbert subspaces of a common Hilbert space from the observation of the sum $v:=v_1+\cdots+v_m$. The prototypical mode/kernel recomposition module performs partial sums of the recovered modes $w_i$ based on the alignment between each recovered mode $w_i$ and the data $v$. We illustrate the proposed framework by programming regression networks approximating the modes $v_i= a_i(t)y_i\big(\theta_i(t)\big)$ of a (possibly noisy) signal $\sum_i v_i$ when the amplitudes $a_i$, instantaneous phases $\theta_i$ and periodic waveforms $y_i$ may all be unknown and show near machine precision recovery under regularity and separation assumptions on the instantaneous amplitudes $a_i$ and frequencies $\dot{\theta}_i$. The structure of some of these networks share intriguing similarities with convolutional neural networks while being interpretable, programmable and amenable to theoretical analysis.

artificial intelligence, machine learning, ptq, (19 more...)

arXiv.org Machine Learning

1907.08592

Country: North America > United States (0.28)

Genre:

Overview (0.45)
Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.74)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Add feedback

Markov Properties for Graphical Models with Cycles and Latent Variables

Forré, Patrick, Mooij, Joris M.

arXiv.org Machine LearningOct-24-2017

We investigate probabilistic graphical models that allow for both cycles and latent variables. For this we introduce directed graphs with hyperedges (HEDGes), generalizing and combining both marginalized directed acyclic graphs (mDAGs) that can model latent (dependent) variables, and directed mixed graphs (DMGs) that can model cycles. We define and analyse several different Markov properties that relate the graphical structure of a HEDG with a probability distribution on a corresponding product space over the set of nodes, for example factorization properties, structural equations properties, ordered/local/global Markov properties, and marginal versions of these. The various Markov properties for HEDGes are in general not equivalent to each other when cycles or hyperedges are present, in contrast with the simpler case of directed acyclic graphical (DAG) models (also known as Bayesian networks). We show how the Markov properties for HEDGes - and thus the corresponding graphical Markov models - are logically related to each other.

artificial intelligence, bayesian inference, machine learning, (16 more...)

arXiv.org Machine Learning

1710.08775

Country:

Europe (1.00)
North America > United States > California (0.67)

Genre: Research Report (0.63)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.48)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.34)

Add feedback