AITopics | Tonin, Francesco

Plotting

Tonin, Francesco

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Quantum-PEFT: Ultra parameter-efficient fine-tuning

Koike-Akino, Toshiaki, Tonin, Francesco, Wu, Yongtao, Wu, Frank Zhengqing, Candogan, Leyla Naz, Cevher, Volkan

arXiv.org Artificial IntelligenceMar-7-2025

This paper introduces Quantum-PEFT that leverages quantum computations for parameter-efficient fine-tuning (PEFT). Unlike other additive PEFT methods, such as low-rank adaptation (LoRA), Quantum-PEFT exploits an underlying full-rank yet surprisingly parameter efficient quantum unitary parameterization. With the use of Pauli parameterization, the number of trainable parameters grows only logarithmically with the ambient dimension, as opposed to linearly as in LoRA-based PEFT methods. Quantum-PEFT achieves vanishingly smaller number of trainable parameters than the lowest-rank LoRA as dimensions grow, enhancing parameter efficiency while maintaining a competitive performance. We apply Quantum-PEFT to several transfer learning benchmarks in language and vision, demonstrating significant advantages in parameter efficiency.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2503.05431

Country:

North America > United States (0.28)
Europe (0.28)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Hardware (0.94)

Add feedback

Linear Attention for Efficient Bidirectional Sequence Modeling

Afzal, Arshia, Rocamora, Elias Abad, Candogan, Leyla Naz, Puigdemont, Pol, Tonin, Francesco, Wu, Yongtao, Shoaran, Mahsa, Cevher, Volkan

arXiv.org Artificial IntelligenceFeb-22-2025

Transformers with linear attention enable fast and parallel training. Moreover, they can be formulated as Recurrent Neural Networks (RNNs), for efficient linear-time inference. While extensively evaluated in causal sequence modeling, they have yet to be extended to the bidirectional setting. This work introduces the LION framework, establishing new theoretical foundations for linear transformers in bidirectional sequence modeling. LION constructs a bidirectional RNN equivalent to full Linear Attention. This extends the benefits of linear transformers: parallel training, and efficient inference, into the bidirectional setting. Using LION, we cast three linear transformers to their bidirectional form: LION-LIT, the bidirectional variant corresponding to (Katharopoulos et al., 2020); LION-D, extending RetNet (Sun et al., 2023); and LION-S, a linear transformer with a stable selective mask inspired by selectivity of SSMs (Dao & Gu, 2024). Replacing the attention block with LION (-LIT, -D, -S) achieves performance on bidirectional tasks that approaches that of Transformers and State-Space Models (SSMs), while delivering significant improvements in training speed. Our implementation is available in http://github.com/LIONS-EPFL/LION.

artificial intelligence, ion, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2502.16249

Genre: Research Report (0.81)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Vision (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

Membership Inference Attacks against Large Vision-Language Models

Li, Zhan, Wu, Yongtao, Chen, Yihang, Tonin, Francesco, Rocamora, Elias Abad, Cevher, Volkan

arXiv.org Artificial IntelligenceNov-5-2024

Large vision-language models (VLLMs) exhibit promising capabilities for processing multi-modal tasks across various application scenarios. However, their emergence also raises significant data security concerns, given the potential inclusion of sensitive information, such as private photos and medical records, in their training datasets. Detecting inappropriately used data in VLLMs remains a critical and unresolved issue, mainly due to the lack of standardized datasets and suitable methodologies. In this study, we introduce the first membership inference attack (MIA) benchmark tailored for various VLLMs to facilitate training data detection. Then, we propose a novel MIA pipeline specifically designed for token-level image detection. Lastly, we present a new metric called MaxR\'enyi-K%, which is based on the confidence of the model output and applies to both text and image data. We believe that our work can deepen the understanding and methodology of MIAs in the context of VLLMs. Our code and datasets are available at https://github.com/LIONS-EPFL/VL-MIA.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2411.02902

Country:

North America > United States (0.28)
Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report > New Finding (0.66)

Industry:

Information Technology > Security & Privacy (1.00)
Transportation > Ground > Road (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
(4 more...)

Add feedback

Learning in Feature Spaces via Coupled Covariances: Asymmetric Kernel SVD and Nystr\"om method

Tao, Qinghua, Tonin, Francesco, Lambert, Alex, Chen, Yingyi, Patrinos, Panagiotis, Suykens, Johan A. K.

arXiv.org Machine LearningJun-12-2024

In contrast with Mercer kernel-based approaches as used e.g., in Kernel Principal Component Analysis (KPCA), it was previously shown that Singular Value Decomposition (SVD) inherently relates to asymmetric kernels and Asymmetric Kernel Singular Value Decomposition (KSVD) has been proposed. However, the existing formulation to KSVD cannot work with infinite-dimensional feature mappings, the variational objective can be unbounded, and needs further numerical evaluation and exploration towards machine learning. In this work, i) we introduce a new asymmetric learning paradigm based on coupled covariance eigenproblem (CCE) through covariance operators, allowing infinite-dimensional feature maps. The solution to CCE is ultimately obtained from the SVD of the induced asymmetric kernel matrix, providing links to KSVD. ii) Starting from the integral equations corresponding to a pair of coupled adjoint eigenfunctions, we formalize the asymmetric Nystr\"om method through a finite sample approximation to speed up training. iii) We provide the first empirical evaluations verifying the practical utility and benefits of KSVD and compare with methods resorting to symmetrization or linear SVD across multiple tasks.

artificial intelligence, machine learning, matrix, (16 more...)

arXiv.org Machine Learning

2406.08748

Country:

Europe > Belgium > Flanders (0.14)
North America > United States > New York (0.14)
North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (0.82)

Industry: Health & Medicine (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

HeNCler: Node Clustering in Heterophilous Graphs through Learned Asymmetric Similarity

Achten, Sonny, Tonin, Francesco, Cevher, Volkan, Suykens, Johan A. K.

arXiv.org Artificial IntelligenceMay-27-2024

Graph neural networks (GNNs) have substantially advanced machine learning applications to graph-structured data by effectively propagating node attributes end-to-end. Typically, GNNs rely on the assumption of homophily, where nodes with similar labels are more likely to be connected [39, 36]. The homophily assumption holds true in contexts such as social networks and citation graphs, where models like GCN [14], GIN [37], and GraphSAGE [11] excel at tasks like node classification and graph prediction. However, this is not the case in heterophilous datasets, such as web page and transaction networks, where edges often link nodes with differing labels. Models such as GAT [35] and various graph transformers [38, 9] show improved performance on these datasets. With their attention mechanisms that learns edge importances, they reduce the dependency on the homophily. In this setting, our work specifically addresses unsupervised attributed node clustering tasks, which require models to function without any label information during training.

artificial intelligence, graph, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2405.1705

Country:

Europe (0.94)
North America > United States (0.68)

Genre: Research Report (1.00)

Industry: Information Technology (0.66)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Self-Attention through Kernel-Eigen Pair Sparse Variational Gaussian Processes

Chen, Yingyi, Tao, Qinghua, Tonin, Francesco, Suykens, Johan A. K.

arXiv.org Artificial IntelligenceFeb-2-2024

While the great capability of Transformers significantly boosts prediction accuracy, it could also yield overconfident predictions and require calibrated uncertainty estimation, which can be commonly tackled by Gaussian processes (GPs). Existing works apply GPs with symmetric kernels under variational inference to the attention kernel; however, omitting the fact that attention kernels are in essence asymmetric. Moreover, the complexity of deriving the GP posteriors remains high for large-scale data. In this work, we propose Kernel-Eigen Pair Sparse Variational Gaussian Processes (KEP-SVGP) for building uncertainty-aware self-attention where the asymmetry of attention kernels is tackled by Kernel SVD (KSVD) and a reduced complexity is acquired. Through KEP-SVGP, i) the SVGP pair induced by the two sets of singular vectors from KSVD w.r.t. the attention kernel fully characterizes the asymmetry; ii) using only a small set of adjoint eigenfunctions from KSVD, the derivation of SVGP posteriors can be based on the inversion of a diagonal matrix containing singular values, contributing to a reduction in time complexity; iii) an evidence lower bound is derived so that variational parameters can be optimized towards this objective. Experiments verify our excellent performances and efficiency on in-distribution, distribution-shift and out-of-distribution benchmarks.

artificial intelligence, machine learning, natural language, (14 more...)

arXiv.org Artificial Intelligence

2402.01476

Country: Europe > Belgium > Flanders (0.14)

Genre: Research Report (0.63)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(2 more...)

Add feedback

Unsupervised Neighborhood Propagation Kernel Layers for Semi-supervised Node Classification

Achten, Sonny, Tonin, Francesco, Patrinos, Panagiotis, Suykens, Johan A. K.

arXiv.org Artificial IntelligenceDec-15-2023

We present a deep Graph Convolutional Kernel Machine (GCKM) for semi-supervised node classification in graphs. The method is built of two main types of blocks: (i) We introduce unsupervised kernel machine layers propagating the node features in a one-hop neighborhood, using implicit node feature mappings. (ii) We specify a semi-supervised classification kernel machine through the lens of the Fenchel-Young inequality. We derive an effective initialization scheme and efficient end-to-end training algorithm in the dual variables for the full architecture. The main idea underlying GCKM is that, because of the unsupervised core, the final model can achieve higher performance in semi-supervised node classification when few labels are available for training. Experimental results demonstrate the effectiveness of the proposed framework.

artificial intelligence, machine learning, representation, (17 more...)

arXiv.org Artificial Intelligence

2301.13764

Country:

North America > United States (0.46)
Europe (0.28)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Primal-Attention: Self-attention through Asymmetric Kernel SVD in Primal Representation

Chen, Yingyi, Tao, Qinghua, Tonin, Francesco, Suykens, Johan A. K.

arXiv.org Artificial IntelligenceDec-5-2023

Recently, a new line of works has emerged to understand and improve self-attention in Transformers by treating it as a kernel machine. However, existing works apply the methods for symmetric kernels to the asymmetric self-attention, resulting in a nontrivial gap between the analytical understanding and numerical implementation. In this paper, we provide a new perspective to represent and optimize self-attention through asymmetric Kernel Singular Value Decomposition (KSVD), which is also motivated by the low-rank property of self-attention normally observed in deep layers. Through asymmetric KSVD, $i$) a primal-dual representation of self-attention is formulated, where the optimization objective is cast to maximize the projection variances in the attention outputs; $ii$) a novel attention mechanism, i.e., Primal-Attention, is proposed via the primal representation of KSVD, avoiding explicit computation of the kernel matrix in the dual; $iii$) with KKT conditions, we prove that the stationary solution to the KSVD optimization in Primal-Attention yields a zero-value objective. In this manner, KSVD optimization can be implemented by simply minimizing a regularization loss, so that low-rank property is promoted without extra decomposition. Numerical experiments show state-of-the-art performance of our Primal-Attention with improved efficiency. Moreover, we demonstrate that the deployed KSVD optimization regularizes Primal-Attention with a sharper singular value decay than that of the canonical self-attention, further verifying the great potential of our method. To the best of our knowledge, this is the first work that provides a primal-dual representation for the asymmetric kernel in self-attention and successfully applies it to modeling and optimization.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2305.19798

Country: Europe > Belgium > Flanders (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Vision (0.93)

Add feedback

Combining Primal and Dual Representations in Deep Restricted Kernel Machines Classifiers

Tonin, Francesco, Patrinos, Panagiotis, Suykens, Johan A. K.

arXiv.org Artificial IntelligenceAug-29-2023

In the context of deep learning with kernel machines, the deep Restricted Kernel Machine (DRKM) framework allows multiple levels of kernel PCA (KPCA) and Least-Squares Support Vector Machines (LSSVM) to be combined into a deep architecture using visible and hidden units. We propose a new method for DRKM classification coupling the objectives of KPCA and classification levels, with the hidden feature matrix lying on the Stiefel manifold. The classification level can be formulated as an LSSVM or as an MLP feature map, combining depth in terms of levels and layers. The classification level is expressed in its primal formulation, as the deep KPCA levels, in their dual formulation, can embed the most informative components of the data in a much lower dimensional space. The dual setting is independent of the dimension of the inputs and the primal setting is parametric, which makes the proposed method computationally efficient for both high-dimensional inputs and large datasets. In the experiments, we show that our developed algorithm can effectively learn from small datasets, while using less memory than the convolutional neural network (CNN) with high-dimensional data. and that models with multiple KPCA levels can outperform models with a single level. On the tested larger-scale datasets, DRKM is more energy efficient than CNN while maintaining comparable performance.

artificial intelligence, machine learning, representation, (18 more...)

arXiv.org Artificial Intelligence

2306.07015

Country: Europe > Belgium > Flanders (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Nonlinear SVD with Asymmetric Kernels: feature learning and asymmetric Nystr\"om method

Tao, Qinghua, Tonin, Francesco, Patrinos, Panagiotis, Suykens, Johan A. K.

arXiv.org Artificial IntelligenceJun-12-2023

Asymmetric data naturally exist in real life, such as directed graphs. Different from the common kernel methods requiring Mercer kernels, this paper tackles the asymmetric kernel-based learning problem. We describe a nonlinear extension of the matrix Singular Value Decomposition through asymmetric kernels, namely KSVD. First, we construct two nonlinear feature mappings w.r.t. rows and columns of the given data matrix. The proposed optimization problem maximizes the variance of each mapping projected onto the subspace spanned by the other, subject to a mutual orthogonality constraint. Through Lagrangian duality, we show that it can be solved by the left and right singular vectors in the feature space induced by the asymmetric kernel. Moreover, we start from the integral equations with a pair of adjoint eigenfunctions corresponding to the singular vectors on an asymmetrical kernel, and extend the Nystr\"om method to asymmetric cases through the finite sample approximation, which can be applied to speedup the training in KSVD. Experiments show that asymmetric KSVD learns features outperforming Mercer-kernel based methods that resort to symmetrization, and also verify the effectiveness of the asymmetric Nystr\"om method.

artificial intelligence, machine learning, matrix, (15 more...)

arXiv.org Artificial Intelligence

2306.0704

Country: Europe > Belgium > Flanders (0.14)

Genre: Research Report (0.64)

Industry:

Health & Medicine (0.47)
Education (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback