AITopics

2309.08201

Country:

North America > Canada > British Columbia (0.14)
Asia > China > Sichuan Province (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology > Security & Privacy (0.86)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

arXiv.org Artificial IntelligenceOct-1-2023

Masked Autoencoders with Multi-Window Local-Global Attention Are Better Audio Learners

Yadav, Sarthak, Theodoridis, Sergios, Hansen, Lars Kai, Tan, Zheng-Hua

In this work, we propose a Multi-Window Masked Autoencoder (MW-MAE) fitted with a novel Multi-Window Multi-Head Attention (MW-MHA) module that facilitates the modelling of local-global interactions in every decoder transformer block through attention heads of several distinct local and global windows. Empirical results on ten downstream audio tasks show that MW-MAEs consistently outperform standard MAEs in overall performance and learn better general-purpose audio representations, along with demonstrating considerably better scaling characteristics. Investigating attention distances and entropies reveals that MW-MAE encoders learn heads with broader local and global attention. Analyzing attention head feature representations through Projection Weighted Canonical Correlation Analysis (PWCCA) shows that attention heads with the same window sizes across the decoder layers of the MW-MAE learn correlated feature representations which enables each block to independently capture local and global information, leading to a decoupled decoder feature hierarchy. Code for feature extraction and downstream experiments along with pre-trained models will be released publically.

artificial intelligence, machine learning, natural language, (15 more...)

2306.00561

Country:

Europe (0.93)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceSep-3-2023

Towards Efficient Modeling and Inference in Multi-Dimensional Gaussian Process State-Space Models

Lin, Zhidi, Maroñas, Juan, Li, Ying, Yin, Feng, Theodoridis, Sergios

The Gaussian process state-space model (GPSSM) has attracted extensive attention for modeling complex nonlinear dynamical systems. However, the existing GPSSM employs separate Gaussian processes (GPs) for each latent state dimension, leading to escalating computational complexity and parameter proliferation, thus posing challenges for modeling dynamical systems with high-dimensional latent states. To surmount this obstacle, we propose to integrate the efficient transformed Gaussian process (ETGP) into the GPSSM, which involves pushing a shared GP through multiple normalizing flows to efficiently model the transition function in high-dimensional latent state space. Additionally, we develop a corresponding variational inference algorithm that surpasses existing methods in terms of parameter count and computational complexity. Experimental results on diverse synthetic and real-world datasets corroborate the efficiency of the proposed method, while also demonstrating its ability to achieve similar inference performance compared to existing methods. Code is available at \url{https://github.com/zhidilin/gpssmProj}.

artificial intelligence, machine learning, modeling & simulation, (14 more...)

2309.01074

Country:

Europe (1.00)
Asia > China (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Machine LearningDec-5-2021

Stochastic Local Winner-Takes-All Networks Enable Profound Adversarial Robustness

Panousis, Konstantinos P., Chatzis, Sotirios, Theodoridis, Sergios

This work explores the potency of stochastic competition-based activations, namely Stochastic Local Winner-Takes-All (LWTA), against powerful (gradient-based) white-box and black-box adversarial attacks; we especially focus on Adversarial Training settings. In our work, we replace the conventional ReLU-based nonlinearities with blocks comprising locally and stochastically competing linear units. The output of each network layer now yields a sparse output, depending on the outcome of winner sampling in each block. We rely on the Variational Bayesian framework for training and inference; we incorporate conventional PGD-based adversarial training arguments to increase the overall adversarial robustness. As we experimentally show, the arising networks yield state-of-the-art robustness against powerful adversarial attacks while retaining very high classification rate in the benign case.

artificial intelligence, machine learning, robustness, (17 more...)

2112.02671

Country:

Europe > Middle East > Cyprus (0.15)
Europe > Denmark (0.14)
Europe > Greece (0.14)

Genre: Research Report (0.82)

Industry:

Information Technology (0.55)
Government (0.55)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

arXiv.org Artificial IntelligenceSep-15-2021

Dialog speech sentiment classification for imbalanced datasets

Nicolaou, Sergis, Mavrides, Lambros, Tryfou, Georgina, Tolias, Kyriakos, Panousis, Konstantinos, Chatzis, Sotirios, Theodoridis, Sergios

Speech is the most common way humans express their feelings, and sentiment analysis is the use of tools such as natural language processing and computational algorithms to identify the polarity of these feelings. Even though this field has seen tremendous advancements in the last two decades, the task of effectively detecting under represented sentiments in different kinds of datasets is still a challenging task. In this paper, we use single and bi-modal analysis of short dialog utterances and gain insights on the main factors that aid in sentiment detection, particularly in the underrepresented classes, in datasets with and without inherent sentiment component. Furthermore, we propose an architecture which uses a learning rate scheduler and different monitoring criteria and provides state-of-the-art results for the SWITCHBOARD imbalanced sentiment dataset.

dataset, deep learning, neural network, (20 more...)

2109.07228

Country:

Europe > Middle East > Cyprus (0.14)
Europe > Denmark (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.79)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.79)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.66)

arXiv.org Machine LearningJan-4-2021

Local Competition and Stochasticity for Adversarial Robustness in Deep Learning

Panousis, Konstantinos P., Chatzis, Sotirios, Alexos, Antonios, Theodoridis, Sergios

This work addresses adversarial robustness in deep learning by considering deep networks with stochastic local winner-takes-all (LWTA) nonlinearities. This type of network units result in sparse representations from each model layer, as the units are organized in blocks where only one unit generates non-zero output. The main operating principle of the introduced units lies on stochastic arguments, as the network performs posterior sampling over competing units to select the winner. We combine these LWTA arguments with tools from the field of Bayesian non-parametrics, specifically the stick-breaking construction of the Indian Buffet Process, to allow for inferring the sub-part of each layer that is essential for modeling the data at hand. Inference for the proposed network is performed by means of stochastic variational Bayes. We perform a thorough experimental evaluation of our model using benchmark datasets, assuming gradient-based adversarial attacks. As we show, our method achieves high robustness to adversarial perturbations, with state-of-the-art performance in powerful white-box attacks.

adversarial attack, deep learning, neural network, (14 more...)

2101.01121

Country: North America > United States > California > Orange County > Irvine (0.14)

Genre: Research Report (0.50)

Industry: Information Technology > Security & Privacy (0.36)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Machine LearningSep-5-2020

Towards Probabilistic Tensor Canonical Polyadic Decomposition 2.0: Automatic Tensor Rank Learning Using Generalized Hyperbolic Prior

Cheng, Lei, Chen, Zhongtao, Shi, Qingjiang, Wu, Yik-Chung, Theodoridis, Sergios

Tensor rank learning for canonical polyadic decomposition (CPD) has long been deemed as an essential but challenging problem. In particular, since the tensor rank controls the complexity of the CPD model, its inaccurate learning would cause overfitting to noise or underfitting to the signal sources, and even destroy the interpretability of model parameters. However, the optimal determination of a tensor rank is known to be a non-deterministic polynomial-time hard (NP-hard) task. Rather than exhaustively searching for the best tensor rank via trial-and-error experiments, Bayesian inference under the Gaussian-gamma prior was introduced in the context of probabilistic CPD modeling and it was shown to be an effective strategy for automatic tensor rank determination. This triggered flourishing research on other structured tensor CPDs with automatic tensor rank learning. As the other side of the coin, these research works also reveal that the Gaussian-gamma model does not perform well for high-rank tensors or/and low signal-to-noise ratios (SNRs). To overcome these drawbacks, in this paper, we introduce a more advanced generalized hyperbolic (GH) prior to the probabilistic CPD model, which not only includes the Gaussian-gamma model as a special case, but also provides more flexibilities to adapt to different levels of sparsity. Based on this novel probabilistic model, an algorithm is developed under the framework of variational inference, where each update is obtained in a closed-form. Extensive numerical results, using synthetic data and real-world datasets, demonstrate the excellent performance of the proposed method in learning both low as well as high tensor ranks even for low SNR cases.

bayesian inference, health & medicine, tensor rank, (14 more...)

2009.02472

Country: Asia > China (0.46)

Genre: Research Report (0.40)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

arXiv.org Machine LearningMay-12-2020

Early soft and flexible fusion of EEG and fMRI via tensor decompositions

Chatzichristos, Christos, Kofidis, Eleftherios, De Lathauwer, Lieven, Theodoridis, Sergios, Van Huffel, Sabine

Data fusion refers to the joint analysis of multiple datasets which provide complementary views of the same task. In this preprint, the problem of jointly analyzing electroencephalography (EEG) and functional Magnetic Resonance Imaging (fMRI) data is considered. Jointly analyzing EEG and fMRI measurements is highly beneficial for studying brain function because these modalities have complementary spatiotemporal resolution: EEG offers good temporal resolution while fMRI is better in its spatial resolution. The fusion methods reported so far ignore the underlying multi-way nature of the data in at least one of the modalities and/or rely on very strong assumptions about the relation of the two datasets. In this preprint, these two points are addressed by adopting for the first time tensor models in the two modalities while also exploring double coupled tensor decompositions and by following soft and flexible coupling approaches to implement the multi-modal analysis. To cope with the Event Related Potential (ERP) variability in EEG, the PARAFAC2 model is adopted. The results obtained are compared against those of parallel Independent Component Analysis (ICA) and hard coupling alternatives in both simulated and real data. Our results confirm the superiority of tensorial methods over methods based on ICA. In scenarios that do not meet the assumptions underlying hard coupling, the advantage of soft and flexible coupled decompositions is clearly demonstrated.

decomposition, information fusion, survey article, (21 more...)

2005.07134

Country:

Europe (1.00)
North America > United States > Maryland (0.14)
Asia > Middle East > Israel (0.14)

Genre: Research Report > New Finding (0.86)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Health Care Technology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Data Science > Data Integration (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

arXiv.org Machine LearningApr-21-2019

Linear Multiple Low-Rank Kernel Based Stationary Gaussian Processes Regression for Time Series

Yin, Feng, Pan, Lishuo, He, Xinwei, Chen, Tianshi, Theodoridis, Sergios, Zhi-Quan, null, Luo, null

Gaussian processes (GP) for machine learning have been studied systematically over the past two decades and they are by now widely used in a number of diverse applications. However, GP kernel design and the associated hyper-parameter optimization are still hard and to a large extend open problems. In this paper, we consider the task of GP regression for time series modeling and analysis. The underlying stationary kernel can be approximated arbitrarily close by a new proposed grid spectral mixture (GSM) kernel, which turns out to be a linear combination of low-rank sub-kernels. In the case where a large number of the sub-kernels are used, either the Nystr\"{o}m or the random Fourier feature approximations can be adopted to deal efficiently with the computational demands. The unknown GP hyper-parameters consist of the non-negative weights of all sub-kernels as well as the noise variance; their estimation is performed via the maximum-likelihood (ML) estimation framework. Two efficient numerical optimization methods for solving the unknown hyper-parameters are derived, including a sequential majorization-minimization (MM) method and a non-linearly constrained alternating direction of multiplier method (ADMM). The MM matches perfectly with the proven low-rank property of the proposed GSM sub-kernels and turns out to be a part of efficiency, stable, and efficient solver, while the ADMM has the potential to generate better local minimum in terms of the test MSE. Experimental results, based on various classic time series data sets, corroborate that the proposed GSM kernel-based GP regression model outperforms several salient competitors of similar kind in terms of prediction mean-squared-error and numerical stability.

bayesian inference, kernel, survey article, (20 more...)

1904.09559

Country:

North America > United States (1.00)
Asia (0.93)
North America > Canada > Ontario > Toronto (0.14)
(2 more...)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.87)

arXiv.org Machine LearningJan-23-2019

Nonparametric Bayesian Deep Networks with Local Competition

Panousis, Konstantinos P., Chatzis, Sotirios, Theodoridis, Sergios

Local competition among neighboring neurons is a common procedure taking place in biological systems. This finding has inspired research on more biologically plausible deep networks that comprise competing linear units, as opposed to nonlinear units that do not entail any form of (local) competition. This paper revisits this modeling paradigm, with the aim of enabling inference of networks that retain state-of-the-art accuracy for the least possible model complexity; this includes the needed number of connections or locally competing sets of units, as well as the required floating-point precision for storing the network weights. To this end, we leverage solid arguments from the field of Bayesian nonparametrics. Specifically, we introduce auxiliary discrete latent variables of model component utility, and perform Bayesian inference over them. Then, we impose appropriate stick-breaking priors over the introduced discrete latent variables; these give rise to a well-established sparsity-inducing mechanism. As we experimentally show using benchmark datasets, our approach yields networks with less memory footprint than the state-of-the-art, and with no compromises in predictive accuracy.

bayesian inference, lwta block, neural network, (19 more...)

1805.07624

Country:

North America > United States (0.28)
Europe (0.28)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.34)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.34)