Goto

Collaborating Authors

 Hosseini, Reshad


Subgoal Discovery Using a Free Energy Paradigm and State Aggregations

arXiv.org Artificial Intelligence

Reinforcement learning (RL) plays a major role in solving complex sequential decision-making tasks. Hierarchical and goal-conditioned RL are promising methods for dealing with two major problems in RL, namely sample inefficiency and difficulties in reward shaping. These methods tackle the mentioned problems by decomposing a task into simpler subtasks and temporally abstracting a task in the action space. One of the key components for task decomposition of these methods is subgoal discovery. We can use the subgoal states to define hierarchies of actions and also use them in decomposing complex tasks. Under the assumption that subgoal states are more unpredictable, we propose a free energy paradigm to discover them. This is achieved by using free energy to select between two spaces, the main space and an aggregation space. The $model \; changes$ from neighboring states to a given state shows the unpredictability of a given state, and therefore it is used in this paper for subgoal discovery. Our empirical results on navigation tasks like grid-world environments show that our proposed method can be applied for subgoal discovery without prior knowledge of the task. Our proposed method is also robust to the stochasticity of environments.


Artificial Data Point Generation in Clustered Latent Space for Small Medical Datasets

arXiv.org Artificial Intelligence

One of the growing trends in machine learning is the use of data generation techniques, since the performance of machine learning models is dependent on the quantity of the training dataset. However, in many medical applications, collecting large datasets is challenging due to resource constraints, which leads to overfitting and poor generalization. This paper introduces a novel method, Artificial Data Point Generation in Clustered Latent Space (AGCL), designed to enhance classification performance on small medical datasets through synthetic data generation. The AGCL framework involves feature extraction, K-means clustering, cluster evaluation based on a class separation metric, and the generation of synthetic data points from clusters with distinct class representations. This method was applied to Parkinson's disease screening, utilizing facial expression data, and evaluated across multiple machine learning classifiers. Experimental results demonstrate that AGCL significantly improves classification accuracy compared to baseline, GN and kNNMTD. AGCL achieved the highest overall test accuracy of 83.33% and cross-validation accuracy of 90.90% in majority voting over different emotions, confirming its effectiveness in augmenting small datasets.


Stochastic First-Order Learning for Large-Scale Flexibly Tied Gaussian Mixture Model

arXiv.org Artificial Intelligence

Gaussian Mixture Models (GMMs) are one of the most potent parametric density models used extensively in many applications. Flexibly-tied factorization of the covariance matrices in GMMs is a powerful approach for coping with the challenges of common GMMs when faced with high-dimensional data and complex densities which often demand a large number of Gaussian components. However, the expectation-maximization algorithm for fitting flexibly-tied GMMs still encounters difficulties with streaming and very large dimensional data. To overcome these challenges, this paper suggests the use of first-order stochastic optimization algorithms. Specifically, we propose a new stochastic optimization algorithm on the manifold of orthogonal matrices. Through numerous empirical results on both synthetic and real datasets, we observe that stochastic optimization methods can outperform the expectation-maximization algorithm in terms of attaining better likelihood, needing fewer epochs for convergence, and consuming less time per each epoch.


Out-of-distribution detection using normalizing flows on the data manifold

arXiv.org Artificial Intelligence

A common approach for out-of-distribution detection involves estimating an underlying data distribution, which assigns a lower likelihood value to out-of-distribution data. Normalizing flows are likelihood-based generative models providing a tractable density estimation via dimension-preserving invertible transformations. Conventional normalizing flows are prone to fail in out-of-distribution detection, because of the well-known curse of dimensionality problem of the likelihood-based models. According to the manifold hypothesis, real-world data often lie on a low-dimensional manifold. This study investigates the effect of manifold learning using normalizing flows on out-of-distribution detection. We proceed by estimating the density on a low-dimensional manifold, coupled with measuring the distance from the manifold, as criteria for out-of-distribution detection. However, individually, each of them is insufficient for this task. The extensive experimental results show that manifold learning improves the out-of-distribution detection ability of a class of likelihood-based models known as normalizing flows. This improvement is achieved without modifying the model structure or using auxiliary out-of-distribution data during training.


Efficient Relation-aware Neighborhood Aggregation in Graph Neural Networks via Tensor Decomposition

arXiv.org Artificial Intelligence

Many Graph Neural Networks (GNNs) are proposed for Knowledge Graph Embedding (KGE). However, lots of these methods neglect the importance of the information of relations and combine it with the information of entities inefficiently, leading to low expressiveness. To address this issue, we introduce a general knowledge graph encoder incorporating tensor decomposition in the aggregation function of Relational Graph Convolutional Network (R-GCN). In our model, neighbor entities are transformed using projection matrices of a low-rank tensor which are defined by relation types to benefit from multi-task learning and produce expressive relation-aware representations. Besides, we propose a low-rank estimation of the core tensor using CP decomposition to compress and regularize our model. We use a training method inspired by contrastive learning, which relieves the training limitation of the 1-N method on huge graphs. We achieve favorably competitive results on FB15k-237 and WN18RR with embeddings in comparably lower dimensions.


Joint Manifold Learning and Density Estimation Using Normalizing Flows

arXiv.org Machine Learning

Based on the manifold hypothesis, real-world data often lie on a low-dimensional manifold, while normalizing flows as a likelihood-based generative model are incapable of finding this manifold due to their structural constraints. So, one interesting question arises: $\textit{"Can we find sub-manifold(s) of data in normalizing flows and estimate the density of the data on the sub-manifold(s)?"}$. In this paper, we introduce two approaches, namely per-pixel penalized log-likelihood and hierarchical training, to answer the mentioned question. We propose a single-step method for joint manifold learning and density estimation by disentangling the transformed space obtained by normalizing flows to manifold and off-manifold parts. This is done by a per-pixel penalized likelihood function for learning a sub-manifold of the data. Normalizing flows assume the transformed data is Gaussianizationed, but this imposed assumption is not necessarily true, especially in high dimensions. To tackle this problem, a hierarchical training approach is employed to improve the density estimation on the sub-manifold. The results validate the superiority of the proposed methods in simultaneous manifold learning and density estimation using normalizing flows in terms of generated image quality and likelihood.


Vector Transport Free Riemannian LBFGS for Optimization on Symmetric Positive Definite Matrix Manifolds

arXiv.org Machine Learning

This work concentrates on optimization on Riemannian manifolds. The Limited-memory Broyden-Fletcher-Goldfarb-Shanno (LBFGS) algorithm is a commonly used quasi-Newton method for numerical optimization in Euclidean spaces. Riemannian LBFGS (RLBFGS) is an extension of this method to Riemannian manifolds. RLBFGS involves computationally expensive vector transports as well as unfolding recursions using adjoint vector transports. In this article, we propose two mappings in the tangent space using the inverse second root and Cholesky decomposition. These mappings make both vector transport and adjoint vector transport identity and therefore isometric. Identity vector transport makes RLBFGS less computationally expensive and its isometry is also very useful in convergence analysis of RLBFGS. Moreover, under the proposed mappings, the Riemannian metric reduces to Euclidean inner product, which is much less computationally expensive. We focus on the Symmetric Positive Definite (SPD) manifolds which are beneficial in various fields such as data science and statistics. This work opens a research opportunity for extension of the proposed mappings to other well-known manifolds.


FRMDN: Flow-based Recurrent Mixture Density Network

arXiv.org Machine Learning

Recurrent Mixture Density Networks (RMDNs) are consisted of two main parts: a Recurrent Neural Network (RNN) and a Gaussian Mixture Model (GMM), in which a kind of RNN (almost LSTM) is used to find the parameters of a GMM in every time step. While available RMDNs have been faced with different difficulties. The most important of them is high$-$dimensional problems. Since estimating the covariance matrix for the high$-$dimensional problems is more difficult, due to existing correlation between dimensions and satisfying the positive definition condition. Consequently, the available methods have usually used RMDN with a diagonal covariance matrix for high$-$dimensional problems by supposing independence among dimensions. Hence, in this paper with inspiring a common approach in the literature of GMM, we consider a tied configuration for each precision matrix (inverse of the covariance matrix) in RMDN as $(\(\Sigma _k^{ - 1} = U{D_k}U\))$ to enrich GMM rather than considering a diagonal form for it. But due to simplicity, we assume $\(U\)$ be an Identity matrix and $\(D_k\)$ is a specific diagonal matrix for $\(k^{th}\)$ component. Until now, we only have a diagonal matrix and it does not differ with available diagonal RMDNs. Besides, Flow$-$based neural networks are a new group of generative models that are able to transform a distribution to a simpler distribution and vice versa, through a sequence of invertible functions. Therefore, we applied a diagonal GMM on transformed observations. At every time step, the next observation, $\({y_{t + 1}}\)$, has been passed through a flow$-$based neural network to obtain a much simpler distribution. Experimental results for a reinforcement learning problem verify the superiority of the proposed method to the base$-$line method in terms of Negative Log$-$Likelihood (NLL) for RMDN and the cumulative reward for a controller with fewer population size.


Active Transfer Learning for Persian Offline Signature Verification

arXiv.org Machine Learning

Offline Signature Verification (OSV) remains a challenging pattern recognition task, especially in the presence of skilled forgeries that are not available during the training. This challenge is aggravated when there are small labeled training data available but with large intra-personal variations. In this study, we address this issue by employing an active learning approach, which selects the most informative instances to label and therefore reduces the human labeling effort significantly. Our proposed OSV includes three steps: feature learning, active learning, and final verification. We benefit from transfer learning using a pre-trained CNN for feature learning. We also propose SVM-based active learning for each user to separate his genuine signatures from the random forgeries. We finally used the SVMs to verify the authenticity of the questioned signature. We examined our proposed active transfer learning method on UTSig: A Persian offline signature dataset. We achieved near 13% improvement compared to the random selection of instances. Our results also showed 1% improvement over the state-of-the-art method in which a fully supervised setting with five more labeled instances per user was used.


Deep-RBF Networks Revisited: Robust Classification with Rejection

arXiv.org Machine Learning

One of the main drawbacks of deep neural networks, like many other classifiers, is their vulnerability to adversarial attacks. An important reason for their vulnerability is assigning high confidence to regions with few or even no feature points. By feature points, we mean a nonlinear transformation of the input space extracting a meaningful representation of the input data. On the other hand, deep-RBF networks assign high confidence only to the regions containing enough feature points, but they have been discounted due to the widely-held belief that they have the vanishing gradient problem. In this paper, we revisit the deep-RBF networks by first giving a general formulation for them, and then proposing a family of cost functions thereof inspired by metric learning. In the proposed deep-RBF learning algorithm, the vanishing gradient problem does not occur. We make these networks robust to adversarial attack by adding the reject option to their output layer. Through several experiments on the MNIST dataset, we demonstrate that our proposed method not only achieves significant classification accuracy but is also very resistant to various adversarial attacks.