learning architecture
38db3aed920cf82ab059bfccbd02be6a-Reviews.html
It is know that adding an additive gaussian noise to the feature is equivalent to an l_2 regularization in a least square problem (Bishop). This paper studies multiplicative Bernoulli feature noising, in a shallow learning architecture, with a general loss function and shows that it has the effect of adapting the geometry through an l_2 regularizer that rescales the feature (beta^{\top} D(beta,X) beta). The Matrix D(beta,X) is a estimate of the inverse diagonal fisher information. It is worth noting that D does not depend on the labels. The equivalent regularizer of dropout is non convex in general.
- Summary/Review (0.48)
- Research Report > New Finding (0.30)
Leveraging Self-Supervised Learning Methods for Remote Screening of Subjects with Paroxysmal Atrial Fibrillation
Atienza, Adrian, Manimaran, Gouthamaan, Puthusserypady, Sadasivan, Dominguez, Helena, Jacobsen, Peter K., Bardram, Jakob E.
The integration of Artificial Intelligence (AI) into clinical research has great potential to reveal patterns that are difficult for humans to detect, creating impactful connections between inputs and clinical outcomes. However, these methods often require large amounts of labeled data, which can be difficult to obtain in healthcare due to strict privacy laws and the need for experts to annotate data. This requirement creates a bottleneck when investigating unexplored clinical questions. This study explores the application of Self-Supervised Learning (SSL) as a way to obtain preliminary results from clinical studies with limited sized cohorts. To assess our approach, we focus on an underexplored clinical task: screening subjects for Paroxysmal Atrial Fibrillation (P-AF) using remote monitoring, single-lead ECG signals captured during normal sinus rhythm. We evaluate state-of-the-art SSL methods alongside supervised learning approaches, where SSL outperforms supervised learning in this task of interest. More importantly, it prevents misleading conclusions that may arise from poor performance in the latter paradigm when dealing with limited cohort settings.
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
A Unified Multi-Task Learning Architecture for Hate Detection Leveraging User-Based Information
Hate speech, offensive language, aggression, racism, sexism, and other abusive language are common phenomena in social media. There is a need for Artificial Intelligence(AI)based intervention which can filter hate content at scale. Most existing hate speech detection solutions have utilized the features by treating each post as an isolated input instance for the classification. This paper addresses this issue by introducing a unique model that improves hate speech identification for the English language by utilising intra-user and inter-user-based information. The experiment is conducted over single-task learning (STL) and multi-task learning (MTL) paradigms that use deep neural networks, such as convolutional neural networks (CNN), gated recurrent unit (GRU), bidirectional encoder representations from the transformer (BERT), and A Lite BERT (ALBERT). We use three benchmark datasets and conclude that combining certain user features with textual features gives significant improvements in macro-F1 and weighted-F1.
Permutative redundancy and uncertainty of the objective in deep learning
Implications of uncertain objective functions and permutative symmetry of traditional deep learning architectures are discussed. It is shown that traditional architectures are polluted by an astronomical number of equivalent global and local optima. Uncertainty of the objective makes local optima unattainable, and, as the size of the network grows, the global optimization landscape likely becomes a tangled web of valleys and ridges. Some remedies which reduce or eliminate ghost optima are discussed including forced pre-pruning, re-ordering, ortho-polynomial activations, and modular bio-inspired architectures.
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- North America > United States > Illinois (0.04)
- Europe > Ukraine (0.04)
Understanding Deep Learning via Notions of Rank
Despite the extreme popularity of deep learning in science and industry, its formal understanding is limited. This thesis puts forth notions of rank as key for developing a theory of deep learning, focusing on the fundamental aspects of generalization and expressiveness. In particular, we establish that gradient-based training can induce an implicit regularization towards low rank for several neural network architectures, and demonstrate empirically that this phenomenon may facilitate an explanation of generalization over natural data (e.g., audio, images, and text). Then, we characterize the ability of graph neural networks to model interactions via a notion of rank, which is commonly used for quantifying entanglement in quantum physics. A central tool underlying these results is a connection between neural networks and tensor factorizations. Practical implications of our theory for designing explicit regularization schemes and data preprocessing algorithms are presented.
- North America > United States (0.13)
- Africa > Senegal > Kolda Region > Kolda (0.04)
- Asia (0.04)
Deep learning architectures for data-driven damage detection in nonlinear dynamic systems
Joseph, Harrish, Quaranta, Giuseppe, Carboni, Biagio, Lacarbonara, Walter
The primary goal of structural health monitoring is to detect damage at its onset before it reaches a critical level. The in-depth investigation in the present work addresses deep learning applied to data-driven damage detection in nonlinear dynamic systems. In particular, autoencoders (AEs) and generative adversarial networks (GANs) are implemented leveraging on 1D convolutional neural networks. The onset of damage is detected in the investigated nonlinear dynamic systems by exciting random vibrations of varying intensity, without prior knowledge of the system or the excitation and in unsupervised manner. The comprehensive numerical study is conducted on dynamic systems exhibiting different types of nonlinear behavior. An experimental application related to a magneto-elastic nonlinear system is also presented to corroborate the conclusions.
- North America > United States > Florida (0.14)
- South America > Brazil > São Paulo (0.04)
- South America > Brazil > Rio de Janeiro > Rio de Janeiro (0.04)
- (4 more...)
MUSTAN: Multi-scale Temporal Context as Attention for Robust Video Foreground Segmentation
Pokala, Praveen Kumar, Patibandla, Jaya Sai Kiran, Pandey, Naveen Kumar, Pailla, Balakrishna Reddy
Video foreground segmentation (VFS) is an important computer vision task wherein one aims to segment the objects under motion from the background. Most of the current methods are image-based, i.e., rely only on spatial cues while ignoring motion cues. Therefore, they tend to overfit the training data and don't generalize well to out-of-domain (OOD) distribution. To solve the above problem, prior works exploited several cues such as optical flow, background subtraction mask, etc. However, having a video data with annotations like optical flow is a challenging task. In this paper, we utilize the temporal information and the spatial cues from the video data to improve OOD performance. However, the challenge lies in how we model the temporal information given the video data in an interpretable way creates a very noticeable difference. We therefore devise a strategy that integrates the temporal context of the video in the development of VFS. Our approach give rise to deep learning architectures, namely MUSTAN1 and MUSTAN2 and they are based on the idea of multi-scale temporal context as an attention, i.e., aids our models to learn better representations that are beneficial for VFS. Further, we introduce a new video dataset, namely Indoor Surveillance Dataset (ISD) for VFS. It has multiple annotations on a frame level such as foreground binary mask, depth map, and instance semantic annotations. Therefore, ISD can benefit other computer vision tasks. We validate the efficacy of our architectures and compare the performance with baselines. We demonstrate that proposed methods significantly outperform the benchmark methods on OOD. In addition, the performance of MUSTAN2 is significantly improved on certain video categories on OOD data due to ISD.
- Research Report (0.50)
- Overview (0.46)
An appointment with Reproducing Kernel Hilbert Space generated by Generalized Gaussian RBF as $L^2-$measure
Gaussian Radial Basis Function (RBF) Kernels are the most-often-employed kernels in artificial intelligence and machine learning routines for providing optimally-best results in contrast to their respective counter-parts. However, a little is known about the application of the Generalized Gaussian Radial Basis Function on various machine learning algorithms namely, kernel regression, support vector machine (SVM) and pattern-recognition via neural networks. The results that are yielded by Generalized Gaussian RBF in the kernel sense outperforms in stark contrast to Gaussian RBF Kernel, Sigmoid Function and ReLU Function. This manuscript demonstrates the application of the Generalized Gaussian RBF in the kernel sense on the aforementioned machine learning routines along with the comparisons against the aforementioned functions as well.
- North America > United States > Florida (0.04)
- North America > United States > Texas > Smith County > Tyler (0.04)
- North America > United States > Pennsylvania (0.04)
- (3 more...)
A Blockchain-empowered Multi-Aggregator Federated Learning Architecture in Edge Computing with Deep Reinforcement Learning Optimization
Federated learning (FL) is emerging as a sought-after distributed machine learning architecture, offering the advantage of model training without direct exposure of raw data. With advancements in network infrastructure, FL has been seamlessly integrated into edge computing. However, the limited resources on edge devices introduce security vulnerabilities to FL in the context. While blockchain technology promises to bolster security, practical deployment on resource-constrained edge devices remains a challenge. Moreover, the exploration of FL with multiple aggregators in edge computing is still new in the literature. Addressing these gaps, we introduce the Blockchain-empowered Heterogeneous Multi-Aggregator Federated Learning Architecture (BMA-FL). We design a novel light-weight Byzantine consensus mechanism, namely PBCM, to enable secure and fast model aggregation and synchronization in BMA-FL. We also dive into the heterogeneity problem in BMA-FL that the aggregators are associated with varied number of connected trainers with Non-IID data distributions and diverse training speed. We proposed a multi-agent deep reinforcement learning algorithm to help aggregators decide the best training strategies. The experiments on real-word datasets demonstrate the efficiency of BMA-FL to achieve better models faster than baselines, showing the efficacy of PBCM and proposed deep reinforcement learning algorithm.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.28)
- North America > United States > Texas > Dallas County > Richardson (0.04)
- North America > United States > Washington > King County > Seattle (0.04)
- (3 more...)
- Information Technology > Security & Privacy (1.00)
- Energy (0.93)
Active Learning with Statistical Models
An active learning problem is one where the learner has the ability or need to influence or select its own training data. Many problems of great practical interest allow active learning, and many even require it. We consider the problem of actively learning a mapping X - Y based on a set of training examples {(Xi,Yi)} l' where Xi E X and Yi E Y. The learner is allowed to iteratively select new inputs x (possibly from a constrained set), observe the resulting output y, and incorporate the new examples (x, y) into its training set. The primary question of active learning is how to choose which x to try next. There are many heuristics for choosing x based on intuition, including choosing places where we don't have data, where we perform poorly [Linden and Weber, 1993], where we have low confidence [Thrun and Moller, 1992], where we expect it