Goto

Collaborating Authors

 South America


The Dilemma Between Dimensionality Reduction and Adversarial Robustness

arXiv.org Machine Learning

Recent work has shown the tremendous vulnerability to adversarial samples that are nearly indistinguishable from benign data but are improperly classified by the deep learning model. Some of the latest findings suggest the existence of adversarial attacks may be an inherent weakness of these models as a direct result of its sensitivity to well-generalizing features in high dimensional data. We hypothesize that data transformations can influence this vulnerability since a change in the data manifold directly determines the adversary's ability to create these adversarial samples. To approach this problem, we study the effect of dimensionality reduction through the lens of adversarial robustness. This study raises awareness of the positive and negative impacts of five commonly used data transformation techniques on adversarial robustness. The evaluation shows how these techniques contribute to an overall increased vulnerability where accuracy is only improved when the dimensionality reduction technique approaches the data's optimal intrinsic dimension. The conclusions drawn from this work contribute to understanding and creating more resistant learning models.


Twitter discussions and emotions about COVID-19 pandemic: a machine learning approach

arXiv.org Machine Learning

The objective of the study is to examine coronavirus disease (COVID-19) related discussions, concerns, and sentiments that emerged from tweets posted by Twitter users. We analyze 4 million Twitter messages related to the COVID-19 pandemic using a list of 25 hashtags such as "coronavirus," "COVID-19," "quarantine" from March 1 to April 21 in 2020. We use a machine learning approach, Latent Dirichlet Allocation (LDA), to identify popular unigram, bigrams, salient topics and themes, and sentiments in the collected Tweets. Popular unigrams include "virus," "lockdown," and "quarantine." Popular bigrams include "COVID-19," "stay home," "corona virus," "social distancing," and "new cases." We identify 13 discussion topics and categorize them into five different themes, such as "public health measures to slow the spread of COVID-19," "social stigma associated with COVID-19," "coronavirus news cases and deaths," "COVID-19 in the United States," and "coronavirus cases in the rest of the world". Across all identified topics, the dominant sentiments for the spread of coronavirus are anticipation that measures that can be taken, followed by a mixed feeling of trust, anger, and fear for different topics. The public reveals a significant feeling of fear when they discuss the coronavirus new cases and deaths than other topics. The study shows that Twitter data and machine learning approaches can be leveraged for infodemiology study by studying the evolving public discussions and sentiments during the COVID-19. Real-time monitoring and assessment of the Twitter discussion and concerns can be promising for public health emergency responses and planning. Already emerged pandemic fear, stigma, and mental health concerns may continue to influence public trust when there occurs a second wave of COVID-19 or a new surge of the imminent pandemic.


SXL: Spatially explicit learning of geographic processes with auxiliary tasks

arXiv.org Machine Learning

From earth system sciences to climate modeling and ecology, many of the greatest empirical modeling challenges are geographic in nature. As these processes are characterized by spatial dynamics, we can exploit their autoregressive nature to inform learning algorithms. We introduce SXL, a method for learning with geospatial data using explicitly spatial auxiliary tasks. We embed the local Moran's I, a well-established measure of local spatial autocorrelation, into the training process, "nudging" the model to learn the direction and magnitude of local autoregressive effects in parallel with the primary task. Further, we propose an expansion of Moran's I to multiple resolutions to capture effects at different spatial granularities and over varying distance scales. We show the superiority of this method for training deep neural networks using experiments with real-world geospatial data in both generative and predictive modeling tasks. Our approach can be used with arbitrary network architectures and, in our experiments, consistently improves their performance. We also outperform appropriate, domain-specific interpolation benchmarks. Our work highlights how integrating the geographic information sciences and spatial statistics into machine learning models can address the specific challenges of spatial data.


A Framework for Sample Efficient Interval Estimation with Control Variates

arXiv.org Machine Learning

We consider the problem of estimating confidence intervals for the mean of a random variable, where the goal is to produce the smallest possible interval for a given number of samples. While minimax optimal algorithms are known for this problem in the general case, improved performance is possible under additional assumptions. In particular, we design an estimation algorithm to take advantage of side information in the form of a control variate, leveraging order statistics. Under certain conditions on the quality of the control variates, we show improved asymptotic efficiency compared to existing estimation algorithms. Empirically, we demonstrate superior performance on several real world surveying and estimation tasks where we use the output of regression models as the control variates.


Apprentice raises $7.5 million to expedite lab work with AI and AR

#artificialintelligence

Apprentice.io, a startup developing a conversational AI and augmented reality platform for pharmaceutical, biotech, and chemical companies, today announced it has raised $7.5 million. CEO and cofounder Angelo Stracquatanio says the capital will enable Apprentice to scale to accommodate customer growth attributable to the pandemic. A shortage of lab workers is hastening the adoption of automation-driven "augmentation" technologies. An American Society for Clinical Pathology study revealed that the increasing workload is compelling lab managers to hire recent graduates or candidates with bachelor's degrees but no laboratory training. Automation and digital guidance tools like Apprentice's can upskill young professionals while ensuring quality standards aren't compromised.


Deep Categorization with Semi-Supervised Self-Organizing Maps

arXiv.org Machine Learning

Nowadays, with the advance of technology, there is an increasing amount of unstructured data being generated every day. However, it is a painful job to label and organize it. Labeling is an expensive, time-consuming, and difficult task. It is usually done manually, which collaborates with the incorporation of noise and errors to the data. Hence, it is of great importance to developing intelligent models that can benefit from both labeled and unlabeled data. Currently, works on unsupervised and semi-supervised learning are still being overshadowed by the successes of purely supervised learning. However, it is expected that they become far more important in the longer term. This article presents a semi-supervised model, called Batch Semi-Supervised Self-Organizing Map (Batch SS-SOM), which is an extension of a SOM incorporating some advances that came with the rise of Deep Learning, such as batch training. The results show that Batch SS-SOM is a good option for semi-supervised classification and clustering. It performs well in terms of accuracy and clustering error, even with a small number of labeled samples, as well as when presented to unsupervised data, and shows competitive results in transfer learning scenarios in traditional image classification benchmark datasets.


Deep Learning feature selection to unhide demographic recommender systems factors

arXiv.org Machine Learning

Extracting demographic features from hidden factors is an innovative concept that provides multiple and relevant applications. The matrix factorization model generates factors which do not incorporate semantic knowledge. This paper provides a deep learningbased method: DeepUnHide, able to extract demographic information from the users and items factors in collaborative filtering recommender systems. The core of the proposed method is the gradient-based localization used in the image processing literature to highlight the representative areas of each classification class. Validation experiments make use of two public datasets and current baselines. Results show the superiority of DeepUnHide to make feature selection and demographic classification, compared to the state of art of feature selection methods. Relevant and direct applications include recommendations explanation, fairness in collaborative filtering and recommendation to groups of users. Keywords: Feature selection, collaborative filtering, demographic information, matrix factorization, gradient based localization, deep learning.


Restricted Boltzmann Machine Flows and The Critical Temperature of Ising models

arXiv.org Machine Learning

We explore alternative experimental setups for the iterative sampling (flow) from Restricted Boltzmann Machines (RBM) mapped on the temperature space of square lattice Ising models by a neural network thermometer. This framework has been introduced to explore connections between RBM-based deep neural networks and the Renormalization Group (RG). It has been found that, under certain conditions, the flow of an RBM trained with Ising spin configurations approaches in the temperature space a value around the critical one: $ k_B T_c / J \approx 2.269$. In this paper we consider datasets with no information about model topology to argue that a neural network thermometer is not an accurate way to detect whether the RBM has learned scale invariance or not.


Deep Learning Meets SAR

arXiv.org Machine Learning

Deep learning in remote sensing has become an international hype, but it is mostly limited to the evaluation of optical data. Although deep learning has been introduced in SAR data processing, despite successful first attempts, its huge potential remains locked. For example, to the best knowledge of the authors, there is no single example of deep learning in SAR that has been developed up to operational processing of big data or integrated into the production chain of any satellite mission. In this paper, we provide an introduction to the most relevant deep learning models and concepts, point out possible pitfalls by analyzing special characteristics of SAR data, review the state-of-the-art of deep learning applied to SAR in depth, summarize available benchmarks, and recommend some important future research directions. With this effort, we hope to stimulate more research in this interesting yet under-exploited research field.


CoSE: Compositional Stroke Embeddings

arXiv.org Machine Learning

We present a generative model for stroke-based drawing tasks which is able to model complex free-form structures. While previous approaches rely on sequence-based models for drawings of basic objects or handwritten text, we propose a model that treats drawings as a collection of strokes that can be composed into complex structures such as diagrams (e.g., flow-charts). At the core of the approach lies a novel auto-encoder that projects variable-length strokes into a latent space of fixed dimension. This representation space allows a relational model, operating in latent space, to better capture the relationship between strokes and to predict subsequent strokes. We demonstrate qualitatively and quantitatively that our proposed approach is able to model the appearance of individual strokes, as well as the compositional structure of larger diagram drawings. Our approach is suitable for interactive use cases such as auto-completing diagrams.