Goto

Collaborating Authors

 Country


Minimum adjusted Rand index for two clusterings of a given size

arXiv.org Machine Learning

The adjusted Rand index is one of the most commonly used similarity measures to compare two clusterings of a given set of objects. Indeed, it is the recommended criterion for external clustering evaluation in the seminal study of Milligan and Cooper (1986). Nevertheless, many other measures for external clustering evaluation were recently surveyed in Meilă (2016). Initially, Rand (1971) considered a similarity index between two clusterings (the Rand index) defined as the proportion of object pairs that are either assigned to the same cluster in both clusterings or to different clusters in both clusterings. However, Morey and Agresti (1984) noted that such an index does not take into account the possible agreement by chance, and Hubert and Arabie (1985) introduced a corrected-for-chance version of the Rand index, which is usually known as the adjusted Rand index (ARI).


A Reinforcement Learning Framework for Time-Dependent Causal Effects Evaluation in A/B Testing

arXiv.org Machine Learning

A/B testing, or online experiment is a standard business strategy to compare a new product with an old one in pharmaceutical, technological, and traditional industries. Major challenges arise in online experiments where there is only one unit that receives a sequence of treatments over time. In those experiments, the treatment at a given time impacts current outcome as well as future outcomes. The aim of this paper is to introduce a reinforcement learning framework for carrying A/B testing, while characterizing the long-term treatment effects. Our proposed testing procedure allows for sequential monitoring and online updating, so it is generally applicable to a variety of treatment designs in different industries. In addition, we systematically investigate the theoretical properties (e.g., asymptotic distribution and power) of our testing procedure. Finally, we apply our framework to both synthetic datasets and a real-world data example obtained from a ride-sharing company to illustrate its usefulness.


Cross-modal variational inference for bijective signal-symbol translation

arXiv.org Machine Learning

Extraction of symbolic information from signals is an active field of research enabling numerous applications especially in the Musical Information Retrieval domain. This complex task, that is also related to other topics such as pitch extraction or instrument recognition, is a demanding subject that gave birth to numerous approaches, mostly based on advanced signal processing-based algorithms. However, these techniques are often non-generic, allowing the extraction of definite physical properties of the signal (pitch, octave), but not allowing arbitrary vocabularies or more general annotations. On top of that, these techniques are one-sided, meaning that they can extract symbolic data from an audio signal, but cannot perform the reverse process and make symbol-to-signal generation. In this paper, we propose an bijective approach for signal/symbol translation by turning this problem into a density estimation task over signal and symbolic domains, considered both as related random variables. We estimate this joint distribution with two different variational auto-encoders, one for each domain, whose inner representations are forced to match with an additive constraint, allowing both models to learn and generate separately while allowing signal-to-symbol and symbol-to-signal inference. In this article, we test our models on pitch, octave and dynamics symbols, which comprise a fundamental step towards music transcription and label-constrained audio generation. In addition to its versatility, this system is rather light during training and generation while allowing several interesting creative uses that we outline at the end of the article.


Sparse and Smooth: improved guarantees for Spectral Clustering in the Dynamic Stochastic Block Model

arXiv.org Machine Learning

In this paper, we analyse classical variants of the Spectral Clustering (SC) algorithm in the Dynamic Stochastic Block Model (DSBM). Existing results show that, in the relatively sparse case where the expected degree grows logarithmically with the number of nodes, guarantees in the static case can be extended to the dynamic case and yield improved error bounds when the DSBM is sufficiently smooth in time, that is, the communities do not change too much between two time steps. We improve over these results by drawing a new link between the sparsity and the smoothness of the DSBM: the more regular the DSBM is, the more sparse it can be, while still guaranteeing consistent recovery. In particular, a mild condition on the smoothness allows to treat the sparse case with bounded degree. We also extend these guarantees to the normalized Laplacian, and as a by-product of our analysis, we obtain to our knowledge the best spectral concentration bound available for the normalized Laplacian of matrices with independent Bernoulli entries.


Mean-Field Analysis of Two-Layer Neural Networks: Non-Asymptotic Rates and Generalization Bounds

arXiv.org Machine Learning

Deep learning has achieved tremendous practical success in a wide range of machine learning tasks (Krizhevsky et al., 2012; Hinton et al., 2012; Silver et al., 2016). However, due to the nonconvex and over-parameterized nature of modern neural networks, the success of deep learning cannot be fully explained by conventional optimization and machine learning theory. Recently, a line of work utilized a mean-field framework to study the training of extremely wide (or even infinitely wide) neural networks (Chizat and Bach, 2018; Mei et al., 2018, 2019; Wei et al., 2019; Fang et al., 2019a,b). It has been shown that over-parameterized two-layer neural networks can be trained to a global optimizer of the training loss, despite the non-convex optimization landscape. However, most of the global convergence results proved in the line are asymptotic, and the convergence rate of the training algorithm is largely unknown, except for some specifically designed training procedure (Wei et al., 2019). Moreover, the generalization performance of neural networks trained in the mean-field regime has not been well-studied. Compared with the mean-field analysis, another line of work studying the learning of overparameterized neural network in the so-called "neural tangnet kernel (NTK) regime" (Jacot et al.,


Search for Smart Evaders with Sweeping Agents

arXiv.org Artificial Intelligence

Suppose that in a given planar circular region, there are some smart mobile evaders and we would like to find them using sweeping agents. We assume that the sweeping agents are in a line formation whose total length is 2r. We propose procedures for designing a sweeping process that ensures the successful completion of the task, thereby deriving conditions on the sweeping velocity of the linear formation and its path. Successful completion of the task means that evaders with a given limit on their velocity cannot escape the sweeping agents. A simpler task for the sweeping formation is the confinement of the evaders to their initial domain. The feasibility of completing these tasks depends on geometric and dynamic constraints that impose a lower bound on the velocity that the sweeper line formation must have. This critical velocity is derived to ensure the satisfaction of the confinement task. Increasing the velocity above the lower bound enables the agents to complete the search task as well. We present results on the total search time as a function of the sweeping velocity of the formation given the initial conditions on the size of the search region and the maximal velocity of the evaders.


Real-Time target detection in maritime scenarios based on YOLOv3 model

arXiv.org Machine Learning

In this work a novel ships dataset is proposed consisting of more than 56k images of marine vessels collected by means of web-scraping and including 12 ship categories. A YOLOv3 single-stage detector based on Keras API is built on top of this dataset. Current results on four categories (cargo ship, naval ship, oil ship and tug ship) show Average Precision up to 96% for Intersection over Union (IoU) of 0.5 and satisfactory detection performances up to IoU of 0.8. A Data Analytics GUI service based on QT framework and Darknet-53 engine is also implemented in order to simplify the deployment process and analyse massive amount of images even for people without Data Science expertise.


Exploring Chemical Space using Natural Language Processing Methodologies for Drug Discovery

arXiv.org Machine Learning

Biochemical methods that measure affinity and biophysical methods that describe the interaction in atomistic level detail have provided valuable information toward a mechanistic explanation for bimolecular recognition [1]. However, more often than not, compounds with drug potential are discovered serendipitously or by phenotypic drug discovery [2] since this highly specific interaction is still difficult to predict [3]. Protein structure based computational strategies such as docking [4], ultra-large library docking for discovering new chemotypes [5], and molecular dynamics simulations [4] or ligand based strategies such as quantitative structure-activity relationship (QSAR) [6, 7], and molecular similarity [8] have been powerful at narrowing down the list of compounds to be tested experimentally. With the increase in available data, machine learning and deep learning architectures are also starting to play a significant role in cheminformatics and drug discovery [9]. These approaches often require extensive computational resources or they are limited by the availability of 3D information. On the other hand, text based representations of biochemical entities are more readily available as evidenced by the 19,588 biomolecular complexes (3D structures) in PDB-Bind [10] (accessed on Nov 13, 2019) compared with 561,356 (manually annotated and reviewed) protein sequences in Uniprot [11] (accessed on Nov 13, 2019) or 97 million compounds in Pubchem [12] (accessed on Nov 13, 2019). The advances in natural language processing (NLP) methodologies make processing of text based representations of biomolecules an area of intense research interest. The discipline of natural language processing (NLP) comprises a variety of methods that explore a large amount of textual data in order to bring unstructured, latent (or hidden) knowledge to the fore [13]. Advances in this field are beneficial for tasks that use language (textual data) to build insight.


Improving Deep Learning For Airbnb Search

arXiv.org Machine Learning

The application of deep learning to search ranking was one of the most impactful product improvements at Airbnb. But what comes next after you launch a deep learning model? In this paper we describe the journey beyond, discussing what we refer to as the ABCs of improving search: A for architecture, B for bias and C for cold start. For architecture, we describe a new ranking neural network, focusing on the process that evolved our existing DNN beyond a fully connected two layer network. On handling positional bias in ranking, we describe a novel approach that led to one of the most significant improvements in tackling inventory that the DNN historically found challenging. To solve cold start, we describe our perspective on the problem and changes we made to improve the treatment of new listings on the platform. We hope ranking teams transitioning to deep learning will find this a practical case study of how to iterate on DNNs.


Combining Machine Learning with Knowledge-Based Modeling for Scalable Forecasting and Subgrid-Scale Closure of Large, Complex, Spatiotemporal Systems

arXiv.org Machine Learning

We consider the commonly encountered situation (e.g., in weather forecasting) where the goal is to predict the time evolution of a large, spatiotemporally chaotic dynamical system when we have access to both time series data of previous system states and an imperfect model of the full system dynamics. Specifically, we attempt to utilize machine learning as the essential tool for integrating the use of past data into predictions. In order to facilitate scalability to the common scenario of interest where the spatiotemporally chaotic system is very large and complex, we propose combining two approaches:(i) a parallel machine learning prediction scheme; and (ii) a hybrid technique, for a composite prediction system composed of a knowledge-based component and a machine-learning-based component. We demonstrate that not only can this method combining (i) and (ii) be scaled to give excellent performance for very large systems, but also that the length of time series data needed to train our multiple, parallel machine learning components is dramatically less than that necessary without parallelization. Furthermore, considering cases where computational realization of the knowledge-based component does not resolve subgrid-scale processes, our scheme is able to use training data to incorporate the effect of the unresolved short-scale dynamics upon the resolved longer-scale dynamics ("subgrid-scale closure").