AITopics

Local ancestry inference (LAI) allows identification of the ancestry of all chromosomal segments in admixed individuals, and it is a critical step in the analysis of human genomes with applications from pharmacogenomics and precision medicine to genome-wide association studies. In recent years, many LAI techniques have been developed in both industry and academic research. However, these methods require large training data sets of human genomic sequences from the ancestries of interest. Such reference data sets are usually limited, proprietary, protected by privacy restrictions, or otherwise not accessible to the public. Techniques to generate training samples that resemble real haploid sequences from ancestries of interest can be useful tools in such scenarios, since a generalized model can often be shared, but the unique human sample sequences cannot. In this work we present a class-conditional VAE-GAN to generate new human genomic sequences that can be used to train local ancestry inference (LAI) algorithms. We evaluate the quality of our generated data by comparing the performance of a state-of-the-art LAI method when trained with generated versus real data.

ae-gan, ancestry, sequence, (16 more...)

1911.1322

Country:

Africa (0.08)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
North America > Montserrat (0.04)

Genre: Research Report (0.84)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Kratimenos, Agelos, Avramidis, Kleanthis, Garoufis, Christos, Zlatintsi, Athanasia, Maragos, Petros

Augmentation Methods on Monophonic Audio for Instrument Classification in Polyphonic Music

Instrument classification is one of the fields in Music Information Retrieval (MIR) that has attracted a lot of research interest. However, the majority of that is dealing with monophonic music, while efforts on polyphonic material mainly focus on predominant instrument recognition or multi-instrument recognition for entire tracks. We present an approach for instrument classification in polyphonic music using monophonic training data that involves mixing-augmentation methods. Specifically, we experiment with pitch and tempo-based synchronization, as well as mixes of tracks with similar music genres. Further, a custom CNN model is proposed, that uses the augmented training data efficiently and a plethora of suitable evaluation metrics are discussed as well. The tempo-sync and genre techniques stand out, achieving an 81% label ranking average precision accuracy, detecting up to 9 instruments in over 2300 testing tracks.

dataset, instrument, music, (14 more...)

1911.12505

Country:

Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
Europe > Spain > Andalusia > Málaga Province > Málaga (0.04)
Europe > Greece (0.04)
Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.04)

Genre: Research Report (0.50)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Dual-Attention Graph Convolutional Network

Zhang, Xueya, Zhang, Tong, Zhao, Wenting, Cui, Zhen, Yang, Jian

Graph convolutional networks (GCNs) have shown the powerful ability in text structure representation and effectively facilitate the task of text classification. However, challenges still exist in adapting GCN on learning discriminative features from texts due to the main issue of graph variants incurred by the textual complexity and diversity. In this paper, we propose a dual-attention GCN to model the structural information of various texts as well as tackle the graph-invariant problem through embedding two types of attention mechanisms, i.e. the connection-attention and hop-attention, into the classic GCN. To encode various connection patterns between neighbour words, connection-attention adaptively imposes different weights specified to neighbourhoods of each word, which captures the short-term dependencies. On the other hand, the hop-attention applies scaled coefficients to different scopes during the graph diffusion process to make the model learn more about the distribution of context, which captures long-term semantics in an adaptive way. Extensive experiments are conducted on five widely used datasets to evaluate our dual-attention GCN, and the achieved state-of-the-art performance verifies the effectiveness of dual-attention mechanisms.

classification, mechanism, text classification, (14 more...)

1911.12486

Country: Asia > China > Jiangsu Province > Nanjing (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.70)

Product Knowledge Graph Embedding for E-commerce

Xu, Da, Ruan, Chuanwei, Korpeoglu, Evren, Kumar, Sushant, Achan, Kannan

In this paper, we propose a new product knowledge graph (PKG) embedding approach for learning the intrinsic product relations as product knowledge for e-commerce. We define the key entities and summarize the pivotal product relations that are critical for general e-commerce applications including marketing, advertisement, search ranking and recommendation. We first provide a comprehensive comparison between PKG and ordinary knowledge graph (KG) and then illustrate why KG embedding methods are not suitable for PKG learning. We construct a self-attention-enhanced distributed representation learning model for learning PKG embeddings from raw customer activity data in an end-to-end fashion. We design an effective multi-task learning schema to fully leverage the multi-modal e-commerce data. The Poincare embedding is also employed to handle complex entity structures. We use a real-world dataset from grocery.walmart.com to evaluate the performances on knowledge completion, search ranking and recommendation. The proposed approach compares favourably to baselines in knowledge completion and downstream tasks.

complement, relation, representation, (15 more...)

doi: 10.1145/3336191.3371778

1911.12481

Country:

North America > United States > Texas > Harris County > Houston (0.04)
North America > United States > California > Santa Clara County > Sunnyvale (0.04)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report (0.50)

Industry: Information Technology > Services > e-Commerce Services (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Belbahri, Mouloud, Murua, Alejandro, Gandouet, Olivier, Nia, Vahid Partovi

Qini-based Uplift Regression

This article proposes methodology that identifies characteristics associated with a home insurance policy that can be used to infer the link between marketing intervention and policy renewal rate. Using the resulting statistical model, the goal is to predict which customers the company should focus on, in order to deploy future retention campaigns. A subscription-based company loses its customers when they stop doing business with their service. Also known as customer attrition, customer churn can be a drag on the business growth. It is less expensive to retain existing customers than to acquire new customers, so businesses put effort into marketing strategies to reduce customer attrition. Customer loyalty, on the other hand, is usually more profitable because the company have already earned the trust and loyalty of existing customers. Businesses mostly have a defined strategy for fighting customer churn over a period of time. Organizations are able to determine their success rate in customer loyalty and identify improvement strategies using available data and learning about churn.

coefficient, uplift, uplift model, (16 more...)

1911.12474

Country:

North America > Canada > Quebec > Montreal (0.04)
North America > Montserrat (0.04)
Europe > United Kingdom > Scotland > City of Edinburgh > Edinburgh (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Banking & Finance > Insurance (1.00)
Marketing (0.87)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.95)

Bayesian Optimization for Categorical and Category-Specific Continuous Inputs

Nguyen, Dang, Gupta, Sunil, Rana, Santu, Shilton, Alistair, Venkatesh, Svetha

Many real-world functions are defined over both categorical and category-specific continuous variables and thus cannot be optimized by traditional Bayesian optimization (BO) methods. To optimize such functions, we propose a new method that formulates the problem as a multi-armed bandit problem, wherein each category corresponds to an arm with its reward distribution centered around the optimum of the objective function in continuous variables. Our goal is to identify the best arm and the maximizer of the corresponding continuous function simultaneously. Our algorithm uses a Thompson sampling scheme that helps connecting both multi-arm bandit and BO in a unified framework. We extend our method to batch BO to allow parallel optimization when multiple resources are available. We theoretically analyze our method for convergence and prove sub-linear regret bounds. We perform a variety of experiments: optimization of several benchmark functions, hyper-parameter tuning of a neural network, and automatic selection of the best machine learning model along with its optimal hyper-parameters (a.k.a automated machine learning). Comparisons with other methods demonstrate the effectiveness of our proposed method.

algorithm, category, optimization, (15 more...)

1911.12473

Country: Oceania > Australia (0.14)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Data Science > Data Mining > Big Data (0.88)

Javed, Ali, Hamshaw, Scott D., Rizzo, Donna M., Lee, Byung Suk

Analysis of Hydrological and Suspended Sediment Events from Mad River Wastershed using Multivariate Time Series Clustering

Hydrological storm events are a primary driver for transporting water quality constituents such as turbidity, suspended sediments and nutrients. Analyzing the concentration (C) of these water quality constituents in response to increased streamflow discharge (Q), particularly when monitored at high temporal resolution during a hydrological event, helps to characterize the dynamics and flux of such constituents. A conventional approach to storm event analysis is to reduce the C-Q time series to two-dimensional (2-D) hysteresis loops and analyze these 2-D patterns. While effective and informative to some extent, this hysteresis loop approach has limitations because projecting the C-Q time series onto a 2-D plane obscures detail (e.g., temporal variation) associated with the C-Q relationships. In this paper, we address this issue using a multivariate time series clustering approach. Clustering is applied to sequences of river discharge and suspended sediment data (acquired through turbidity-based monitoring) from six watersheds located in the Lake Champlain Basin in the northeastern United States. While clusters of the hydrological storm events using the multivariate time series approach were found to be correlated to 2-D hysteresis loop classifications and watershed locations, the clusters differed from the 2-D hysteresis classifications. Additionally, using available meteorological data associated with storm events, we examine the characteristics of computational clusters of storm events in the study watersheds and identify the features driving the clustering approach.

characteristic, storm event, time sery, (16 more...)

1911.12466

Country:

North America > United States > Vermont > Chittenden County > Burlington (0.14)
North America > United States > California (0.04)
Asia > Middle East > Jordan (0.04)
(2 more...)

Genre:

Research Report > Experimental Study (0.68)
Research Report > New Finding (0.67)

Industry: Water & Waste Management > Water Management (0.86)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.69)

Information-Geometric Set Embeddings (IGSE): From Sets to Probability Distributions

Sun, Ke, Nielsen, Frank

This letter introduces an abstract learning problem called the ``set embedding'': The objective is to map sets into probability distributions so as to lose less information. We relate set union and intersection operations with corresponding interpolations of probability distributions. We also demonstrate a preliminary solution with experimental results on toy set embedding examples.

divergence, gaussian distribution, subset, (13 more...)

1911.12463

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
Oceania > Australia > New South Wales > Sydney (0.04)
North America > Canada (0.04)

Genre: Research Report (0.40)

Industry: Education (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)

Bosch, Samuel, de la Cerda, Alexander Sanchez, Rosing, Tajana Simunic, De Micheli, Giovanni

QubitHD: A Stochastic Acceleration Method for HD Computing-Based Machine Learning

Machine Learning algorithms based on Brain-inspired Hyperdimensional (HD) computing imitate cognition by exploiting statistical properties of high-dimensional vector spaces. It is a promising solution for achieving high energy-efficiency in different machine learning tasks, such as classification, semi-supervised learning and clustering. A weakness of existing HD computing-based ML algorithms is the fact that they have to be binarized for achieving very high energy-efficiency. At the same time, binarized models reach lower classification accuracies. To solve the problem of the trade-off between energy-efficiency and classification accuracy, we propose the QubitHD algorithm. It stochastically binarizes HD-based algorithms, while maintaining comparable classification accuracies to their non-binarized counterparts. The FPGA implementation of QubitHD provides a 65% improvement in terms of energy-efficiency, and a 95% improvement in terms of the training time, as compared to state-of-the-art HD-based ML algorithms. It also outperforms state-of-the-art low-cost classifiers (like Binarized Neural Networks) in terms of speed and energy-efficiency by an order of magnitude during training and inference.

accuracy, algorithm, classification accuracy, (13 more...)

1911.12446

Country:

Europe > Switzerland > Vaud > Lausanne (0.04)
Africa > Chad > Salamat (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
North America > United States > California > San Diego County > La Jolla (0.04)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (0.34)

Pramanik, Aniket, Aggarwal, Hemant, Jacob, Mathews

Calibrationless Parallel MRI using Model based Deep Learning (C-MODL)

We introduce a fast model based deep learning approach for calibrationless parallel MRI reconstruction. The proposed scheme is a non-linear generalization of structured low rank (SLR) methods that self learn linear annihilation filters from the same subject. It pre-learns non-linear annihilation relations in the Fourier domain from exemplar data. The pre-learning strategy significantly reduces the computational complexity, making the proposed scheme three orders of magnitude faster than SLR schemes. The proposed framework also allows the use of a complementary spatial domain prior; the hybrid regularization scheme offers improved performance over calibrated image domain MoDL approach. The calibrationless strategy minimizes potential mismatches between calibration data and the main scan, while eliminating the need for a fully sampled calibration region.

annihilation relation, coil sensitivity, reconstruction, (13 more...)

1911.12443

Country: North America > United States > Iowa > Johnson County > Iowa City (0.04)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)