AITopics

2012.07436

Country:

North America > United States (0.14)
Asia > China (0.14)

Genre: Research Report (1.00)

Industry: Energy > Power Industry (0.87)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Data Science > Data Mining (0.85)

arXiv.org Artificial IntelligenceNov-24-2020

xFraud: Explainable Fraud Transaction Detection on Heterogeneous Graphs

Rao, Susie Xi, Zhang, Shuai, Han, Zhichao, Zhang, Zitao, Min, Wei, Chen, Zhiyao, Shan, Yinan, Zhao, Yang, Zhang, Ce

At online retail platforms, it is crucial to actively detect risks of fraudulent transactions to improve our customer experience, minimize loss, and prevent unauthorized chargebacks. Traditional rule-based methods and simple feature-based models are either inefficient or brittle and uninterpretable. The graph structure that exists among the heterogeneous typed entities of the transaction logs is informative and difficult to fake. To utilize the heterogeneous graph relationships and enrich the explainability, we present xFraud, an explainable Fraud transaction prediction system. xFraud is composed of a predictor which learns expressive representations for malicious transaction detection from the heterogeneous transaction graph via a self-attentive heterogeneous graph neural network, and an explainer that generates meaningful and human understandable explanations from graphs to facilitate further process in business unit. In our experiments with xFraud on two real transaction networks with up to ten millions transactions, we are able to achieve an area under a curve (AUC) score that outperforms baseline models and graph embedding methods. In addition, we show how the explainer could benefit the understanding towards model predictions and enhance model trustworthiness for real-world fraud transaction cases.

deep learning, law enforcement, transaction, (22 more...)

2011.12193

Country: Asia (0.28)

Genre: Research Report > New Finding (0.66)

Industry:

Law Enforcement & Public Safety > Fraud (1.00)
Information Technology (0.94)
Banking & Finance (0.93)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.88)
Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

arXiv.org Artificial IntelligenceOct-12-2020

MicroRec: Accelerating Deep Recommendation Systems to Microseconds by Hardware and Data Structure Solutions

Jiang, Wenqi, He, Zhenhao, Zhang, Shuai, Preußer, Thomas B., Zeng, Kai, Feng, Liang, Zhang, Jiansong, Liu, Tongxuan, Li, Yong, Zhou, Jingren, Zhang, Ce, Alonso, Gustavo

Deep neural networks are widely used in personalized recommendation systems. Unlike regular DNN inference workloads, recommendation inference is memory-bound due to the many random memory accesses needed to lookup the embedding tables. The inference is also heavily constrained in terms of latency because producing a recommendation for a user must be done in about tens of milliseconds. In this paper, we propose MicroRec, a high-performance inference engine for recommendation systems. MicroRec accelerates recommendation inference by (1) redesigning the data structures involved in the embeddings to reduce the number of lookups needed and (2) taking advantage of the availability of High-Bandwidth Memory (HBM) in FPGA accelerators to tackle the latency by enabling parallel lookups. We have implemented the resulting design on an FPGA board including the embedding lookup step as well as the complete inference process. Compared to the optimized CPU baseline (16 vCPU, AVX2-enabled), MicroRec achieves 13.8~14.7x speedup on embedding lookup alone and 2.5$~5.4x speedup for the entire recommendation inference in terms of throughput. As for latency, CPU-based engines needs milliseconds for inferring a recommendation while MicroRec only takes microseconds, a significant advantage in real-time recommendation systems.

cartesian product, deep learning, neural network, (17 more...)

2010.05894

Country:

Europe (0.14)
Asia (0.14)

Genre: Research Report (0.64)

Industry: Information Technology > Services (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Machine LearningSep-4-2020

Clustering COVID-19 Lung Scans

Householder, Jacob, Householder, Andrew, Gomez-Reed, John Paul, Park, Fredrick, Zhang, Shuai

With the recent outbreak of COVID-19, creating a means to stop it's spread and eventually develop a vaccine are the most important and challenging tasks that the scientific community is facing right now. The first step towards these goals is to correctly identify a patient that is infected with the virus. Our group applied an unsupervised machine learning technique to identify COVID-19 cases. This is an important topic as COVID-19 is a novel disease currently being studied in detail and our methodology has the potential to reveal important differences between it and other viral pneumonia. This could then, in turn, enable doctors to more confidently help each patient. Our experiments utilize Principal Component Analysis (PCA), t-distributed Stochastic Neighbor Embedding (t-SNE), and the recently developed Robust Continuous Clustering algorithm (RCC). We display the performance of RCC in identifying COVID-19 patients and its ability to compete with other unsupervised algorithms, namely K-Means++ (KM++). Using a COVID-19 Radiography dataset, we found that RCC outperformed KM++; we used the Adjusted Mutual Information Score (AMI) in order to measure the effectiveness of both algorithms. The AMI for the two and three class cases of KM++ were 0.0250 and 0.054, respectively. In comparison, RCC scored 0.5044 in the two class case and 0.267 in the three class case, clearly showing RCC as the superior algorithm. This not only opens new possible applications of RCC, but it could potentially aid in the creation of a new tool for COVID-19 identification.

algorithm, immunology, us government, (18 more...)

2009.09899

Country: North America > United States (0.94)

Genre: Research Report (0.83)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Government > Regional Government > North America Government > United States Government (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

arXiv.org Artificial IntelligenceSep-3-2020

Computational prediction of RNA tertiary structures using machine learning methods

Huang, Bin, Du, Yuanyang, Zhang, Shuai, Li, Wenfei, Wang, Jun, Zhang, Jian

RNAs play crucial and versatile roles in biological processes. Computational prediction approaches can help to understand RNA structures and their stabilizing factors, thus providing information on their functions, and facilitating the design of new RNAs. Machine learning (ML) techniques have made tremendous progress in many fields in the past few years. Although their usage in protein-related fields has a long history, the use of ML methods in predicting RNA tertiary structures is new and rare. Here, we review the recent advances of using ML methods on RNA structure predictions and discuss the advantages and limitation, the difficulties and potentials of these approaches when applied in the field. Introduction RNAs are macromolecules of crucial and versatile biological functions. To fully understand their functions, knowledge of the three-dimensional (3D) structures is essential. Since experimental approaches to determinate RNA 3D structures are difficult and expensive, many computational approaches have been developed to this purpose. To date, although template-based and homology-modeling methods could achieve high accuracies, de novo predictions still depends on the size and complexity of the RNA, and further improvement in predicting non-canonical interactions are required, according to the recent RNA-Puzzles round III. For a comprehensive study of the recent work, we refer readers to the relevant literature.

deep learning, neural network, prediction, (19 more...)

doi: 10.1088/1674-1056/abb303

2009.0144

Country: North America > United States > Massachusetts (0.14)

Genre: Research Report (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)

arXiv.org Artificial IntelligenceSep-2-2020

A Practical Chinese Dependency Parser Based on A Large-scale Dataset

Zhang, Shuai, Wang, Lijie, Sun, Ke, Xiao, Xinyan

Dependency parsing is a longstanding natural language processing task, with its outputs crucial to various downstream tasks. Recently, neural network based (NN-based) dependency parsing has achieved significant progress and obtained the state-of-the-art results. As we all know, NN-based approaches require massive amounts of labeled training data, which is very expensive because it requires human annotation by experts. Thus few industrial-oriented dependency parser tools are publicly available. In this report, we present Baidu Dependency Parser (DDParser), a new Chinese dependency parser trained on a large-scale manually labeled dataset called Baidu Chinese Treebank (DuCTB). DuCTB consists of about one million annotated sentences from multiple sources including search logs, Chinese newswire, various forum discourses, and conversation programs. DDParser is extended on the graph-based biaffine parser to accommodate to the characteristics of Chinese dataset. We conduct experiments on two test sets: the standard test set with the same distribution as the training set and the random test set sampled from other sources, and the labeled attachment scores (LAS) of them are 92.9% and 86.9% respectively. DDParser achieves the state-of-the-art results, and is released at https://github.com/baidu/DDParser.

deep learning, dependency, neural network, (14 more...)

2009.00901

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Machine LearningJun-24-2020

Fast Learning of Graph Neural Networks with Guaranteed Generalizability: One-hidden-layer Case

Zhang, Shuai, Wang, Meng, Liu, Sijia, Chen, Pin-Yu, Xiong, Jinjun

Although graph neural networks (GNNs) have made great progress recently on learning from graph-structured data in practice, their theoretical guarantee on generalizability remains elusive in the literature. In this paper, we provide a theoretically-grounded generalizability analysis of GNNs with one hidden layer for both regression and binary classification problems. Under the assumption that there exists a ground-truth GNN model (with zero generalization error), the objective of GNN learning is to estimate the ground-truth GNN parameters from the training data. To achieve this objective, we propose a learning algorithm that is built on tensor initialization and accelerated gradient descent. We then show that the proposed learning algorithm converges to the ground-truth GNN model for the regression problem, and to a model sufficiently close to the ground-truth for the binary classification problem. Moreover, for both cases, the convergence rate of the proposed learning algorithm is proven to be linear and faster than the vanilla gradient descent algorithm. We further explore the relationship between the sample complexity of GNNs and their underlying graph properties. Lastly, we provide numerical experiments to demonstrate the validity of our analysis and the effectiveness of the proposed learning algorithm for GNNs.

artificial intelligence, graph neural network, machine learning, (15 more...)

2006.14117

Country:

North America > United States (0.28)
Europe > Austria > Vienna (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

arXiv.org Machine LearningApr-23-2019

Quaternion Knowledge Graph Embedding

Zhang, Shuai, Tay, Yi, Yao, Lina, Liu, Qi

Complex-valued representations have demonstrated promising results on modeling relational data, i.e., knowledge graphs. This paper proposes a new knowledge graph embedding method. More concretely, we move beyond standard complex representations, adopting expressive hypercomplex representations for learning representations of entities and relations. Hypercomplex embeddings, or Quaternion embeddings (QuatE), are complex valued embeddings with three imaginary components. Different from standard complex (Hermitian) inner product, latent interdependencies (between all components) are aptly captured via the Hamilton product in Quaternion space, encouraging a more efficient and expressive representation learning process. Moreover, Quaternions are intuitively desirable for smooth and pure rotation in vector space, preventing noise from sheer/scaling operators. Finally, Quaternion inductive biases enjoy and satisfy the key desiderata of relational representation learning (i.e., modeling symmetry, anti-symmetry and inversion). Experimental results demonstrate that QuatE achieves state-of-the-art performance on four well-established knowledge graph completion benchmarks.

deep learning, neural network, quaternion, (19 more...)

1904.10281

Country: Oceania > Australia > New South Wales (0.14)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

arXiv.org Machine LearningMar-13-2019

Understanding Straight-Through Estimator in Training Activation Quantized Neural Nets

Yin, Penghang, Lyu, Jiancheng, Zhang, Shuai, Osher, Stanley, Qi, Yingyong, Xin, Jack

Training activation quantized neural networks involves minimizing a piecewise constant function whose gradient vanishes almost everywhere, which is undesirable for the standard back-propagation or chain rule. An empirical way around this issue is to use a straight-through estimator (STE) (Bengio et al., 2013) in the backward pass, so that the "gradient" through the modified chain rule becomes non-trivial. Since this unusual "gradient" is certainly not the gradient of loss function, the following question arises: why searching in its negative direction minimizes the training loss? In this paper, we provide the theoretical justification of the concept of STE by answering this question. We consider the problem of learning a two-linear-layer network with binarized ReLU activation and Gaussian input data. We shall refer to the unusual "gradient" given by the STE-modifed chain rule as coarse gradient. The choice of STE is not unique. We prove that if the STE is properly chosen, the expected coarse gradient correlates positively with the population gradient (not available for the training), and its negation is a descent direction for minimizing the population loss. We further show the associated coarse gradient descent algorithm converges to a critical point of the population loss minimization problem. Moreover, we show that a poor choice of STE leads to instability of the training algorithm near certain local minima, which is verified with CIFAR-10 experiments.

conference paper, deep learning, neural network, (18 more...)

1903.05662

Country: North America > United States > California (0.46)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

arXiv.org Machine LearningJan-24-2019

AutoShuffleNet: Learning Permutation Matrices via an Exact Lipschitz Continuous Penalty in Deep Convolutional Neural Networks

Lyu, Jiancheng, Zhang, Shuai, Qi, Yingyong, Xin, Jack

ShuffleNet is a state-of-the-art light weight convolutional neural network architecture. Its basic operations include group, channel-wise convolution and channel shuffling. However, channel shuffling is manually designed empirically. Mathematically, shuffling is a multiplication by a permutation matrix. In this paper, we propose to automate channel shuffling by learning permutation matrices in network training. We introduce an exact Lipschitz continuous non-convex penalty so that it can be incorporated in the stochastic gradient descent to approximate permutation at high precision. Exact permutations are obtained by simple rounding at the end of training and are used in inference. The resulting network, referred to as AutoShuffleNet, achieved improved classification accuracies on CIFAR-10 and ImageNet data sets. In addition, we found experimentally that the standard convex relaxation of permutation matrices into stochastic matrices leads to poor performance. We prove theoretically the exactness (error bounds) in recovering permutation matrices when our penalty function is zero (very small). We present examples of permutation optimization through graph matching and two-layer neural network models where the loss functions are calculated in closed analytical form. In the examples, convex relaxation failed to capture permutations whereas our penalty succeeded.

deep learning, matrix, neural network, (18 more...)

1901.08624

Country: North America > United States > California (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.85)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.69)