AITopics

2009.04462

Country:

Europe (1.00)
North America > United States > New Jersey > Mercer County > Princeton (0.14)
North America > United States > California > Los Angeles County (0.14)

Genre:

Research Report (1.00)
Overview (1.00)

Industry:

Media (1.00)
Marketing (1.00)
Information Technology > Security & Privacy (1.00)
(4 more...)

Technology:

Information Technology > e-Commerce (1.00)
Information Technology > Security & Privacy (1.00)
Information Technology > Information Management (1.00)
(6 more...)

arXiv.org Artificial IntelligenceOct-27-2020

Cross-lingual Machine Reading Comprehension with Language Branch Knowledge Distillation

Liu, Junhao, Shou, Linjun, Pei, Jian, Gong, Ming, Yang, Min, Jiang, Daxin

Cross-lingual Machine Reading Comprehension (CLMRC) remains a challenging problem due to the lack of large-scale annotated datasets in low-source languages, such as Arabic, Hindi, and Vietnamese. Many previous approaches use translation data by translating from a rich-source language, such as English, to low-source languages as auxiliary supervision. However, how to effectively leverage translation data and reduce the impact of noise introduced by translation remains onerous. In this paper, we tackle this challenge and enhance the cross-lingual transferring performance by a novel augmentation approach named Language Branch Machine Reading Comprehension (LBMRC). A language branch is a group of passages in one single language paired with questions in all target languages. We train multiple machine reading comprehension (MRC) models proficient in individual language based on LBMRC. Then, we devise a multilingual distillation approach to amalgamate knowledge from multiple language branch models to a single model for all target languages. Combining the LBMRC and multilingual distillation can be more robust to the data noises, therefore, improving the model's cross-lingual ability. Meanwhile, the produced single multilingual model is applicable to all target languages, which saves the cost of training, inference, and maintenance for multiple models. Extensive experiments on two CLMRC benchmarks clearly show the effectiveness of our proposed method.

artificial intelligence, dataset, machine translation, (18 more...)

2010.14271

Country: Asia > China (0.28)

Genre: Research Report (1.00)

Industry: Education > Assessment & Standards > Student Performance (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Artificial IntelligenceOct-14-2020

A Graph Representation of Semi-structured Data for Web Question Answering

Zhang, Xingyao, Shou, Linjun, Pei, Jian, Gong, Ming, Wen, Lijie, Jiang, Daxin

The abundant semi-structured data on the Web, such as HTML-based tables and lists, provide commercial search engines a rich information source for question answering (QA). Different from plain text passages in Web documents, Web tables and lists have inherent structures, which carry semantic correlations among various elements in tables and lists. Many existing studies treat tables and lists as flat documents with pieces of text and do not make good use of semantic information hidden in structures. In this paper, we propose a novel graph representation of Web tables and lists based on a systematic categorization of the components in semi-structured data as well as their relations. We also develop pre-training and reasoning techniques on the graph model for the QA task. Extensive experiments on several real datasets collected from a commercial engine verify the effectiveness of our approach. Our method improves F1 score by 3.90 points over the state-of-the-art baselines.

deep learning, neural network, relation, (20 more...)

2010.06801

Country: North America > United States (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(3 more...)

arXiv.org Machine LearningAug-6-2020

Momentum-Based Policy Gradient Methods

Huang, Feihu, Gao, Shangqian, Pei, Jian, Huang, Heng

In the paper, we propose a class of efficient momentum-based policy gradient methods for the model-free reinforcement learning, which use adaptive learning rates and do not require any large batches. Specifically, we propose a fast important-sampling momentum-based policy gradient (IS-MBPG) method based on a new momentum-based variance reduced technique and the importance sampling technique. We also propose a fast Hessian-aided momentum-based policy gradient (HA-MBPG) method based on the momentum-based variance reduced technique and the Hessian-aided technique. Moreover, we prove that both the IS-MBPG and HA-MBPG methods reach the best known sample complexity of $O(\epsilon^{-3})$ for finding an $\epsilon$-stationary point of the non-concave performance function, which only require one trajectory at each iteration. In particular, we present a non-adaptive version of IS-MBPG method, i.e., IS-MBPG*, which also reaches the best known sample complexity of $O(\epsilon^{-3})$ without any large batches. In the experiments, we apply four benchmark tasks to demonstrate the effectiveness of our algorithms.

algorithm, artificial intelligence, reinforcement learning, (12 more...)

2007.0668

Country:

North America > United States (0.14)
North America > Canada (0.14)

Genre: Research Report (0.63)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)

arXiv.org Machine LearningJul-7-2020

Personalized Federated Learning: An Attentive Collaboration Approach

Huang, Yutao, Chu, Lingyang, Zhou, Zirui, Wang, Lanjun, Liu, Jiangchuan, Pei, Jian, Zhang, Yong

For the challenging computational environment of IOT/edge computing, personalized federated learning allows every client to train a strong personalized cloud model by effectively collaborating with the other clients in a privacy-preserving manner. The performance of personalized federated learning is largely determined by the effectiveness of inter-client collaboration. However, when the data is non-IID across all clients, it is challenging to infer the collaboration relationships between clients without knowing their data distributions. In this paper, we propose to tackle this problem by a novel framework named federated attentive message passing (FedAMP) that allows each client to collaboratively train its own personalized cloud model without using a global model. FedAMP implements an attentive collaboration mechanism by iteratively encouraging clients with more similar model parameters to have stronger collaborations. This adaptively discovers the underlying collaboration relationships between clients, which significantly boosts effectiveness of collaboration and leads to the outstanding performance of FedAMP. We establish the convergence of FedAMP for both convex and non-convex models, and further propose a heuristic method that resembles the FedAMP framework to further improve its performance for federated learning with deep neural networks. Extensive experiments demonstrate the superior performance of our methods in handling non-IID data, dirty data and dropped clients.

deep learning, heurfedamp, neural network, (17 more...)

2007.03797

Country: North America > Canada > British Columbia > Metro Vancouver Regional District > Burnaby (0.14)

Genre: Research Report (1.00)

Industry:

Information Technology (0.68)
Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

arXiv.org Machine LearningJun-16-2020

Measuring Model Complexity of Neural Networks with Curve Activation Functions

Hu, Xia, Liu, Weiqing, Bian, Jiang, Pei, Jian

It is fundamental to measure model complexity of deep neural networks. The existing literature on model complexity mainly focuses on neural networks with piecewise linear activation functions. Model complexity of neural networks with general curve activation functions remains an open problem. To tackle the challenge, in this paper, we first propose the linear approximation neural network (LANN for short), a piecewise linear framework to approximate a given deep model with curve activation function. LANN constructs individual piecewise linear approximation for the activation function of each neuron, and minimizes the number of linear regions to satisfy a required approximation degree. Then, we analyze the upper bound of the number of linear regions formed by LANNs, and derive the complexity measure based on the upper bound. To examine the usefulness of the complexity measure, we experimentally explore the training process of neural networks and detect overfitting. Our results demonstrate that the occurrence of overfitting is positively correlated with the increase of model complexity during training. We find that the $L^1$ and $L^2$ regularizations suppress the increase of model complexity. Finally, we propose two approaches to prevent overfitting by directly constraining model complexity, namely neuron pruning and customized $L^1$ regularization.

complexity, deep learning, neural network, (18 more...)

2006.08962

Country: North America > Canada (0.28)

Genre: Research Report > New Finding (0.86)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

arXiv.org Machine LearningJun-7-2020

Eigen-GNN: A Graph Structure Preserving Plug-in for GNNs

Zhang, Ziwei, Cui, Peng, Pei, Jian, Wang, Xin, Zhu, Wenwu

Graph Neural Networks (GNNs) are emerging machine learning models on graphs. Although sufficiently deep GNNs are shown theoretically capable of fully preserving graph structures, most existing GNN models in practice are shallow and essentially feature-centric. We show empirically and analytically that the existing shallow GNNs cannot preserve graph structures well. To overcome this fundamental challenge, we propose Eigen-GNN, a simple yet effective and general plug-in module to boost GNNs ability in preserving graph structures. Specifically, we integrate the eigenspace of graph structures with GNNs by treating GNNs as a type of dimensionality reduction and expanding the initial dimensionality reduction bases. Without needing to increase depths, Eigen-GNN possesses more flexibilities in handling both feature-driven and structure-driven tasks since the initial bases contain both node features and graph structures. We present extensive experimental results to demonstrate the effectiveness of Eigen-GNN for tasks including node classification, link prediction, and graph isomorphism tests.

artificial intelligence, graph structure, neural network, (19 more...)

2006.0433

Country: North America > United States (0.14)

Genre:

Research Report > New Finding (0.68)
Research Report > Experimental Study (0.46)

Industry:

Information Technology (0.47)
Energy > Power Industry (0.46)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

arXiv.org Artificial IntelligenceSep-12-2019

Exact and Consistent Interpretation for Piecewise Linear Neural Networks: A Closed Form Solution

Chu, Lingyang, Hu, Xia, Hu, Juhua, Wang, Lanjun, Pei, Jian

Strong intelligent machines powered by deep neural networks are increasingly deployed as black boxes to make decisions in risk-sensitive domains, such as finance and medical. To reduce potential risk and build trust with users, it is critical to interpret how such machines make their decisions. Existing works interpret a pre-trained neural network by analyzing hidden neurons, mimicking pre-trained models or approximating local predictions. However, these methods do not provide a guarantee on the exactness and consistency of their interpretation. In this paper, we propose an elegant closed form solution named $OpenBox$ to compute exact and consistent interpretations for the family of Piecewise Linear Neural Networks (PLNN). The major idea is to first transform a PLNN into a mathematically equivalent set of linear classifiers, then interpret each linear classifier by the features that dominate its prediction. We further apply $OpenBox$ to demonstrate the effectiveness of non-negative and sparse constraints on improving the interpretability of PLNNs. The extensive experiments on both synthetic and real world data sets clearly demonstrate the exactness and consistency of our interpretation.

decision feature, deep learning, neural network, (17 more...)

1802.06259

Country:

North America > Canada (0.28)
North America > United States > New York (0.14)

Genre: Research Report (1.00)

Industry: Transportation (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

arXiv.org Machine LearningJul-29-2019

Nonconvex Zeroth-Order Stochastic ADMM Methods with Lower Function Query Complexity

Huang, Feihu, Gao, Shangqian, Pei, Jian, Huang, Heng

Zeroth-order (gradient-free) method is a class of powerful optimization tool for many machine learning problems because it only needs function values (not gradient) in the optimization. In particular, zeroth-order method is very suitable for many complex problems such as black-box attacks and bandit feedback, whose explicit gradients are difficult or infeasible to obtain. Recently, although many zeroth-order methods have been developed, these approaches still exist two main drawbacks: 1) high function query complexity; 2) not being well suitable for solving the problems with complex penalties and constraints. To address these challenging drawbacks, in this paper, we propose a novel fast zeroth-order stochastic alternating direction method of multipliers (ADMM) method (\emph{i.e.}, ZO-SPIDER-ADMM) with lower function query complexity for solving nonconvex problems with multiple nonsmooth penalties. Moreover, we prove that our ZO-SPIDER-ADMM has the optimal function query complexity of $O(dn + dn^{\frac{1}{2}}\epsilon^{-1})$ for finding an $\epsilon$-approximate local solution, where $n$ and $d$ denote the sample size and dimension of data, respectively. In particular, the ZO-SPIDER-ADMM improves the existing best nonconvex zeroth-order ADMM methods by a factor of $O(d^{\frac{1}{3}}n^{\frac{1}{6}})$. Moreover, we propose a fast online ZO-SPIDER-ADMM (\emph{i.e.,} ZOO-SPIDER-ADMM). Our theoretical analysis shows that the ZOO-SPIDER-ADMM has the function query complexity of $O(d\epsilon^{-\frac{3}{2}})$, which improves the existing best result by a factor of $O(\epsilon^{-\frac{1}{2}})$. Finally, we utilize a task of structured adversarial attack on black-box deep neural networks to demonstrate the efficiency of our algorithms.

information retrieval query processing, null 2, optimization problem, (18 more...)

1907.13463

Country: North America (0.14)

Genre: Research Report (0.63)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.87)

arXiv.org Machine LearningJun-17-2019

Exact and Consistent Interpretation of Piecewise Linear Models Hidden behind APIs: A Closed Form Solution

Cong, Zicun, Chu, Lingyang, Wang, Lanjun, Hu, Xia, Pei, Jian

More and more AI services are provided through APIs on cloud where predictive models are hidden behind APIs. To build trust with users and reduce potential application risk, it is important to interpret how such predictive models hidden behind APIs make their decisions. The biggest challenge of interpreting such predictions is that no access to model parameters or training data is available. Existing works interpret the predictions of a model hidden behind an API by heuristically probing the response of the API with perturbed input instances. However, these methods do not provide any guarantee on the exactness and consistency of their interpretations. In this paper, we propose an elegant closed form solution named \texttt{OpenAPI} to compute exact and consistent interpretations for the family of Piecewise Linear Models (PLM), which includes many popular classification models. The major idea is to first construct a set of overdetermined linear equation systems with a small set of perturbed instances and the predictions made by the model on those instances. Then, we solve the equation systems to identify the decision features that are responsible for the prediction on an input instance. Our extensive experiments clearly demonstrate the exactness and consistency of our method.

deep learning, interpretation, neural network, (21 more...)

1906.06857

Country: Europe (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.68)