AITopics

doi: 10.1109/MASS52906.2021.00032

2103.16031

Country: North America > United States (1.00)

Genre: Research Report (0.50)

Industry:

Information Technology > Security & Privacy (0.90)
Government > Regional Government > North America Government > United States Government (0.49)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.61)

arXiv.org Artificial IntelligenceDec-13-2020

C2C-GenDA: Cluster-to-Cluster Generation for Data Augmentation of Slot Filling

Hou, Yutai, Chen, Sanyuan, Che, Wanxiang, Chen, Cheng, Liu, Ting

Slot filling, a fundamental module of spoken language understanding, often suffers from insufficient quantity and diversity of training data. To remedy this, we propose a novel Cluster-to-Cluster generation framework for Data Augmentation (DA), named C2C-GenDA. It enlarges the training set by reconstructing existing utterances into alternative expressions while keeping semantic. Different from previous DA works that reconstruct utterances one by one independently, C2C-GenDA jointly encodes multiple existing utterances of the same semantics and simultaneously decodes multiple unseen expressions. Jointly generating multiple new utterances allows to consider the relations between generated instances and encourages diversity. Besides, encoding multiple existing utterances endows C2C with a wider view of existing expressions, helping to reduce generation that duplicates existing data. Experiments on ATIS and Snips datasets show that instances augmented by C2C-GenDA improve slot filling by 7.99 (11.9%) and 5.76 (13.6%) F-scores respectively, when there are only hundreds of training utterances.

deep learning, speech recognition, utterance, (19 more...)

2012.07004

Country: Asia > China (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.95)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.67)

arXiv.org Artificial IntelligenceNov-12-2020

Neural Network Training Techniques Regularize Optimization Trajectory: An Empirical Study

Chen, Cheng, Yang, Junjie, Zhou, Yi

Modern deep neural network (DNN) trainings utilize various training techniques, e.g., nonlinear activation functions, batch normalization, skip-connections, etc. Despite their effectiveness, it is still mysterious how they help accelerate DNN trainings in practice. In this paper, we provide an empirical study of the regularization effect of these training techniques on DNN optimization. Specifically, we find that the optimization trajectories of successful DNN trainings consistently obey a certain regularity principle that regularizes the model update direction to be aligned with the trajectory direction. Theoretically, we show that such a regularity principle leads to a convergence guarantee in nonconvex optimization and the convergence rate depends on a regularization parameter. Empirically, we find that DNN trainings that apply the training techniques achieve a fast convergence and obey the regularity principle with a large regularization parameter, implying that the model updates are well aligned with the trajectory. On the other hand, DNN trainings without the training techniques have slow convergence and obey the regularity principle with a small regularization parameter, implying that the model updates are not well aligned with the trajectory. Therefore, different training techniques regularize the model update direction via the regularity principle to facilitate the convergence.

artificial intelligence, machine learning, regularity principle, (19 more...)

doi: 10.1109/BigData50022.2020.9378359

2011.06702

Country: North America > United States (0.14)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

arXiv.org Machine LearningSep-22-2020

FedCluster: Boosting the Convergence of Federated Learning via Cluster-Cycling

Chen, Cheng, Chen, Ziyi, Zhou, Yi, Kailkhura, Bhavya

We develop FedCluster--a novel federated learning framework with improved optimization efficiency, and investigate its theoretical convergence properties. The FedCluster groups the devices into multiple clusters that perform federated learning cyclically in each learning round. Therefore, each learning round of FedCluster consists of multiple cycles of meta-update that boost the overall convergence. In nonconvex optimization, we show that FedCluster with the devices implementing the local {stochastic gradient descent (SGD)} algorithm achieves a faster convergence rate than the conventional {federated averaging (FedAvg)} algorithm in the presence of device-level data heterogeneity. We conduct experiments on deep learning applications and demonstrate that FedCluster converges significantly faster than the conventional federated learning under diverse levels of device-level data heterogeneity for a variety of local optimizers.

deep learning, fedcluster, neural network, (18 more...)

2009.10748

Genre: Research Report (0.40)

Industry: Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.54)

arXiv.org Artificial IntelligenceSep-17-2020

FewJoint: A Few-shot Learning Benchmark for Joint Language Understanding

Hou, Yutai, Mao, Jiafeng, Lai, Yongkui, Chen, Cheng, Che, Wanxiang, Chen, Zhigang, Liu, Ting

Few-learn learning (FSL) is one of the key future steps in machine learning and has raised a lot of attention. However, in contrast to the rapid development in other domains, such as Computer Vision, the progress of FSL in Nature Language Processing (NLP) is much slower. One of the key reasons for this is the lacking of public benchmarks. NLP FSL researches always report new results on their own constructed few-shot datasets, which is pretty inefficient in results comparison and thus impedes cumulative progress. In this paper, we present FewJoint, a novel Few-Shot Learning benchmark for NLP. Different from most NLP FSL research that only focus on simple N-classification problems, our benchmark introduces few-shot joint dialogue language understanding, which additionally covers the structure prediction and multi-task reliance problems. This allows our benchmark to reflect the real-word NLP complexity beyond simple N-classification. Our benchmark is used in the few-shot learning contest of SMP2020-ECDT task-1. We also provide a compatible FSL platform to ease experiment set-up.

benchmark, deep learning, neural network, (18 more...)

2009.08138

Country:

Asia > China (0.28)
North America > United States (0.28)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

arXiv.org Machine LearningSep-5-2020

Revisiting Co-Occurring Directions: Sharper Analysis and Efficient Algorithm for Sparse Matrices

Luo, Luo, Chen, Cheng, Xie, Guangzeng, Ye, Haishan

We study the streaming model for approximate matrix multiplication (AMM). We are interested in the scenario that the algorithm can only take one pass over the data with limited memory. The state-of-the-art deterministic sketching algorithm for streaming AMM is the co-occurring directions (COD), which has much smaller approximation errors than randomized algorithms and outperforms other deterministic sketching methods empirically. In this paper, we provide a tighter error bound for COD whose leading term considers the potential approximate low-rank structure and the correlation of input matrices. We prove COD is space optimal with respect to our improved error bound. We also propose a variant of COD for sparse matrices with theoretical guarantees. The experiments on real-world sparse datasets show that the proposed algorithm is more efficient than baseline methods.

artificial intelligence, matrix, natural language, (16 more...)

2009.02553

Country: Asia > China (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.93)
Information Technology > Data Science (0.68)

arXiv.org Machine LearningSep-13-2019

A Stochastic Proximal Point Algorithm for Saddle-Point Problems

Luo, Luo, Chen, Cheng, Li, Yujun, Xie, Guangzeng, Zhang, Zhihua

We consider saddle point problems which objective functions are the average of $n$ strongly convex-concave individual components. Recently, researchers exploit variance reduction methods to solve such problems and achieve linear-convergence guarantees. However, these methods have a slow convergence when the condition number of the problem is very large. In this paper, we propose a stochastic proximal point algorithm, which accelerates the variance reduction method SAGA for saddle point problems. Compared with the catalyst framework, our algorithm reduces a logarithmic term of condition number for the iteration complexity. We adopt our algorithm to policy evaluation and the empirical results show that our method is much more efficient than state-of-the-art methods.

neural network, null 2, optimization problem, (13 more...)

1909.06946

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

arXiv.org Artificial IntelligenceSep-10-2018

A Multi-Agent Reinforcement Learning Method for Impression Allocation in Online Display Advertising

Wu, Di, Chen, Cheng, Yang, Xun, Chen, Xiujun, Tan, Qing, Xu, Jian, Gai, Kun

In online display advertising, guaranteed contracts and real-time bidding (RTB) are two major ways to sell impressions for a publisher. Despite the increasing popularity of RTB, there is still half of online display advertising revenue generated from guaranteed contracts. Therefore, simultaneously selling impressions through both guaranteed contracts and RTB is a straightforward choice for a publisher to maximize its yield. However, deriving the optimal strategy to allocate impressions is not a trivial task, especially when the environment is unstable in real-world applications. In this paper, we formulate the impression allocation problem as an auction problem where each contract can submit virtual bids for individual impressions. With this formulation, we derive the optimal impression allocation strategy by solving the optimal bidding functions for contracts. Since the bids from contracts are decided by the publisher, we propose a multi-agent reinforcement learning (MARL) approach to derive cooperative policies for the publisher to maximize its yield in an unstable environment. The proposed approach also resolves the common challenges in MARL such as input dimension explosion, reward credit assignment, and non-stationary environment. Experimental evaluations on large-scale real datasets demonstrate the effectiveness of our approach.

artificial intelligence, impression, information technology services, (16 more...)

1809.03152

Country: Asia > China (0.14)

Genre: Research Report (0.50)

Industry:

Marketing (1.00)
Information Technology > Services (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

arXiv.org Machine LearningOct-26-2015

A Parallel algorithm for $\mathcal{X}$-Armed bandits

Chen, Cheng, Liu, Shuang, Zhang, Zhihua, Li, Wu-Jun

The target of $\mathcal{X}$-armed bandit problem is to find the global maximum of an unknown stochastic function $f$, given a finite budget of $n$ evaluations. Recently, $\mathcal{X}$-armed bandits have been widely used in many situations. Many of these applications need to deal with large-scale data sets. To deal with these large-scale data sets, we study a distributed setting of $\mathcal{X}$-armed bandits, where $m$ players collaborate to find the maximum of the unknown function. We develop a novel anytime distributed $\mathcal{X}$-armed bandit algorithm. Compared with prior work on $\mathcal{X}$-armed bandits, our algorithm uses a quite different searching strategy so as to fit distributed learning scenarios. Our theoretical analysis shows that our distributed algorithm is $m$ times faster than the classical single-player algorithm. Moreover, the number of communication rounds of our algorithm is only logarithmic in $mn$. The numerical results show that our method can make effective use of every players to minimize the loss. Thus, our distributed approach is attractive and useful.

algorithm, artificial intelligence, big data, (17 more...)

1510.07471

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.88)

arXiv.org Machine LearningSep-4-2015

Regret vs. Communication: Distributed Stochastic Multi-Armed Bandits and Beyond

Liu, Shuang, Chen, Cheng, Zhang, Zhihua

In this paper, we consider the distributed stochastic multi-armed bandit problem, where a global arm set can be accessed by multiple players independently. The players are allowed to exchange their history of observations with each other at specific points in time. We study the relationship between regret and communication. When the time horizon is known, we propose the Over-Exploration strategy, which only requires one-round communication and whose regret does not scale with the number of players. When the time horizon is unknown, we measure the frequency of communication through a new notion called the density of the communication set, and give an exact characterization of the interplay between regret and communication. Specifically, a lower bound is established and stable strategies that match the lower bound are developed. The results and analyses in this paper are specific but can be translated into more general settings.

artificial intelligence, big data, communication, (20 more...)

1504.03509

Genre: Research Report (0.64)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)