AITopics

Thirty-Second AAAI Conference on Artificial Intelligence

Country: North America > United States > Texas (0.14)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

arXiv.org Machine LearningJan-16-2018

Understanding the Disharmony between Dropout and Batch Normalization by Variance Shift

Li, Xiang, Chen, Shuo, Hu, Xiaolin, Yang, Jian

This paper first answers the question "why do the two most powerful techniques Dropout and Batch Normalization (BN) often lead to a worse performance when they are combined together?" in both theoretical and statistical aspects. Theoretically, we find that Dropout would shift the variance of a specific neural unit when we transfer the state of that network from train to test. However, BN would maintain its statistical variance, which is accumulated from the entire learning procedure, in the test phase. The inconsistency of that variance (we name this scheme as "variance shift") causes the unstable numerical behavior in inference that leads to more erroneous predictions finally, when applying Dropout before BN. Thorough experiments on DenseNet, ResNet, ResNeXt and Wide ResNet confirm our findings. According to the uncovered mechanism, we next explore several strategies that modifies Dropout and try to overcome the limitations of their combination by avoiding the variance shift risks.

deep learning, dropout 0, neural network, (12 more...)

1801.05134

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Neural Information Processing SystemsDec-31-2016

LightRNN: Memory and Computation-Efficient Recurrent Neural Networks

Li, Xiang, Qin, Tao, Yang, Jian, Liu, Tie-Yan

Recurrent neural networks (RNNs) have achieved state-of-the-art performances in many natural language processing tasks, such as language modeling and machine translation. However, when the vocabulary is large, the RNN model will become very big (e.g., possibly beyond the memory capacity of a GPU device) and its training will become very inefficient. In this work, we propose a novel technique to tackle this challenge. The key idea is to use 2-Component (2C) shared embedding for word representations. We allocate every word in the vocabulary into a table, each row of which is associated with a vector, and each column associated with another vector. Depending on its position in the table, a word is jointly represented by two components: a row vector and a column vector. Since the words in the same row share the row vector and the words in the same column share the column vector, we only need $2 \sqrt{|V|}$ vectors to represent a vocabulary of $|V|$ unique words, which are far less than the $|V|$ vectors required by existing approaches. Based on the 2-Component shared embedding, we design a new RNN algorithm and evaluate it using the language modeling task on several benchmark datasets. The results show that our algorithm significantly reduces the model size and speeds up the training process, without sacrifice of accuracy (it achieves similar, if not better, perplexity as compared to state-of-the-art language models). Remarkably, on the One-Billion-Word benchmark Dataset, our algorithm achieves comparable perplexity to previous language models, whilst reducing the model size by a factor of 40-100, and speeding up the training process by a factor of 2. We name our proposed algorithm \emph{LightRNN} to reflect its very small model size and very high training speed.

deep learning, lightrnn, neural network, (16 more...)

Neural Information Processing Systems

Country:

Europe > Spain (0.14)
Asia > China (0.14)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Neural Information Processing SystemsDec-31-2016

Large Margin Discriminant Dimensionality Reduction in Prediction Space

Saberian, Mohammad, Pereira, Jose Costa, Xu, Can, Yang, Jian, Nvasconcelos, Nuno

In this paper we establish a duality between boosting and SVM, and use this to derive a novel discriminant dimensionality reduction algorithm. In particular, using the multiclass formulation of boosting and SVM we note that both use a combination of mapping and linear classification to maximize the multiclass margin. In SVM this is implemented using a predefined mapping (induced by the kernel) and optimizing the linear classifiers. In boosting the linear classifiers are predefined and the mapping (predictor) is learned through a combination of weak learners. We argue that the intermediate mapping, i.e. boosting predictor, is preserving the discriminant aspects of the data and that by controlling the dimension of this mapping it is possible to obtain discriminant low dimensional representations for the data. We use the aforementioned duality and propose a new method, Large Margin Discriminant Dimensionality Reduction (LADDER) that jointly learns the mapping and the linear classifiers in an efficient manner. This leads to a data-driven mapping which can embed data into any number of dimensions. Experimental results show that this embedding can significantly improve performance on tasks such as hashing and image/scene classification.

artificial intelligence, codeword, machine learning, (16 more...)

Neural Information Processing Systems

Country:

North America > United States (0.14)
North America > Canada > Ontario > Toronto (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (0.83)

arXiv.org Machine LearningOct-21-2015

Inventory Control Involving Unknown Demand of Discrete Nonperishable Items - Analysis of a Newsvendor-based Policy

Katehakis, Michael N., Yang, Jian, Zhou, Tingting

Inventory control with unknown demand distribution is considered, with emphasis placed on the case involving discrete nonperishable items. We focus on an adaptive policy which in every period uses, as much as possible, the optimal newsvendor ordering quantity for the empirical distribution learned up to that period. The policy is assessed using the regret criterion, which measures the price paid for ambiguity on demand distribution over $T$ periods. When there are guarantees on the latter's separation from the critical newsvendor parameter $\beta=b/(h+b)$, a constant upper bound on regret can be found. Without any prior information on the demand distribution, we show that the regret does not grow faster than the rate $T^{1/2+\epsilon}$ for any $\epsilon>0$. In view of a known lower bound, this is almost the best one could hope for. Simulation studies involving this along with other policies are also conducted.

artificial intelligence, exp, machine learning, (16 more...)

1510.06463

Country: North America > United States (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.46)
Information Technology > Artificial Intelligence > Machine Learning (0.46)

AAAI ConferencesMar-6-2015

Sparse Deep Stacking Network for Image Classification

Li, Jun (Nanjing University of Science and Technology) | Chang, Heyou (Nanjing University of Science and Technology) | Yang, Jian (Nanjing University of Science and Technology)

Sparse coding can learn good robust representation to noise and model more higher-order representation for image classification. However, the inference algorithm is computationally expensive even though the supervised signals are used to learn compact and discriminative dictionaries in sparse coding techniques. Luckily, a simplified neural network module (SNNM) has been proposed to directly learn the discriminative dictionaries for avoiding the expensive inference. But the SNNM module ignores the sparse representations. Therefore, we propose a sparse SNNM module by adding the mixed-norm regularization (l1/l2 norm). The sparse SNNM modules are further stacked to build a sparse deep stacking network (S-DSN). In the experiments, we evaluate S-DSN with four databases, including Extended YaleB, AR, 15 scene and Caltech101. Experimental results show that our model outperforms related classification methods with only a linear classifier. It is worth noting that we reach 98.8% recognition accuracy on 15 scene.

deep learning, neural network, representation, (19 more...)

Twenty-Ninth AAAI Conference on Artificial Intelligence

Country: Asia > China (0.14)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

AAAI ConferencesMar-6-2015

Causal Inference via Sparse Additive Models with Application to Online Advertising

Sun, Wei (Purdue University) | Wang, Pengyuan (Yahoo! Labs) | Yin, Dawei (Yahoo! Labs) | Yang, Jian (Yahoo! Labs) | Chang, Yi (Yahoo! Labs)

Advertising effectiveness measurement is a fundamental problem in online advertising. Various causal inference methods have been employed to measure the causal effects of ad treatments. However, existing methods mainly focus on linear logistic regression for univariate and binary treatments and are not well suited for complex ad treatments of multi-dimensions, where each dimension could be discrete or continuous. In this paper we propose a novel two-stage causal inference framework for assessing the impact of complex ad treatments. In the first stage, we estimate the propensity parameter via a sparse additive model; in the second stage, a propensity-adjusted regression model is applied for measuring the treatment effect. Our approach is shown to provide an unbiased estimation of the ad effectiveness under regularity conditions. To demonstrate the efficacy of our approach, we apply it to a real online advertising campaign to evaluate the impact of three ad treatments: ad frequency, ad channel, and ad size. We show that the ad frequency usually has a treatment effect cap when ads are showing on mobile device. In addition, the strategies for choosing best ad size are completely different for mobile ads and online ads.

additive model, artificial intelligence, information technology services, (18 more...)

Twenty-Ninth AAAI Conference on Artificial Intelligence

Country: North America > United States > Indiana > Tippecanoe County (0.14)

Genre: Research Report > Experimental Study (0.67)

Industry:

Marketing (1.00)
Information Technology > Services (0.92)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.71)

arXiv.org Machine LearningSep-2-2014

Feature Selection in Conditional Random Fields for Map Matching of GPS Trajectories

Yang, Jian, Meng, Liqiu

Map matching of the GPS trajectory serves the purpose of recovering the original route on a road network from a sequence of noisy GPS observations. It is a fundamental technique to many Location Based Services. However, map matching of a low sampling rate on urban road network is still a challenging task. In this paper, the characteristics of Conditional Random Fields with regard to inducing many contextual features and feature selection are explored for the map matching of the GPS trajectories at a low sampling rate. Experiments on a taxi trajectory dataset show that our method may achieve competitive results along with the success of reducing model complexity for computation-limited applications.

ground transportation, optimization problem, trajectory, (17 more...)

1409.0791

Country: Europe > Germany (0.28)

Genre: Research Report (0.40)

Industry: Transportation > Ground > Road (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.70)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.69)

arXiv.org Machine LearningSep-2-2014

Feature Engineering for Map Matching of Low-Sampling-Rate GPS Trajectories in Road Network

Yang, Jian, Meng, Liqiu

Map matching of GPS trajectories from a sequence of noisy observations serves the purpose of recovering the original routes in a road network. In this work in progress, we attempt to share our experience of feature construction in a spatial database by reporting our ongoing experiment of feature extrac-tion in Conditional Random Fields (CRFs) for map matching. Our preliminary results are obtained from real-world taxi GPS trajectories.

gps observation, ground transportation, spatial reasoning, (14 more...)

1409.0797

Country: Europe > Germany (0.28)

Genre: Research Report (0.40)

Industry:

Transportation > Infrastructure & Services (1.00)
Transportation > Ground > Road (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.70)

AAAI ConferencesJul-14-2014

Delivering Guaranteed Display Ads under Reach and Frequency Requirements

Hojjat, Ali (University of California, Irvine) | Turner, John (University of California, Irvine) | Cetintas, Suleyman (Yahoo Labs) | Yang, Jian (Yahoo Labs)

We propose a novel idea in the allocation and serving of online advertising. We show that by using predetermined fixed-length streams of ads (which we call patterns) to serve advertising, we can incorporate a variety of interesting features into the ad allocation optimization problem. In particular, our formulation optimizes for representativeness as well as user-level diversity and pacing of ads, under reach and frequency requirements. We show how the problem can be solved efficiently using a column generation scheme in which only a small set of best patterns are kept in the optimization problem. Our numerical tests suggest that with parallelization of the pattern generation process, the algorithm has a promising run time and memory usage.

artificial intelligence, impression, optimization problem, (18 more...)

Twenty-Eighth AAAI Conference on Artificial Intelligence

Country: North America > United States > California (0.28)

Industry:

Marketing (1.00)
Information Technology > Services (0.49)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)