Asia
The Machines are Coming: China's role in the future of artificial intelligence
Try typing "the machines" into Google and chances are that one of the top results the artificial intelligence-powered search engine will return is the phrase: "The Machines are Coming". After a 2016 filled with high-profile advances in artificial intelligence (AI), leading technologists say this could be a breakout year in the development of intelligent machines that emulate humans. Asia, until now lagging Silicon Valley in AI, will play a bigger role as the field cements itself at the pinnacle of the technology world in 2017, the experts say. AI โ technically, a computing field that involves the analysis of large troves of data to predict outcomes and patterns โ is as old as modern computers but its esoteric nature means it has long endured caricatures of its actual potential โ think for example, the 1960s space age cartoon The Jetsons, which featured a sentient robot maid and automated flying cars (both of which we are still waiting for, even 50 years on). Now, a confluence of factors has given rise to hopes that computers with human-like cognitive ability may soon be a reality.
Android Circuit: New Galaxy S8 Leaks, Android Biggest Success In 2016, New Google Pixel Problem
Taking a look back at seven days of news and headlines across the world of Android, this week's Android Circuit includes a new voice for the Galaxy S8, the return of the S-Pen, Pixel power problems, Android's battery win, the shutdown of Cyanogen, WileyFox's quick change to Nougat, a North Korean Android tablet's spyware, and Super Mario Run prepares for its Android arrival. Android Circuit is here to remind you of a few of the many things that have happened around Android in the last week (and you can find the weekly Apple news digest here). The Samsung Galaxy S8 could be picking up a new tool named Bixby, a voice-powered digital assistant along the lines of Siri and Google Assistant. Viv Labs is the company behind the technology, and Samsung recently acquired it, so it makes sense for the South Koreans to stake its claim in this space. But will that upset Google?
Big Ideas in 2016: How 2016's tech trends are setting the stage for a smarter 2017
More than ever before, our day-to-day lives have become increasingly akin to something out of a sci-fi film, as the boundaries between real life and the digital world are increasingly blurred, and technology is ever more integrated into our lives. For example, the average person now has the ability to converse and work with artificial intelligence, while businesses are transforming the way they operate thanks to the advent of blockchain solutions. Some of these innovations have already been ramping up for the past few years, but they have secured their place as an increasingly integral part of our lives in 2016, making businesses more efficient and able to deepen their reach in key markets. Over the past year, computers have become smarter than ever. We've been talking to machines for decades, but in 2016, they now talk back, transforming how we work and play.
Dual Learning for Machine Translation
He, Di, Xia, Yingce, Qin, Tao, Wang, Liwei, Yu, Nenghai, Liu, Tie-Yan, Ma, Wei-Ying
While neural machine translation (NMT) is making good progress in the past two years, tens of millions of bilingual sentence pairs are needed for its training. However, human labeling is very costly. To tackle this training data bottleneck, we develop a dual-learning mechanism, which can enable an NMT system to automatically learn from unlabeled data through a dual-learning game. This mechanism is inspired by the following observation: any machine translation task has a dual task, e.g., English-to-French translation (primal) versus French-to-English translation (dual); the primal and dual tasks can form a closed loop, and generate informative feedback signals to train the translation models, even if without the involvement of a human labeler. In the dual-learning mechanism, we use one agent to represent the model for the primal task and the other agent to represent the model for the dual task, then ask them to teach each other through a reinforcement learning process. Based on the feedback signals generated during this process (e.g., the language-model likelihood of the output of a model, and the reconstruction error of the original sentence after the primal and dual translations), we can iteratively update the two models until convergence (e.g., using the policy gradient methods). We call the corresponding approach to neural machine translation \emph{dual-NMT}. Experiments show that dual-NMT works very well on English$\leftrightarrow$French translation; especially, by learning from monolingual data (with 10\% bilingual data for warm start), it achieves a comparable accuracy to NMT trained from the full bilingual data for the French-to-English translation task.
A Multi-Batch L-BFGS Method for Machine Learning
Berahas, Albert S., Nocedal, Jorge, Takac, Martin
The question of how to parallelize the stochastic gradient descent (SGD) method has received much attention in the literature. In this paper, we focus instead on batch methods that use a sizeable fraction of the training set at each iteration to facilitate parallelism, and that employ second-order information. In order to improve the learning process, we follow a multi-batch approach in which the batch changes at each iteration. This can cause difficulties because L-BFGS employs gradient differences to update the Hessian approximations, and when these gradients are computed using different data points the process can be unstable. This paper shows how to perform stable quasi-Newton updating in the multi-batch setting, illustrates the behavior of the algorithm in a distributed computing platform, and studies its convergence properties for both the convex and nonconvex cases.
Barzilai-Borwein Step Size for Stochastic Gradient Descent
Tan, Conghui, Ma, Shiqian, Dai, Yu-Hong, Qian, Yuqiu
One of the major issues in stochastic gradient descent (SGD) methods is how to choose an appropriate step size while running the algorithm. Since the traditional line search technique does not apply for stochastic optimization methods, the common practice in SGD is either to use a diminishing step size, or to tune a step size by hand, which can be time consuming in practice. In this paper, we propose to use the Barzilai-Borwein (BB) method to automatically compute step sizes for SGD and its variant: stochastic variance reduced gradient (SVRG) method, which leads to two algorithms: SGD-BB and SVRG-BB. We prove that SVRG-BB converges linearly for strongly convex objective functions. As a by-product, we prove the linear convergence result of SVRG with Option I proposed in [10], whose convergence result has been missing in the literature. Numerical experiments on standard data sets show that the performance of SGD-BB and SVRG-BB is comparable to and sometimes even better than SGD and SVRG with best-tuned step sizes, and is superior to some advanced SGD variants.
Understanding the Effective Receptive Field in Deep Convolutional Neural Networks
Luo, Wenjie, Li, Yujia, Urtasun, Raquel, Zemel, Richard
We study characteristics of receptive fields of units in deep convolutional networks. The receptive field size is a crucial issue in many visual tasks, as the output must respond to large enough areas in the image to capture information about large objects. We introduce the notion of an effective receptive field size, and show that it both has a Gaussian distribution and only occupies a fraction of the full theoretical receptive field size. We analyze the effective receptive field in several architecture designs, and the effect of sub-sampling, skip connections, dropout and nonlinear activations on it. This leads to suggestions for ways to address its tendency to be too small.
LightRNN: Memory and Computation-Efficient Recurrent Neural Networks
Li, Xiang, Qin, Tao, Yang, Jian, Liu, Tie-Yan
Recurrent neural networks (RNNs) have achieved state-of-the-art performances in many natural language processing tasks, such as language modeling and machine translation. However, when the vocabulary is large, the RNN model will become very big (e.g., possibly beyond the memory capacity of a GPU device) and its training will become very inefficient. In this work, we propose a novel technique to tackle this challenge. The key idea is to use 2-Component (2C) shared embedding for word representations. We allocate every word in the vocabulary into a table, each row of which is associated with a vector, and each column associated with another vector. Depending on its position in the table, a word is jointly represented by two components: a row vector and a column vector. Since the words in the same row share the row vector and the words in the same column share the column vector, we only need $2 \sqrt{|V|}$ vectors to represent a vocabulary of $|V|$ unique words, which are far less than the $|V|$ vectors required by existing approaches. Based on the 2-Component shared embedding, we design a new RNN algorithm and evaluate it using the language modeling task on several benchmark datasets. The results show that our algorithm significantly reduces the model size and speeds up the training process, without sacrifice of accuracy (it achieves similar, if not better, perplexity as compared to state-of-the-art language models). Remarkably, on the One-Billion-Word benchmark Dataset, our algorithm achieves comparable perplexity to previous language models, whilst reducing the model size by a factor of 40-100, and speeding up the training process by a factor of 2. We name our proposed algorithm \emph{LightRNN} to reflect its very small model size and very high training speed.
Deep Neural Networks with Inexact Matching for Person Re-Identification
Subramaniam, Arulkumar, Chatterjee, Moitreya, Mittal, Anurag
Person Re-Identification is the task of matching images of a person across multiple camera views. Almost all prior approaches address this challenge by attempting to learn the possible transformations that relate the different views of a person from a training corpora. Then, they utilize these transformation patterns for matching a query image to those in a gallery image bank at test time. This necessitates learning good feature representations of the images and having a robust feature matching technique. Deep learning approaches, such as Convolutional Neural Networks (CNN), simultaneously do both and have shown great promise recently. In this work, we propose two CNN-based architectures for Person Re-Identification. In the first, given a pair of images, we extract feature maps from these images via multiple stages of convolution and pooling. A novel inexact matching technique then matches pixels in the first representation with those of the second. Furthermore, we search across a wider region in the second representation for matching. Our novel matching technique allows us to tackle the challenges posed by large viewpoint variations, illumination changes or partial occlusions. Our approach shows a promising performance and requires only about half the parameters as a current state-of-the-art technique. Nonetheless, it also suffers from false matches at times. In order to mitigate this issue, we propose a fused architecture that combines our inexact matching pipeline with a state-of-the-art exact matching technique. We observe substantial gains with the fused model over the current state-of-the-art on multiple challenging datasets of varying sizes, with gains of up to about 21%.
Stochastic Gradient Geodesic MCMC Methods
Liu, Chang, Zhu, Jun, Song, Yang
We propose two stochastic gradient MCMC methods for sampling from Bayesian posterior distributions defined on Riemann manifolds with a known geodesic flow, e.g. hyperspheres. Our methods are the first scalable sampling methods on these manifolds, with the aid of stochastic gradients. Novel dynamics are conceived and 2nd-order integrators are developed. By adopting embedding techniques and the geodesic integrator, the methods do not require a global coordinate system of the manifold and do not involve inner iterations. Synthetic experiments show the validity of the method, and its application to the challenging inference for spherical topic models indicate practical usability and efficiency.