Goto

Collaborating Authors

 Deep Learning


Collaborative Deep Learning in Fixed Topology Networks

arXiv.org Machine Learning

In this paper, we address the scalability of optimization algorithms for deep learning in a distributed setting. Scaling up deep learning [1] is becoming increasingly crucial for large-scale applications where the sizes of both the available data as well as the models are massive [2]. Among various algorithmic advances, many recent attempts have been made to parallelize stochastic gradient descent (SGD) based learning schemes across multiple computing agents. An early approach called Downpour SGD [3], developed within Google's disbelief software framework, primarily focuses on model parallelization (i.e., splitting the model across the agents). A different approach known as elastic averaging SGD (EASGD) [4] attempts to improve perform multiple SGDs in parallel; this method uses a central parameter server that helps in assimilating parameter updates from the computing agents. However, none of the above approaches concretely address the issue of data parallelization, which is an important issue for several learning scenarios: for example, data parallelization enables privacy-preserving learning in scenarios such as distributed learning with a network of mobile and Internet-of-Things (IoT) devices. A recent scheme called Federated Averaging SGD [5] attempts such a data parallelization in the context of deep learning with significant success; however, they still use a central parameter server. In contrast, deep learning with decentralized computation can be achieved via gossip SGD algorithms [6, 7], where agents communicate probabilistically without the aid of a parameter server.


Deep Interest Network for Click-Through Rate Prediction

arXiv.org Machine Learning

Display advertising business brings billions dollars income yearly in Alibaba. In cost-per-click (CPC) advertising system, advertisements are ranked by the eCPM (effective cost per mille) which is the product of the bid price and CTR (click-through rate). Hence, the performance of CTR prediction model has a straight impact on the final revenue and plays a key role in the advertising system. Driven by the success of deep learning in image recognition, computer vision and natural language processing, a number of deep learning based methods have been proposed for CTR prediction task [1, 2, 3, 4]. These methods usually first employ embedding layer on the input, mapping original large scale sparse id features to the distributed representations, then add fully connected layers (in other words, multilayer perceptrons, MLPs) to automatically learn the nonlinear relations among features. Compared to traditional commonly used logistic regression model [5, 6]. MLPs can reduce a lot of feature engineering jobs, which is time and manpower consuming in industry applications. MLPs now have become a popular model structure on CTR prediction problem. However, in the fields with rich internet-scale user behavior data, such as online advertising and recommendation system in e-commence industry, these MLPs models often lack of deep understanding and exploiting the specific structures of behavior data, thus leave space for further improvement.


Adversarial Neural Machine Translation

arXiv.org Machine Learning

In this paper, we study a new learning paradigm for Neural Machine Translation (NMT). Instead of maximizing the likelihood of the human translation as in previous works, we minimize the distinction between human translation and the translation given by an NMT model. To achieve this goal, inspired by the recent success of generative adversarial networks (GANs), we employ an adversarial training architecture and name it as Adversarial-NMT. In Adversarial-NMT, the training of the NMT model is assisted by an adversary, which is an elaborately designed Convolutional Neural Network (CNN). The goal of the adversary is to differentiate the translation result generated by the NMT model from that by human. The goal of the NMT model is to produce high quality translations so as to cheat the adversary. A policy gradient method is leveraged to co-train the NMT model and the adversary. Experimental results on English$\rightarrow$French and German$\rightarrow$English translation tasks show that Adversarial-NMT can achieve significantly better translation quality than several strong baselines.


Deep Learning Research Review Week 1: Generative Adversarial Nets

@machinelearnbot

This week, I'll be doing a new series called Deep Learning Research Review. Every couple weeks or so, I'll be summarizing and explaining research papers in specific subfields of deep learning. This week I'll begin with Generative Adversarial Networks. According to Yann LeCun, "adversarial training is the coolest thing since sliced bread". I'm inclined to believe so because I don't think sliced bread ever created this much buzz and excitement within the deep learning community.


Using Apache Spark with Intel BigDL on Mesosphere DC/OS ยท Blog

#artificialintelligence

Deep learning is becoming more and more pervasive as a machine learning technique across various domains like healthcare, transportation, communications, manufacturing and many other areas. As part of our work on Lightbend Fast Data Platform, we have been exploring various deep learning libraries. BigDL is a distributed deep learning library from Intel released and open-sourced in 2016. Besides offering most of the popular neural net topologies out of the box, BigDL boasts of extremely high performance through its usage of Intel MKL library for numerical computation. BigDL supports import/export of networks pre-trained in TensorFlow, Caffe or Torch and there are plans to include interoperability with other libraries in the market.


Deep Learning personalization of Internet is next big leap - AI Trends

#artificialintelligence

Deep learning is a subfield of machine learning and it comprises several approaches to tackling the single most important goal of AI research: allowing computers to model our world well enough to exhibit something like what we humans call intelligence. On a basic conceptual level, deep learning approaches share a very basic trait. DL algorithms interpret the raw data through multiple processing layers. Each of these layers takes the output of the previous one as its input and creates a more abstract representation of it. As a result, the more data is being fed into the right algorithm, the more general are the rules and features that it's able to infer in relation to a given scenario and, therefore, the apter it gets at handling new, similar situations.



Tesla hired a top AI expert to lead a critical aspect of Autopilot -- here's what we know

#artificialintelligence

Tesla has completely shaken up its Autopilot team, and its newest addition is Andrej Karpathy, the new director of artificial intelligence and Autopilot vision. Karpathy will replace David Nistรฉr, Tesla's VP of Autopilot vision, who left the company quietly in March to join NVIDIA, according to his LinkedIn. Karpathy will play a critical role in advancing the company's Autopilot system to the point where a Tesla can drive itself from Los Angeles to New York before the year ends. Karpathy is considered a leading expert in computer vision. He received a pHd in machine learning and computer vision from Stanford University.



Uncle Sam Wants Your Deep Neural Networks

#artificialintelligence

Companies like Google and Facebook use the technology to do things like identify faces in online images, recognize commands spoken into smartphones and translate one language into another. But the possibilities extend well beyond smartphone apps and other online services. Earlier this year, Kaggle ran a $1 million contest to build algorithms capable of identifying signs of lung cancer in CT scans, helping to fuel a larger effort to apply neural networks to health care. Now, the hope is that neural networks can also help automated systems read body scans with greater accuracy, so checkpoint workers can spend less time pulling passengers aside and patting them down. "We started by trying to figure out what was a dog and what was a cat," said Goldbloom, referring to the growing community of companies, academics and other researchers working with neural networks.