Deep Learning
The Dark Secret at the Heart of AI
Last year, a strange self-driving car was released onto the quiet roads of Monmouth County, New Jersey. The experimental vehicle, developed by researchers at the chip maker Nvidia, didn't look different from other autonomous cars, but it was unlike anything demonstrated by Google, Tesla, or General Motors, and it showed the rising power of artificial intelligence. The car didn't follow a single instruction provided by an engineer or programmer. Instead, it relied entirely on an algorithm that had taught itself to drive by watching a human do it. Getting a car to drive this way was an impressive feat. But it's also a bit unsettling, since it isn't completely clear how the car makes its decisions. Information from the vehicle's sensors goes straight into a huge network of artificial neurons that process the data and then deliver the commands required to operate the steering wheel, the brakes, and other systems.
FastMask: Segment Multi-scale Object Candidates in One Shot
Hu, Hexiang, Lan, Shiyi, Jiang, Yuning, Cao, Zhimin, Sha, Fei
Objects appear to scale differently in natural images. This fact requires methods dealing with object-centric tasks (e.g. object proposal) to have robust performance over variances in object scales. In the paper, we present a novel segment proposal framework, namely FastMask, which takes advantage of hierarchical features in deep convolutional neural networks to segment multi-scale objects in one shot. Innovatively, we adapt segment proposal network into three different functional components (body, neck and head). We further propose a weight-shared residual neck module as well as a scale-tolerant attentional head module for efficient one-shot inference. On MS COCO benchmark, the proposed FastMask outperforms all state-of-the-art segment proposal methods in average recall being 2~5 times faster. Moreover, with a slight trade-off in accuracy, FastMask can segment objects in near real time (~13 fps) with 800*600 resolution images, demonstrating its potential in practical applications. Our implementation is available on https://github.com/voidrank/FastMask.
Harmonic Networks: Deep Translation and Rotation Equivariance
Worrall, Daniel E., Garbin, Stephan J., Turmukhambetov, Daniyar, Brostow, Gabriel J.
Translating or rotating an input image should not affect the results of many computer vision tasks. Convolutional neural networks (CNNs) are already translation equivariant: input image translations produce proportionate feature map translations. This is not the case for rotations. Global rotation equivariance is typically sought through data augmentation, but patch-wise equivariance is more difficult. We present Harmonic Networks or H-Nets, a CNN exhibiting equivariance to patch-wise translation and 360-rotation. We achieve this by replacing regular CNN filters with circular harmonics, returning a maximal response and orientation for every receptive field patch. H-Nets use a rich, parameter-efficient and low computational complexity representation, and we show that deep feature maps within the network encode complicated rotational invariants. We demonstrate that our layers are general enough to be used in conjunction with the latest architectures and techniques, such as deep supervision and batch normalization. We also achieve state-of-the-art classification on rotated-MNIST, and competitive results on other benchmark challenges.
TristouNet: Triplet Loss for Speaker Turn Embedding
ABSTRACT TristouNet is a neural network architecture based on Long Short-Term Memory recurrent networks, meant to project speech sequences into a fixed-dimensional euclidean space. Thanks to the triplet loss paradigm used for training, the resulting sequence embeddings can be compared directly with the euclidean distance, for speaker comparison purposes. Experiments on short (between 500ms and 5s) speech turn comparison and speaker change detection show that TristouNet brings significant improvements over the current state-of-the-art techniques for both tasks. Index Terms-- triplet loss, long short-term memory network, sequence embedding, speaker recognition 1. INTRODUCTION Given a speech sequence x and a claimed identity a, speaker verification aims at accepting or rejecting the identity claim. Speaker identification is the task of determining which speaker (from a predefined set of speakers a S) has uttered the sequence x.
Multimodal Transfer: A Hierarchical Deep Convolutional Neural Network for Fast Artistic Style Transfer
Wang, Xin, Oxholm, Geoffrey, Zhang, Da, Wang, Yuan-Fang
Transferring artistic styles onto everyday photographs has become an extremely popular task in both academia and industry. Recently, offline training has replaced on-line iterative optimization, enabling nearly real-time stylization. When those stylization networks are applied directly to high-resolution images, however, the style of localized regions often appears less similar to the desired artistic style. This is because the transfer process fails to capture small, intricate textures and maintain correct texture scales of the artworks. Here we propose a multimodal convolutional neural network that takes into consideration faithful representations of both color and luminance channels, and performs stylization hierarchically with multiple losses of increasing scales. Compared to state-of-the-art networks, our network can also perform style transfer in nearly real-time by conducting much more sophisticated training offline. By properly handling style and texture cues at multiple scales using several modalities, we can transfer not just large-scale, obvious style cues but also subtle, exquisite ones. That is, our scheme can generate results that are visually pleasing and more similar to multiple desired artistic styles with color and texture cues at multiple scales.
Compare NVIDIA Pascal GPUs and Google TPU
The recent TPU paper by Google draws a clear conclusion โ without accelerated computing, the scale-out of AI is simply not practical. Today's economy runs in the world's data centers, and data centers are changing dramatically. Not so long ago, they served up web pages, advertising and video content. Now, they recognize voices, detect images in video streams and connect us with information we need exactly when we need it. Increasingly, those capabilities are enabled by a form of artificial intelligence called deep learning.
The Race For AI: Google, Twitter, Intel, Apple In A Rush To Grab Artificial Intelligence Startups
Corporate giants like Google, IBM, Yahoo, Intel, Apple and Salesforce are competing in the race to acquire private AI companies, with Ford, Samsung, GE, and Uber emerging as new entrants. Over 200 private companies using AI algorithms across different verticals have been acquired since 2012, with over 30 acquisitions taking place in Q1'17 alone (as of 3/24/17). This quarter also saw one of the largest M&A deals: Ford's acquisition of Argo AI for $1B. In 2013, Google picked up deep learning and neural network startup DNNresearch from the computer science department at the University of Toronto. This acquisition reportedly helped Google make major upgrades to its image search feature.
What is A.I. and how it will affect our $200 billion Digital Ad Market?
Various companies already use machine learning for ad targeting, utilizing Real-Time Bidding (RTB). By utilizing deep learning, marketers have the ability to make stronger predictions of the next event in the consumer's purchase cycle. With AI layered on, deep learning provides the platform for marketers to understand their customer's need, to serve them better and sell products that consumers have'yet to decide on'. This strive towards knowing the'intent of the customer before they do' (yes, a bit freaky โ but possible with sheer amount of data) is the ultimate prize towards enabling hyper-personalized consumer experience. Imagine a fully automated media platform, that is able to utilize machine learning to process the inflow of data stream, to create personalized digital ads for each unique customer โ selecting not only the brands they will have best infinity towards, but also have pre-select the color and even the size of clothing based on the customer's current'trending choice'.
Flipboard on Flipboard
In 2011 Google realized they had a problem. They were getting serious about deep learning networks with computational demands that strained their resources. Google calculated they would have to have twice as many data centers as they already had if people used their deep learning speech recognition models for voice search for just three minutes a day. They needed more powerful and efficient processing chips. What kind of chip did they need?
12 Opensource Tools for Artificial Intelligence (AI) 7wData
Artificial Intelligence (AI) is now in trend because people are looking for some sought of technology that makes their lives more easy and valuable. Even the smartphones are turning shifting their focus the Artificial Intelligence. Big companies like Google, Amazon, and Facebook are already working on it and contributing in the form of Opensource AI Tools. For example, Facebook came up with an open-source AI project called Torchnet to accelerate the AI research and in the same way, Google open-source AI project is DeepMind Lab. A Recent study at Standford Universtiy stated that the AI ( report) will show it huge impact in coming years. So, today in this article we are going to show some variety of useful open source artificial intelligence software that helps in building your AI projects.