Image Understanding


How to Get Beautiful Results with Neural Style Transfer

#artificialintelligence

Most of the Gatys et al. paper which introduced Neural Style Transfer is simple and straightforward to understand. However, one question that does not get addressed is why the Gram matrix is a natural way to represent style (i.e. At a high level, the Gram matrix measures the correlations between different feature maps in the same layer. A feature map is simply the post-activation output of a convolutional layer. For example, if a convolutional layer has 64 filters, it will output 64 feature maps.


Image classification in the wild

#artificialintelligence

As we have announced recently, Appsilon Data Science's AI for Good initiative is working together with biodiversity conservationists at the National Parks Agency in Gabon and in collaboration with experts from the University of Stirling. Part of our role in the project is to develop an image classification algorithm capable of classifying wildlife seen in images taken by camera traps located in the forests of Gabon. The project has received support from the Google for Education fund which allowed us to embark on this journey with the immense power of the latest computational resources at hand. Below are some interesting findings we made so far. Stay tuned for more news on the progress!


Google, MIT Partner on Visual Transfer Learning to Help Robots Learn to Grasp, Manipulate Objects

#artificialintelligence

A team from the Massachusetts Institute of Technology (MIT) and Google's artificial intelligence (AI) arm has found a way to use visual transfer learning to help robots grasp and manipulate objects more accurately. "We investigate whether existing pre-trained deep learning visual feature representations can improve the efficiency of learning robotic manipulation tasks, like grasping objects," write Google's Yen-Chen Lin and Andy Zeng of the research. "By studying how we can intelligently transfer neural network weights between vision models and affordance-based manipulation models, we can evaluate how different visual feature representations benefit the exploration process and enable robots to quickly acquire manipulation skills using different grippers. "We initialized our affordance-based manipulation models with backbones based on the ResNet-50 architecture and pre-trained on different vision tasks, including a classification model from ImageNet and a segmentation model from COCO. With different initialisations, the robot was then tasked with learning to grasp a diverse set of objects through trial and error.


Basic Theory Neural Style Transfer #2

#artificialintelligence

Timeline: 00:00 - intro & NST series overview 02:25 - what I want this series to be 03:30 - defining the task of NST 04:01 - 2 types of style transfer 04:43 - a glimpse of the image style transfer history 06:55 - explanation of the content representation 10:10 - explanation of the style representation 14:12 - putting it all together (animation) ---------------- The AI Epiphany is a channel dedicated to simplifying the field of AI using creative visualizations and in general, a stronger focus on geometrical and visual intuition rather than the algebraic and numerical intuition.


Image Classification Model with Google AutoML [A How To Guide]

#artificialintelligence

In this tutorial, I'll show you how to create a single label classification model in Google AutoML. We'll be using a dataset of AI-generated faces from generated.photos. We'll be training our algorithm to determine whether a face is male or female. After that, we'll deploy our model to the cloud AND create the web browser version of the algorithm. First let's take a look at the data we'll be classifying (you can download it here).


(Nearly) Everything you need to know about computer vision in one repo

#artificialintelligence

This post was co-authored by JS Tan, Patrick Buehler, Anupam Sharma and Jun Ki Min. In recent years, we've seen extraordinary growth in Computer Vision, with applications in image understanding, search, mapping, semi-autonomous or autonomous vehicles and many more . The ability for models to understand actions in a video, a task that was unthinkable just a few years ago, is now something that we can achieve with relatively high accuracy and in near real-time. However, the field is not particularly welcoming for newcomers. Without prior experience or guidance, building an accurate classifier can easily take weeks.


Image Classification: TensorFlow MariaDB (Part 1) MariaDB

#artificialintelligence

Cutting-edge companies are turning to artificial intelligence and machine learning to meet the challenges of the new digital business transformation era. According to Gartner: "Eighty-seven percent of senior business leaders say digitalization is a company priority and 79% of corporate strategists say it is reinventing their business--creating new revenue streams in new ways". But so far, digital change has been challenging. The complexity of the tools, architecture, and environment create barriers to using machine learning. Using SQL-based relational data management to store and perform data exploration of images reduces the barriers and unlocks the benefits of machine learning.


r/MachineLearning - [R] Neuroevolution of Self-Interpretable Agents

#artificialintelligence

Inattentional blindness is the psychological phenomenon that causes one to miss things in plain sight. It is a consequence of the selective attention in perception that lets us remain focused on important parts of our world without distraction from irrelevant details. Motivated by selective attention, we study the properties of artificial agents that perceive the world through the lens of a self-attention bottleneck. By constraining access to only a small fraction of the visual input, we show that their policies are directly interpretable in pixel space. We find neuroevolution ideal for training self-attention architectures for vision-based reinforcement learning (RL) tasks, allowing us to incorporate modules that can include discrete, non-differentiable operations which are useful for our agent.


STREETS: A Novel Camera Network Dataset for Traffic Flow

Neural Information Processing Systems

In this paper, we introduce STREETS, a novel traffic flow dataset from publicly available web cameras in the suburbs of Chicago, IL. We seek to address the limitations of existing datasets in this area. Many such datasets lack a coherent traffic network graph to describe the relationship between sensors. The datasets that do provide a graph depict traffic flow in urban population centers or highway systems and use costly sensors like induction loops. These contexts differ from that of a suburban traffic body.


Unsupervised Meta-Learning for Few-Shot Image Classification

Neural Information Processing Systems

Few-shot or one-shot learning of classifiers requires a significant inductive bias towards the type of task to be learned. One way to acquire this is by meta-learning on tasks similar to the target task. In this paper, we propose UMTRA, an algorithm that performs unsupervised, model-agnostic meta-learning for classification tasks. The meta-learning step of UMTRA is performed on a flat collection of unlabeled images. While we assume that these images can be grouped into a diverse set of classes and are relevant to the target task, no explicit information about the classes or any labels are needed.