Goto

Collaborating Authors

 Pattern Recognition


BeMyEye acquires Streetbee, a Russian crowdsourcing and image recognition provider

#artificialintelligence

London-headquartered BeMyEye has made another acquisition, its third in a little over three years. This time the retail execution monitoring service is purchasing Russian crowdsourcing and image recognition provider Streetbee. The acquisition will see BeMyEye launch "Perfect Shelf," which will use image recognition technology to lower the cost for consumer goods companies wanting to get "objective and actionable" in-store insights. These will typically include share of shelf and planogram compliance (the specific placement of products on a store shelf). More broadly, BeMyEye offers a platform to enable companies and brands to crowdsource various in-store data.


A Reminder That Machine Learning Is About Correlations Not Causation

#artificialintelligence

Lost amongst the hype and hyperbole surrounding machine learning today, especially deep learning, is the critical distinction between correlation and causation. Developers and data scientists increasingly treat their creations as silicon lifeforms "learning" concrete facts about the world, rather than what they truly are: piles of numbers detached from what they represent, mere statistical patterns encoded into software. We must recognize that those patterns are merely correlations amongst vast reams of data, rather than causative truths or natural laws governing our world. As machine learning has expanded beyond its roots in the worlds of computer science and statistics into nearly every conceivable field, the data scientists and programmers building those models are increasingly detached from an understanding of how and why the models they are creating work. To them, machine learning is akin to a black box in which you blindly feed different mixes of training data in one side, twirl some knobs and dials and repeat until you get results that seem to work well enough to throw into production.


A Data-Driven Approach for Discovery of Heat Load Patterns in District Heating

arXiv.org Machine Learning

Understanding the heat use of customers is crucial for effective district heating (DH) operations and management. Unfortunately, existing knowledge about customers and their heat load behaviors is quite scarce and very few studies have been focusing on this aspect. The deployment of smart meters offers a unique opportunity for researchers and DH utilities to analyze large-scale data and discover both typical, as well as atypical, patterns in the network. Heat load pattern discovery is a challenging task in DH systems, since a comprehensive analysis needs to involve many customers. Most of the past studies have relied on analysis of a small number of buildings, which are not shown to be picked as the representative examples. Therefore, the knowledge discovered in such studies is not enough to generalize for the entire network. In this work, we propose a data-driven approach that enables automatic discovery of heat load patterns in a complete district heating network. Our method clusters the buildings into different groups based on the characteristics of their load profiles, extracts the representative patterns for each of them, and detects abnormal profiles, i.e., the ones deviating from the expected behavior. We present the first comprehensive analysis of the heat load patterns by conducting a case study on all the buildings, in six customer categories, connected to two district heating networks in the south of Sweden. Our method has captured fifteen typical patterns among the heat load profiles of all buildings in our dataset. It shows that control strategies are not enough to explain the variability in the heat load behaviors. In conclusion, we demonstrate that the proposed approach has a great potential to develop knowledge about customers and their heat use habits in practice by automatically analyzing their typical and atypical profiles in large-scale.


An introduction to domain adaptation and transfer learning

arXiv.org Machine Learning

In machine learning, if the training data is an unbiased sample of an underlying distribution, then the learned classification function will make accurate predictions for new samples. However, if the training data is not an unbiased sample, then there will be differences between how the training data is distributed and how the test data is distributed. Standard classifiers cannot cope with changes in data distributions between training and test phases, and will not perform well. Domain adaptation and transfer learning are sub-fields within machine learning that are concerned with accounting for these types of changes. Here, we present an introduction to these fields, guided by the question: when and how can a classifier generalize from a source to a target domain? We will start with a brief introduction into risk minimization, and how transfer learning and domain adaptation expand upon this framework. Following that, we discuss three special cases of data set shift, namely prior, covariate and concept shift. For more complex domain shifts, there are a wide variety of approaches. These are categorized into: importance-weighting, subspace mapping, domain-invariant spaces, feature augmentation, minimax estimators and robust algorithms. A number of points will arise, which we will discuss in the last section. We conclude with the remark that many open questions will have to be addressed before transfer learners and domain-adaptive classifiers become practical.


HACS: Human Action Clips and Segments Dataset for Recognition and Temporal Localization

arXiv.org Artificial Intelligence

This paper presents a new large-scale dataset for recognition and temporal localization of human actions collected from Web videos. We refer to it as HACS (Human Action Clips and Segments). We leverage consensus among visual classifiers to automatically mine candidate short clips from unlabeled videos, which are subsequently validated via manual verification. Annotated clips include both positive examples and hard negatives. The resulting dataset is dubbed HACS Clips. Through a separate process we also collect annotations defining action segment boundaries. This resulting dataset is called HACS Segments. Overall, HACS Clips consists of 1.55M annotated clips sampled from 504K untrimmed videos, and HACS Segments contains 139K action segments densely annotated in 50K untrimmed videos spanning 200 action categories. HACS Clips contains more labeled examples than any existing video benchmark. This renders our dataset an excellent source for spatiotemporal feature learning, as evidenced by our transfer learning experiments on three different target datasets where HACS Clips outperforms Kinetics and Sports1M as a pretraining benchmark, and yields the best published results to date. On HACS Segments, we evaluate state-of-the-art methods of action proposal generation and action localization, and highlight the new challenges posed by our dense and fine-grained temporal annotations.


ORIGAMI: A Heterogeneous Split Architecture for In-Memory Acceleration of Learning

arXiv.org Machine Learning

Memory bandwidth bottleneck is a major challenges in processing machine learning (ML) algorithms. In-memory acceleration has potential to address this problem; however, it needs to address two challenges. First, in-memory accelerator should be general enough to support a large set of different ML algorithms. Second, it should be efficient enough to utilize bandwidth while meeting limited power and area budgets of logic layer of a 3D-stacked memory. We observe that previous work fails to simultaneously address both challenges. We propose ORIGAMI, a heterogeneous set of in-memory accelerators, to support compute demands of different ML algorithms, and also uses an off-the-shelf compute platform (e.g.,FPGA,GPU,TPU,etc.) to utilize bandwidth without violating strict area and power budgets. ORIGAMI offers a pattern-matching technique to identify similar computation patterns of ML algorithms and extracts a compute engine for each pattern. These compute engines constitute heterogeneous accelerators integrated on logic layer of a 3D-stacked memory. Combination of these compute engines can execute any type of ML algorithms. To utilize available bandwidth without violating area and power budgets of logic layer, ORIGAMI comes with a computation-splitting compiler that divides an ML algorithm between in-memory accelerators and an out-of-the-memory platform in a balanced way and with minimum inter-communications. Combination of pattern matching and split execution offers a new design point for acceleration of ML algorithms. Evaluation results across 12 popular ML algorithms show that ORIGAMI outperforms state-of-the-art accelerator with 3D-stacked memory in terms of performance and energy-delay product (EDP) by 1.5x and 29x (up to 1.6x and 31x), respectively. Furthermore, results are within a 1% margin of an ideal system that has unlimited compute resources on logic layer of a 3D-stacked memory.


UAV-GESTURE: A Dataset for UAV Control and Gesture Recognition

arXiv.org Machine Learning

Current UAV-recorded datasets are mostly limited to action recognition and object tracking, whereas the gesture signals datasets were mostly recorded in indoor spaces. Currently, there is no outdoor recorded public video dataset for UAV commanding signals. Gesture signals can be effectively used with UAVs by leveraging the UAVs visual sensors and operational simplicity. To fill this gap and enable research in wider application areas, we present a UAV gesture signals dataset recorded in an outdoor setting. We selected 13 gestures suitable for basic UAV navigation and command from general aircraft handling and helicopter handling signals. We provide 119 high-definition video clips consisting of 37151 frames. The overall baseline gesture recognition performance computed using Pose-based Convolutional Neural Network (P-CNN) is 91.9 %. All the frames are annotated with body joints and gesture classes in order to extend the dataset's applicability to a wider research area including gesture recognition, action recognition, human pose recognition and situation awareness.


ML Kit Android: Implementing Text Recognition -- Firebase

#artificialintelligence

Firebase is now set up, we can now start building our Text Recognition app. We need Firebase ML Vision dependency, we add it in our app-level build.grade After capturing the image from the camera, we'll set the image into the ImageView as: Our app is ready to use. Run the app and click on the camera icon to launch the camera on your Android Device. Click a picture of some text, then click on tick icon and watch Firebase do the magic for you.


How Alan Turing Deciphered Shark Skin - Issue 68: Context

Nautilus

In 1952, well before developmental biologists spoke in terms of Hoxgenes and transcription factors, or even understood DNA's structure, Alan Turing had an idea. The famed mathematician who hastened the end of World War II by cracking the Enigma code turned his mind to the natural world and devised an elegant mathematical model of pattern formation. His theory outlined how endless varieties of stripes, spots, and scales could emerge from the interaction of two simple, hypothetical chemical agents, or "morphogens." Decades passed before biologists seriously considered that this mathematical theory could in fact explain myriad biological patterns. The development of mammalian hair, the feathers of birds, and even those ridges on the roof of your mouth all stem from Turing-like mechanisms.


LinkNet: Relational Embedding for Scene Graph

Neural Information Processing Systems

Objects and their relationships are critical contents for image understanding. A scene graph provides a structured description that captures these properties of an image. However, reasoning about the relationships between objects is very challenging and only a few recent works have attempted to solve the problem of generating a scene graph from an image. In this paper, we present a novel method that improves scene graph generation by explicitly modeling inter-dependency among the entire object instances. We design a simple and effective relational embedding module that enables our model to jointly represent connections among all related objects, rather than focus on an object in isolation. Our novel method significantly benefits two main parts of the scene graph generation task: object classification and relationship classification. Using it on top of a basic Faster R-CNN, our model achieves state-of-the-art results on the Visual Genome benchmark. We further push the performance by introducing global context encoding module and geometrical layout encoding module. We validate our final model, LinkNet, through extensive ablation studies, demonstrating its efficacy in scene graph generation.