Goto

Collaborating Authors

 Pattern Recognition


Human-Guided Data Exploration

arXiv.org Machine Learning

The outcome of the explorative data analysis (EDA) phase is vital for successful data analysis. EDA is more effective when the user interacts with the system used to carry out the exploration. In the recently proposed paradigm of iterative data mining the user controls the exploration by inputting knowledge in the form of patterns observed during the process. The system then shows the user views of the data that are maximally informative given the user's current knowledge. Although this scheme is good at showing surprising views of the data to the user, there is a clear shortcoming: the user cannot steer the process. In many real cases we want to focus on investigating specific questions concerning the data. This paper presents the Human Guided Data Exploration framework, generalising previous research. This framework allows the user to incorporate existing knowledge into the exploration process, focus on exploring a subset of the data, and compare different complex hypotheses concerning relations in the data. The framework utilises a computationally efficient constrained randomisation scheme. To showcase the framework, we developed a free open-source tool, using which the empirical evaluation on real-world datasets was carried out. Our evaluation shows that the ability to focus on particular subsets and being able to compare hypotheses are important additions to the interactive iterative data mining process.


Encoding Temporal Markov Dynamics in Graph for Visualizing and Mining Time Series

AAAI Conferences

Time series and signals are attracting more attention across statistics, machine learning and pattern recognition as it appears widely in the industry, especially in sensor and IoT related research and applications, but few advances has been achieved in effective time series visual analytics and interaction due to its temporal dimensionality and complex dynamics. Inspired by recent effort on using network metrics to characterize time series for classification, we present an approach to visualize time series as complex networks based on the first order Markov process in its temporal ordering. In contrast to the classical bar charts, line plots and other statistics based graph, our approach delivers more intuitive visualization that better preserves both the temporal dependency and frequency structures. It provides a natural inverse operation to map the graph back to raw signals, making it possible to use graph statistics to characterize time series for better visual exploration and statistical analysis. Our experimental results suggest the effectiveness on various tasks such as pattern discovery and classification on both synthetic and the real time series and sensor data.


NegPSpan: efficient extraction of negative sequential patterns with embedding constraints

arXiv.org Machine Learning

Mining frequent sequential patterns consists in extracting recurrent behaviors, modeled as patterns, in a big sequence dataset. Such patterns inform about which events are frequently observed in sequences, i.e. what does really happen. Sometimes, knowing that some specific event does not happen is more informative than extracting a lot of observed events. Negative sequential patterns (NSP) formulate recurrent behaviors by patterns containing both observed events and absent events. Few approaches have been proposed to mine such NSPs. In addition, the syntax and semantics of NSPs differ in the different methods which makes it difficult to compare them. This article provides a unified framework for the formulation of the syntax and the semantics of NSPs. Then, we introduce a new algorithm, NegPSpan, that extracts NSPs using a PrefixSpan depth-first scheme and enabling maxgap constraints that other approaches do not take into account. The formal framework allows for highlighting the differences between the proposed approach wrt to the methods from the literature, especially wrt the state of the art approach eNSP. Intensive experiments on synthetic and real datasets show that NegPSpan can extract meaningful NSPs and that it can process bigger datasets than eNSP thanks to significantly lower memory requirements and better computation times.


2001: A Space Odyssey Predicted The Future--50 Years Ago

WIRED

The space race was in full swing. For the first time, a space probe had recently landed on another planet (Venus). And I was eagerly studying everything I could to do with space. Then on April 2, 1968 (May 15 in the UK), the movie 2001: A Space Odyssey was released--and I was keen to see it. So in the early summer of 1968 there I was, the first time I'd ever been in an actual cinema (yes, it was called that in the UK). I'd been dropped off for a matinee, and was pretty much the only person in the theater. And to this day, I remember sitting in a plush seat and eagerly waiting for the curtain to go up, and the movie to begin. It started with an impressive extraterrestrial sunrise. But then what was going on? Those were landscapes, and animals. I was confused, and frankly a little bored. But just when I was getting concerned, there was a bone thrown in the air that morphed into a spacecraft, and pretty soon there was a rousing waltz--and a big space station turning majestically on the screen. The next two hours had a big effect on me. It wasn't really the spacecraft (I'd seen plenty of them in books by then, and in fact made many of my own concept designs). But what was new and exciting for me in the movie was the whole atmosphere of a world full of technology--and the notion of what might be possible there, with all those bright screens doing things, and, yes, computers driving it all. It would be another year before I saw my first actual computer in real life. But those two hours in 1968 watching 2001 defined an image of what the computational future could be like, that I carried around for years. I think it was during the intermission to the movie that some seller of refreshments--perhaps charmed by a solitary kid so earnestly pondering the movie--gave me a "cinema program" about the movie. Half a century later I still have that program, complete with a food stain, and faded writing from my 8-year-old self, recording (with some misspelling) where and when I saw the movie. A lot has happened in the past 50 years, particularly in technology, and it's an interesting experience for me to watch 2001 again--and compare what it predicted with what's actually happened. Of course, some of what's actually been built over the past 50 years has been done by people like me, who were influenced in larger or smaller ways by 2001. When Wolfram Alpha was launched in 2009--showing some distinctly HAL-like characteristics--we paid a little homage to 2001 in our failure message (needless to say, one piece of notable feedback we got at the beginning was someone asking: "How did you know my name was Dave?!"). One very obvious prediction of 2001 that hasn't panned out, at least yet, is routine, luxurious space travel. But like many other things in the movie, it doesn't feel like what was predicted was off track; it's just that--50 years later--we still haven't got there yet. Well, they have lots of flat-screen displays, just like real computers today.


Image Recognition TensorFlow

#artificialintelligence

Our brains make vision seem easy. It doesn't take any effort for humans to tell apart a lion and a jaguar, read a sign, or recognize a human's face. But these are actually hard problems to solve with a computer: they only seem easy because our brains are incredibly good at understanding images. In the last few years, the field of machine learning has made tremendous progress on addressing these difficult problems. In particular, we've found that a kind of model called a deep convolutional neural network can achieve reasonable performance on hard visual recognition tasks -- matching or exceeding human performance in some domains.


When Subgraph Isomorphism is Really Hard, and Why This Matters for Graph Databases

Journal of Artificial Intelligence Research

The subgraph isomorphism problem involves deciding whether a copy of a pattern graph occurs inside a larger target graph. The non-induced version allows extra edges in the target, whilst the induced version does not. Although both variants are NP-complete, algorithms inspired by constraint programming can operate comfortably on many real-world problem instances with thousands of vertices. However, they cannot handle arbitrary instances of this size. We show how to generate "really hard" random instances for subgraph isomorphism problems, which are computationally challenging with a couple of hundred vertices in the target, and only twenty pattern vertices. For the non-induced version of the problem, these instances lie on a satisfiable / unsatisfiable phase transition, whose location we can predict; for the induced variant, much richer behaviour is observed, and constrainedness gives a better measure of difficulty than does proximity to a phase transition. These results have practical consequences: we explain why the widely researched "filter / verify" indexing technique used in graph databases is founded upon a misunderstanding of the empirical hardness of NP-complete problems, and cannot be beneficial when paired with any reasonable subgraph isomorphism algorithm.


Artificial Intelligence and Robotics

arXiv.org Artificial Intelligence

The recent successes of AI have captured the wildest imagination of both the scientific communities and the general public. Robotics and AI amplify human potentials, increase productivity and are moving from simple reasoning towards human-like cognitive abilities. Current AI technologies are used in a set area of applications, ranging from healthcare, manufacturing, transport, energy, to financial services, banking, advertising, management consulting and government agencies. The global AI market is around 260 billion USD in 2016 and it is estimated to exceed 3 trillion by 2024. To understand the impact of AI, it is important to draw lessons from it's past successes and failures and this white paper provides a comprehensive explanation of the evolution of AI, its current status and future directions.


STN-OCR: A single Neural Network for Text Detection and Text Recognition

#artificialintelligence

STN-OCR, a single semi-supervised Deep Neural Network(DNN), consist of a spatial transformer network -- which is used to detected text regions in images, and a text recognition network -- which recognizes the textual content of the identified text regions. STN-OCR is an end-to-end scene text recognition system, but it is not easy to train. This model is mostly able to detect text in differently arranged lines of text in images, while also recognizing the content of these words. The overview of the system is shown in Figure 1. Compared with most of the current text recognition systems, which extract all the information from the image at once, STN-OCR behaves more like a human.


Learning architectures based on quantum entanglement: a simple matrix product state algorithm for image recognition

arXiv.org Machine Learning

It is a fundamental, but still elusive question whether methods based on quantum mechanics, in particular on quantum entanglement, can be used for classical information processing and machine learning. Even partial answer to this question would bring important insights to both fields of both machine learning and quantum mechanics. In this work, we implement simple numerical experiments, related to pattern/images classification, in which we represent the classifiers by quantum matrix product states (MPS). Classical machine learning algorithm is then applied to these quantum states. We explicitly show how quantum features (i.e., single-site and bipartite entanglement) can emerge in such represented images; entanglement characterizes here the importance of data, and this information can be practically used to improve the learning procedures. Thanks to the low demands on the dimensions and number of the unitary matrices, necessary to construct the MPS, we expect such numerical experiments could open new paths in classical machine learning, and shed at same time lights on generic quantum simulations/computations.


The Power of Pattern Learning for Industrial Operations

#artificialintelligence

The next industrial revolution is here. Whether you call it Industry 4.0 or Industrial IoT or Digital Transformation, the increased access to machine and operational data, proliferation of two-way communication, speed of data flow, combined with the lower cost of computing, connectivity and storage has created the perfect environment to transform industrial operations. The time series data generated by these operations, if harnessed, can provide actionable insights to reduce downtime as well as improve throughput, operator safety and product quality. McKinsey & Company predicts that the next 20 percent productivity rise in operations will come from digital analytics, and machine learning-enabled pattern recognition is playing a significant role in enhancing production operations. Time series data generated in discrete and process manufacturing operations is very rich in information that can provide insights on the current and future health of the production equipment and lines.