Information Fusion
Metacognitive Learning Approach for Online Tool Condition Monitoring
Pratama, Mahardhika, Dimla, Eric, Lai, Chow Yin, Lughofer, Edwin
As manufacturing processes become increasingly automated, so should tool condition monitoring (TCM) as it is impractical to have human workers monitor the state of the tools continuously. Tool condition is crucial to ensure the good quality of products: Worn tools affect not only the surface quality but also the dimensional accuracy, which means higher reject rate of the products. Therefore, there is an urgent need to identify tool failures before it occurs on the fly. While various versions of intelligent tool condition monitoring have been proposed, most of them suffer from a cognitive nature of traditional machine learning algorithms. They focus on the how to learn process without paying attention to other two crucial issues: what to learn, and when to learn. The what to learn and the when to learn provide self regulating mechanisms to select the training samples and to determine time instants to train a model. A novel tool condition monitoring approach based on a psychologically plausible concept, namely the metacognitive scaffolding theory, is proposed and built upon a recently published algorithm, recurrent classifier (rClass). The learning process consists of three phases: what to learn, how to learn, when to learn and makes use of a generalized recurrent network structure as a cognitive component. Experimental studies with real-world manufacturing data streams were conducted where rClass demonstrated the highest accuracy while retaining the lowest complexity over its counterparts.
Content Intelligence: The New Frontier of Content Marketing Technology
We live in an age where science fiction ever more quickly becomes science fact. Big data and Artificial Intelligence (AI) are revolutionizing industries across the developed world, from retail to finance to domestic and international spying. These technologies are automating functions previously considered tasks only a human could do, and offering detailed, personalized predictions a human could never make. Now these tools are underpinning a new era of content marketing technology: content intelligence. Big data involves computationally analyzing extremely large data sets to reveal patterns, trends, and associations; especially those relating to human behavior and interactions. It is used in everything from predicting stock performance to seasonal buying behavior to helping the NSA know whether your post about "blowing up the joint" refers to your bomb-making or DJing skills. Every human who uses any form of digital communication generates data constantly, both about themselves and about humans in aggregate. Big data refers to the ability to find, sort, and make sense of this ocean of ones and zeroes. It encompasses structured, semi-structured, and unstructured information, both human-generated and from sensors, machines, and public records. Structured data generally means information residing in a fixed field within a record or file, such as that found in spreadsheets and relational databases. Information that's tagged to show some elements within the data, such as metadata in email or photos, is semi-structured data. Unstructured data meanwhile, includes content such as untagged text, images, audio, video, and so on. Big data can also includes demographic or psychographic information about consumers. Think product reviews and commentary, blogs, content on social media sites, and the digital exhaust streamed 24/7 from mobile devices, sensors, and technical devices. The definition of AI is more nebulous because what is considered AI is constantly changing.
Consistency of community detection in multi-layer networks using spectral and matrix factorization methods
We consider the problem of estimating a consensus community structure by combining information from multiple layers of a multi-layer network or multiple snapshots of a time-varying network. Numerous methods have been proposed in the literature for the more general problem of multi-view clustering in the past decade based on the spectral clustering or a low-rank matrix factorization. As a general theme, these "intermediate fusion" methods involve obtaining a low column rank matrix by optimizing an objective function and then using the columns of the matrix for clustering. However, the theoretical properties of these methods remain largely unexplored and most researchers have relied on the performance in synthetic and real data to assess the goodness of the procedures. In the absence of statistical guarantees on the objective functions, it is difficult to determine if the algorithms optimizing the objective will return a good community structure. We apply some of these methods for consensus community detection in multi-layer networks and investigate the consistency properties of the global optimizer of the objective functions under the multi-layer stochastic blockmodel. We derive several new asymptotic results showing consistency of the intermediate fusion techniques along with the spectral clustering of mean adjacency matrix under a high dimensional setup, where the number of nodes, the number of layers and the number of communities of the multi-layer graph grow. Our numerical study shows that in comparison to the intermediate fusion techniques, late fusion methods, namely spectral clustering on aggregate spectral kernel and module allegiance matrix, under-perform in sparse networks, while the spectral clustering of mean adjacency matrix under-performs in multi-layer networks that contain layers with both homophilic and heterophilic clusters.
Bayesian Hybrid Matrix Factorisation for Data Integration
We introduce a novel Bayesian hybrid matrix factorisation model (HMF) for data integration, based on combining multiple matrix factorisation methods, that can be used for in- and out-of-matrix prediction of missing values. The model is very general and can be used to integrate many datasets across different entity types, including repeated experiments, similarity matrices, and very sparse datasets. We apply our method on two biological applications, and extensively compare it to state-of-the-art machine learning and matrix factorisation models. For in-matrix predictions on drug sensitivity datasets we obtain consistently better performances than existing methods. This is especially the case when we increase the sparsity of the datasets. Furthermore, we perform out-of-matrix predictions on methylation and gene expression datasets, and obtain the best results on two of the three datasets, especially when the predictivity of datasets is high.
The Key To Biological Data Integration
The advent of NGS technologies is focusing much of the attention towards the data management issue. However, more than their volume, it is the diversity of biological data which constitutes the real bioinformatics bottleneck; a bottleneck which cannot be solved through technological considerations only, such as cloud infrastructures for instance. A bioinformatics platform must indeed store, organize and give access to a wide span of data and results. First of all, the experimental data and their transformations: not only the sequence data, such as the reads, the assembly files and the resulting contigs - to name the most important ones, but also spectra or metabolic flux measurements. Through the interpretation of these data, biological entities are predicted and characterized: coding regions, regulatory signals, polypeptides, enzymes classes, peptide tags, and so on.
Putin, Merkel and Hollande Discuss Anti-Terrorism Data Exchange: Kremlin
MOSCOW (Reuters) - The leaders of Russia, Germany and France agreed in a phone call on Tuesday to speed up the exchange of data aimed at fighting terrorism, the Kremlin said. They spoke following Monday's deadly bomb attack on a metro train in Russia's second-largest city of St. Petersburg which killed 14 people and wounded 50. The Kremlin said the leaders also discussed the situation in Ukraine and the Easter ceasefire declared from April 1. A German government source said: "Merkel urged Putin to use his influence with the separatists (to keep to the April 1 ceasefire)." The Kremlin added that Putin, Merkel and Hollande have agreed to continue contacts on Ukraine.
Data Integration Tools – Market Study
This post is a brief review of leading Data Integration tools in the market. Heavily referencing from the Gartner 2016 report and peer reviews from my circle. The data integration tool market was worth approximately $2.8 billion at the end of 2015, an increase of 10.5% from the end of 2014 [2016 Gartner Report – Data Integration Tools].
Distilling Information Reliability and Source Trustworthiness from Digital Traces
Tabibian, Behzad, Valera, Isabel, Farajtabar, Mehrdad, Song, Le, Schölkopf, Bernhard, Gomez-Rodriguez, Manuel
Online knowledge repositories typically rely on their users or dedicated editors to evaluate the reliability of their content. These evaluations can be viewed as noisy measurements of both information reliability and information source trustworthiness. Can we leverage these noisy evaluations, often biased, to distill a robust, unbiased and interpretable measure of both notions? In this paper, we argue that the temporal traces left by these noisy evaluations give cues on the reliability of the information and the trustworthiness of the sources. Then, we propose a temporal point process modeling framework that links these temporal traces to robust, unbiased and interpretable notions of information reliability and source trustworthiness. Furthermore, we develop an efficient convex optimization procedure to learn the parameters of the model from historical traces. Experiments on real-world data gathered from Wikipedia and Stack Overflow show that our modeling framework accurately predicts evaluation events, provides an interpretable measure of information reliability and source trustworthiness, and yields interesting insights about real-world events.
Data Fusion, Data Privacy: What We Can Learn From Walmart's Flexible Data Architecture
We are living in a world that is increasingly dependent on and driven by data. Incorporating data in your decision-making is the way companies now know that they're fully informed. Data driven technologies can revolutionize a company's operations at every level, including customer outreach. Having a full view of a customer is paramount for marketing, customer service, and many other realms. In every application domain, there is a need for data fusion: the ability to collect and integrate data from many different sources into a useful whole.