Information Fusion
Using Artificial Intelligence for IoT Integration: Bit Stew's Approach - RTInsights
Supervised and unsupervised learning approaches rapidly map data into a semantic model that can be used in an IT architecture. GE Digital's acquisition of Bit Stew Systems, a small startup with about 100 employees, should come as no surprise to those familiar with the challenges of industrial IoT projects. GE's Predix is a major industrial IoT platform that targets sectors such as manufacturing, aviation and energy, with use cases in predicting maintenance and optimizing performance of massive assets, such as multi-million dollar gas pipelines, jet engines, or gas turbines. Bit Stew, based in Vancouver, Canada, has software that uses machine learning algorithms to filter and integrate data from industrial equipment, databases, and control systems, creating a semantic data model for use throughout an IT architecture โ from cloud to edge. "Most of our customers maintain 30 connected systems to our platform, and are managing millions of connected devices," Franco Castaldini, vice president of marketing and product management at Bit Stew, told RTInsights.
Amazon.com: Entity Information Life Cycle for Big Data: Master Data Management and Information Integration (9780128005378): John R. Talburt, Yinle Zhou: Books
Dr. John R. Talburt is Professor of Information Science at the University of Arkansas at Little Rock (UALR) where he is the Coordinator for the Information Quality Graduate Program and the Executive Director of the UALR Center for Advanced Research in Entity Resolution and Information Quality (ERIQ). He is also the Chief Scientist for Black Oak Partners, LLC, an information quality solutions company. Prior to his appointment at UALR he was the leader for research and development and product innovation at Acxiom Corporation, a global leader in information management and customer data integration. Professor Talburt holds several patents related to customer data integration and the author of numerous articles on information quality and entity resolution, and is the author of Entity Resolution and Information Quality (Morgan Kaufmann, 2011). He also holds the IAIDQ Information Quality Certified Professional (IQCP) credential.
ETL, ELT and Data Hub: Where Hadoop is the right fit ?
DMX-h, Syncsort's ETL /DI product for Hadoop runs natively on Hadoop and integrates very closely with the Map Reduce paradigm to perform high volume ETL batch operations like large JOINS, AGGREGATIONS, etc., which doesn't require users to rip the data out of Hadoop, do the ETL, and put it back into Hadoop as you referenced. DMX-h's ETL engine integrates via Syncsort's contribution to the Apache open source community, patch MAPREDUCE-2454, which introduced a new feature to the Hadoop MapReduce framework to allow alternative implementations of the Sort phase. This engine is the same ETL engine Syncsort offers outside of Hadoop and uses the same graphical UI, thereby making it very easy and seamless for existing ETL developers and architects to make the transition to ETL in Hadoop/Map Reduce โ eliminating the need for Java/PIG expertise. The same lightweight DMX-h engine can be used to extract data from disparate source systems (Mainframe, RDBMS, files, etc.), pre-process, cleanse, validate and load it to HDFS, and then be used to implement very efficient and high speed Map Reduce ETL in Hadoop. Why Hadoop means more data savings & less data warehouse.
The Marketer's Guide To AI In Marketing And Advertising AdExchanger
Artificial intelligence (AI) is surging in ad/mar tech land. Or resurging, depending on how good your memory is. IBM continues to push Watson, and, in the run-up to their respective conferences, Salesforce and Oracle talked up their own AI initiatives. Also, Google, Facebook, IBM, Microsoft and Amazon banded together to create best practices around AI technologies. And startups like Adgorithms, Boomtrain, Cognitiv, Kenshoo, Lattice Engines, Rocket Fuel and numerous others continue to extol the virtues of their AI-powered applications.
Amazon.com: Entity Information Life Cycle for Big Data: Master Data Management and Information Integration (9780128005378): John R. Talburt, Yinle Zhou: Books
The authors have done an excellent job tying together state of the art academic concepts with state of the practice business needs, showing clearly how these two (much hyped) concepts can be used to provide value to our organizations. A must read for anyone confused about how to apply these concepts ...
Whitepaper: O'Reilly Research on Integrating Data for Better Analytics
Companies are collecting more data than ever. But, given how difficult it is to unify the many internal and external data streams they've built, more data doesn't necessarily translate into better analytics. The real challenge is to provide deep and broad access to "a single source of truth" in their data that the typically slow ETL process for data warehousing cannot achieve. More than just fast access, analysts need the ability to explore data at a granular level.
Big Structure: At The Nexus of Knowledge Bases, the Semantic Web and Artificial Intelligence
In Part I of this two-part series, Fred Giasson and I looked back over a decade of working within the semantic Web and found it partially successful but really the wrong question moving forward. The inadequacies of the semantic Web to date reside in its lack of attention to practical data interoperability across organizational or community boundaries. An emphasis on linked data has created an illusion that questions of data integration are being effectively addressed. Linked data is hard to publish and not the only useful form for consuming data; linked data quality is often unreliable; the linking predicates for relating disparate data sources to one another may be inadequate or wrong; and, there are no reference groundings for relating data values across datasets. Neither the semantic Web nor linked data has developed the practices, tooling or experience to actually interoperate data across the Web.
Data Integration with High Dimensionality
We consider a problem of data integration. Consider determining which genes affect a disease. The genes, which we call predictor objects, can be measured in different experiments on the same individual. We address the question of finding which genes are predictors of disease by any of the experiments. Our formulation is more general. In a given data set, there are a fixed number of responses for each individual, which may include a mix of discrete, binary and continuous variables. There is also a class of predictor objects, which may differ within a subject depending on how the predictor object is measured, i.e., depend on the experiment. The goal is to select which predictor objects affect any of the responses, where the number of such informative predictor objects or features tends to infinity as sample size increases. There are marginal likelihoods for each way the predictor object is measured, i.e., for each experiment. We specify a pseudolikelihood combining the marginal likelihoods, and propose a pseudolikelihood information criterion. Under regularity conditions, we establish selection consistency for the pseudolikelihood information criterion with unbounded true model size, which includes a Bayesian information criterion with appropriate penalty term as a special case. Simulations indicate that data integration improves upon, sometimes dramatically, using only one of the data sources.
Using machine learning and R to create smart reports & applications
Azure Machine Learning is an end to end solution that facilitates descriptive, predictive and prescriptive analysis. Working with Azure ML helps data scientists easily publish their code as a web service, to be accessible from different platforms. Data scientists can publish their R code directly from R Studio into Azure ML and create a web service that can be called from any other application. Imagine that we have a simple function in R Studio for calculating the sum of two variables. We are going to create an API (web service) for this function that can be called from other platforms (eg mobile applications, web applications).