Goto

Collaborating Authors

 Information Fusion


mGPfusion: Predicting protein stability changes with Gaussian process kernel learning and data fusion

arXiv.org Machine Learning

Proteins are used in various applications by pharmaceutical, food, fuel, and many other industries and their usage is growing steadily (Kirk et al., 2002; Sanchez and Demain, 2010). Proteins have important advantages over chemical catalysts, as they are derived from renewable resources, are biodegradable and are often highly selective (Cherry and Fidantsef, 2003). Protein engineering is used to further improve the properties of proteins, for example to enhance their catalytic activity, modify their substrate specificity or to improve their thermostability (Rapley and Walker, 2000). Increasing the stability is an important aspect of protein engineering, as the proteins used in industry should be stable in the industrial process conditions, which often involve higher than ambient temperature and non-aqueous solvents (Bommarius et al., 2011). The properties of a protein are modified by introducing alterations to its amino acid sequence. Mutations in general tend to be destabilising, and if too many destabilising mutations are implemented, the protein may not remain functional without compensatory stabilising mutations (Tokuriki and Tawfik, 2009). The stability of a protein can be defined as the difference in Gibbs energy G between the folded and unfolded (or native and denaturated) state of the protein.


KSQL in Action: Real-Time Streaming ETL from Oracle Transactional Data

@machinelearnbot

In this post I'm going to show what streaming ETL looks like in practice. My first job from university was building a data warehouse for a retailer in the UK. Back then, it was writing COBOL jobs to load tables in DB2. We waited for all the shops to close and do their end of day system processing, and send their data back to the central mainframe. From there it was checked and loaded, and then reports generated on it.


The five Ps of AI strategy for marketers

@machinelearnbot

Some predict that Artificial Intelligence will drive the next industrial revolution. What is certain is that over the next few years AI will become more important to marketers. But to unlock AI's huge potential you need an AI strategy. Here are five Ps to help you develop yours. If you're interested in marketing applications of AI, Econsultancy's Supercharged conference takes place in London on May 1, 2018 and is chocked full of case studies and advice on how to build out your data science capability. How can AI help your organisation?


Qubole Snowflake: Transforming Data with Apache Spark -- [2 of 3] Qubole

@machinelearnbot

Snowflake and Qubole have partnered to bring a new level of integrated product capabilities that make it easier and faster to build and deploy machine learning (ML) and artificial intelligence (AI) models in Apache Spark using data stored in Snowflake and big data sources. In this second blog of three we cover how to perform advanced data preparation with Apache Spark to create refined data sets and write the results to Snowflake, thereby enabling new analytic use cases. The blog series covers the use cases directly served by the Qubole–Snowflake integration. The first blog discussed how to get started with ML in Apache Spark using data stored in Snowflake. Blogs two and three cover how data engineers can use Qubole to read and write data in Snowflake, including advanced data preparation, such as data wrangling, data augmentation, and advanced ETL to refine existing Snowflake data sets.


Social Media Would Not Lie: Prediction of the 2016 Taiwan Election via Online Heterogeneous Data

arXiv.org Machine Learning

The prevalence of online media has attracted researchers from various domains to explore human behavior and make interesting predictions. In this research, we leverage heterogeneous social media data collected from various online platforms to predict Taiwan's 2016 presidential election. In contrast to most existing research, we take a "signal" view of heterogeneous information and adopt the Kalman filter to fuse multiple signals into daily vote predictions for the candidates. We also consider events that influenced the election in a quantitative manner based on the so-called event study model that originated in the field of financial research. We obtained the following interesting findings. First, public opinions in online media dominate traditional polls in Taiwan election prediction in terms of both predictive power and timeliness. But offline polls can still function on alleviating the sample bias of online opinions. Second, although online signals converge as election day approaches, the simple Facebook "Like" is consistently the strongest indicator of the election result. Third, most influential events have a strong connection to cross-strait relations, and the Chou Tzu-yu flag incident followed by the apology video one day before the election increased the vote share of Tsai Ing-Wen by 3.66%. This research justifies the predictive power of online media in politics and the advantages of information fusion. The combined use of the Kalman filter and the event study method contributes to the data-driven political analytics paradigm for both prediction and attribution purposes.


Managing ETL Vendors

#artificialintelligence

It wasn't but a few short years ago your local technology vendor (IBM, Oracle, SAP, Microsoft, etc.) was always looking to enhance your ETL services on premise so you could be in "full control" of your database operations locally. You invested hundreds of thousands of dollars (if not much more!) to build out your local infrastructure to support your data platform of choice. Not to mention all the other costs for hardware, security, environment, expertise, and the list goes on. Today, one of the top projects of any technology leader or CXO is navigating their cloud strategy. Whether you are cloud mature, a hybrid, or really just exploring your cloud options, there is much activity around the cloud.


A Mixture of Views Network with Applications to the Classification of Breast Microcalcifications

arXiv.org Machine Learning

In this paper we examine data fusion methods for multi-view data classification. We present a decision concept which explicitly takes into account the input multi-view structure, where for each case there is a different subset of relevant views. This data fusion concept, which we dub Mixture of Views, is implemented by a special purpose neural network architecture. It is demonstrated on the task of classifying breast microcalcifications as benign or malignant based on CC and MLO mammography views. The single view decisions are combined by a data-driven decision, according to the relevance of each view in a given case, into a global decision. The method is evaluated on a large multi-view dataset extracted from the standardized digital database for screening mammography (DDSM). The experimental results show that our method outperforms previously suggested fusion methods.


A Beginner's Guide to Data Engineering – Part II

@machinelearnbot

In A Beginner's Guide to Data Engineering -- Part I, I explained that an organization's analytics capability is built layers upon layers. From collecting raw data and building data warehouses to applying Machine Learning, we saw why data engineering plays a critical role in all of these areas. One of any data engineer's most highly sought-after skills is the ability to design, build, and maintain data warehouses. I defined what data warehousing is and discussed its three common building blocks -- Extract, Transform, and Load, where the name ETL comes from. For those who are new to ETL processes, I introduced a few popular open source frameworks built by companies like LinkedIn, Pinterest, Spotify, and highlight Airbnb's own open-sourced tool Airflow.


Royal Philips Partners With Samsung on IoT, AI - M2M Magazine

#artificialintelligence

LAS VEGAS, NV – MARCH 8, 2018 – Royal Philips (NYSE: PHG, AEX: PHIA), a global leader in health technology, and Samsung Electronics Co. Ltd. today announced plans for a strategic partnership to connect Samsung's ARTIK Smart IoT Platform to the Philips HealthSuite Digital Platform. This collaboration will ultimately allow the Samsung ARTIK ecosystem of connected devices to safely access and share information with Philips' cloud platform. Healthcare application developers will be able to realize interoperable connected health solutions using integrated data sets and innovative HealthSuite services such as advanced health analytics. "This collaboration will enable healthcare application developers to focus on the development of innovative applications rather than on the technical integration of devices," said Dale Wiggins, General Manager Philips HealthSuite Digital Platform at Philips. "By strengthening our HealthSuite ecosystem with Samsung ARTIK, we will be taking another important step in breaking down the silos in today's healthcare domain to create a trusted and seamless care experience for both consumers and care professionals."


Getting Personal with Big Data in Insurance: Strategies for Mastering Massive Amounts of Data - Global IQX

#artificialintelligence

Telemetry, IoT, wearables, AI, chatbots and drones are tools that help group Insurers better engage with customers and improve business processes. There is one thing that all of these technologies have in common: data. Personal data to be precise. Exactly how insurers will mine, manage and utilize the massive amounts of data now available from various internal and external sources may mean the difference between data mastery and data mystery for many carriers. In this blog, I'll outline a few things carriers can start to think about as they incorporate big data into their corporate strategies.