Goto

Collaborating Authors

Tackling Climate Change with Machine Learning

arXiv.org Artificial Intelligence

Climate change is one of the greatest challenges facing humanity, and we, as machine learning experts, may wonder how we can help. Here we describe how machine learning can be a powerful tool in reducing greenhouse gas emissions and helping society adapt to a changing climate. From smart grids to disaster management, we identify high impact problems where existing gaps can be filled by machine learning, in collaboration with other fields. Our recommendations encompass exciting research questions as well as promising business opportunities. We call on the machine learning community to join the global effort against climate change.


Diversifying Database Activity Monitoring with Bandits

arXiv.org Artificial Intelligence

Database activity monitoring (DAM) systems are commonly used by organizations to protect the organizational data, knowledge and intellectual properties. In order to protect organizations database DAM systems have two main roles, monitoring (documenting activity) and alerting to anomalous activity. Due to high-velocity streams and operating costs, such systems are restricted to examining only a sample of the activity. Current solutions use policies, manually crafted by experts, to decide which transactions to monitor and log. This limits the diversity of the data collected. Bandit algorithms, which use reward functions as the basis for optimization while adding diversity to the recommended set, have gained increased attention in recommendation systems for improving diversity. In this work, we redefine the data sampling problem as a special case of the multi-armed bandit (MAB) problem and present a novel algorithm, which combines expert knowledge with random exploration. We analyze the effect of diversity on coverage and downstream event detection tasks using a simulated dataset. In doing so, we find that adding diversity to the sampling using the bandit-based approach works well for this task and maximizing population coverage without decreasing the quality in terms of issuing alerts about events.



Big Data Meet Cyber-Physical Systems: A Panoramic Survey

arXiv.org Machine Learning

The world is witnessing an unprecedented growth of cyber-physical systems (CPS), which are foreseen to revolutionize our world {via} creating new services and applications in a variety of sectors such as environmental monitoring, mobile-health systems, intelligent transportation systems and so on. The {information and communication technology }(ICT) sector is experiencing a significant growth in { data} traffic, driven by the widespread usage of smartphones, tablets and video streaming, along with the significant growth of sensors deployments that are anticipated in the near future. {It} is expected to outstandingly increase the growth rate of raw sensed data. In this paper, we present the CPS taxonomy {via} providing a broad overview of data collection, storage, access, processing and analysis. Compared with other survey papers, this is the first panoramic survey on big data for CPS, where our objective is to provide a panoramic summary of different CPS aspects. Furthermore, CPS {require} cybersecurity to protect {them} against malicious attacks and unauthorized intrusion, which {become} a challenge with the enormous amount of data that is continuously being generated in the network. {Thus, we also} provide an overview of the different security solutions proposed for CPS big data storage, access and analytics. We also discuss big data meeting green challenges in the contexts of CPS.


Zhang

AAAI Conferences

Recent developments in SCADA (Supervisory Control and Data Acquisition) systems for physical infrastructure, such as high pressure gas pipeline systems and electric grids, have generated enormous amounts of time series data. This data brings great opportunities for advanced knowledge discovery and data mining methods to identify system failures faster and earlier than operation experts. This paper presents our effort in collaboration with a utility company to solve a grand challenge; namely, to use advanced data mining methods to detect leaks on a high pressure gas transmission system. Leak detection models with unsupervised learning tasks were developed analyzing billions of data records to identify leaks of different sizes and impacts, with very low false positive rates. In particular, our solution was able to identify small leaks leading to rupture events. The model also identified small leaks not identifiable with current detection systems. Such high-fidelity early identification enables operation personnel to take preventive measures against possible catastrophic events. We then formulate several generic detection methods with models derived from time series anomaly detection methods. We show that our leak detection models are superior to the SCADA alarm system, a mass balance model and other generic time series anomaly detection models in terms of both detection accuracy and computation time.