The increased digitalisation and monitoring of the energy system opens up numerous opportunities % and solutions which can help to decarbonise the energy system. Applications on low voltage (LV), localised networks, such as community energy markets and smart storage will facilitate decarbonisation, but they will require advanced control and management. Reliable forecasting will be a necessary component of many of these systems to anticipate key features and uncertainties. Despite this urgent need, there has not yet been an extensive investigation into the current state-of-the-art of low voltage level forecasts, other than at the smart meter level. This paper aims to provide a comprehensive overview of the landscape, current approaches, core applications, challenges and recommendations. Another aim of this paper is to facilitate the continued improvement and advancement in this area. To this end, the paper also surveys some of the most relevant and promising trends. It establishes an open, community-driven list of the known LV level open datasets to encourage further research and development.
Biclustering is a data mining technique which searches for local patterns in numeric tabular data with main application in bioinformatics. This technique has shown promise in multiple areas, including development of biomarkers for cancer, disease subtype identification, or gene-drug interactions among others. In this paper we introduce EBIC.JL - an implementation of one of the most accurate biclustering algorithms in Julia, a modern highly parallelizable programming language for data science. We show that the new version maintains comparable accuracy to its predecessor EBIC while converging faster for the majority of the problems. We hope that this open source software in a high-level programming language will foster research in this promising field of bioinformatics and expedite development of new biclustering methods for big data.
The emergence of Machine Learning (ML) methods as an effective and easy to use tool for modeling and prediction has opened up new horizons for users in all aspects of business, finance, health, and research [18, 7]. In stark contrast to more traditional statistical methods that require relevant expertise, ML methods constitute largely black-boxes and are based on minimal assumptions. This resulted in adoption of ML from an audience of users with possible expertise in the domain of application but little or no expertise in statistics or computer science. Despite these features, effective use of ML does require good knowledge of the method used. The choice of the method (algorithm), the tuning of its hyperparameters and the architecture requires experience and good understanding of those methods and the data. Naturally, trial and error might provide possible solutions; however it can also result in sub-optimal use of ML and wasted computational resources. The goal of this research is to find optimal values of hyperparameters for a given dataset and a modelling objective given that the relationship between the hyperparameters, model quality (model score) and training cost is not known in advance.
Buluc, Aydin, Kolda, Tamara G., Wild, Stefan M., Anitescu, Mihai, DeGennaro, Anthony, Jakeman, John, Kamath, Chandrika, Ramakrishnan, null, Kannan, null, Lopes, Miles E., Martinsson, Per-Gunnar, Myers, Kary, Nelson, Jelani, Restrepo, Juan M., Seshadhri, C., Vrabie, Draguna, Wohlberg, Brendt, Wright, Stephen J., Yang, Chao, Zwart, Peter
Randomized algorithms have propelled advances in artificial intelligence and represent a foundational research area in advancing AI for Science. Future advancements in DOE Office of Science priority areas such as climate science, astrophysics, fusion, advanced materials, combustion, and quantum computing all require randomized algorithms for surmounting challenges of complexity, robustness, and scalability. This report summarizes the outcomes of that workshop, "Randomized Algorithms for Scientific Computing (RASC)," held virtually across four days in December 2020 and January 2021.
Microservices are becoming the defacto design choice for software architecture. It involves partitioning the software components into finer modules such that the development can happen independently. It also provides natural benefits when deployed on the cloud since resources can be allocated dynamically to necessary components based on demand. Therefore, enterprises as part of their journey to cloud, are increasingly looking to refactor their monolith application into one or more candidate microservices; wherein each service contains a group of software entities (e.g., classes) that are responsible for a common functionality. Graphs are a natural choice to represent a software system. Each software entity can be represented as nodes and its dependencies with other entities as links. Therefore, this problem of refactoring can be viewed as a graph based clustering task. In this work, we propose a novel method to adapt the recent advancements in graph neural networks in the context of code to better understand the software and apply them in the clustering task. In that process, we also identify the outliers in the graph which can be directly mapped to top refactor candidates in the software. Our solution is able to improve state-of-the-art performance compared to works from both software engineering and existing graph representation based techniques.
Our work represents another step into the detection and prevention of these ever-more present political manipulation efforts. We, therefore, start by focusing on understanding what the state-of-the-art approaches lack -- since the problem remains, this is a fair assumption. We find concerning issues within the current literature and follow a diverging path. Notably, by placing emphasis on using data features that are less susceptible to malicious manipulation and also on looking for high-level approaches that avoid a granularity level that is biased towards easy-to-spot and low impact cases. We designed and implemented a framework -- Twitter Watch -- that performs structured Twitter data collection, applying it to the Portuguese Twittersphere. We investigate a data snapshot taken on May 2020, with around 5 million accounts and over 120 million tweets (this value has since increased to over 175 million). The analyzed time period stretches from August 2019 to May 2020, with a focus on the Portuguese elections of October 6th, 2019. However, the Covid-19 pandemic showed itself in our data, and we also delve into how it affected typical Twitter behavior. We performed three main approaches: content-oriented, metadata-oriented, and network interaction-oriented. We learn that Twitter's suspension patterns are not adequate to the type of political trolling found in the Portuguese Twittersphere -- identified by this work and by an independent peer - nor to fake news posting accounts. We also surmised that the different types of malicious accounts we independently gathered are very similar both in terms of content and interaction, through two distinct analysis, and are simultaneously very distinct from regular accounts.
Bharti, Kishor, Cervera-Lierta, Alba, Kyaw, Thi Ha, Haug, Tobias, Alperin-Lea, Sumner, Anand, Abhinav, Degroote, Matthias, Heimonen, Hermanni, Kottmann, Jakob S., Menke, Tim, Mok, Wai-Keong, Sim, Sukin, Kwek, Leong-Chuan, Aspuru-Guzik, Alán
A universal fault-tolerant quantum computer that can solve efficiently problems such as integer factorization and unstructured database search requires millions of qubits with low error rates and long coherence times. While the experimental advancement towards realizing such devices will potentially take decades of research, noisy intermediate-scale quantum (NISQ) computers already exist. These computers are composed of hundreds of noisy qubits, i.e. qubits that are not error-corrected, and therefore perform imperfect operations in a limited coherence time. In the search for quantum advantage with these devices, algorithms have been proposed for applications in various disciplines spanning physics, machine learning, quantum chemistry and combinatorial optimization. The goal of such algorithms is to leverage the limited available resources to perform classically challenging tasks. In this review, we provide a thorough summary of NISQ computational paradigms and algorithms. We discuss the key structure of these algorithms, their limitations, and advantages. We additionally provide a comprehensive overview of various benchmarking and software tools useful for programming and testing NISQ devices.
How can we assess the value of data objectively, systematically and quantitatively? Pricing data, or information goods in general, has been studied and practiced in dispersed areas and principles, such as economics, marketing, electronic commerce, data management, data mining and machine learning. In this article, we present a unified, interdisciplinary and comprehensive overview of this important direction. We examine various motivations behind data pricing, understand the economics of data pricing and review the development and evolution of pricing models according to a series of fundamental principles. We discuss both digital products and data products. We also consider a series of challenges and directions for future work.
Esports, despite its expanding interest, lacks fundamental sports analytics resources such as accessible data or proven and reproducible analytical frameworks. Even Counter-Strike: Global Offensive (CSGO), the second most popular esport, suffers from these problems. Thus, quantitative evaluation of CSGO players, a task important to teams, media, bettors and fans, is difficult. To address this, we introduce (1) a data model for CSGO with an open-source implementation; (2) a graph distance measure for defining distances in CSGO; and (3) a context-aware framework to value players' actions based on changes in their team's chances of winning. Using over 70 million in-game CSGO events, we demonstrate our framework's consistency and independence compared to existing valuation frameworks. We also provide use cases demonstrating high-impact play identification and uncertainty estimation.
We present a model for Bayesian filtering (BF) in discrete dynamic systems where multiple entities (inter)-act, i.e. where the system dynamics is naturally described by a Multiset rewriting system (MRS). Typically, BF in such situations is computationally expensive due to the high number of discrete states that need to be maintained explicitly. We devise a lifted state representation, based on a suitable decomposition of multiset states, such that some factors of the distribution are exchangeable and thus afford an efficient representation. Intuitively, this representation groups together similar entities whose properties follow an exchangeable joint distribution. Subsequently, we introduce a BF algorithm that works directly on lifted states, without resorting to the original, much larger ground representation. This algorithm directly lends itself to approximate versions by limiting the number of explicitly represented lifted states in the posterior. We show empirically that the lifted representation can lead to a factorial reduction in the representational complexity of the distribution, and in the approximate cases can lead to a lower variance of the estimate and a lower estimation error compared to the original, ground representation.