Performance Analysis
Adaptive XGBoost for Evolving Data Streams
Montiel, Jacob, Mitchell, Rory, Frank, Eibe, Pfahringer, Bernhard, Abdessalem, Talel, Bifet, Albert
Boosting is an ensemble method that combines base models in a sequential manner to achieve high predictive accuracy. A popular learning algorithm based on this ensemble method is eXtreme Gradient Boosting (XGB). We present an adaptation of XGB for classification of evolving data streams. In this setting, new data arrives over time and the relationship between the class and the features may change in the process, thus exhibiting concept drift. The proposed method creates new members of the ensemble from mini-batches of data as new data becomes available. The maximum ensemble size is fixed, but learning does not stop when this size is reached because the ensemble is updated on new data to ensure consistency with the current concept. We also explore the use of concept drift detection to trigger a mechanism to update the ensemble. We test our method on real and synthetic data with concept drift and compare it against batch-incremental and instance-incremental classification methods for data streams.
Automatic Dialogic Instruction Detection for K-12 Online One-on-one Classes
Xu, Shiting, Ding, Wenbiao, Liu, Zitao
Online one-on-one class is created for highly interactive and immersive learning experience. It demands a large number of qualified online instructors. In this work, we develop six dialogic instructions and help teachers achieve the benefits of one-on-one learning paradigm. Moreover, we utilize neural language models, i.e., long short-term memory (LSTM), to detect above six instructions automatically. Experiments demonstrate that the LSTM approach achieves AUC scores from 0.840 to 0.979 among all six types of instructions on our real-world educational dataset.
Statistical Equity: A Fairness Classification Objective
Mehrabi, Ninareh, Huang, Yuzhong, Morstatter, Fred
Machine learning systems have been shown to propagate the societal errors of the past. In light of this, a wealth of research focuses on designing solutions that are "fair." Even with this abundance of work, there is no singular definition of fairness, mainly because fairness is subjective and context dependent. We propose a new fairness definition, motivated by the principle of equity, that considers existing biases in the data and attempts to make equitable decisions that account for these previous historical biases. We formalize our definition of fairness, and motivate it with its appropriate contexts. Next, we operationalize it for equitable classification. We perform multiple automatic and human evaluations to show the effectiveness of our definition and demonstrate its utility for aspects of fairness, such as the feedback loop.
Simultaneous imputation and disease classification in incomplete medical datasets using Multigraph Geometric Matrix Completion (MGMC)
Vivar, Gerome, Kazi, Anees, Burwinkel, Hendrik, Zwergal, Andreas, Navab, Nassir, Ahmadi, Seyed-Ahmad
Large-scale population-based studies in medicine are a key resource towards better diagnosis, monitoring, and treatment of diseases. They also serve as enablers of clinical decision support systems, in particular Computer Aided Diagnosis (CADx) using machine learning (ML). Numerous ML approaches for CADx have been proposed in literature. However, these approaches assume full data availability, which is not always feasible in clinical data. To account for missing data, incomplete data samples are either removed or imputed, which could lead to data bias and may negatively affect classification performance. As a solution, we propose an end-to-end learning of imputation and disease prediction of incomplete medical datasets via Multigraph Geometric Matrix Completion (MGMC). MGMC uses multiple recurrent graph convolutional networks, where each graph represents an independent population model based on a key clinical meta-feature like age, sex, or cognitive function. Graph signal aggregation from local patient neighborhoods, combined with multigraph signal fusion via self-attention, has a regularizing effect on both matrix reconstruction and classification performance. Our proposed approach is able to impute class relevant features as well as perform accurate classification on two publicly available medical datasets. We empirically show the superiority of our proposed approach in terms of classification and imputation performance when compared with state-of-the-art approaches. MGMC enables disease prediction in multimodal and incomplete medical datasets. These findings could serve as baseline for future CADx approaches which utilize incomplete datasets.
AI and Machine Learning in Cyber Security
Zen monks have been using a tool called a'koan' for hundreds of years to assist them in reaching enlightenment. These koans are like riddles or stories that can only be solved by letting go of ones narrowing believes and stories about how things should be. Zen students sit in silent meditation and observe how the koan is working on them, slowly transforming their way of looking at the world and revealing a tiny piece of the path to nirvana, that place of no suffering. "Zen is like a man hanging by his teeth in a tree over a precipice. His hands grasp no branch, his feet rest on no limb, and under the tree another man asks him, 'Why did Bodhidharma come to China from the West?' If the man in the tree does not answer, he misses the question, and if he answers, he falls and loses his life. Now what shall he do?" -- Zen Koan -- Case 5 of the Gateless Gate Collection.
How accurate are the results from self-testing for covid-19 at home?
IN THE UK, essential workers are now among those being sent home testing kits for coronavirus. This involves swabbing the inside of your own nose and the back of your throat, but how useful are the results? Studies from early in the outbreak in China have suggested that swabs taken by healthcare professionals may give a 30 per cent "false negative" rate, where infected people are told they don't have the virus (NEJM, doi.org/ggmzsp; medRxiv, doi.org/dvfr). This has prompted claims that self-testing will give even more false negatives and could raise the risk of infected people spreading the virus. No test is perfect โ swabbing technique and analysis errors can lead to inaccurate results.
Crackovid: Optimizing Group Testing
Abraham, Louis, Bรฉcigneul, Gary, Schรถlkopf, Bernhard
We study the problem usually referred to as group testing in the context of COVID-19. Given $n$ samples taken from patients, how should we select mixtures of samples to be tested, so as to maximize information and minimize the number of tests? We consider both adaptive and non-adaptive strategies, and take a Bayesian approach with a prior both for infection of patients and test errors. We start by proposing a mathematically principled objective, grounded in information theory. We then optimize non-adaptive optimization strategies using genetic algorithms, and leverage the mathematical framework of adaptive sub-modularity to obtain theoretical guarantees for the greedy-adaptive method.
Foundations and modelling of dynamic networks using Dynamic Graph Neural Networks: A survey
Skarding, Joakim, Gabrys, Bogdan, Musial, Katarzyna
Dynamic networks are used in a wide range of fields, including social network analysis, recommender systems and epidemiology. Representing complex networks as structures changing over time allow network models to leverage not only structural but also temporal patterns. However, as dynamic network literature stems from diverse fields and makes use of inconsistent terminology, it is challenging to navigate. Meanwhile, graph neural networks (GNNs) have gained a lot of attention in recent years for their ability to perform well on a range of network science tasks, such as link prediction and node classification. Despite the popularity of graph neural networks and the proven benefits of dynamic network models, there has been little focus on graph neural networks for dynamic networks. We aim to provide a review that demystifies dynamic networks, introduces dynamic graph neural networks (DGNNs) and appeals to researchers with a background in either network science or data science. We contribute: (i) a comprehensive dynamic network taxonomy, (ii) a survey of dynamic graph neural networks and (iii) an overview of how dynamic graph neural networks can be used for dynamic link prediction.
Coronavirus Update: Trump Exempted From Wearing Face Mask At White House
On April 3, the U.S. Centers for Disease Control and Prevention (CDC) recommended "wearing cloth face coverings in public settings where other social distancing measures are difficult to maintain (e.g., grocery stores and pharmacies) especially in areas of significant community-based transmission" of COVID-19. Despite a plethora of health experts telling it to do so since then, the White House only complied with this health guidance Monday. It sent an email to staffers ordering all of them to wear face masks inside the building. White House staffers can take-off their masks while they're seated at their desks and are able to maintain six feet of distance from others. Incredibly, President Donald Trump is exempted from this order, aides told The Washington Post.
New U.S. plans reimagine fighting wildfires amid virus risks
In new plans that offer a national reimagining of how to fight wildfires amid the risk of the coronavirus spreading through crews, it's not clear how officials will get the testing and equipment needed to keep firefighters safe in what's expected to be a difficult fire season. A U.S. group instead put together broad guidelines to consider when sending crews to blazes, with agencies and firefighting groups in different parts of the country able to tailor them to fit their needs. The wildfire season has largely begun, and states in the American West that have suffered catastrophic blazes in recent years could see higher-than-normal levels of wildfire because of drought. "This plan is intended to provide a higher-level framework of considerations and not specific operational procedures," the National Multi-Agency Coordination Group, made up of representatives from federal agencies who worked with state and local officials, wrote in each of the regional plans. "It is not written in terms of'how to' but instead provides considerations of'what,' 'why,' and'where.'"