Bayesian Inference of Spreading Processes on Networks Machine Learning

Infectious diseases are studied to understand their spreading mechanisms, to evaluate control strategies and to predict the risk and course of future outbreaks. Because people only interact with a small number of individuals, and because the structure of these interactions matters for spreading processes, the pairwise relationships between individuals in a population can be usefully represented by a network. Although the underlying processes of transmission are different, the network approach can be used to study the spread of pathogens in a contact network or the spread of rumors in an online social network. We study simulated simple and complex epidemics on synthetic networks and on two empirical networks, a social / contact network in an Indian village and an online social network in the U.S. Our goal is to learn simultaneously about the spreading process parameters and the source node (first infected node) of the epidemic, given a fixed and known network structure, and observations about state of nodes at several points in time. Our inference scheme is based on approximate Bayesian computation (ABC), an inference technique for complex models with likelihood functions that are either expensive to evaluate or analytically intractable. ABC enables us to adopt a Bayesian approach to the problem despite the posterior distribution being very complex. Our method is agnostic about the topology of the network and the nature of the spreading process. It generally performs well and, somewhat counter-intuitively, the inference problem appears to be easier on more heterogeneous network topologies, which enhances its future applicability to real-world settings where few networks have homogeneous topologies.

Q u al it at i v e R e as on in g f or F in an c i al Assessments: A Prospectus

AI Magazine

Most high-performance expert systems rely primarily on an ability to represent surface knowledge about associations between observable evidence or data, on the one hand, and hypotheses or classifications of interest, on the other. Although the present generation of practical systems shows that this architectural style can be pushed quite far, the limitations of current systems motivate a search for representations that would allow expert systems to move beyond the prevalent "symptom-disease" style. One approach that appears promising is to couple a rule-based or associational system module with some other computational model of the phenomenon or domain of interest. According to this approach, the domain knowledge captured in the second model would be selected to complement the associational knowledge represented in the first module. Simulation models have been especially attractive choices for the complementary representation because of the causal relations embedded in them (Brown & Burton, 1975; Cuena, 1983).

ADEPOS: A Novel Approximate Computing Framework for Anomaly Detection Systems and its Implementation in 65nm CMOS Machine Learning

To overcome the energy and bandwidth limitations of traditional IoT systems, edge computing or information extraction at the sensor node has become popular. However, now it is important to create very low energy information extraction or pattern recognition systems. In this paper, we present an approximate computing method to reduce the computation energy of a specific type of IoT system used for anomaly detection (e.g. in predictive maintenance, epileptic seizure detection, etc). Termed as Anomaly Detection Based Power Savings (ADEPOS), our proposed method uses low precision computing and low complexity neural networks at the beginning when it is easy to distinguish healthy data. However, on the detection of anomalies, the complexity of the network and computing precision are adaptively increased for accurate predictions. We show that ensemble approaches are well suited for adaptively changing network size. To validate our proposed scheme, a chip has been fabricated in UMC65nm process that includes an MSP430 microprocessor along with an on-chip switching mode DC-DC converter for dynamic voltage and frequency scaling. Using NASA bearing dataset for machine health monitoring, we show that using ADEPOS we can achieve 8.95X saving of energy along the lifetime without losing any detection accuracy. The energy savings are obtained by reducing the execution time of the neural network on the microprocessor.

OMG, I Have to Tweet that! A Study of Factors that Influence Tweet Rates

AAAI Conferences

Many studies have shown that social data such as tweets are a rich source of information about the real-world including, for example, insights into health trends. A key limitation when analyzing Twitter data, however, is that it depends on people self-reporting their own behaviors and observations. In this paper, we present a large-scale quantitative analysis of some of the factors that influence self-reporting bias. In our study, we compare a year of tweets about weather events to ground-truth knowledge about actual weather occurrences. For each weather event we calculate how extreme, how expected, and how big a change the event represents. We calculate the extent to which these factors can explain the daily variations in tweet rates about weather events. We find that we can build global models that take into account basic weather information, together with extremeness, expectation and change calculations to account for over 40% of the variability in tweet rates. We build location-specific (i.e., a model per each metropolitan area) models that account for an average of 70% of the variability in tweet rates.

Fooled by Twitter Data


Data scientists must always remember that data sets are not objective - they are selected, collected, filtered, structured and analyzed by human design. Naked and hidden biases in selecting, collecting, structuring and analyzing data present serious risks. For example, a recent Wall Street Journal article entitled "Tweets Provide New Way to Gauge TV Audiences" provides evidence of a disconnect between mainstream viewers and folks who use Twitter. The chart above shows the disconnect between the most popular and most tweeted shows - the most tweeted show is not a top ten show. While Twitter data can be useful for detecting trends and sentiments for certain areas (e.g., disease surveillance, natural disaster surveillance, product sentiments, financial trading, politics) in limited circumstances using scientific methods, it can also mislead and present a false view of reality.