Performance Analysis
Treatment Effect Estimation with Data-Driven Variable Decomposition
Kuang, Kun (Tainghua University) | Cui, Peng ( Tsinghua University ) | Li, Bo ( Tsinghua University ) | Jiang, Meng ( University of Illinois Urbana-Champaign ) | Yang, Shiqiang (Tsinghua University) | Wang, Fei ( Cornell University )
One fundamental problem in causal inference is the treatment effect estimation in observational studies when variables are confounded. Control for confounding effect is generally handled by propensity score. But it treats all observed variables as confounders and ignores the adjustment variables, which have no influence on treatment but are predictive of the outcome. Recently, it has been demonstrated that the adjustment variables are effective in reducing the variance of the estimated treatment effect. However, how to automatically separate the confounders and adjustment variables in observational studies is still an open problem, especially in the scenarios of high dimensional variables, which are common in big data era. In this paper, we propose a Data-Driven Variable Decomposition (D$^2$VD) algorithm, which can 1) automatically separate confounders and adjustment variables with a data driven approach, and 2) simultaneously estimate treatment effect in observational studies with high dimensional variables. Under standard assumptions, we show experimentally that the proposed D$^2$VD algorithm can automatically separate the variables precisely, and estimate treatment effect more accurately and with tighter confidence intervals than the state-of-the-art methods on both synthetic data and real online advertising dataset.
A Leukocyte Detection Technique in Blood Smear Images Using Plant Growth Simulation Algorithm
Bhattacharjee, Deblina (Kyungpook National University) | Paul, Anand (Kyungpook National University)
For quite some time, the analysis of leukocyte images has drawn significant attention from the fields of medicine and computer vision alike where various techniques have been used to automate the manual analysis and classification of such images. Analysing such samples manually for detecting leukocytes is time-consuming and prone to error as the cells have different morphological features. Therefore, in order to automate and optimize the process, the nature-inspired Plant Growth Simulation Algorithm (PGSA) has been applied in this paper. An automated detection technique of white blood cells embedded in obscured, stained and smeared images of blood samples has been presented in this paper which is based on a random bionic algorithm and makes use of a fitness function that measures the similarity of the generated candidate solution to an actual leukocyte. As the proposed algorithm proceeds the set of candidate solutions evolves, guaranteeing their fit with the actual leukocytes outlined in the edge map of the image. The experimental results of the stained images and the empirical results reported validate the higher precision and sensitivity of the proposed method than the existing methods. Further, the proposed method reduces the feasible sets of candidate points in each iteration, thereby decreasing the required run time of load flow, objective function evaluation, thus reaching the goal state in minimum time and within the desired constraints.
A Projection Based Conditional Dependence Measure with Applications to High-dimensional Undirected Graphical Models
Fan, Jianqing, Feng, Yang, Xia, Lucy
Measuring conditional dependence is an important topic in statistics with broad applications including graphical models. Under a factor model setting, a new conditional dependence measure based on projection is proposed. The corresponding conditional independence test is developed with the asymptotic null distribution unveiled where the number of factors could be high-dimensional. It is also shown that the new test has control over the asymptotic significance level and can be calculated efficiently. A generic method for building dependency graphs without Gaussian assumption using the new test is elaborated. Numerical results and real data analysis show the superiority of the new method.
metboost: Exploratory regression analysis with hierarchically clustered data
Miller, Patrick J., McArtor, Daniel B., Lubke, Gitta H.
As data collections become larger, exploratory regression analysis becomes more important but more challenging. When observations are hierarchically clustered the problem is even more challenging because model selection with mixed effect models can produce misleading results when nonlinear effects are not included into the model (Bauer and Cai, 2009). A machine learning method called boosted decision trees (Friedman, 2001) is a good approach for exploratory regression analysis in real data sets because it can detect predictors with nonlinear and interaction effects while also accounting for missing data. We propose an extension to boosted decision decision trees called metboost for hierarchically clustered data. It works by constraining the structure of each tree to be the same across groups, but allowing the terminal node means to differ. This allows predictors and split points to lead to different predictions within each group, and approximates nonlinear group specific effects. Importantly, metboost remains computationally feasible for thousands of observations and hundreds of predictors that may contain missing values. We apply the method to predict math performance for 15,240 students from 751 schools in data collected in the Educational Longitudinal Study 2002 (Ingels et al., 2007), allowing 76 predictors to have unique effects for each school. When comparing results to boosted decision trees, metboost has 15% improved prediction performance. Results of a large simulation study show that metboost has up to 70% improved variable selection performance and up to 30% improved prediction performance compared to boosted decision trees when group sizes are small
Joint Attention and Brain Functional Connectivity in Infants and Toddlers Cerebral Cortex
Initiating joint attention (IJA), the behavioral instigation of coordinated focus of 2 people on an object, emerges over the first 2 years of life and supports social-communicative functioning related to the healthy development of aspects of language, empathy, and theory of mind. Deficits in IJA provide strong early indicators for autism spectrum disorder, and therapies targeting joint attention have shown tremendous promise. However, the brain systems underlying IJA in early childhood are poorly understood, due in part to significant methodological challenges in imaging localized brain function that supports social behaviors during the first 2 years of life. Herein, we show that the functional organization of the brain is intimately related to the emergence of IJA using functional connectivity magnetic resonance imaging and dimensional behavioral assessments in a large semilongitudinal cohort of infants and toddlers. In particular, though functional connections spanning the brain are involved in IJA, the strongest brain-behavior associations cluster within connections between a small subset of functional brain networks; namely between the visual network and dorsal attention network and between the visual network and posterior cingulate aspects of the default mode network. These observations mark the earliest known description of how functional brain systems underlie a burgeoning fundamental social behavior, may help improve the design of targeted therapies for neurodevelopmental disorders, and, more generally, elucidate physiological mechanisms essential to healthy social behavior development. The emergence of joint attention (JA), the coordinated orienting of 2 people toward an object or event, occurs during the first 2 years of life, arguably the most dynamic and important period of early child development (Scaife and Bruner 1975). It is theorized that engaging in JA lays the foundation for prosocial cooperative behavior, from basic social-communicative functioning and language development (Premack 2004) to sophisticated forms of empathy (Mundy and Jarrold 2010) and theory of mind (Adolphs 2003). In fact, early exhibition of joint attention is strongly associated with later language ability (Morales et al. 2000; Mundy et al. 2007), and atypical development of the initiation of joint attention (IJA) is strongly indicative of autism spectrum disorder (ASD) (Bruinsma et al. 2004). The neural substrates underlying IJA in early childhood are poorly understood (Barak and Feng 2016), due in part to significant methodological challenges in imaging localized brain function that supports social behaviors in children during the first 2 years of life.
Cross validation Deep learning
It seems to me, that above definition of k-folded cross validation algorithm (from Deep Learning book by Ian Goodfellow and Yoshua Bengio and Aaron Courville, 2016) is inconsistent with the common definition of cross - validation. In above algorithm $e$ vector is the vector of loss function calculated for every particular example in the $D$ dataset, and then mean of vector $e$ is the estimation of generalization error. Whereas in standard definition of cross - validation, we calculate test error for each fold and then calculate average of them.
Path Assignment Techniques For Vehicle Tracking
Altendorfer, Richard, Wirkert, Sebastian
Many driver assistance systems such as Adaptive Cruise Control require the identification of the closest vehicle that is in the host vehicle's path. This entails an assignment of detected vehicles to the host vehicle path or neighboring paths. After reviewing approaches to the estimation of the host vehicle path and lane assignment techniques we introduce two methods that are motivated by the rationale to filter measured data as late in the processing stages as possible in order to avoid delays and other artifacts of intermediate filters. These filters generate discrete posterior probability distributions from which a path or "lane" index is extracted by a median estimator. The relative performance of those methods is illustrated by a ROC using experimental data and labeled ground truth data.
State-of-the-Art Machine Learning Automation with HDT
The number of "feature values" is the total number of key-value pairs found, including the small unstable ones, regardless as to whether they are classified as good or bad. Any article with a pv above the arbitrary value pv_threshold 7.1 (see source code) is considered as good. This corresponds to articles having about 1.3 times more traffic than average, since we use a log scale and the average pv is 6.81. The traffic for articles classified as good by the algorithm (pv 8.23) is about 4.2 times above the traffic that an average article receives. Also note that we correctly identify the vast majority of good articles, but this is because we work with small nodes. Finally an article is marked as good if it triggers at least one node marked as good (that is, satisfying the criterion defined in the next sub-section.) Besides pv_threshold, the algorithm uses 12 parameters to identify a usable, stable node classified as good.
A Gentle Guide to Machine Learning MonkeyLearn Blog
Machine Learning is a subfield within Artificial Intelligence that builds algorithms that allow computers to learn to perform tasks from data instead of being explicitly programmed. We can make machines learn to do things! The first time I heard that, it blew my mind. That means that we can program computers to learn things by themselves! The ability of learning is one of the most important aspects of intelligence. Translating that power to machines, sounds like a huge step towards making them more intelligent. And in fact, Machine Learning is the area that is making most of the progress in Artificial Intelligence today; being a trendy topic right now and pushing the possibility to have more intelligent machines.