Government
Aligning where to see and what to tell: image caption with region-based attention and scene factorization
Jin, Junqi, Fu, Kun, Cui, Runpeng, Sha, Fei, Zhang, Changshui
Recent progress on automatic generation of image captions has shown that it is possible to describe the most salient information conveyed by images with accurate and meaningful sentences. In this paper, we propose an image caption system that exploits the parallel structures between images and sentences. In our model, the process of generating the next word, given the previously generated ones, is aligned with the visual perception experience where the attention shifting among the visual regions imposes a thread of visual ordering. This alignment characterizes the flow of "abstract meaning", encoding what is semantically shared by both the visual scene and the text description. Our system also makes another novel modeling contribution by introducing scene-specific contexts that capture higher-level semantic information encoded in an image. The contexts adapt language models for word generation to specific scene types. We benchmark our system and contrast to published results on several popular datasets. We show that using either region-based attention or scene-specific contexts improves systems without those components. Furthermore, combining these two modeling ingredients attains the state-of-the-art performance.
Detectability thresholds and optimal algorithms for community structure in dynamic networks
Ghasemian, Amir, Zhang, Pan, Clauset, Aaron, Moore, Cristopher, Peel, Leto
We study the fundamental limits on learning latent community structure in dynamic networks. Specifically, we study dynamic stochastic block models where nodes change their community membership over time, but where edges are generated independently at each time step. In this setting (which is a special case of several existing models), we are able to derive the detectability threshold exactly, as a function of the rate of change and the strength of the communities. Below this threshold, we claim that no algorithm can identify the communities better than chance. We then give two algorithms that are optimal in the sense that they succeed all the way down to this limit. The first uses belief propagation (BP), which gives asymptotically optimal accuracy, and the second is a fast spectral clustering algorithm, based on linearizing the BP equations. We verify our analytic and algorithmic results via numerical simulation, and close with a brief discussion of extensions and open questions.
Optimal model-free prediction from multivariate time series
Runge, Jakob, Donner, Reik V., Kurths, Jรผrgen
Forecasting a time series from multivariate predictors constitutes a challenging problem, especially using model-free approaches. Most techniques, such as nearest-neighbor prediction, quickly suffer from the curse of dimensionality and overfitting for more than a few predictors which has limited their application mostly to the univariate case. Therefore, selection strategies are needed that harness the available information as efficiently as possible. Since often the right combination of predictors matters, ideally all subsets of possible predictors should be tested for their predictive power, but the exponentially growing number of combinations makes such an approach computationally prohibitive. Here a prediction scheme that overcomes this strong limitation is introduced utilizing a causal pre-selection step which drastically reduces the number of possible predictors to the most predictive set of causal drivers making a globally optimal search scheme tractable. The information-theoretic optimality is derived and practical selection criteria are discussed. As demonstrated for multivariate nonlinear stochastic delay processes, the optimal scheme can even be less computationally expensive than commonly used sub-optimal schemes like forward selection. The method suggests a general framework to apply the optimal model-free approach to select variables and subsequently fit a model to further improve a prediction or learn statistical dependencies. The performance of this framework is illustrated on a climatological index of El Ni\~no Southern Oscillation.
A tree augmented naive Bayesian network experiment for breast cancer prediction
In order to investigate the breast cancer prediction problem on the aging population with the grades of DCIS, we conduct a tree augmented naive Bayesian network experiment trained and tested on a large clinical dataset including consecutive diagnostic mammography examinations, consequent biopsy outcomes and related cancer registry records in the population of women across all ages. Our tasks are to classify the conventional "Benign vs. Malignant" and the new "Benign/LG vs. IntG/HG/Invasive" based on mammography examination features and patient demographic information, specifically to predict the probability of malignancy, for the biopsy threshold setting and the biopsy decision making. The aggregated results of our tenfold cross validation method recommend a biopsy threshold higher than 2% for the aging population. The Receiver Operating Characteristic curves and the Precision-Recall curves by aggregating the tenfold cross validation results are interesting.
A hybrid algorithm for Bayesian network structure learning with application to multi-label learning
Gasse, Maxime, Aussem, Alex, Elghazel, Haytham
We present a novel hybrid algorithm for Bayesian network structure learning, called H2PC. It first reconstructs the skeleton of a Bayesian network and then performs a Bayesian-scoring greedy hill-climbing search to orient the edges. The algorithm is based on divide-and-conquer constraint-based subroutines to learn the local structure around a target variable. We conduct two series of experimental comparisons of H2PC against Max-Min Hill-Climbing (MMHC), which is currently the most powerful state-of-the-art algorithm for Bayesian network structure learning. First, we use eight well-known Bayesian network benchmarks with various data sizes to assess the quality of the learned structure returned by the algorithms. Our extensive experiments show that H2PC outperforms MMHC in terms of goodness of fit to new data and quality of the network structure with respect to the true dependence structure of the data. Second, we investigate H2PC's ability to solve the multi-label learning problem. We provide theoretical results to characterize and identify graphically the so-called minimal label powersets that appear as irreducible factors in the joint distribution under the faithfulness condition. The multi-label learning problem is then decomposed into a series of multi-class classification problems, where each multi-class variable encodes a label powerset. H2PC is shown to compare favorably to MMHC in terms of global classification accuracy over ten multi-label data sets covering different application domains. Overall, our experiments support the conclusions that local structural learning with H2PC in the form of local neighborhood induction is a theoretically well-motivated and empirically effective learning framework that is well suited to multi-label learning. The source code (in R) of H2PC as well as all data sets used for the empirical tests are publicly available.
How to Scale Up Kernel Methods to Be As Good As Deep Neural Nets
Lu, Zhiyun, May, Avner, Liu, Kuan, Garakani, Alireza Bagheri, Guo, Dong, Bellet, Aurรฉlien, Fan, Linxi, Collins, Michael, Kingsbury, Brian, Picheny, Michael, Sha, Fei
Deep neural networks (DNNs) and other types of deep learning architecture have made significant advances [3, 4]. In both well-benchmarked tasks and real-world applications, such as automatic speech recognition [21, 34, 44] and image recognition [29, 48], deep learning architectures have achieved an unprecedented level of success and have generated major impact. Arguably, the most instrumental factors contributing to their success are: (1) learning from a huge amount of training data for highly complex models with millions to billions of parameters; (2) adopting simple but effective optimization methods such as stochastic gradient descent; (3) combatting overfitting with new schemes such as dropout [23]; and (4) computing with massive parallelism on GPUs. New techniques as well as "tricks of the trade" are frequently invented and added to the toolboxes for machine learning researchers and practitioners. In stark contrast, there have been many fewer publicly known successful applications of kernel methods (such as support vector machines) to problems at a scale comparable to the speech and image recognition problems tackled by DNNs. This is a surprising chasm, noting that kernel methods have been extensively studied both theoretically and empirically for their power of modeling highly nonlinear data [43]. Moreover, the connection between kernel methods and (infinite) neural networks has also been long noted [35, 51, 11]. Nonetheless, a common misconception is that it may be difficult, if not impossible, for kernel methods to catch up with deep learning methods in addressing large-scale learning problems.
Robust Structured Low-Rank Approximation on the Grassmannian
Hage, Clemens, Kleinsteuber, Martin
Over the past years Robust PCA has been established as a standard tool for reliable low-rank approximation of matrices in the presence of outliers. Recently, the Robust PCA approach via nuclear norm minimization has been extended to matrices with linear structures which appear in applications such as system identification and data series analysis. At the same time it has been shown how to control the rank of a structured approximation via matrix factorization approaches. The drawbacks of these methods either lie in the lack of robustness against outliers or in their static nature of repeated batch-processing. We present a Robust Structured Low-Rank Approximation method on the Grassmannian that on the one hand allows for fast re-initialization in an online setting due to subspace identification with manifolds, and that is robust against outliers due to a smooth approximation of the $\ell_p$-norm cost function on the other hand. The method is evaluated in online time series forecasting tasks on simulated and real-world data.
Bayesian Poisson Tensor Factorization for Inferring Multilateral Relations from Sparse Dyadic Event Counts
Schein, Aaron, Paisley, John, Blei, David M., Wallach, Hanna
We present a Bayesian tensor factorization model for inferring latent group structures from dynamic pairwise interaction patterns. For decades, political scientists have collected and analyzed records of the form "country $i$ took action $a$ toward country $j$ at time $t$"---known as dyadic events---in order to form and test theories of international relations. We represent these event data as a tensor of counts and develop Bayesian Poisson tensor factorization to infer a low-dimensional, interpretable representation of their salient patterns. We demonstrate that our model's predictive performance is better than that of standard non-negative tensor factorization methods. We also provide a comparison of our variational updates to their maximum likelihood counterparts. In doing so, we identify a better way to form point estimates of the latent factors than that typically used in Bayesian Poisson matrix factorization. Finally, we showcase our model as an exploratory analysis tool for political scientists. We show that the inferred latent factor matrices capture interpretable multilateral relations that both conform to and inform our knowledge of international affairs.
Looking for Robots That Will Cooperate, Not Terminate - NYTimes.com
A robot that evoked a human form paused in front of a door leading to a simulated nuclear power plant accident and inexplicably stood motionless. Suddenly, from the grandstands overlooking the scene, a group of schoolchildren began to chant: "Go Robot! What has long been thought of as a brave new world in which mobile robots freely move about in factories, towns and cities is now approaching. Robots will advance from the dull, dirty and dangerous work that they do today to take on a range of tasks, from rescue work to elder care in close contact with humans. Just as software robots such as Apple's Siri and Microsoft's Cortana have rapidly become useful personal assistants, physical robots will occupy a place in the near future. That is the world imagined by government officials and technologists at the Defense Advanced Research Projects Agency, the American military organization that is charged with the mission of avoiding a Sputnik-style technology threat to national security. Last weekend at the sprawling Los Angeles County Fairgrounds, Darpa concluded the Robotics Challenge, a two-year-long effort to jump start this next generation of smart and presumably helpful robots by offering a cash prize for the designers of a machine that could work in concert with human controllers in a hazardous environment. The $3.5 million competition was won by a South Korean team from the Korean Advanced Institute of Science and Technology. The technology may still seem far-fetched, but betting against the agency that has had a remarkably far-reaching effect on the modern world -- from funding the work that led to both the personal computer and the Internet, to setting expectations that self-driving vehicles are only a matter of years away -- might be a mistake. Darpa officials have taken pains to assure anyone who would listen that it was not primarily interested in designing Terminators, or killer robots. The agency is an arm of the Pentagon, and its futuristic robots are an example of what is described as a "dual use" technology that will have both military and civilian uses. Darpa, which is also known for pioneering the Internet surveillance system that was exposed last year by Edward J. Snowden, has, under its current director, Arati Prabhakar, expanded its watchfulness over the potential effect of the technologies it helps foster. In introducing a workshop for discussion on the effect of robotics held at the end of the challenge competition on Sunday, Dr. Prabhakar described the agency as being committed to a broader mission: "We work together to build the future of robots that can help extend the capabilities that we have and build the technologies that will aid humanity in the future.
Celeste: Variational inference for a generative model of astronomical images
Regier, Jeffrey, Miller, Andrew, McAuliffe, Jon, Adams, Ryan, Hoffman, Matt, Lang, Dustin, Schlegel, David, Prabhat, null
We present a new, fully generative model of optical telescope image sets, along with a variational procedure for inference. Each pixel intensity is treated as a Poisson random variable, with a rate parameter dependent on latent properties of stars and galaxies. Key latent properties are themselves random, with scientific prior distributions constructed from large ancillary data sets. We check our approach on synthetic images. We also run it on images from a major sky survey, where it exceeds the performance of the current state-of-the-art method for locating celestial bodies and measuring their colors.