In this paper we propose an approximated learning framework for large scale graphical models and derive message passing algorithms for learning their parameters efficiently. We first relate CRFs and structured SVMs and show that in the CRF's primal a variant of the log-partition function, known as soft-max, smoothly approximates the hinge loss function of structured SVMs. We then propose an intuitive approximation for structured prediction problems using Fenchel duality based on a local entropy approximation that computes the exact gradients of the approximated problem and is guaranteed to converge. Unlike existing approaches, this allow us to learn graphical models with cycles and very large number of parameters efficiently. We demonstrate the effectiveness of our approach in an image denoising task. This task was previously solved by sharing parameters across cliques. In contrast, our algorithm is able to efficiently learn large number of parameters resulting in orders of magnitude better prediction.
Conditional random fields (CRF) are quite successful on sequence labeling tasks such as natural language processing and biological sequence analysis. CRF models use linear potential functions to represent the relationship between input features and outputs. However, in many real-world applications such as protein structure prediction and handwriting recognition, the relationship between input features and outputs is highly complex and nonlinear, which cannot be accurately modeled by a linear function. To model the nonlinear relationship between input features and outputs we propose Conditional Neural Fields (CNF), a new conditional probabilistic graphical model for sequence labeling. Our CNF model extends CRF by adding one (or possibly several) middle layer between input features and outputs. The middle layer consists of a number of hidden parameterized gates, each acting as a local neural network node or feature extractor to capture the nonlinear relationship between input features and outputs. Therefore, conceptually this CNF model is much more expressive than the linear CRF model. To better control the complexity of the CNF model, we also present a hyperparameter optimization procedure within the evidence framework. Experiments on two widely-used benchmarks indicate that this CNF model performs significantly better than a number of popular methods. In particular, our CNF model is the best among about ten machine learning methods for protein secondary tructure prediction and also among a few of the best methods for handwriting recognition.
Inference is typically intractable in high-treewidth undirected graphical models, making maximum likelihood learning a challenge. One way to overcome this is to restrict parameters to a tractable set, most typically the set of tree-structured parameters. This paper explores an alternative notion of a tractable set, namely a set of "fast-mixing parameters" where Markov chain Monte Carlo (MCMC) inference can be guaranteed to quickly converge to the stationary distribution. While it is common in practice to approximate the likelihood gradient using samples obtained from MCMC, such procedures lack theoretical guarantees. This paper proves that for any exponential family with bounded sufficient statistics, (not just graphical models) when parameters are constrained to a fast-mixing set, gradient descent with gradients approximated by sampling will approximate the maximum likelihood solution inside the set with high-probability.
Traditional structured prediction models try to learn the conditional likelihood, i.e., p(y x), to capture the relationship between the structured output y and the input features x. For many models, computing the likelihood is intractable. These models are therefore hard to train, requiring the use of surrogate objectives or variational inference to approximate likelihood. In this paper, we propose conditional Glow (c-Glow), a conditional generative flow for structured output learning. C-Glow benefits from the ability of flow-based models to compute p(y x) exactly and efficiently. Learning with c-Glow does not require a surrogate objective or performing inference during training. Once trained, we can directly and efficiently generate conditional samples to do structured prediction. We evaluate this approach on different structured prediction tasks and find c-Glow's structured outputs comparable in quality with state-of-the-art deep structured prediction approaches.
Structured prediction is a powerful framework for coping with joint prediction of interacting outputs. A central difficulty in using this framework is that often the correct label dependence structure is unknown. At the same time, we would like to avoid an overly complex structure that will lead to intractable prediction. In this work we address the challenge of learning tree structured predictive models that achieve high accuracy while at the same time facilitate efficient (linear time) inference. We start by proving that this task is in general NP-hard, and then suggest an approximate alternative. Briefly, our CRANK approach relies on a novel Circuit-RANK regularizer that penalizes non-tree structures and that can be optimized using a CCCP procedure. We demonstrate the effectiveness of our approach on several domains and show that, despite the relative simplicity of the structure, prediction accuracy is competitive with a fully connected model that is computationally costly at prediction time.