Supervised Learning
Transition-Based Neural Word Segmentation Using Word-Level Features
Zhang, Meishan, Zhang, Yue, Fu, Guohong
Character-based and word-based methods are two different solutions for Chinese word segmentation, the former exploiting sequence labeling models over characters and the latter using word-level features. Neural models have been exploited for character-based Chinese word segmentation, giving high accuracies by making use of external character embeddings, yet requiring less feature engineering. In this paper, we study a neural model for word-based Chinese word segmentation, by replacing the manually-designed discrete features with neural features in a transition-based word segmentation framework. Experimental results demonstrate that word features lead to comparable performance to the best systems in the literature, and a further combination of discrete and neural features obtains top accuracies on several benchmarks.
Search-Guided, Lightly-supervised Training of Structured Prediction Energy Networks
Rooshenas, Amirmohammad, Zhang, Dongxu, Sharma, Gopal, McCallum, Andrew
In structured output prediction tasks, labeling ground-truth training output is often expensive. However, for many tasks, even when the true output is unknown, we can evaluate predictions using a scalar reward function, which may be easily assembled from human knowledge or non-differentiable pipelines. But searching through the entire output space to find the best output with respect to this reward function is typically intractable. In this paper, we instead use efficient truncated randomized search in this reward function to train structured prediction energy networks (SPENs), which provide efficient test-time inference using gradient-based search on a smooth, learned representation of the score landscape, and have previously yielded state-of-the-art results in structured prediction. In particular, this truncated randomized search in the reward function yields previously unknown local improvements, providing effective supervision to SPENs, avoiding their traditional need for labeled training data.
Conditional Graph Neural Processes: A Functional Autoencoder Approach
Nassar, Marcel, Wang, Xin, Tumer, Evren
We introduce a novel encoder-decoder architecture to embed functional processes into latent vector spaces. This embedding can then be decode d to sample the encoded functions over any arbitrary domain. This autoenco der generalizes the recently introduced Conditional Neural Process (CNP) model o f random processes. Our architecture employs the latest advances in graph neura l networks to process irregularly sampled functions. Thus, we refer to our model a s Conditional Graph Neural Process (CGNP). Graph neural networks can effective ly exploit "local" structures of the metric spaces over which the functions/pr ocessesare defined. The contributions of this paper are twofold: (i) a novel graph-b ased encoder-decoder architecture for functionaland process embeddings, and (i i) a demonstration of the importance of using the structure of metric spaces for this t ype of representations.
SqueezeFit: Label-aware dimensionality reduction by semidefinite programming
McWhirter, Culver, Mixon, Dustin G., Villar, Soledad
Given labeled points in a high-dimensional vector space, we seek a low-dimensional subspace such that projecting onto this subspace maintains some prescribed distance between points of differing labels. Intended applications include compressive classification. Taking inspiration from large margin nearest neighbor classification, this paper introduces a semidefinite relaxation of this problem. Unlike its predecessors, this relaxation is amenable to theoretical analysis, allowing us to provably recover a planted projection operator from the data.
Lawyers in South Korean wartime labor case set deadline for response from Nippon Steel & Sumitomo Metal
Lawyers representing South Korean plaintiffs in a World War II labor court case against Japan's Nippon Steel & Sumitomo Metal Corp. have set a Dec. 24 deadline for the firm to show willingness to discuss a court verdict on compensation. If the firm fails to respond, the lawyers, who spoke after being denied a meeting with company officials for a second time on Tuesday, said they would start procedures to seize its South Korean assets. Tuesday's incident stemmed from a ruling by South Korea's Supreme Court late in October that Nippon Steel must pay 100 million won ($90,500) to each of four South Koreans for forced labor during the war. The Japanese government has denounced the verdict, saying all wartime reparations were dealt with in a 1965 treaty that normalized ties between the two nations. At the time of the ruling, Nippon Steel called it "extremely regrettable," but added that it would review the decision carefully in considering further steps.
Stochastic Graphlet Embedding
Abstract--Graph-based methods are known to be successful in many machine learning and pattern classification tasks. These methods consider semi-structured data as graphs where nodes correspond to primitives (parts, interest points, segments, etc.) and edges characterize the relationships between these primitives. However, these non-vectorial graph data cannot be straightforwardly plugged into off-the-shelf machine learning algorithms without a preliminary step of - explicit/implicit - graph vectorization and embedding. This embedding process should be resilient to intra-class graph variations while being highly discriminant. In this paper, we propose a novel high-order stochastic graphlet embedding (SGE) that maps graphs into vector spaces. Our main contribution includes a new stochastic search procedure that efficiently parses a given graph and extracts/samples unlimitedly high-order graphlets. We consider these graphlets, with increasing orders, to model local primitives as well as their increasingly complex interactions. In order to build our graph representation, we measure the distribution ofthese graphlets into a given graph, using particular hash functions that efficiently assign sampled graphlets into isomorphic sets with a very low probability of collision. When combined with maximum margin classifiers, these graphlet-based representations have positive impact on the performance of pattern comparison and recognition as corroborated through extensive experiments using standard benchmark databases. I. INTRODUCTION In this paper, we consider the problem of graph-based classification: given a pattern (image, shape, handwritten character, documentetc.) Most of the early pattern classification methods were designed using numerical feature vectors resulting from statistical analysis [12], [29]. Other more successful extensions of these methods also integrate structural information (see for instance [27]). These extensions were built upon the assumption that parts, in patterns, do not appear independently and structural relationships among these parts are crucial in order to achieve effective description and classification [20].
Leveraging Clinical Time-Series Data for Prediction: A Cautionary Tale
Sherman, Eli, Gurm, Hitinder, Balis, Ulysses, Owens, Scott, Wiens, Jenna
In healthcare, patient risk stratification models are often learned using time-series data extracted from electronic health records. When extracting data for a clinical prediction task, several formulations exist, depending on how one chooses the time of prediction and the prediction horizon. In this paper, we show how the formulation can greatly impact both model performance and clinical utility. Leveraging a publicly available ICU dataset, we consider two clinical prediction tasks: in-hospital mortality, and hypokalemia. Through these case studies, we demonstrate the necessity of evaluating models using an outcome-independent reference point, since choosing the time of prediction relative to the event can result in unrealistic performance. Further, an outcome-independent scheme outperforms an outcome-dependent scheme on both tasks (In-Hospital Mortality AUROC .882 vs. .831; Serum Potassium: AUROC .829 vs. .740) when evaluated on test sets that mimic real-world use.
Partial Evaluation of Logic Programs in Vector Spaces
Sakama, Chiaki, Nguyen, Hien D., Sato, Taisuke, Inoue, Katsumi
In this paper, we introduce methods of encoding propositional logic programs in vector spaces. Interpretations are represented by vectors and programs are represented by matrices. The least model of a definite program is computed by multiplying an interpretation vector and a program matrix. To optimize computation in vector spaces, we provide a method of partial evaluation of programs using linear algebra. Partial evaluation is done by unfolding rules in a program, and it is realized in a vector space by multiplying program matrices. We perform experiments using randomly generated programs and show that partial evaluation has potential for realizing efficient computation in huge scale of programs.
DONUT: CTC-based Query-by-Example Keyword Spotting
Lugosch, Loren, Myer, Samuel, Tomar, Vikrant Singh
Keyword spotting--or wakeword detection--is an essential feature for hands-free operation of modern voice-controlled devices. With such devices becoming ubiquitous, users might want to choose a personalized custom wakeword. In this work, we present DONUT, a CTC-based algorithm for online query-by-example keyword spotting that enables custom wakeword detection. The algorithm works by recording a small number of training examples from the user, generating a set of label sequence hypotheses from these training examples, and detecting the wakeword by aggregating the scores of all the hypotheses given a new audio recording. Our method combines the generalization and interpretability of CTC-based keyword spotting with the user-adaptation and convenience of a conventional query-by-example system. DONUT has low computational requirements and is well-suited for both learning and inference on embedded systems without requiring private user data to be uploaded to the cloud.