Goto

Collaborating Authors

 Supervised Learning


Multi-Metric AutoRec for High Dimensional and Sparse User Behavior Data Prediction

arXiv.org Artificial Intelligence

User behavior data produced during interaction with massive items in the significant data era are generally heterogeneous and sparse, leaving the recommender system (RS) a large diversity of underlying patterns to excavate. Deep neural network-based models have reached the state-of-the-art benchmark of the RS owing to their fitting capabilities. However, prior works mainly focus on designing an intricate architecture with fixed loss function and regulation. These single-metric models provide limited performance when facing heterogeneous and sparse user behavior data. Motivated by this finding, we propose a multi-metric AutoRec (MMA) based on the representative AutoRec. The idea of the proposed MMA is mainly two-fold: 1) apply different $L_p$-norm on loss function and regularization to form different variant models in different metric spaces, and 2) aggregate these variant models. Thus, the proposed MMA enjoys the multi-metric orientation from a set of dispersed metric spaces, achieving a comprehensive representation of user data. Theoretical studies proved that the proposed MMA could attain performance improvement. The extensive experiment on five real-world datasets proves that MMA can outperform seven other state-of-the-art models in predicting unobserved user behavior data.


P4E: Few-Shot Event Detection as Prompt-Guided Identification and Localization

arXiv.org Artificial Intelligence

We propose P4E, an identify-and-localize event detection framework that integrates the best of few-shot prompting and structured prediction. Our framework decomposes event detection into an identification task and a localization task. For the identification task, which we formulate as multi-label classification, we leverage cloze-based prompting to align our objective with the pre-training task of language models, allowing our model to quickly adapt to new event types. We then employ an event type-agnostic sequence labeling model to localize the event trigger conditioned on the identification output. This heterogeneous model design allows P4E to quickly learn new event types without sacrificing the ability to make structured predictions. Our experiments demonstrate the effectiveness of our proposed design, and P4E shows superior performance for few-shot event detection on benchmark datasets FewEvent and MAVEN and comparable performance to SOTA for fully-supervised event detection on ACE.


Active Learning for Regression by Inverse Distance Weighting

arXiv.org Artificial Intelligence

Active learning (AL) strategies are used in supervised learning to let the training algorithm "ask questions" [34], i.e., choose the feature vectors to query for the corresponding target value during the training phase, usually based on the model learned so far. The main aim of AL is to possibly reduce the number of training samples required to train the model, or in other words, to get a model of the same prediction quality with a smaller dataset. This is particularly useful when knowing the target value associated with a given combination of features is an expensive operation, for example, it may involve asking a human to "label" samples manually, running a costly and time-consuming laboratory experiment, or performing a complex computer simulation. AL methods are usually categorized in query synthesis (or population-based) methods, in which the feature vector to query can be chosen arbitrarily, pool-based sampling methods, in which the vector can only be chosen within a given finite set (or "pool") of unlabeled values, and selective-sampling methods, in which vectors are proposed in a streaming flow and the AL algorithm can only decide online whether to ask for the corresponding target or not [34]. Several approaches to AL are available in the literature, see, e.g., the survey papers [1, 16,22,34,39]. Most of the literature focuses on classification problems [1,33], although AL has been investigated also for regression [9-13,25,27,38,41,42].


Learning to Reuse Distractors to support Multiple Choice Question Generation in Education

arXiv.org Artificial Intelligence

Multiple choice questions (MCQs) are widely used in digital learning systems, as they allow for automating the assessment process. However, due to the increased digital literacy of students and the advent of social media platforms, MCQ tests are widely shared online, and teachers are continuously challenged to create new questions, which is an expensive and time-consuming task. A particularly sensitive aspect of MCQ creation is to devise relevant distractors, i.e., wrong answers that are not easily identifiable as being wrong. This paper studies how a large existing set of manually created answers and distractors for questions over a variety of domains, subjects, and languages can be leveraged to help teachers in creating new MCQs, by the smart reuse of existing distractors. We built several data-driven models based on context-aware question and distractor representations, and compared them with static feature-based models. The proposed models are evaluated with automated metrics and in a realistic user test with teachers. Both automatic and human evaluations indicate that context-aware models consistently outperform a static feature-based approach. For our best-performing context-aware model, on average 3 distractors out of the 10 shown to teachers were rated as high-quality distractors. We create a performance benchmark, and make it public, to enable comparison between different approaches and to introduce a more standardized evaluation of the task. The benchmark contains a test of 298 educational questions covering multiple subjects & languages and a 77k multilingual pool of distractor vocabulary for future research.


World Cup 2022: Netherlands and Argentina descend into chaos as new yellow card record set

BBC News

Historians will argue that other matches in World Cup history were dirtier. Think'The Battle of Santiago' in 1962, in which Chile and Italy brawled throughout and which the BBC's David Coleman described as "the most stupid, appalling, disgusting and disgraceful exhibition of football in the history of the game".


Efficient Malware Analysis Using Metric Embeddings

arXiv.org Artificial Intelligence

In this paper, we explore the use of metric learning to embed Windows PE files in a low-dimensional vector space for downstream use in a variety of applications, including malware detection, family classification, and malware attribute tagging. Specifically, we enrich labeling on malicious and benign PE files using computationally expensive, disassembly-based malicious capabilities. Using these capabilities, we derive several different types of metric embeddings utilizing an embedding neural network trained via contrastive loss, Spearman rank correlation, and combinations thereof. We then examine performance on a variety of transfer tasks performed on the EMBER and SOREL datasets, demonstrating that for several tasks, low-dimensional, computationally efficient metric embeddings maintain performance with little decay, which offers the potential to quickly retrain for a variety of transfer tasks at significantly reduced storage overhead. We conclude with an examination of practical considerations for the use of our proposed embedding approach, such as robustness to adversarial evasion and introduction of task-specific auxiliary objectives to improve performance on mission critical tasks.


Triadic Temporal Exponential Random Graph Models (TTERGM)

arXiv.org Machine Learning

Temporal exponential random graph models (TERGM) are powerful statistical models that can be used to infer the temporal pattern of edge formation and elimination in complex networks (e.g., social networks). TERGMs can also be used in a generative capacity to predict longitudinal time series data in these evolving graphs. However, parameter estimation within this framework fails to capture many real-world properties of social networks, including: triadic relationships, small world characteristics, and social learning theories which could be used to constrain the probabilistic estimation of dyadic covariates. Here, we propose triadic temporal exponential random graph models (TTERGM) to fill this void, which includes these hierarchical network relationships within the graph model. We represent social network learning theory as an additional probability distribution that optimizes Markov chains in the graph vector space. The new parameters are then approximated via Monte Carlo maximum likelihood estimation. We show that our TTERGM model achieves improved fidelity and more accurate predictions compared to several benchmark methods on GitHub network data.


NASA's Artemis 1 spacecraft breaks a record set by Apollo 13 in 1970

Daily Mail - Science & tech

NASA's Artemis programme is already breaking records, less than two weeks after its very first spaceflight launched. The agency has confirmed its Artemis 1 Orion capsule smashed the record for the furthest distance travelled from Earth by any craft designed to carry humans. At 08:40 EST (13:40 GMT) on Saturday (November 26), Orion reached 248,655 miles from Earth, beating the record set by Apollo 13 in April 1970. Then, at 16:06 EST (21:06 GMT) on Saturday, it reached the farthest point in its orbit โ€“ a maximum distance of 268,553 miles. Artemis 1 is an uncrewed test flight for NASA's Artemis programme, comprising the Orion spacecraft, Space Launch System (SLS) rocket.


Searching for Discriminative Words in Multidimensional Continuous Feature Space

arXiv.org Artificial Intelligence

Word feature vectors have been proven to improve many NLP tasks. With recent advances in unsupervised learning of these feature vectors, it became possible to train it with much more data, which also resulted in better quality of learned features. Since it learns joint probability of latent features of words, it has the advantage that we can train it without any prior knowledge about the goal task we want to solve. We aim to evaluate the universal applicability property of feature vectors, which has been already proven to hold for many standard NLP tasks like part-of-speech tagging or syntactic parsing. In our case, we want to understand the topical focus of text documents and design an efficient representation suitable for discriminating different topics. The discriminativeness can be evaluated adequately on text categorisation task. We propose a novel method to extract discriminative keywords from documents. We utilise word feature vectors to understand the relations between words better and also understand the latent topics which are discussed in the text and not mentioned directly but inferred logically. We also present a simple way to calculate document feature vectors out of extracted discriminative words. We evaluate our method on the four most popular datasets for text categorisation. We show how different discriminative metrics influence the overall results. We demonstrate the effectiveness of our approach by achieving state-of-the-art results on text categorisation task using just a small number of extracted keywords. We prove that word feature vectors can substantially improve the topical inference of documents' meaning. We conclude that distributed representation of words can be used to build higher levels of abstraction as we demonstrate and build feature vectors of documents.


Lifting Weak Supervision To Structured Prediction

arXiv.org Artificial Intelligence

Weak supervision (WS) is a rich set of techniques that produce pseudolabels by aggregating easily obtained but potentially noisy label estimates from a variety of sources. WS is theoretically well understood for binary classification, where simple approaches enable consistent estimation of pseudolabel noise rates. Using this result, it has been shown that downstream models trained on the pseudolabels have generalization guarantees nearly identical to those trained on clean labels. While this is exciting, users often wish to use WS for structured prediction, where the output space consists of more than a binary or multi-class label set: e.g. rankings, graphs, manifolds, and more. Do the favorable theoretical properties of WS for binary classification lift to this setting? We answer this question in the affirmative for a wide range of scenarios. For labels taking values in a finite metric space, we introduce techniques new to weak supervision based on pseudo-Euclidean embeddings and tensor decompositions, providing a nearly-consistent noise rate estimator. For labels in constant-curvature Riemannian manifolds, we introduce new invariants that also yield consistent noise rate estimation. In both cases, when using the resulting pseudolabels in concert with a flexible downstream model, we obtain generalization guarantees nearly identical to those for models trained on clean data. Several of our results, which can be viewed as robustness guarantees in structured prediction with noisy labels, may be of independent interest. Empirical evaluation validates our claims and shows the merits of the proposed method.