mccallum
Structured Energy Network as a Loss Function Jay-Y oon Lee
Belanger & McCallum (2016) and Gygli et al. (2017) have shown that energy In this work, we propose Structured Energy As Loss (SEAL) to take advantage of the expressivity of energy networks without incurring the high inference cost. This raises a question: Can energy networks be used in a way that is as expressive as SPENs, as efficient at inference as feedforward approaches, and also easy to train?
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > United States > California (0.14)
- Asia > South Korea > Seoul > Seoul (0.04)
- (6 more...)
- Education (0.93)
- Energy > Power Industry (0.86)
- North America > United States > Massachusetts > Hampshire County > Amherst (0.14)
- North America > Canada (0.04)
- Europe > France > Hauts-de-France > Nord > Lille (0.04)
- Asia > Middle East > Jordan (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning (0.72)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.68)
- North America > United States > Maryland > Prince George's County > College Park (0.14)
- North America > United States > Colorado > Boulder County > Boulder (0.14)
- Asia > Middle East > Jordan (0.05)
- (4 more...)
Linear Regression in p-adic metric spaces
Baker, Gregory D., McCallum, Scott, Pattinson, Dirk
Many real-world machine learning problems involve inherently hierarchical data, yet traditional approaches rely on Euclidean metrics that fail to capture the discrete, branching nature of hierarchical relationships. We present a theoretical foundation for machine learning in p-adic metric spaces, which naturally respect hierarchical structure. Our main result proves that an n-dimensional plane minimizing the p-adic sum of distances to points in a dataset must pass through at least n + 1 of those points -- a striking contrast to Euclidean regression that highlights how p-adic metrics better align with the discrete nature of hierarchical data. As a corollary, a polynomial of degree n constructed to minimise the p-adic sum of residuals will pass through at least n + 1 points. As a further corollary, a polynomial of degree n approximating a higher degree polynomial at a finite number of points will yield a difference polynomial that has distinct rational roots. We demonstrate the practical significance of this result through two applications in natural language processing: analyzing hierarchical taxonomies and modeling grammatical morphology. These results suggest that p-adic metrics may be fundamental to properly handling hierarchical data structures in machine learning. In hierarchical data, interpolation between points often makes less sense than selecting actual observed points as representatives.
- Oceania > Australia (0.04)
- North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
- Europe > Germany > Saxony > Leipzig (0.04)
- Asia > Taiwan > Taiwan Province > Taipei (0.04)
- North America > United States > Massachusetts > Hampshire County > Amherst (0.14)
- North America > Canada (0.04)
- Europe > France > Hauts-de-France > Nord > Lille (0.04)
- Asia > Middle East > Jordan (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning (0.72)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.68)
- Asia > North Korea (0.14)
- North America > United States > Maryland > Prince George's County > College Park (0.14)
- North America > United States > Colorado > Boulder County > Boulder (0.14)
- (22 more...)
- Government > Military (1.00)
- Government > Regional Government > North America Government > United States Government (0.93)
- Law (0.69)
- Law Enforcement & Public Safety (0.68)
Answering Compositional Queries with Set-Theoretic Embeddings
Dasgupta, Shib, McCallum, Andrew, Rendle, Steffen, Zhang, Li
The need to compactly and robustly represent item-attribute relations arises in many important tasks, such as faceted browsing and recommendation systems. A popular machine learning approach for this task denotes that an item has an attribute by a high dot-product between vectors for the item and attribute -- a representation that is not only dense, but also tends to correct noisy and incomplete data. While this method works well for queries retrieving items by a single attribute (such as \emph{movies that are comedies}), we find that vector embeddings do not so accurately support compositional queries (such as movies that are comedies and British but not romances). To address these set-theoretic compositions, this paper proposes to replace vectors with box embeddings, a region-based representation that can be thought of as learnable Venn diagrams. We introduce a new benchmark dataset for compositional queries, and present experiments and analysis providing insights into the behavior of both. We find that, while vector and box embeddings are equally suited to single attribute queries, for compositional queries box embeddings provide substantial advantages over vectors, particularly at the moderate and larger retrieval set sizes that are most useful for users' search and browsing.
- North America > United States > Massachusetts (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
- Africa > Ethiopia > Addis Ababa > Addis Ababa (0.04)
- Media > Film (1.00)
- Leisure & Entertainment (1.00)
HTMOT : Hierarchical Topic Modelling Over Time
Poumay, Judicael, Ittoo, Ashwin
Over the years, topic models have provided an efficient way of extracting insights from text. However, while many models have been proposed, none are able to model topic temporality and hierarchy jointly. Modelling time provide more precise topics by separating lexically close but temporally distinct topics while modelling hierarchy provides a more detailed view of the content of a document corpus. In this study, we therefore propose a novel method, HTMOT, to perform Hierarchical Topic Modelling Over Time. We train HTMOT using a new implementation of Gibbs sampling, which is more efficient. Specifically, we show that only applying time modelling to deep sub-topics provides a way to extract specific stories or events while high level topics extract larger themes in the corpus. Our results show that our training procedure is fast and can extract accurate high-level topics and temporally precise sub-topics. We measured our model's performance using the Word Intrusion task and outlined some limitations of this evaluation method, especially for hierarchical models. As a case study, we focused on the various developments in the space industry in 2020.
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- Asia > Middle East > Jordan (0.05)
- North America > United States > New York > New York County > New York City (0.04)
- (4 more...)
Implicit Training of Energy Model for Structure Prediction
Shankar, Shiv, Piratla, Vihari
Most deep learning research has focused on developing new model and training procedures. On the other hand the training objective has usually been restricted to combinations of standard losses. When the objective aligns well with the evaluation metric, this is not a major issue. However when dealing with complex structured outputs, the ideal objective can be hard to optimize and the efficacy of usual objectives as a proxy for the true objective can be questionable. In this work, we argue that the existing inference network based structure prediction methods ( Tu and Gimpel 2018; Tu, Pang, and Gimpel 2020) are indirectly learning to optimize a dynamic loss objective parameterized by the energy model. We then explore using implicit-gradient based technique to learn the corresponding dynamic objectives. Our experiments show that implicitly learning a dynamic loss landscape is an effective method for improving model performance in structure prediction.
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.93)
SSM-Net: feature learning for Music Structure Analysis using a Self-Similarity-Matrix based loss
Peeters, Geoffroy, Angulo, Florian
In this paper, we propose a new paradigm to learn audio features for Music Structure Analysis (MSA). We train a deep encoder to learn features such that the Self-Similarity-Matrix (SSM) resulting from those approximates a ground-truth SSM. This is done by minimizing a loss between both SSMs. Since this loss is differentiable w.r.t. its input features we can train the encoder in a straightforward way. We successfully demonstrate the use of this training paradigm using the Area Under the Curve ROC (AUC) on the RWC-Pop dataset.
- North America > United States (0.05)
- North America > Canada > British Columbia > Vancouver Island > Capital Regional District > Victoria (0.04)
- Europe > United Kingdom > England > East Sussex > Brighton (0.04)
- (7 more...)
- Media > Music (0.88)
- Leisure & Entertainment (0.88)