AITopics

doi: 10.18653/v1/2022.naacl-main.343

2109.01242

Country:

Europe (1.00)
North America > Canada (0.68)
North America > United States > Massachusetts (0.28)
North America > United States > Minnesota (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

arXiv.org Artificial IntelligenceJun-27-2021

Word2Box: Learning Word Representation Using Box Embeddings

Dasgupta, Shib Sankar, Boratko, Michael, Atmakuri, Shriya, Li, Xiang Lorraine, Patel, Dhruvesh, McCallum, Andrew

Learning vector representations for words is one of the most fundamental topics in NLP, capable of capturing syntactic and semantic relationships useful in a variety of downstream NLP tasks. Vector representations can be limiting, however, in that typical scoring such as dot product similarity intertwines position and magnitude of the vector in space. Exciting innovations in the space of representation learning have proposed alternative fundamental representations, such as distributions, hyperbolic vectors, or regions. Our model, Word2Box, takes a region-based approach to the problem of word representation, representing words as $n$-dimensional rectangles. These representations encode position and breadth independently and provide additional geometric operations such as intersection and containment which allow them to model co-occurrence patterns vectors struggle with. We demonstrate improved performance on various word similarity tasks, particularly on less common words, and perform a qualitative analysis exploring the additional unique expressivity provided by Word2Box.

artificial intelligence, learning word representation, natural language, (2 more...)

2106.14361

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.87)
Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (0.60)

arXiv.org Artificial IntelligenceApr-18-2021

Case-based Reasoning for Natural Language Queries over Knowledge Bases

Das, Rajarshi, Zaheer, Manzil, Thai, Dung, Godbole, Ameya, Perez, Ethan, Lee, Jay-Yoon, Tan, Lizhen, Polymenakos, Lazaros, McCallum, Andrew

It is often challenging for a system to solve a new complex problem from scratch, but much easier if the system can access other similar problems and description of their solutions -- a paradigm known as case-based reasoning (CBR). We propose a neuro-symbolic CBR approach for question answering over large knowledge bases (CBR-KBQA). While the idea of CBR is tempting, composing a solution from cases is nontrivial, when individual cases only contain partial logic to the full solution. To resolve this, CBR-KBQA consists of two modules: a non-parametric memory that stores cases (question and logical forms) and a parametric model which can generate logical forms by retrieving relevant cases from memory. Through experiments, we show that CBR-KBQA can effectively derive novel combination of relations not presented in case memory that is required to answer compositional questions. On several KBQA datasets that test compositional generalization, CBR-KBQA achieves competitive performance. For example, on the challenging ComplexWebQuestions dataset, CBR-KBQA outperforms the current state of the art by 11% accuracy. Furthermore, we show that CBR-KBQA is capable of using new cases \emph{without} any further training. Just by incorporating few human-labeled examples in the non-parametric case memory, CBR-KBQA is able to successfully generate queries containing unseen KB relations.

artificial intelligence, machine learning, relation, (17 more...)

2104.08762

Country: North America > United States > Massachusetts (0.14)

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment (1.00)
Education (0.94)
Media (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Case-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

arXiv.org Machine LearningApr-14-2021

Exact and Approximate Hierarchical Clustering Using A*

Greenberg, Craig S., Macaluso, Sebastian, Monath, Nicholas, Dubey, Avinava, Flaherty, Patrick, Zaheer, Manzil, Ahmed, Amr, Cranmer, Kyle, McCallum, Andrew

Hierarchical clustering is a critical task in numerous domains. Many approaches are based on heuristics and the properties of the resulting clusterings are studied post hoc. However, in several applications, there is a natural cost function that can be used to characterize the quality of the clustering. In those cases, hierarchical clustering can be seen as a combinatorial optimization problem. To that end, we introduce a new approach based on A* search. We overcome the prohibitively large search space by combining A* with a novel \emph{trellis} data structure. This combination results in an exact algorithm that scales beyond previous state of the art, from a search space with $10^{12}$ trees to $10^{15}$ trees, and an approximate algorithm that improves over baselines, even in enormous search spaces that contain more than $10^{1000}$ trees. We empirically demonstrate that our method achieves substantially higher quality results than baselines for a particle physics use case and other clustering benchmarks. We describe how our method provides significantly improved theoretical bounds on the time and space complexity of A* for clustering.

artificial intelligence, hierarchical clustering, optimization problem, (15 more...)

2104.07061

Country: North America > United States > Massachusetts (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

arXiv.org Artificial IntelligenceApr-9-2021

Probabilistic Box Embeddings for Uncertain Knowledge Graph Reasoning

Chen, Xuelu, Boratko, Michael, Chen, Muhao, Dasgupta, Shib Sankar, Li, Xiang Lorraine, McCallum, Andrew

Knowledge bases often consist of facts which are harvested from a variety of sources, many of which are noisy and some of which conflict, resulting in a level of uncertainty for each triple. Knowledge bases are also often incomplete, prompting the use of embedding methods to generalize from known facts, however, existing embedding methods only model triple-level uncertainty, and reasoning results lack global consistency. To address these shortcomings, we propose BEUrRE, a novel uncertain knowledge graph embedding method with calibrated probabilistic semantics. BEUrRE models each entity as a box (i.e. axis-aligned hyperrectangle) and relations between two entities as affine transforms on the head and tail entity boxes. The geometry of the boxes allows for efficient calculation of intersections and volumes, endowing the model with calibrated probabilistic semantics and facilitating the incorporation of relational constraints. Extensive experiments on two benchmark datasets show that BEUrRE consistently outperforms baselines on confidence prediction and fact ranking due to its probabilistic calibration and ability to capture high-order dependencies among facts.

constraint, neural network, us government, (21 more...)

2104.04597

Country: North America > United States > California (0.14)

Genre: Research Report (0.64)

Industry: Automobiles & Trucks > Manufacturer (0.68)

Technology:

Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.88)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (0.62)

arXiv.org Artificial IntelligenceOct-28-2020

Improving Local Identifiability in Probabilistic Box Embeddings

Dasgupta, Shib Sankar, Boratko, Michael, Zhang, Dongxu, Vilnis, Luke, Li, Xiang Lorraine, McCallum, Andrew

Geometric embeddings have recently received attention for their natural ability to represent transitive asymmetric relations via containment. Box embeddings, where objects are represented by n-dimensional hyperrectangles, are a particularly promising example of such an embedding as they are closed under intersection and their volume can be calculated easily, allowing them to naturally represent calibrated probability distributions. The benefits of geometric embeddings also introduce a problem of local identifiability, however, where whole neighborhoods of parameters result in equivalent loss which impedes learning. Prior work addressed some of these issues by using an approximation to Gaussian convolution over the box parameters, however, this intersection operation also increases the sparsity of the gradient. In this work, we model the box parameters with min and max Gumbel distributions, which were chosen such that space is still closed under the operation of the intersection. The calculation of the expected intersection volume involves all parameters, and we demonstrate experimentally that this drastically improves the ability of such models to learn.

deep learning, neural network, probability, (20 more...)

2010.04831

Country: North America > United States (0.94)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

arXiv.org Machine LearningOct-24-2020

Energy-Based Reranking: Improving Neural Machine Translation Using Energy-Based Models

Naskar, Subhajit, Rooshenas, Amirmohammad, Sun, Simeng, Iyyer, Mohit, McCallum, Andrew

The discrepancy between maximum likelihood estimation (MLE) and task measures such as BLEU score has been studied before for autoregressive neural machine translation (NMT) and resulted in alternative training algorithms (Ranzato et al., 2016; Norouzi et al., 2016; Shen et al., 2016; Wu et al., 2018). However, MLE training remains the de facto approach for autoregressive NMT because of its computational efficiency and stability. Despite this mismatch between the training objective and task measure, we notice that the samples drawn from an MLE-based trained NMT support the desired distribution -- there are samples with much higher BLEU score comparing to the beam decoding output. To benefit from this observation, we train an energy-based model to mimic the behavior of the task measure (i.e., the energy-based model assigns lower energy to samples with higher BLEU score), which is resulted in a re-ranking algorithm based on the samples drawn from NMT: energy-based re-ranking (EBR). Our EBR consistently improves the performance of the Transformer-based NMT: +3 BLEU points on Sinhala-English, +2.0 BLEU points on IWSLT'17 French-English, and +1.7 BLEU points on WMT'19 German-English tasks.

artificial intelligence, bleu score, machine translation, (14 more...)

2009.13267

Country: North America > United States > Massachusetts (0.14)

Genre:

Instructional Material (0.48)
Research Report (0.40)

arXiv.org Artificial IntelligenceJun-24-2020

AutoKnow: Self-Driving Knowledge Collection for Products of Thousands of Types

Dong, Xin Luna, He, Xiang, Kan, Andrey, Li, Xian, Liang, Yan, Ma, Jun, Xu, Yifan Ethan, Zhang, Chenwei, Zhao, Tong, Saldana, Gabriel Blanco, Deshpande, Saurabh, Manduca, Alexandre Michetti, Ren, Jay, Singh, Surender Pal, Xiao, Fan, Chang, Haw-Shiuan, Karamanolakis, Giannis, Mao, Yuning, Wang, Yaqing, Faloutsos, Christos, McCallum, Andrew, Han, Jiawei

Can one build a knowledge graph (KG) for all products in the world? Knowledge graphs have firmly established themselves as valuable sources of information for search and question answering, and it is natural to wonder if a KG can contain information about products offered at online retail sites. There have been several successful examples of generic KGs, but organizing information about products poses many additional challenges, including sparsity and noise of structured data for products, complexity of the domain with millions of product types and thousands of attributes, heterogeneity across large number of categories, as well as large and constantly growing number of products. We describe AutoKnow, our automatic (self-driving) system that addresses these challenges. The system includes a suite of novel techniques for taxonomy construction, product property identification, knowledge extraction, anomaly detection, and synonym discovery. AutoKnow is (a) automatic, requiring little human intervention, (b) multi-scalable, scalable in multiple dimensions (many domains, many products, and many attributes), and (c) integrative, exploiting rich customer behavior logs. AutoKnow has been operational in collecting product knowledge for over 11K product types.

artificial intelligence, product type, text processing, (21 more...)

doi: 10.1145/3394486.3403323

2006.13473

Country: North America > United States (1.00)

Genre: Research Report > Promising Solution (0.34)

Industry:

Consumer Products & Services > Food, Beverage, Tobacco & Cannabis (0.93)
Retail (0.88)

Technology:

Information Technology > Knowledge Management > Knowledge Engineering (1.00)
Information Technology > Data Science > Data Quality (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.70)
(4 more...)

arXiv.org Machine LearningJul-23-2019

Optimal Transport-based Alignment of Learned Character Representations for String Similarity

Tam, Derek, Monath, Nicholas, Kobren, Ari, Traylor, Aaron, Das, Rajarshi, McCallum, Andrew

String similarity models are vital for record linkage, entity resolution, and search. In this work, we present STANCE --a learned model for computing the similarity of two strings. Our approach encodes the characters of each string, aligns the encodings using Sinkhorn Iteration (alignment is posed as an instance of optimal transport) and scores the alignment with a convolutional neural network. We evaluate STANCE's ability to detect whether two strings can refer to the same entity--a task we term alias detection. We construct five new alias detection datasets (and make them publicly available). We show that STANCE or one of its variants outperforms both state-of-the-art and classic, parameter-free similarity models on four of the five datasets. We also demonstrate STANCE's ability to improve downstream tasks by applying it to an instance of cross-document coreference and show that it leads to a 2.8 point improvement in B^3 F1 over the previous state-of-the-art approach.

deep learning, neural network, similarity, (21 more...)

1907.10165

Country: North America > United States > Massachusetts (0.14)

Genre: Research Report > Promising Solution (0.48)

Industry:

Government > Regional Government > North America Government > United States Government (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (0.93)
Government > Military (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Machine LearningJun-18-2019

Supervised Hierarchical Clustering with Exponential Linkage

Yadav, Nishant, Kobren, Ari, Monath, Nicholas, McCallum, Andrew

In supervised clustering, standard techniques for learning a pairwise dissimilarity function often suffer from a discrepancy between the training and clustering objectives, leading to poor cluster quality. Rectifying this discrepancy necessitates matching the procedure for training the dissimilarity function to the clustering algorithm. In this paper, we introduce a method for training the dissimilarity function in a way that is tightly coupled with hierarchical clustering, in particular single linkage. However, the appropriate clustering algorithm for a given dataset is often unknown. Thus we introduce an approach to supervised hierarchical clustering that smoothly interpolates between single, average, and complete linkage, and we give a training procedure that simultaneously learns a linkage function and a dissimilarity function. We accomplish this with a novel Exponential Linkage function that has a learnable parameter that controls the interpolation. In experiments on four datasets, our joint training procedure consistently matches or outperforms the next best training procedure/linkage function pair and gives up to 8 points improvement in dendrogram purity over discrepant pairs.

artificial intelligence, data mining, linkage, (17 more...)

1906.07859

Country:

North America > United States > Massachusetts (0.14)
North America > United States > California (0.14)

Genre:

Research Report > New Finding (0.46)
Research Report > Experimental Study (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)