Nakashole, Ndapa
On Linearizing Structured Data in Encoder-Decoder Language Models: Insights from Text-to-SQL
Shao, Yutong, Nakashole, Ndapa
Structured data, prevalent in tables, databases, and knowledge graphs, poses a significant challenge in its representation. With the advent of large language models (LLMs), there has been a shift towards linearization-based methods, which process structured data as sequential token streams, diverging from approaches that explicitly model structure, often as a graph. Crucially, there remains a gap in our understanding of how these linearization-based methods handle structured data, which is inherently non-linear. This work investigates the linear handling of structured data in encoder-decoder language models, specifically T5. Our findings reveal the model's ability to mimic human-designed processes such as schema linking and syntax prediction, indicating a deep, meaningful learning of structure beyond simple token sequencing. We also uncover insights into the model's internal mechanisms, including the ego-centric nature of structure node encodings and the potential for model compression due to modality fusion redundancy. Overall, this work sheds light on the inner workings of linearization-based methods and could potentially provide guidance for future research.
Zero-shot Triplet Extraction by Template Infilling
Kim, Bosung, Iso, Hayate, Bhutani, Nikita, Hruschka, Estevam, Nakashole, Ndapa, Mitchell, Tom
The task of triplet extraction aims to extract pairs of entities and their corresponding relations from unstructured text. Most existing methods train an extraction model on training data involving specific target relations, and are incapable of extracting new relations that were not observed at training time. Generalizing the model to unseen relations typically requires fine-tuning on synthetic training data which is often noisy and unreliable. We show that by reducing triplet extraction to a template infilling task over a pre-trained language model (LM), we can equip the extraction model with zero-shot learning capabilities and eliminate the need for additional training data. We propose a novel framework, ZETT (ZEro-shot Triplet extraction by Template infilling), that aligns the task objective to the pre-training objective of generative transformers to generalize to unseen relations. Experiments on FewRel and Wiki-ZSL datasets demonstrate that ZETT shows consistent and stable performance, outperforming previous state-of-the-art methods, even when using automatically generated templates. https://github.com/megagonlabs/zett/
Never-Ending Learning
Mitchell, Tom M. (Carnegie Mellon University) | Cohen, William (Carnegie Mellon University) | Hruschka, Estevam (University of Sao Carlos) | Talukdar, Partha (Indian Institute of Science) | Betteridge, Justin (Carnegie Mellon University) | Carlson, Andrew (Google) | Mishra, Bhavana Dalvi (Carnegien Mellon University) | Gardner, Matthew (Carnegie Mellon University) | Kisiel, Bryan (Carnegie Mellon University) | Krishnamurthy, Jayant (Carnegie Mellon University) | Lao, Ni (Google) | Mazaitis, Kathryn (Carnegie Mellon University) | Mohamed, Thahir (Carnegie Mellon University) | Nakashole, Ndapa (Carnegie Mellon University) | Platanios, Emmanouil Antonios (Ohioe State University) | Ritter, Alan (Carnegie Mellon University) | Samadi, Mehdi (Duolingo) | Settles, Burr (Carnegie Mellon University) | Wang, Richard (Carnegie Mellon University) | Wijaya, Derry (Carnegie Mellon University) | Gupta, Abhinav (Carnegie Mellon University) | Chen, Xinlei (Alpine Data Lab) | Saparov, Abulhair (Pittsburgh Supercomputer Center) | Greaves, Malcolm | Welling, Joel
Whereas people learn many different types of knowledge from diverse experiences over many years, most current machine learning systems acquire just a single function or data model from just a single data set. We propose a never-ending learning paradigm for machine learning, to better reflect the more ambitious and encompassing type of learning performed by humans. As a case study, we describe the Never-Ending Language Learner (NELL), which achieves some of the desired properties of a never-ending learner, and we discuss lessons learned. NELL has been learning to read the web 24 hours/day since January 2010, and so far has acquired a knowledge base with over 80 million confidence-weighted beliefs (e.g., servedWith(tea, biscuits) ). NELL has also learned millions of features and parameters that enable it to read these beliefs from the web. Additionally, it has learned to reason over these beliefs to infer new beliefs, and is able to extend its ontology by synthesizing new relational predicates. NELL can be tracked online at http://rtw.ml.cmu.edu, and followed on Twitter at @CMUNELL.