Dialogue Term Extraction using Transfer Learning and Topological Data Analysis
Vukovic, Renato, Heck, Michael, Ruppik, Benjamin Matthias, van Niekerk, Carel, Zibrowius, Marcus, Gašić, Milica
–arXiv.org Artificial Intelligence
Goal oriented dialogue systems were originally designed as a natural language interface to a fixed data-set of entities that users might inquire about, further described by domain, slots, and values. As we move towards adaptable dialogue systems where knowledge about domains, slots, and values may change, there is an increasing need to automatically extract these terms from raw dialogues or related non-dialogue data on a large scale. In this paper, we take an important step in this direction by exploring different features that can enable systems to discover realizations of domains, slots, and values in dialogues in a purely data-driven fashion. The features that we examine stem from word embeddings, language modelling features, as well as topological features of the word embedding space. To examine the utility of each feature set, we train a seed model based on the widely used MultiWOZ data-set. Then, we apply this model to a different corpus, the Schema-Guided Dialogue data-set. Our method outperforms the previously proposed approach that relies solely on word embeddings. We also demonstrate that each of the features is responsible for discovering different kinds of content. We believe our results warrant further research towards ontology induction, and continued harnessing of topological data analysis for dialogue and natural language processing research.
arXiv.org Artificial Intelligence
Aug-22-2022
- Country:
- South America > Paraguay
- Oceania > Australia
- New South Wales > Sydney (0.04)
- North America
- Dominican Republic (0.04)
- United States
- Utah (0.04)
- Arizona (0.04)
- Rhode Island > Providence County
- Providence (0.04)
- New York > New York County
- New York City (0.04)
- Minnesota > Hennepin County
- Minneapolis (0.14)
- Massachusetts > Middlesex County
- Cambridge (0.04)
- California
- Santa Clara County > Palo Alto (0.04)
- Los Angeles County > Los Angeles (0.04)
- Canada
- Ontario > Toronto (0.04)
- British Columbia > Metro Vancouver Regional District
- Vancouver (0.04)
- Europe
- Asia
- China > Hong Kong (0.04)
- Middle East > Jordan (0.04)
- Japan (0.04)
- Nepal > Bagmati Province
- Kathmandu District > Kathmandu (0.04)
- Malaysia > Kuala Lumpur
- Kuala Lumpur (0.04)
- Genre:
- Research Report > New Finding (0.34)
- Industry:
- Consumer Products & Services > Hotels (1.00)
- Leisure & Entertainment (0.93)
- Technology: