Co-training an Unsupervised Constituency Parser with Weak Supervision
Maveli, Nickil, Cohen, Shay B.
–arXiv.org Artificial Intelligence
We introduce a method for unsupervised parsing that relies on bootstrapping classifiers to identify if a node dominates a specific span in a sentence. There are two types of classifiers, an inside classifier that acts on a span, and an outside classifier that acts on everything outside of a given span. Through self-training and co-training with the two classifiers, we show that the interplay between them helps improve the accuracy of both, and as a result, effectively parse. A seed bootstrapping technique prepares the data to train these classifiers. Our analyses further validate that such an approach in conjunction with weak supervision using prior branching knowledge of a known language (left/right-branching) and minimal heuristics injects strong inductive bias into the parser, achieving 63.1 F$_1$ on the English (PTB) test set. In addition, we show the effectiveness of our architecture by evaluating on treebanks for Chinese (CTB) and Japanese (KTB) and achieve new state-of-the-art results.\footnote{For code or data, please contact the authors.}
arXiv.org Artificial Intelligence
Oct-5-2021
- Country:
- Oceania > Australia
- New South Wales > Sydney (0.04)
- North America
- Canada (0.28)
- United States
- Maryland > Baltimore (0.04)
- Washington > King County
- Seattle (0.04)
- Pennsylvania > Philadelphia County
- Philadelphia (0.04)
- New York > New York County
- New York City (0.04)
- Minnesota > Hennepin County
- Minneapolis (0.14)
- Massachusetts
- Suffolk County > Boston (0.04)
- Middlesex County > Cambridge (0.04)
- Georgia > Fulton County
- Atlanta (0.04)
- California
- San Francisco County > San Francisco (0.04)
- San Diego County > San Diego (0.04)
- Europe
- United Kingdom > England
- Greater Manchester > Manchester (0.04)
- Spain > Catalonia
- Barcelona Province > Barcelona (0.04)
- Italy > Tuscany
- Florence (0.04)
- Hungary > Budapest
- Budapest (0.04)
- United Kingdom > England
- Asia
- China > Hong Kong (0.04)
- South Korea (0.04)
- Middle East > Qatar
- Japan > Honshū
- Tōhoku (0.04)
- Oceania > Australia
- Genre:
- Research Report (0.50)
- Technology: