Generating Regular Expressions from Natural Language Specifications: Are We There Yet?

Zhong, Zexuan (University of Illinois at Urbana-Champaign) | Guo, Jiaqi (Xi’an Jiaotong University) | Yang, Wei (University of Illinois at Urbana-Champaign) | Xie, Tao (University of Illinois at Urbana-Champaign) | Lou, Jian-Guang (Microsoft Research Asia) | Liu, Ting (Xi’an Jiaotong University) | Zhang, Dongmei (Microsoft Research Asia)

AAAI Conferences 

Recent state-of-the-art approaches automatically generate regular expressions from natural language specifications. Given that these approaches use only synthetic data in both training datasets and validation/test datasets, a natural question arises: are these approaches effective to address various real-world situations? To explore this question, in this paper, we conduct a characteristic study on comparing two synthetic datasets used by the recent research and a real-world dataset collected from the Internet, and conduct an experimental study on applying a state-of-the-art approach on the real-world dataset. Our study results suggest the existence of distinct characteristics between the synthetic datasets and the real-world dataset, and the state-of-the-art approach (based on a model trained from a synthetic dataset) achieves extremely low effectiveness when evaluated on real-world data, much lower than the effectiveness when evaluated on the synthetic dataset. We also provide initial analysis on some of those challenging cases and discuss future directions.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found