Generating Regular Expressions from Natural Language Specifications: Are We There Yet?
Zhong, Zexuan (University of Illinois at Urbana-Champaign) | Guo, Jiaqi (Xi’an Jiaotong University) | Yang, Wei (University of Illinois at Urbana-Champaign) | Xie, Tao (University of Illinois at Urbana-Champaign) | Lou, Jian-Guang (Microsoft Research Asia) | Liu, Ting (Xi’an Jiaotong University) | Zhang, Dongmei (Microsoft Research Asia)
Recent state-of-the-art approaches automatically generate regular expressions from natural language specifications. Given that these approaches use only synthetic data in both training datasets and validation/test datasets, a natural question arises: are these approaches effective to address various real-world situations? To explore this question, in this paper, we conduct a characteristic study on comparing two synthetic datasets used by the recent research and a real-world dataset collected from the Internet, and conduct an experimental study on applying a state-of-the-art approach on the real-world dataset. Our study results suggest the existence of distinct characteristics between the synthetic datasets and the real-world dataset, and the state-of-the-art approach (based on a model trained from a synthetic dataset) achieves extremely low effectiveness when evaluated on real-world data, much lower than the effectiveness when evaluated on the synthetic dataset. We also provide initial analysis on some of those challenging cases and discuss future directions.
Apr-6-2018