Reviews: Learning Pipelines with Limited Data and Domain Knowledge: A Study in Parsing Physics Problems

Neural Information Processing Systems 

The main idea is the use of PSL (probabilistic soft logic) as a framework to map partial estimates from multiple feedforward algorithms, along with domain specific logical rules, to parse visual diagrams from physics texts. Specifically, the pipelines use feature extractors for lines, arcs, corners, text elements, object elements (e.g.blocks in physics diagrams). These are combined along with human specified rules for groupings, high-level elements, text/figure labeling schemes along with the inference engine to produce the parse into a formal logical language. Experiments illustrate how the learned system: 1) is superior to state of the art diagram parsing scheme, 2) can utilize labelled as well as unlabelled data to achieve improved performance, 3) can handle various degrees of supervision in different parts of the pipeline and is robust, and 4) through integrative modeling of the stages in pipeline prevents error propagation. Quality, Clarity, originality, significance of the paper: The paper is well written and has extensive references to relevant literature, adequate experimentation.