Southern University of Science and Technology
On Multiset Selection With Size Constraints
Qian, Chao (University of Science and Technology of China) | Zhang, Yibo (University of Science and Technology of China) | Tang, Ke (Southern University of Science and Technology) | Yao, Xin (Southern University of Science and Technology)
This paper considers the multiset selection problem with size constraints, which arises in many real-world applications such as budget allocation. Previous studies required the objective function f to be submodular, while we relax this assumption by introducing the notion of the submodularity ratios (denoted by α_f and β_f). We propose an anytime randomized iterative approach POMS, which maximizes the given objective f and minimizes the multiset size simultaneously. We prove that POMS using a reasonable time achieves an approximation guarantee of max{1-1/e^(β_f), (α_f/2)(1-1/e^(α_f))}. Particularly, when f is submdoular, this bound is at least as good as that of the previous greedy-style algorithms. In addition, we give lower bounds on the submodularity ratio for the objectives of budget allocation. Experimental results on budget allocation as well as a more complex application, namely, generalized influence maximization, exhibit the superior performance of the proposed approach.
SqueezedText: A Real-Time Scene Text Recognition by Binary Convolutional Encoder-Decoder Network
Liu, Zichuan (Nanyang Technological University) | Li, Yixing (Arizona State University) | Ren, Fengbo (Arizona State University) | Goh, Wang Ling (Nanyang Technological University) | Yu, Hao (Southern University of Science and Technology)
A new approach for real-time scene text recognition is proposed in this paper. A novel binary convolutional encoder-decoder network (B-CEDNet) together with a bidirectional recurrent neural network (Bi-RNN). The B-CEDNet is engaged as a visual front-end to provide elaborated character detection, and a back-end Bi-RNN performs character-level sequential correction and classification based on learned contextual knowledge. The front-end B-CEDNet can process multiple regions containing characters using a one-off forward operation, and is trained under binary constraints with significant compression. Hence it leads to both remarkable inference run-time speedup as well as memory usage reduction. With the elaborated character detection, the back-end Bi-RNN merely processes a low dimension feature sequence with category and spatial information of extracted characters for sequence correction and classification. By training with over 1,000,000 synthetic scene text images, the B-CEDNet achieves a recall rate of 0.86, precision of 0.88 and F-score of 0.87 on ICDAR-03 and ICDAR-13. With the correction and classification by Bi-RNN, the proposed real-time scene text recognition achieves state-of-the-art accuracy while only consumes less than 1-ms inference run-time. The flow processing flow is realized on GPU with a small network size of 1.01 MB for B-CEDNet and 3.23 MB for Bi-RNN, which is much faster and smaller than the existing solutions.