The automatic digitizing of paper maps is a significant and challenging task for both academia and industry. As an important procedure of map digitizing, the semantic segmentation section mainly relies on manual visual interpretation with low efficiency. In this study, we select urban planning maps as a representative sample and investigate the feasibility of utilizing U-shape fully convolutional based architecture to perform end-to-end map semantic segmentation. The experimental results obtained from the test area in Shibuya district, Tokyo, demonstrate that our proposed method could achieve a very high Jaccard similarity coefficient of 93.63% and an overall accuracy of 99.36%. For implementation on GPGPU and cuDNN, the required processing time for the whole Shibuya district can be less than three minutes. The results indicate the proposed method can serve as a viable tool for urban planning map semantic segmentation task with high accuracy and efficiency.
Breast cancer is the most common cancer and is the leading cause of cancer death among women worldwide. Detection of breast cancer, while it is still small and confined to the breast, provides the best chance of effective treatment. Computer Aided Detection (CAD) systems that detect cancer from mammograms will help in reducing the human errors that lead to missing breast carcinoma. Literature is rich of scientific papers for methods of CAD design, yet with no complete system architecture to deploy those methods. On the other hand, commercial CADs are developed and deployed only to vendors' mammography machines with no availability to public access. This paper presents a complete CAD; it is complete since it combines, on a hand, the rigor of algorithm design and assessment (method), and, on the other hand, the implementation and deployment of a system architecture for public accessibility (system). (1) We develop a novel algorithm for image enhancement so that mammograms acquired from any digital mammography machine look qualitatively of the same clarity to radiologists' inspection; and is quantitatively standardized for the detection algorithms. (2) We develop novel algorithms for masses and microcalcifications detection with accuracy superior to both literature results and the majority of approved commercial systems. (3) We design, implement, and deploy a system architecture that is computationally effective to allow for deploying these algorithms to cloud for public access.
Xia, Rongkai (Sun Yat-sen University) | Pan, Yan (Sun Yat-sen University) | Lai, Hanjiang (Sun Yat-sen University) | Liu, Cong (Sun Yat-sen University) | Yan, Shuicheng (National University of Singapore)
Hashing is a popular approximate nearest neighbor search approach for large-scale image retrieval. Supervised hashing, which incorporates similarity/dissimilarity information on entity pairs to improve the quality of hashing function learning, has recently received increasing attention. However, in the existing supervised hashing methods for images, an input image is usually encoded by a vector of hand-crafted visual features. Such hand-crafted feature vectors do not necessarily preserve the accurate semantic similarities of images pairs, which may often degrade the performance of hashing function learning. In this paper, we propose a supervised hashing method for image retrieval, in which we automatically learn a good image representation tailored to hashing as well as a set of hash functions. The proposed method has two stages. In the first stage, given the pairwise similarity matrix $S$ over training images, we propose a scalable coordinate descent method to decompose $S$ into a product of $HH^T$ where $H$ is a matrix with each of its rows being the approximate hash code associated to a training image. In the second stage, we propose to simultaneously learn a good feature representation for the input images as well as a set of hash functions, via a deep convolutional network tailored to the learned hash codes in $H$ and optionally the discrete class labels of the images. Extensive empirical evaluations on three benchmark datasets with different kinds of images show that the proposed method has superior performance gains over several state-of-the-art supervised and unsupervised hashing methods.
--Powerful generative adversarial networks (GAN) have been developed to automatically synthesize realistic images from text. However, most existing tasks are limited to generating simple images such as flowers from captions. In this work, we extend this problem to the less addressed domain of face generation from fine-grained textual descriptions of face, e.g., "A person has curly hair, oval face, and mustache" . We are motivated by the potential of automated face generation to impact and assist critical tasks such as criminal face reconstruction. Since current datasets for the task are either very small or do not contain captions, we generate captions for images in the CelebA dataset by creating an algorithm to automatically convert a list of attributes to a set of captions. We then model the highly multi-modal problem of text to face generation as learning the conditional distribution of faces (conditioned on text) in same latent space. We utilize the current state-of-the-art GAN (DC-GAN with GAN-CLS loss) for learning conditional multi-modality. The presence of more fine-grained details and variable length of the captions makes the problem easier for a user but more difficult to handle compared to the other text-to-image tasks. We flipped the labels for real and fake images and added noise in discriminator . Generated images for diverse textual descriptions show promising results. In the end, we show how the widely used inceptions score is not a good metric to evaluate the performance of generative models used for synthesizing faces from text. I NTRODUCTION Photographic text-to-face synthesis is a mainstream problem with potential applications in image editing, video games, or for accessibility.
Multi-label image classification is a fundamental but challenging task towards general visual understanding. Existing methods found the region-level cues (e.g., features from RoIs) can facilitate multi-label classification. Nevertheless, such methods usually require laborious object-level annotations (i.e., object labels and bounding boxes) for effective learning of the object-level visual features. In this paper, we propose a novel and efficient deep framework to boost multi-label classification by distilling knowledge from weakly-supervised detection task without bounding box annotations. Specifically, given the image-level annotations, (1) we first develop a weakly-supervised detection (WSD) model, and then (2) construct an end-to-end multi-label image classification framework augmented by a knowledge distillation module that guides the classification model by the WSD model according to the class-level predictions for the whole image and the object-level visual features for object RoIs. The WSD model is the teacher model and the classification model is the student model. After this cross-task knowledge distillation, the performance of the classification model is significantly improved and the efficiency is maintained since the WSD model can be safely discarded in the test phase. Extensive experiments on two large-scale datasets (MS-COCO and NUS-WIDE) show that our framework achieves superior performances over the state-of-the-art methods on both performance and efficiency.