The requirement of large amounts of annotated images has become one grand challenge while training deep neural network models for various visual detection and recognition tasks. This paper presents a novel image synthesis technique that aims to generate a large amount of annotated scene text images for training accurate and robust scene text detection and recognition models. The proposed technique consists of three innovative designs. First, it realizes "semantic coherent" synthesis by embedding texts at semantically sensible regions within the background image, where the semantic coherence is achieved by leveraging the semantic annotations of objects and image regions that have been created in the prior semantic segmentation research. Second, it exploits visual saliency to determine the embedding locations within each semantic sensible region, which coincides with the fact that texts are often placed around homogeneous regions for better visibility in scenes. Third, it designs an adaptive text appearance model that determines the color and brightness of embedded texts by learning from the feature of real scene text images adaptively. The proposed technique has been evaluated over five public datasets and the experiments show its superior performance in training accurate and robust scene text detection and recognition models.
This is the second story in our continuing series covering the basics of artificial intelligence. While it isn't necessary to read the first article, which covers neural networks, doing so may add to your understanding of the topics covered in this one. Teaching a computer how to'see' is no small feat. You can slap a camera on a PC, but that won't give it sight. In order for a machine to actually view the world like people or animals do, it relies on computer vision and image recognition.
TELLING a yellow taxi and a pair of binoculars apart is so easy most people could do it standing on their head. Not so for an artificial intelligence: flip the cab upside down and it sees binoculars. This is just one of dozens of examples that show AI is a lot worse at identifying objects by sight than many people realise.
In the article on Artificial Intelligence, Wikipedia states that: "Artificial Intelligence (AI) is intelligence demonstrated by machines, unlike the intelligence of humans and animals, which involves consciousness and emotionality." Machine Learning (ML), as a subset of Artificial Intelligence (AI) can learn by itself.