Without a doubt one of the most exciting potential uses for AI (Artificial Intelligence) and in particular deep learning is in healthcare. Traditionally, diagnosis of killer illnesses such as cancer and heart disease have relied on examinations of x-rays and scans to spot early warning signs of developing problems. Image recognition is of course one of the tasks at which deep learning excels – from Facebook's facial recognition to Google's image search, practical examples of it in use are becoming more common by the day. Although being able to tag pictures of our friends without typing their name, or find amusing images of cats when we want them, may seem trivial use cases, the same technology is quickly advancing to a point where more far-reaching implications are being realized. In China, lung cancer is the leading cause of death, claiming over 600,000 lives each year, largely due to high levels of air pollution.
How can we find a general way to choose the most suitable samples for training a classifier? Even with very limited prior information? Active learning, which can be regarded as an iterative optimization procedure, plays a key role to construct a refined training set to improve the classification performance in a variety of applications, such as text analysis, image recognition, social network modeling, etc. Although combining representativeness and informativeness of samples has been proven promising for active sampling, state-of-the-art methods perform well under certain data structures. Then can we find a way to fuse the two active sampling criteria without any assumption on data? This paper proposes a general active learning framework that effectively fuses the two criteria. Inspired by a two-sample discrepancy problem, triple measures are elaborately designed to guarantee that the query samples not only possess the representativeness of the unlabeled data but also reveal the diversity of the labeled data. Any appropriate similarity measure can be employed to construct the triple measures. Meanwhile, an uncertain measure is leveraged to generate the informativeness criterion, which can be carried out in different ways. Rooted in this framework, a practical active learning algorithm is proposed, which exploits a radial basis function together with the estimated probabilities to construct the triple measures and a modified Best-versus-Second-Best strategy to construct the uncertain measure, respectively. Experimental results on benchmark datasets demonstrate that our algorithm consistently achieves superior performance over the state-of-the-art active learning algorithms.
The graph-based semi-supervised label propagation algorithm has delivered impressive classification results. However, the estimated soft labels typically contain mixed signs and noise, which cause inaccurate predictions due to the lack of suitable constraints. Moreover, available methods typically calculate the weights and estimate the labels in the original input space, which typically contains noise and corruption. Thus, the en-coded similarities and manifold smoothness may be inaccurate for label estimation. In this paper, we present effective schemes for resolving these issues and propose a novel and robust semi-supervised classification algorithm, namely, the tri-ple-matrix-recovery-based robust auto-weighted label propa-gation framework (ALP-TMR). Our ALP-TMR introduces a triple matrix recovery mechanism to remove noise or mixed signs from the estimated soft labels and improve the robustness to noise and outliers in the steps of assigning weights and pre-dicting the labels simultaneously. Our method can jointly re-cover the underlying clean data, clean labels and clean weighting spaces by decomposing the original data, predicted soft labels or weights into a clean part plus an error part by fitting noise. In addition, ALP-TMR integrates the au-to-weighting process by minimizing reconstruction errors over the recovered clean data and clean soft labels, which can en-code the weights more accurately to improve both data rep-resentation and classification. By classifying samples in the recovered clean label and weight spaces, one can potentially improve the label prediction results. The results of extensive experiments demonstrated the satisfactory performance of our ALP-TMR.
The Hanguang 800 is being implemented across many application scenarios within Aliyun, ranging from video classification to smart city applications. For example, the company's popular Pailitao platform applies visual image search to e-commerce, allowing customers to search for items by taking a photo of the query object. Using AI-based image recognition & indexing powered by the new Hanguang 800, Aliyun can increase image processing efficiency by 12 times compared to GPUs. With regard to smart city tech, Aliyun says it previously used 40 traditional GPUs to process videos of central Hangzhou with a latency of 300ms. Now the task requires only four Hanguang 800 with a lower latency of 150ms.