Accuracy
Multi-Class Classification of Blood Cells -- End to End Computer Vision based diagnosis case study
The diagnosis of blood-based diseases often involves identifying and characterizing patient blood samples. Automated methods to detect and classify blood cell subtypes have important medical applications. Automated medical image processing and analysis offers a powerful tool for medical diagnosis. In this work we tackle the problem of white blood cell classification based on the morphological characteristics of their outer contour, color. The work we would explore a set of preprocessing and segmentation (Color-based segmentation, Morphological processing, contouring) algorithms along with a set of features extraction methods (Corner detection algorithms and Histogram of Gradients (HOG)), dimentionality reduction algorithms (Principal Component Analysis (PCA)) that are able to recognize and classify through various Unsupervised (k-nearest neighbors) and Supervised (Support Vector Machine, Decision Trees, Linear Discriminant Analysis, Quadratic Discriminant Analysis, Naรฏve Bayes) algorithms different categories of white blood cells to Eosinophil, Lymphocyte, Monocyte, and Neutrophil. We even take a step forwards to explore various Deep Convolutional Neural network architecture (Sqeezent, MobilenetV1, MobilenetV2, InceptionNet etc.) without preprocessing/segmentation and with preprocessing. We would like to explore many algorithms to identify the robust algorithm with least time complexity and low resource requirement. The outcome of this work can be a cue to selection of algorithms as per requirement for automated blood cell classification.
ParK: Sound and Efficient Kernel Ridge Regression by Feature Space Partitions
Carratino, Luigi, Vigogna, Stefano, Calandriello, Daniele, Rosasco, Lorenzo
We introduce ParK, a new large-scale solver for kernel ridge regression. Our approach combines partitioning with random projections and iterative optimization to reduce space and time complexity while provably maintaining the same statistical accuracy. In particular, constructing suitable partitions directly in the feature space rather than in the input space, we promote orthogonality between the local estimators, thus ensuring that key quantities such as local effective dimension and bias remain under control. We characterize the statistical-computational tradeoff of our model, and demonstrate the effectiveness of our method by numerical experiments on large-scale datasets.
Aligned Contrastive Predictive Coding
Chorowski, Jan, Ciesielski, Grzegorz, Dzikowski, Jarosลaw, ลaลcucki, Adrian, Marxer, Ricard, Opala, Mateusz, Pusz, Piotr, Rychlikowski, Paweล, Stypuลkowski, Michaล
We investigate the possibility of forcing a self-supervised model trained using a contrastive predictive loss, to extract slowly varying latent representations. Rather than producing individual predictions for each of the future representations, the model emits a sequence of predictions shorter than the sequence of upcoming representations to which they will be aligned. In this way, the prediction network solves a simpler task of predicting the next symbols, but not their exact timing, while the encoding network is trained to produce piece-wise constant latent codes. We evaluate the model on a speech coding task and demonstrate that the proposed Aligned Contrastive Predictive Figure 1: ACPC architecture. The encoder maps chunks of input Coding (ACPC) leads to higher linear phone prediction accuracy data into a latent space and the autoregressive model predicts and lower ABX error rates, while being slightly faster to K upcoming latent vectors. They are aligned using DTW to the train due to the reduced number of prediction heads.
Statistical Analysis of Perspective Scores on Hate Speech Detection
Mansourifar, Hadi, Alsagheer, Dana, Shi, Weidong, Ni, Lan, Huang, Yan
Hate speech detection has become a hot topic in recent years due to the exponential growth of offensive language in social media. It has proven that, state-of-the-art hate speech classifiers are efficient only when tested on the data with the same feature distribution as training data. As a consequence, model architecture plays the second role to improve the current results. In such a diverse data distribution relying on low level features is the main cause of deficiency due to natural bias in data. That's why we need to use high level features to avoid a biased judgement. In this paper, we statistically analyze the Perspective Scores and their impact on hate speech detection. We show that, different hate speech datasets are very similar when it comes to extract their Perspective Scores. Eventually, we prove that, over-sampling the Perspective Scores of a hate speech dataset can significantly improve the generalization performance when it comes to be tested on other hate speech datasets.
Open-set Label Noise Can Improve Robustness Against Inherent Label Noise
Wei, Hongxin, Tao, Lue, Xie, Renchunzi, An, Bo
Learning with noisy labels is a practically challenging problem in weakly supervised learning. In the existing literature, open-set noises are always considered to be poisonous for generalization, similar to closed-set noises. In this paper, we empirically show that open-set noisy labels can be non-toxic and even benefit the robustness against inherent noisy labels. Inspired by the observations, we propose a simple yet effective regularization by introducing Open-set samples with Dynamic Noisy Labels (ODNL) into training. With ODNL, the extra capacity of the neural network can be largely consumed in a way that does not interfere with learning patterns from clean data. Through the lens of SGD noise, we show that the noises induced by our method are random-direction, conflict-free and biased, which may help the model converge to a flat minimum with superior stability and enforce the model to produce conservative predictions on Out-of-Distribution instances. Extensive experimental results on benchmark datasets with various types of noisy labels demonstrate that the proposed method not only enhances the performance of many existing robust algorithms but also achieves significant improvement on Out-of-Distribution detection tasks even in the label noise setting.
Anticipatory Detection of Compulsive Body-focused Repetitive Behaviors with Wearables
Searle, Benjamin Lucas, Spathis, Dimitris, Constantinides, Marios, Quercia, Daniele, Mascolo, Cecilia
Body-focused repetitive behaviors (BFRBs), like face-touching or skin-picking, are hand-driven behaviors which can damage one's appearance, if not identified early and treated. Technology for automatic detection is still under-explored, with few previous works being limited to wearables with single modalities (e.g., motion). Here, we propose a multi-sensory approach combining motion, orientation, and heart rate sensors to detect BFRBs. We conducted a feasibility study in which participants (N=10) were exposed to BFRBs-inducing tasks, and analyzed 380 mins of signals under an extensive evaluation of sensing modalities, cross-validation methods, and observation windows. Our models achieved an AUC > 0.90 in distinguishing BFRBs, which were more evident in observation windows 5 mins prior to the behavior as opposed to 1-min ones. In a follow-up qualitative survey, we found that not only the timing of detection matters but also models need to be context-aware, when designing just-in-time interventions to prevent BFRBs.
30 Basic Machine Learning Questions Answered
Machine Learning is the path to a better and advanced future. A Machine Learning Developer is the most demanding job in 2021 and it is going to increase by 20โ30% in the upcoming 3โ5 years. Machine Learning by the core is all statistics and programming concepts. The language that is mostly used by Machine learning developers for coding is python because of its simplicity. In this blog, you will some of the most asked machine learning questions that every machine learning enthusiast has to answer one day.
Naive Bayes for Data Science -- With Python
There are many solutions proposed for classification purposes. Most of them share one common approach. Calculate the probability that a given sample belongs to a specific class. After that it is more subjective to decide if the given probability is an indication of class membership which is derived by cut-off threshold. This threshold is mainly determined by the utility function or risk-aversion policies.
Plant Disease Detection Using Image Processing and Machine Learning
Kulkarni, Pranesh, Karwande, Atharva, Kolhe, Tejas, Kamble, Soham, Joshi, Akshay, Wyawahare, Medha
One of the important and tedious task in agricultural practices is the detection of the disease on crops. It requires huge time as well as skilled labor. This paper proposes a smart and efficient technique for detection of crop disease which uses computer vision and machine learning techniques. The proposed system is able to detect 20 different diseases of 5 common plants with 93% accuracy.
Solution for Large-scale Long-tailed Recognition with Noisy Labels
Xian, Yuqiao, Zhuang, Jia-Xin, Yu, Fufu
This is a technical report for CVPR 2021 AliProducts Challenge. AliProducts Challenge is a competition proposed for studying the large-scale and fine-grained commodity image recognition problem encountered by worldleading ecommerce companies. The large-scale product recognition simultaneously meets the challenge of noisy annotations, imbalanced (long-tailed) data distribution and fine-grained classification. In our solution, we adopt stateof-the-art model architectures of both CNNs and Transformer, including ResNeSt, EfficientNetV2, and DeiT. We found that iterative data cleaning, classifier weight normalization, high-resolution finetuning, and test time augmentation are key components to improve the performance of training with the noisy and imbalanced dataset. Finally, we obtain 6.4365% mean class error rate in the leaderboard with our ensemble model.