Accuracy
Machine Learning Constructives and Local Searches for the Travelling Salesman Problem
Vitali, Tommaso, Mele, Umberto Junior, Gambardella, Luca Maria, Montemanni, Roberto
The Travelling Salesman Problem (TSP) is one of the most investigated problems in the Combinatorial Optimization (CO) field. This is partly due to the fact that it belongs to the set of NP-Hard problems, which makes it particularly challenging. Moreover, the many practical problems that can be reduced to this - such as in Ratnesh et al. [10] where models of the TSP are presented to be used in the manufacture of microchips - make it even more attractive. At the same time, the full potentials of Machine Learning (ML) and Deep Learning (DL) techniques are becoming increasingly recognized in the CO field [2]. Mele et al. [17] recently introduced ML-Constructive, a promising constructive approach that computes fast solutions in two separate phases.
BezierSeg: Parametric Shape Representation for Fast Object Segmentation in Medical Images
Chen, Haichou, Deng, Yishu, Li, Bin, Li, Zeqin, Chen, Haohua, Jing, Bingzhong, Li, Chaofeng
Delineating the lesion area is an important task in image-based diagnosis. Pixel-wise classification is a popular approach to segmenting the region of interest. However, at fuzzy boundaries such methods usually result in glitches, discontinuity, or disconnection, inconsistent with the fact that lesions are solid and smooth. To overcome these undesirable artifacts, we propose the BezierSeg model which outputs bezier curves encompassing the region of interest. Directly modelling the contour with analytic equations ensures that the segmentation is connected, continuous, and the boundary is smooth. In addition, it offers sub-pixel accuracy. Without loss of accuracy, the bezier contour can be resampled and overlaid with images of any resolution. Moreover, a doctor can conveniently adjust the curve's control points to refine the result. Our experiments show that the proposed method runs in real time and achieves accuracy competitive with pixel-wise segmentation models.
Deep learning and liver disease
Many medical imaging techniques have played a pivotal role in the early detection, diagnosis, and treatment of diseases, such as computed tomography (CT), magnetic resonance imaging (MRI), ultrasound, positron emission tomography (PET), mammography, and X-ray. AI has made significant progress which allows machines to automatically represent and explain complicated data. It is widely applied in the medical field, especially in some domains that need imaging data analysis. According to Vivantil et al by using deep learning models based on longitudinal liver CT studies, new liver tumours could be detected automatically with a true positive rate of 86%, while the stand-alone detection rate was only 72% and this method achieved a precision of 87% and an improvement of 39% over the traditional SVM mode. CNN models which use ultrasound images to detect liver lesions were also developed. According to Liu et al by using a CNN model based on liver ultrasound images, the proposed method can effectively extract the liver capsules and accurately diagnose liver cirrhosis, with the diagnostic AUC being able to reach 0.968.
What facial recognition and the racist pseudoscience of phrenology have in common
'Phrenology' has an old-fashioned ring to it. It sounds like it belongs in a history book, filed somewhere between bloodletting and velocipedes. We'd like to think that judging people's worth based on the size and shape of their skull is a practice that's well behind us. However, phrenology is once again rearing its lumpy head. In recent years, machine-learning algorithms have promised governments and private companies the power to glean all sorts of information from people's appearance.
Bayesian analysis of the prevalence bias: learning and predicting from imbalanced data
Folgoc, Loic Le, Baltatzis, Vasileios, Alansary, Amir, Desai, Sujal, Devaraj, Anand, Ellis, Sam, Manzanera, Octavio E. Martinez, Kanavati, Fahdi, Nair, Arjun, Schnabel, Julia, Glocker, Ben
Datasets are rarely a realistic approximation of the target population. Say, prevalence is misrepresented, image quality is above clinical standards, etc. This mismatch is known as sampling bias. Sampling biases are a major hindrance for machine learning models. They cause significant gaps between model performance in the lab and in the real world. Our work is a solution to prevalence bias. Prevalence bias is the discrepancy between the prevalence of a pathology and its sampling rate in the training dataset, introduced upon collecting data or due to the practioner rebalancing the training batches. This paper lays the theoretical and computational framework for training models, and for prediction, in the presence of prevalence bias. Concretely a bias-corrected loss function, as well as bias-corrected predictive rules, are derived under the principles of Bayesian risk minimization. The loss exhibits a direct connection to the information gain. It offers a principled alternative to heuristic training losses and complements test-time procedures based on selecting an operating point from summary curves. It integrates seamlessly in the current paradigm of (deep) learning using stochastic backpropagation and naturally with Bayesian models.
3 ways to evaluate and improve machine learning models
When solving machine learning problems, simply training a model based on a problem-specific training machine learning algorithm does not guarantee either that the resulting model fully captures the underlying concept hidden in the training data or that the optimum parameter values were chosen for model training. Failing to test a model's performance means an underperforming model could be deployed on the production system, resulting in incorrect predictions. Choosing one model from the many available options based on intuition alone is risky. By generating different metrics, the efficacy of the model can be assessed. Use of these metrics reveals how well the model fits the data on which it was trained.
Foundations of data imbalance and solutions for a data democracy
Kulkarni, Ajay, Chong, Deri, Batarseh, Feras A.
Dealing with imbalanced data is a prevalent problem while performing classification on the datasets. Many times, this problem contributes to bias while making decisions or implementing policies. Thus, it is vital to understand the factors which causes imbalance in the data (or class imbalance). Such hidden biases and imbalances can lead to data tyranny, and a major challenge to a data democracy. In this chapter, two essential statistical elements are resolved: the degree of class imbalance and the complexity of the concept, solving such issues helps in building the foundations of a data democracy. Further, statistical measures which are appropriate in these scenarios are discussed and implemented on a real-life dataset (car insurance claims). In the end, popular data-level methods such as Random Oversampling, Random Undersampling, SMOTE, Tomek Link, and others are implemented in Python, and their performance is compared. Keywords - Imbalanced Data, Degree of Class Imbalance, Complexity of the Concept, Statistical Assessment Metrics, Undersampling and Oversampling 1. Motivation & Introduction In the real-world, data are collected from various sources like social networks, websites, logs, and databases. Whilst dealing with data from different sources, it is very crucial to check the quality of the data [1]. Data with questionable quality can introduce different types of biases in various stages of the data science lifecycle. These biases sometime can affect the association between variables, and in many cases could represent the opposite of the actual behavior [2].
Fraud Prevention At Banks With AI And ML
Cybersecurity is of the utmost concern for financial institutions (FIs) of all types, ranging from community credit unions to multibillion-dollar international banking conglomerates to everyday consumers. More than 2 million fraud reports were issued to the Federal Trade Commission in 2020, reaching a total loss of more than $3 billion. One survey found that 47 percent of businesses around the world have reported being victimized by digital crime within the past two years, with losses totaling $42 billion. Fraudsters are also growing more advanced in their tactics, leveraging sophisticated technologies like artificial intelligence (AI) and machine learning (ML) to deploy millions of attacks simultaneously. The overwhelming volume of attacks has put organizations on the back foot, scrambling to find countermeasures to the account takeovers (ATOs), phishing attacks and other schemes they face by the thousands every day.
Did the Model Change? Efficiently Assessing Machine Learning API Shifts
Chen, Lingjiao, Cai, Tracy, Zaharia, Matei, Zou, James
Machine learning (ML) prediction APIs are increasingly widely used. An ML API can change over time due to model updates or retraining. This presents a key challenge in the usage of the API because it is often not clear to the user if and how the ML model has changed. Model shifts can affect downstream application performance and also create oversight issues (e.g. if consistency is desired). In this paper, we initiate a systematic investigation of ML API shifts. We first quantify the performance shifts from 2020 to 2021 of popular ML APIs from Google, Microsoft, Amazon, and others on a variety of datasets. We identified significant model shifts in 12 out of 36 cases we investigated. Interestingly, we found several datasets where the API's predictions became significantly worse over time. This motivated us to formulate the API shift assessment problem at a more fine-grained level as estimating how the API model's confusion matrix changes over time when the data distribution is constant. Monitoring confusion matrix shifts using standard random sampling can require a large number of samples, which is expensive as each API call costs a fee. We propose a principled adaptive sampling algorithm, MASA, to efficiently estimate confusion matrix shifts. MASA can accurately estimate the confusion matrix shifts in commercial ML APIs using up to 90% fewer samples compared to random sampling. This work establishes ML API shifts as an important problem to study and provides a cost-effective approach to monitor such shifts.
Underwater Acoustic Networks for Security Risk Assessment in Public Drinking Water Reservoirs
Stork, Jörg, Wenzel, Philip, Landwein, Severin, Algorri, Maria-Elena, Zaefferer, Martin, Kusch, Wolfgang, Staubach, Martin, Bartz-Beielstein, Thomas, Köhn, Hartmut, Dejager, Hermann, Wolf, Christian
We have built a novel system for the surveillance of drinking water reservoirs using underwater sensor networks. We implement an innovative AI-based approach to detect, classify and localize underwater events. In this paper, we describe the technology and cognitive AI architecture of the system based on one of the sensor networks, the hydrophone network. We discuss the challenges of installing and using the hydrophone network in a water reservoir where traffic, visitors, and variable water conditions create a complex, varying environment. Our AI solution uses an autoencoder for unsupervised learning of latent encodings for classification and anomaly detection, and time delay estimates for sound localization. Finally, we present the results of experiments carried out in a laboratory pool and the water reservoir and discuss the system's potential.