Goto

Collaborating Authors

 Regression


AI Identifies Live Cancer Cells In Less Than 35 Minutes With 95% Accuracy

#artificialintelligence

The ability to analyze single cells is one of the holy grails of precision medicine. Yuri Belotti, PhD, Doorgesh Sharma Jokhun, PhD, and Professor Chwee Teck (C.T.) Lim at National University of Singapore have developed a novel protocol for single-cell classification based on intracellular pH. Their paper entitled Machine learning based approach to pH imaging and classification of single cancer cells was published in APL Bioengineering. The pH in the human body varies between 4.7 and 8.0. Cancer growth, metastasis, and other diseases including Alzheimer's have been linked to deviations from normal intracellular acidity.


Learning Tensor Representations for Meta-Learning

arXiv.org Machine Learning

We introduce a tensor-based model of shared representation for meta-learning from a diverse set of tasks. Prior works on learning linear representations for meta-learning assume that there is a common shared representation across different tasks, and do not consider the additional task-specific observable side information. In this work, we model the meta-parameter through an order-$3$ tensor, which can adapt to the observed task features of the task. We propose two methods to estimate the underlying tensor. The first method solves a tensor regression problem and works under natural assumptions on the data generating process. The second method uses the method of moments under additional distributional assumptions and has an improved sample complexity in terms of the number of tasks. We also focus on the meta-test phase, and consider estimating task-specific parameters on a new task. Substituting the estimated tensor from the first step allows us estimating the task-specific parameters with very few samples of the new task, thereby showing the benefits of learning tensor representations for meta-learning. Finally, through simulation and several real-world datasets, we evaluate our methods and show that it improves over previous linear models of shared representations for meta-learning.


Machine Learning Model Development and Model Operations: Principles and Practices - KDnuggets

#artificialintelligence

The use of Machine Leaning (ML) has increased substantially in enterprise data analytics scenarios to extract valuable insights from the business data. Hence, it is very important to have an ecosystem to build, test, deploy, and maintain the enterprise grade machine learning models in production environments. The ML model development involves data acquisition from multiple trusted sources, data processing to make suitable for building the model, choose algorithm to build the model, build model, compute performance metrics and choose best performing model. The model maintenance plays critical role once the model is deployed into production. The maintenance of machine learning model includes keeping the model up to date and relevant in tune with the source data changes as there is a risk of model becoming outdated in course of time.


PETS-SWINF: A regression method that considers images with metadata based Neural Network for pawpularity prediction on 2021 Kaggle Competition "PetFinder.my"

arXiv.org Artificial Intelligence

Millions of stray animals suffer on the streets or are euthanized in shelters every day around the world. In order to better adopt stray animals, scoring the pawpularity (cuteness) of stray animals is very important, but evaluating the pawpularity of animals is a very labor-intensive thing. Consequently, there has been an urgent surge of interest to develop an algorithm that scores pawpularity of animals. However, the dataset in Kaggle not only has images, but also metadata describing images. Most methods basically focus on the most advanced image regression methods in recent years, but there is no good method to deal with the metadata of images. To address the above challenges, the paper proposes an image regression model called PETS-SWINF that considers metadata of the images. Our results based on a dataset of Kaggle competition, "PetFinder.my", show that PETS-SWINF has an advantage over only based images models. Our results shows that the RMSE loss of the proposed model on the test dataset is 17.71876 but 17.76449 without metadata. The advantage of the proposed method is that PETS-SWINF can consider both low-order and high-order features of metadata, and adaptively adjust the weights of the image model and the metadata model. The performance is promising as our leadboard score is ranked 15 out of 3545 teams (Gold medal) currently for 2021 Kaggle competition on the challenge "PetFinder.my".


Logistic regression as a neural network - DataScienceCentral.com

#artificialintelligence

As a teacher of Data Science (Data Science for Internet of Things course at the University of Oxford), I am always fascinated in cross connection between concepts. To recap, Logistic regression is a binary classification method. It can be modelled as a function that can take in any number of inputs and constrain the output to be between 0 and 1. This means, we can think of Logistic Regression as a one-layer neural network. For a binary output, if the true label is y (y 0 or y 1) and y_hat is the predicted output โ€“ then y_hat represents the probability that y 1 โ€“ given inputs w and x. Therefore, the probability that y 0 given inputs w and x is (1 โ€“ y_hat), as shown below.


Enhancement of Healthcare Data Performance Metrics using Neural Network Machine Learning Algorithms

arXiv.org Artificial Intelligence

Patients are often encouraged to make use of wearable devices for remote collection and monitoring of health data. This adoption of wearables results in a significant increase in the volume of data collected and transmitted. The battery life of the devices is then quickly diminished due to the high processing requirements of the devices. Given the importance attached to medical data, it is imperative that all transmitted data adhere to strict integrity and availability requirements. Reducing the volume of healthcare data for network transmission may improve sensor battery life without compromising accuracy. There is a trade-off between efficiency and accuracy which can be controlled by adjusting the sampling and transmission rates. This paper demonstrates that machine learning can be used to analyse complex health data metrics such as the accuracy and efficiency of data transmission to overcome the trade-off problem. The study uses time series nonlinear autoregressive neural network algorithms to enhance both data metrics by taking fewer samples to transmit. The algorithms were tested with a standard heart rate dataset to compare their accuracy and efficiency. The result showed that the Levenbery-Marquardt algorithm was the best performer with an efficiency of 3.33 and accuracy of 79.17%, which is similar to other algorithms accuracy but demonstrates improved efficiency. This proves that machine learning can improve without sacrificing a metric over the other compared to the existing methods with high efficiency.


Six online courses to learn regression in 2022

#artificialintelligence

Regression analysis is a useful mechanism for estimating the relationship between a dependent variable and one or more independent variables. It is widely used in forecasting and has become an important machine learning tool. It becomes crucial for someone starting in machine learning to understand how regression analysis works. Let us look at a few resources available online to get started with regression analysis. MachineHack, a popular platform for data scientists and AI practitioners provides courses on regression in the form of bootcamps. Bootcamps are pocket courses for all who aspire to become data scientists, data engineers and machine learning developers.


Machine Learning for Multi-Output Regression: When should a holistic multivariate approach be preferred over separate univariate ones?

arXiv.org Machine Learning

The hope of such multivariate analyses is, that the consideration of possible dependencies between the outcomes may lead to procedures with better power (in case of inference) or accuracy (in case of prediction) compared to separate univariate analyses. While the need for the development and use of valid and distributional robust or nonparametric multivariate methods has been recognized and addressed in inferential statistic (Dobler et al., 2020; Friedrich et al., 2019; Konietschke et al., 2015; Smaga, 2017; Vallejo and Ato, 2012; Zimmermann et al., 2020), there do not exist exhausting studies that exploit the potential of multivariate regression methods for prediction. Focussing on tree-based ensemble methods as the Random Forest, it is the aim of this manuscript to close this gap. In particular, we want to answer our research-motivating question: When should a holistic multivariate regression approach be preferred over separate univariate predictions? Corresponding Author Email address: lena.schmid@tu-dortmund.de (Lena Schmid)


A Kernel-Expanded Stochastic Neural Network

arXiv.org Machine Learning

The deep neural network suffers from many fundamental issues in machine learning. For example, it often gets trapped into a local minimum in training, and its prediction uncertainty is hard to be assessed. To address these issues, we propose the so-called kernel-expanded stochastic neural network (K-StoNet) model, which incorporates support vector regression (SVR) as the first hidden layer and reformulates the neural network as a latent variable model. The former maps the input vector into an infinite dimensional feature space via a radial basis function (RBF) kernel, ensuring absence of local minima on its training loss surface. The latter breaks the high-dimensional nonconvex neural network training problem into a series of low-dimensional convex optimization problems, and enables its prediction uncertainty easily assessed. The K-StoNet can be easily trained using the imputation-regularized optimization (IRO) algorithm. Compared to traditional deep neural networks, K-StoNet possesses a theoretical guarantee to asymptotically converge to the global optimum and enables the prediction uncertainty easily assessed. The performances of the new model in training, prediction and uncertainty quantification are illustrated by simulated and real data examples.


Synthesising Electronic Health Records: Cystic Fibrosis Patient Group

arXiv.org Artificial Intelligence

Class imbalance can often degrade predictive performance of supervised learning algorithms. Balanced classes can be obtained by oversampling exact copies, with noise, or interpolation between nearest neighbours (as in traditional SMOTE methods). Oversampling tabular data using augmentation, as is typical in computer vision tasks, can be achieved with deep generative models. Deep generative models are effective data synthesisers due to their ability to capture complex underlying distributions. Synthetic data in healthcare can enhance interoperability between healthcare providers by ensuring patient privacy. Equipped with large synthetic datasets which do well to represent small patient groups, machine learning in healthcare can address the current challenges of bias and generalisability. This paper evaluates synthetic data generators ability to synthesise patient electronic health records. We test the utility of synthetic data for patient outcome classification, observing increased predictive performance when augmenting imbalanced datasets with synthetic data.