Goto

Collaborating Authors

 Regression


Continual Learning with Deep Artificial Neurons

arXiv.org Artificial Intelligence

Neurons in real brains are enormously complex computational units. Among other things, they're responsible for transforming inbound electro-chemical vectors into outbound action potentials, updating the strengths of intermediate synapses, regulating their own internal states, and modulating the behavior of other nearby neurons. One could argue that these cells are the only things exhibiting any semblance of real intelligence. It is odd, therefore, that the machine learning community has, for so long, relied upon the assumption that this complexity can be reduced to a simple sum and fire operation. We ask, might there be some benefit to substantially increasing the computational power of individual neurons in artificial systems? To answer this question, we introduce Deep Artificial Neurons (DANs), which are themselves realized as deep neural networks. Conceptually, we embed DANs inside each node of a traditional neural network, and we connect these neurons at multiple synaptic sites, thereby vectorizing the connections between pairs of cells. We demonstrate that it is possible to meta-learn a single parameter vector, which we dub a neuronal phenotype, shared by all DANs in the network, which facilitates a meta-objective during deployment. Here, we isolate continual learning as our meta-objective, and we show that a suitable neuronal phenotype can endow a single network with an innate ability to update its synapses with minimal forgetting, using standard backpropagation, without experience replay, nor separate wake/sleep phases. We demonstrate this ability on sequential non-linear regression tasks.


Learn to build an end to end data science project - KDnuggets

#artificialintelligence

A Data Scientist is the one who is the best programmer among all the statisticians and the best statistician among all the programmers. Every Data Scientist needs an efficient strategy to solve data science problems. Data Science positions are unique across the country so we can try and predict the salary of data science positions based on Job Title, Company, and Geography, etc. Here I have built a project where any user can plug in the information, and it splits up into a range of salaries, so if anyone is trying to negotiate, then this is a pretty cool tool for them to use. This stage is significant because it helps clarify the customer's target.


Regression Analysis for Statistics & Machine Learning in R

#artificialintelligence

It is a practical, hands-on course, i.e. we will spend some time dealing with some of the theoretical concepts related to both statistical and machine learning regression analysis. However, majority of the course will focus on implementing different techniques on real data and interpret the results. After each video you will learn a new concept or technique which you may apply to your own projects.


Regression Trees for Cumulative Incidence Functions

arXiv.org Machine Learning

A subject being followed over time may experience several types of events related, for example, to disease morbidity and mortality. For example, in a Phase III trial of concomitant versus sequential chemotherapy and thoracic radiotherapy for patients with inoperable non-small cell lung cancer (NSCLC) conducted by the Radiation Therapy Oncology Group (RTOG), patients were followed up to 5 years, the occurrence of either disease progression or death being of particular interest. Such "competing risks" data are commonly encountered in cancer and other biomedical followup studies, in addition to the potential complication of right-censoring on the event time(s) of interest. Two quantities are often used when analyzing competing risks data: the cause-specific hazard function (CSH) and the cumulative incidence function (CIF). For a given event, the former describes the instantaneous risk of this event at time t, given that no events have yet occurred; the latter describes the probability of occurrence, or absolute risk, of that event across time and can be derived directly from the subdistribution hazard function (Fine and Gray, 1999).


Dependency-based Anomaly Detection: Framework, Methods and Benchmark

arXiv.org Artificial Intelligence

Anomaly detection is an important research problem because anomalies often contain critical insights for understanding the unusual behavior in data. One type of anomaly detection approach is dependency-based, which identifies anomalies by examining the violations of the normal dependency among variables. These methods can discover subtle and meaningful anomalies with better interpretation. Existing dependency-based methods adopt different implementations and show different strengths and weaknesses. However, the theoretical fundamentals and the general process behind them have not been well studied. This paper proposes a general framework, DepAD, to provide a unified process for dependency-based anomaly detection. DepAD decomposes unsupervised anomaly detection tasks into feature selection and prediction problems. Utilizing off-the-shelf techniques, the DepAD framework can have various instantiations to suit different application domains. Comprehensive experiments have been conducted over one hundred instantiated DepAD methods with 32 real-world datasets to evaluate the performance of representative techniques in DepAD. To show the effectiveness of DepAD, we compare two DepAD methods with nine state-of-the-art anomaly detection methods, and the results show that DepAD methods outperform comparison methods in most cases. Through the DepAD framework, this paper gives guidance and inspiration for future research of dependency-based anomaly detection and provides a benchmark for its evaluation.


TensorFlow - Hands-on Machine Learning with TensorFlow

#artificialintelligence

Preview this course - GET COUPON CODE Learn how to build Machine Learning projects in this TensorFlow Course created by The Click Reader. In this course, you will be learning about Scalar as well as Tensors and how to create them using TensorFlow. You will also be learning how to perform various kinds of Tensor operations for manipulating and changing tensor values. You will be performing a total of three Machine Learning projects while learning through this TensorFlow full course: 1. Linear Regression from Scratch You will be learning how to create a Linear Regression model from scratch using TensorFlow. You will be preparing the data, building the model architecture as well as training the model using a custom-made loss function as well as an optimizer.


A decision-making tool to fine-tune abnormal levels in the complete blood count tests

arXiv.org Machine Learning

The complete blood count (CBC) performed by automated hematology analyzers is one of the most ordered laboratory tests. It is a first-line tool for assessing a patient's general health status, or diagnosing and monitoring disease progression. When the analysis does not fit an expected setting, technologists manually review a blood smear using a microscope. The International Consensus Group for Hematology Review published in 2005 a set of criteria for reviewing CBCs. Commonly, adjustments are locally needed to account for laboratory resources and populations characteristics. Our objective is to provide a decision support tool to identify which CBC variables are associated with higher risks of abnormal smear and at which cutoff values. We propose a cost-sensitive Lasso-penalized additive logistic regression combined with stability selection. Using simulated and real CBC data, we demonstrate that our tool correctly identify the true cutoff values, provided that there is enough available data in their neighbourhood.


Getting Started with Data Science in Python

#artificialintelligence

In statistics, ordinary least squares (OLS) is a type of linear least squares method for estimating the unknown parameters in a linear regression model. OLS chooses the parameters of a linear function of a set of explanatory variables by the principle of least squares: minimizing the sum of the squares of the differences between the observed dependent variable (values of the variable being observed) in the given dataset and those predicted by the linear function. The sklearn module for python is used for predictive data analysis. You can learn more from their website and docs. The first step is to make the necessary imports and load the dataset.


Supervised PCA: A Multiobjective Approach

arXiv.org Machine Learning

Methods for supervised principal component analysis (SPCA) aim to incorporate label information into principal component analysis (PCA), so that the extracted features are more useful for a prediction task of interest. Prior work on SPCA has focused primarily on optimizing prediction error, and has neglected the value of maximizing variance explained by the extracted features. We propose a new method for SPCA that addresses both of these objectives jointly, and demonstrate empirically that our approach dominates existing approaches, i.e., outperforms them with respect to both prediction error and variation explained. Our approach accommodates arbitrary supervised learning losses and, through a statistical reformulation, provides a novel low-rank extension of generalized linear models.


Differentially Private Synthetic Data: Applied Evaluations and Enhancements

arXiv.org Artificial Intelligence

Machine learning practitioners frequently seek to leverage the most informative available data, without violating the data owner's privacy, when building predictive models. Differentially private data synthesis protects personal details from exposure, and allows for the training of differentially private machine learning models on privately generated datasets. But how can we effectively assess the efficacy of differentially private synthetic data? In this paper, we survey four differentially private generative adversarial networks for data synthesis. We evaluate each of them at scale on five standard tabular datasets, and in two applied industry scenarios. Our results suggest some synthesizers are more applicable for different privacy budgets, and we further demonstrate complicating domain-based tradeoffs in selecting an approach. We offer experimental learning on applied machine learning scenarios with private internal data to researchers and practioners alike. In addition, we propose QUAIL, an ensemble-based modeling approach to generating synthetic data. We examine QUAIL's tradeoffs, and note circumstances in which it outperforms baseline differentially private supervised learning models under the same budget constraint. Maintaining an individual's privacy is a major concern when collecting sensitive information from groups or organizations. A formalization of privacy, known as differential privacy, has become the gold standard with which to protect information from malicious agents (Dwork et al., TAMC 2008).