Goto

Collaborating Authors

 Regression


Regression analysis using Python

#artificialintelligence

Purchasing a new or used automobile can be quite a tough approach should you not know what you will be undertaking. By educating yourself about auto shopping before you decide to visit the car dealership, you could make things easier for your self. The following advice can help the next store shopping journey be more enjoyable. Usually deliver a technician together when shopping for a brand new motor vehicle. Automobile sellers are popular for offering lemons and you do not want to be their after that sufferer.


Vehicle Fuel Optimization Under Real-World Driving Conditions: An Explainable Artificial Intelligence Approach

arXiv.org Artificial Intelligence

Fuel optimization of diesel and petrol vehicles within industrial fleets is critical for mitigating costs and reducing emissions. This objective is achievable by acting on fuel-related factors, such as the driving behaviour style. In this study, we developed an Explainable Boosting Machine (EBM) model to predict fuel consumption of different types of industrial vehicles, using real-world data collected from 2020 to 2021. This Machine Learning model also explains the relationship between the input factors and fuel consumption, quantifying the individual contribution of each one of them. The explanations provided by the model are compared with domain knowledge in order to see if they are aligned. The results show that the 70% of the categories associated to the fuel-factors are similar to the previous literature. With the EBM algorithm, we estimate that optimizing driving behaviour decreases fuel consumption between 12% and 15% in a large fleet (more than 1000 vehicles).


PySpark for Data Science - Intermediate ($89.99 to FREE)

#artificialintelligence

This module on PySpark Tutorials aims to explain the intermediate concepts such as those like the use of Spark session in case of later versions and the use of Spark Config and Spark Context in case of earlier versions. This will also help you in understanding how the Spark-related environment is set up, concepts of Broadcasting and accumulator, other optimization techniques include those like parallelism, tungsten, and catalyst optimizer. You will also be taught about the various compression techniques such as Snappy and Zlib. We will also understand and talk about the various Big data ecosystem related concepts such as HDFS and block storage, various components of Spark such as Spark Core, Mila, GraphX, R, Streaming, SQL, etc. and will also study the basics of Python language which is related and relevant to be used along with Apache Spark thereby making it Pyspark. We will learn the following in this course: -Regression -Linear Regression -Output Column -Test Data -Prediction -Generalized Linear Regression -Forest Regression -Classification -Binomial Logistic Regression -Multinomial Logistic Regression -Decision Tree -Random Forest -Clustering -K-Means Model Pyspark is a big data solution that is applicable for real-time streaming using Python programming language and provides a better and efficient way to do all kinds of calculations and computations.


Neural Variational Learning for Grounded Language Acquisition

arXiv.org Artificial Intelligence

We propose a learning system in which language is grounded in visual percepts without specific pre-defined categories of terms. We present a unified generative method to acquire a shared semantic/visual embedding that enables the learning of language about a wide range of real-world objects. We evaluate the efficacy of this learning by predicting the semantics of objects and comparing the performance with neural and non-neural inputs. We show that this generative approach exhibits promising results in language grounding without pre-specifying visual categories under low resource settings. Our experiments demonstrate that this approach is generalizable to multilingual, highly varied datasets.


Statistical Estimation from Dependent Data

arXiv.org Machine Learning

We consider a general statistical estimation problem wherein binary labels across different observations are not independent conditioned on their feature vectors, but dependent, capturing settings where e.g. these observations are collected on a spatial domain, a temporal domain, or a social network, which induce dependencies. We model these dependencies in the language of Markov Random Fields and, importantly, allow these dependencies to be substantial, i.e do not assume that the Markov Random Field capturing these dependencies is in high temperature. As our main contribution we provide algorithms and statistically efficient estimation rates for this model, giving several instantiations of our bounds in logistic regression, sparse logistic regression, and neural network settings with dependent data. Our estimation guarantees follow from novel results for estimating the parameters (i.e. external fields and interaction strengths) of Ising models from a {\em single} sample. {We evaluate our estimation approach on real networked data, showing that it outperforms standard regression approaches that ignore dependencies, across three text classification datasets: Cora, Citeseer and Pubmed.}


Robust Variable Selection and Estimation Via Adaptive Elastic Net S-Estimators for Linear Regression

arXiv.org Machine Learning

Heavy-tailed error distributions and predictors with anomalous values are ubiquitous in high-dimensional regression problems and can seriously jeopardize the validity of statistical analyses if not properly addressed. For more reliable estimation under these adverse conditions, we propose a new robust regularized estimator for simultaneous variable selection and coefficient estimation. This estimator, called adaptive PENSE, possesses the oracle property without prior knowledge of the scale of the residuals and without any moment conditions on the error distribution. The proposed estimator gives reliable results even under very heavy-tailed error distributions and aberrant contamination in the predictors or residuals. Importantly, even in these challenging settings variable selection by adaptive PENSE remains stable. Numerical studies on simulated and real data sets highlight superior finite-sample performance in a vast range of settings compared to other robust regularized estimators in the case of contaminated samples and competitiveness compared to classical regularized estimators in clean samples.


What is Machine Learning? A Primer for the Epidemiologist

#artificialintelligence

Machine learning is a branch of computer science that has the potential to transform epidemiologic sciences. Amid a growing focus on "Big Data," it offers epidemiologists new tools to tackle problems for which classical methods are not well-suited. In order to critically evaluate the value of integrating machine learning algorithms and existing methods, however, it is essential to address language and technical barriers between the two fields that can make it difficult for epidemiologists to read and assess machine learning studies. Here, we provide an overview of the concepts and terminology used in machine learning literature, which encompasses a diverse set of tools with goals ranging from prediction to classification to clustering. We provide a brief introduction to 5 common machine learning algorithms and 4 ensemble-based approaches. We then summarize epidemiologic applications of machine learning techniques in the published literature. We recommend approaches to incorporate machine learning in epidemiologic research and discuss opportunities and challenges for integrating machine learning and existing epidemiologic research methods. Machine learning is a branch of computer science that broadly aims to enable computers to "learn" without being directly programmed (1). It has origins in the artificial intelligence movement of the 1950s and emphasizes practical objectives and applications, particularly prediction and optimization. Computers "learn" in machine learning by improving their performance at tasks through "experience" (2, p. xv). In practice, "experience" usually means fitting to data; hence, there is not a clear boundary between machine learning and statistical approaches. Indeed, whether a given methodology is considered "machine learning" or "statistical" often reflects its history as much as genuine differences, and many algorithms (e.g., least absolute shrinkage and selection operator (LASSO), stepwise regression) may or may not be considered machine learning depending on who you ask. Still, despite methodological similarities, machine learning is philosophically and practically distinguishable. At the liberty of (considerable) oversimplification, machine learning generally emphasizes predictive accuracy over hypothesis-driven inference, usually focusing on large, high-dimensional (i.e., having many covariates) data sets (3, 4). Regardless of the precise distinction between approaches, in practice, machine learning offers epidemiologists important tools. In particular, a growing focus on "Big Data" emphasizes problems and data sets for which machine learning algorithms excel while more commonly used statistical approaches struggle. This primer provides a basic introduction to machine learning with the aim of providing readers a foundation for critically reading studies based on these methods and a jumping-off point for those interested in using machine learning techniques in epidemiologic research.


Logistic Regression predict function

#artificialintelligence

In the previous tutorial, we wrote an optimization function that will output the learned w and b parameters. Now we can use w and b to predict the labels for our dataset X. So, in this tutorial, we will implement the predict() function. So we will implement a prediction function, but first, let's see what are the inputs and outputs to it: If we'll run our new function on previous values "predict(w, b, X)" we should receive the following results: From our results, we could say that we predicted two cats and one dog. But because input was not real images but just simple random test numbers, our predictions also don't mean anything.


2 Hours of ML a day -- Series

#artificialintelligence

I am starting this blog, mainly to be accountable, disciplined and share my journey (both ups and downs) in my learning process. I tend to slack, after a hectic work day, and end up watching on OTT. I try to start learning (do it rightly for 2 days) and then end up slacking for 10–15 days and start again. I am in a vicious cycle of wanting to learn and not being able to achieve it. I have a basic understanding of ML concepts like Decision trees, Linear Regression, Logistic Regression etc. Basic for me is -- knowing the algorithm, without any in-depth knowledge.


Explainable AI Enabled Inspection of Business Process Prediction Models

arXiv.org Artificial Intelligence

Modern data analytics underpinned by machine learning techniques has become a key enabler to the automation of data-led decision making. As an important branch of state-of-the-art data analytics, business process predictions are also faced with a challenge in regard to the lack of explanation to the reasoning and decision by the underlying `black-box' prediction models. With the development of interpretable machine learning techniques, explanations can be generated for a black-box model, making it possible for (human) users to access the reasoning behind machine learned predictions. In this paper, we aim to present an approach that allows us to use model explanations to investigate certain reasoning applied by machine learned predictions and detect potential issues with the underlying methods thus enhancing trust in business process prediction models. A novel contribution of our approach is the proposal of model inspection that leverages both the explanations generated by interpretable machine learning mechanisms and the contextual or domain knowledge extracted from event logs that record historical process execution. Findings drawn from this work are expected to serve as a key input to developing model reliability metrics and evaluation in the context of business process predictions.