Goto

Collaborating Authors

 Regression


Identifying biases in legal data: An algorithmic fairness perspective

arXiv.org Machine Learning

As artificial intelligence enters the legal space, it is essential to recognize biases in legal data and ensure that they are not replicated and reinforced with legal technology [7, 13, 18]. Furthermore, understanding biases in legal data and developing discrimination-free technology could help the legal space to become fairer and more widely accessible. We typically find two types of biases in legal data: First, representation biases, i.e., certain social groups are over-or underrepresented in a data set. Second, sentencing disparities, i.e., the outcome of legal proceedings for similar cases varies across social groups. Representation biases may reflect disparities in policing (arrest rates) or in offense rates.


Functional additive regression on shape and form manifolds of planar curves

arXiv.org Machine Learning

Defining shape and form as equivalence classes under translation, rotation and -- for shapes -- also scale, we extend generalized additive regression to models for the shape/form of planar curves or landmark configurations. The model respects the resulting quotient geometry of the response, employing the squared geodesic distance as loss function and a geodesic response function mapping the additive predictor to the shape/form space. For fitting the model, we propose a Riemannian $L_2$-Boosting algorithm well-suited for a potentially large number of possibly parameter-intensive model terms, which also yiels automated model selection. We provide novel intuitively interpretable visualizations for (even non-linear) covariate effects in the shape/form space via suitable tensor based factorizations. The usefulness of the proposed framework is illustrated in an analysis of 1) astragalus shapes of wild and domesticated sheep and 2) cell forms generated in a biophysical model, as well as 3) in a realistic simulation study with response shapes and forms motivated from a dataset on bottle outlines.


Predicting vehicles parking behaviour in shared premises for aggregated EV electricity demand response programs

arXiv.org Artificial Intelligence

The global electric car sales in 2020 continued to exceed the expectations climbing to over 3 millions and reaching a market share of over 4%. However, uncertainty of generation caused by higher penetration of renewable energies and the advent of Electrical Vehicles (EV) with their additional electricity demand could cause strains to the power system, both at distribution and transmission levels. Demand response aggregation and load control will enable greater grid stability and greater penetration of renewable energies into the grid. The present work fits this context in supporting charging optimization for EV in parking premises assuming a incumbent high penetration of EVs in the system. We propose a methodology to predict an estimation of the parking duration in shared parking premises with the objective of estimating the energy requirement of a specific parking lot, evaluate optimal EVs charging schedule and integrate the scheduling into a smart controller. We formalize the prediction problem as a supervised machine learning task to predict the duration of the parking event before the car leaves the slot. This predicted duration feeds the energy management system that will allocate the power over the duration reducing the overall peak electricity demand. We structure our experiments inspired by two research questions aiming to discover the accuracy of the proposed machine learning approach and the most relevant features for the prediction models. We experiment different algorithms and features combination for 4 datasets from 2 different campus facilities in Italy and Brazil. Using both contextual and time of the day features, the overall results of the models shows an higher accuracy compared to a statistical analysis based on frequency, indicating a viable route for the development of accurate predictors for sharing parking premises energy management systems


Sharp global convergence guarantees for iterative nonconvex optimization: A Gaussian process perspective

arXiv.org Machine Learning

We consider a general class of regression models with normally distributed covariates, and the associated nonconvex problem of fitting these models from data. We develop a general recipe for analyzing the convergence of iterative algorithms for this task from a random initialization. In particular, provided each iteration can be written as the solution to a convex optimization problem satisfying some natural conditions, we leverage Gaussian comparison theorems to derive a deterministic sequence that provides sharp upper and lower bounds on the error of the algorithm with sample-splitting. Crucially, this deterministic sequence accurately captures both the convergence rate of the algorithm and the eventual error floor in the finite-sample regime, and is distinct from the commonly used "population" sequence that results from taking the infinite-sample limit. We apply our general framework to derive several concrete consequences for parameter estimation in popular statistical models including phase retrieval and mixtures of regressions. Provided the sample size scales near-linearly in the dimension, we show sharp global convergence rates for both higher-order algorithms based on alternating updates and first-order algorithms based on subgradient descent. These corollaries, in turn, yield multiple consequences, including: (a) Proof that higher-order algorithms can converge significantly faster than their first-order counterparts (and sometimes super-linearly), even if the two share the same population update and (b) Intricacies in super-linear convergence behavior for higher-order algorithms, which can be nonstandard (e.g., with exponent 3/2) and sensitive to the noise level in the problem. We complement these results with extensive numerical experiments, which show excellent agreement with our theoretical predictions.


Modeling Regime Shifts in Multiple Time Series

arXiv.org Machine Learning

We investigate the problem of discovering and modeling regime shifts in an ecosystem comprising multiple time series known as co-evolving time series. Regime shifts refer to the changing behaviors exhibited by series at different time intervals. Learning these changing behaviors is a key step toward time series forecasting. While advances have been made, existing methods suffer from one or more of the following shortcomings: (1) failure to take relationships between time series into consideration for discovering regimes in multiple time series; (2) lack of an effective approach that models time-dependent behaviors exhibited by series; (3) difficulties in handling data discontinuities which may be informative. Most of the existing methods are unable to handle all of these three issues in a unified framework. This, therefore, motivates our effort to devise a principled approach for modeling interactions and time-dependency in co-evolving time series. Specifically, we model an ecosystem of multiple time series by summarizing the heavy ensemble of time series into a lighter and more meaningful structure called a \textit{mapping grid}. By using the mapping grid, our model first learns time series behavioral dependencies through a dynamic network representation, then learns the regime transition mechanism via a full time-dependent Cox regression model. The originality of our approach lies in modeling interactions between time series in regime identification and in modeling time-dependent regime transition probabilities, usually assumed to be static in existing work.


A Practical Guide to Linear Regression

#artificialintelligence

I use Kaggle public dataset "Insurance Premium Prediction" in this exercise. The data includes independent variables: age, sex, bmi, children, smoker, region, and target variable -- expenses. Firstly, let's load the data and have a preliminary examination of the data using df.info() EDA is essential to both investigate the data quality and reveal hidden correlations among variables. In this exercise, I cover three techniques relevant to linear regression.


Mathematics Hidden Behind Linear Regression

#artificialintelligence

This is about the mathematics that is used in the linear regression (with gradient descent) algorithm. This was a part of my IB HL Mathematics Exploration. Linear Regression is a statistical tool that produces a line of best fit for a given dataset analytically. To produce the regression line manually, one needs to perform operations such as mean-squared error and optimizing the cost function; both are explained in detail later in the document. The main problem arises when the size of the dataset is so large that it becomes computationally inefficient to be done by hand. Therefore, when a dataset size becomes large the computer can perform the task much quicker just with a few simple lines of code in any language. Linear regression algorithm uses a dataset (pairs of input and output values) to generate a line of best fit for that dataset. To start, the algorithm generates a hypothesis in the form??


Do You Know? What is MULTIVARIATE REGRESSION?

#artificialintelligence

Multivariate Regression is a more powerful version of linear regression, that employs multiple features or variables. Example: In linear regression,we only take into account the size of the house, to determine the price of the house. The total number of features are four in number, thus n 4. In this case, X(3) is a 4-dimensional vector for the four input features of the third house. This means, X(3) [1534, 3, 2, 30, 315]. Here, 3 is the index of the training example, which is used as a notation for the third row. What should be the form of our hypothesis function?


Linear regression in Machine learning

#artificialintelligence

Let's understand this concept with a simple example. You want to apply for US University to pursue Master's degree. Then the factors on which whether you will get an admit letter from a particular University depends on the following factors i.e. GRE Score, TOEFL Score, Number of research papers published, SOP and Letter of recommendation so, basically speaking in data science terms these are your independent variables and the chances of you being admitted into an university is your dependent variable which you predict based on your independent variables. In simple terms the thing which you need to predict falls under dependent variable and the factors on which your prediction is based on are your independent variables.


Machine Learning Made Simple

#artificialintelligence

Registration Link - https://bit.ly/3Aios5K 14 Days. 10 Speakers. All-Inclusive Program. Career Tips. Free of Charge. Have you ever dreamt of becoming a data science rockstar and launching a career in Silicon Valley? We know the fastest pathway and can’t wait to share it with you. 💁 ⚡ Register to the first edition of our well-packed ML marathon right now. During the 14 days of comprehensive online webinars you will: 📌 find out insider tips from the leading experts about how to quickly start a successful data science career in Silicon Valley; 📌 level up your theoretical knowledge and learn breakthrough approaches to the creation of turnkey ML solutions without coding; 📌 boost your practical skills and master the ways to solve real-world challenges with ML; 📌 discover how to create TinyML models and embed them into the edge devices; 📌 get an overview of the current industry landscape, latest ML trends, and tools. 🎁 All participants will have a chance to take part in a special competition by Neuton.AI. Build a predictive model with a preassigned dataset and compare its accuracy with Neuton’s model. The creator of the most accurate model will be awarded with a free 3-month premium subscription to the Neuton.AI Platform. Duration: 1.5 hours daily Time: 7:00 PM IST - 8:30 PM IST (+5.30 GMT) Join our marathon today to skyrocket your data science career tomorrow! 🚀 Program: Block 1: Career Prospects 👨‍💻 9/27/2021 Machine Learning in a Nutshell by Soham Sharma Bringing Silicon Valley to Student by bridging gap between colleges and real-world by Gurumurthy Yeleswarapu, Siliconvalley4u 9/28/2021 How to take up data career. Your Ticket to the BIG Data Science World: Enter the Largest International Community of DS and business experts, AI Guild by Dr. Chris Armbruster Block 2: Actionable AutoML Tools 🛠️ 9/29/2021 Master Data Science without a Single Line of Code, Leveraging Neuton.AI [Live Demo Included] by Alex Miller & Danil Zherebtsov Block 3: Theory & Practice 💻 9/30/2021 The Fundamentals of Linear Regression (Theory) by Pallab Nath 10/1/2021 The Fundamentals of Linear Regression (Practice) by Pallab Nath 10/2/2021 Introduction to Support Vector Machines (Theory) by Dr. Promit Ray 10/3/2021 Introduction to Support Vector Machines (Practice) by Dr. Promit Ray 10/4/2021 The Art of Logistic Regression (Theory) by Namita Konnur 10/5/2021 The Art of Logistic Regression (Practice) by Namita Konnur 10/6/2021 KNN | Tips and Tricks (Theory) by Vivek Nair 10/7/2021 KNN | Tips and Tricks (Practice) by Vivek Nair 10/8/2021 In-Depth: Decision Tree + Random Forest (Theory) by Suram Saraswati Anugna 10/9/2021 In-Depth: Decision Tree + Random Forest (Practice) by Suram Saraswati Anugna Block 4: Industry Trends 💡 10/10/2021 TinyML: AI Intelligence for Edge Devices [Case Included] by Danil Zherebtsov