Goto

Collaborating Authors

 Regression


Artificial neural networks and time series of counts: A class of nonlinear INGARCH models

arXiv.org Machine Learning

Time series of counts are frequently analyzed using generalized integer-valued autoregressive models with conditional heteroskedasticity (INGARCH). These models employ response functions to map a vector of past observations and past conditional expectations to the conditional expectation of the present observation. In this paper, it is shown how INGARCH models can be combined with artificial neural network (ANN) response functions to obtain a class of nonlinear INGARCH models. The ANN framework allows for the interpretation of many existing INGARCH models as a degenerate version of a corresponding neural model. Details on maximum likelihood estimation, marginal effects and confidence intervals are given. The empirical analysis of time series of bounded and unbounded counts reveals that the neural INGARCH models are able to outperform reasonable degenerate competitor models in terms of the information loss.


Federated Causal Inference in Heterogeneous Observational Data

arXiv.org Artificial Intelligence

We are interested in estimating the effect of a treatment applied to individuals at multiple sites, where data is stored locally for each site. Due to privacy constraints, individual-level data cannot be shared across sites; the sites may also have heterogeneous populations and treatment assignment mechanisms. Motivated by these considerations, we develop federated methods to draw inference on the average treatment effects of combined data across sites. Our methods first compute summary statistics locally using propensity scores and then aggregate these statistics across sites to obtain point and variance estimators of average treatment effects. We show that these estimators are consistent and asymptotically normal. To achieve these asymptotic properties, we find that the aggregation schemes need to account for the heterogeneity in treatment assignments and in outcomes across sites. We demonstrate the validity of our federated methods through a comparative study of two large medical claims databases.


AI Today Podcast: AI Glossary Series - Regression and Linear Regression - AI & Data Today

#artificialintelligence

Regression is a statistical and mathematical technique to find the relationship between two or more variables. In this episode of the AI Today podcast hosts Kathleen Walch and Ron Schmelzer define the terms Regression and Linear Regression and explain how they relate to AI and why it's important to know about them.


How is Machine Learning an Emerging Technology?

#artificialintelligence

Machine learning (ML) is a field of computer science that involves creating and using algorithms and statistical models to allow computers to automatically learn from data without being explicitly programmed. In other words, machine learning involves training a computer program to make predictions. Additionally, it helps to take decisions based on patterns and insights learned from large datasets. Machine learning has a wide range of applications, from image recognition and natural language processing to fraud detection and personalized marketing. Some common machine-learning techniques include linear regression, logistic regression, decision trees, random forests, support vector machines, and neural networks. Thus, enrolling in Machine Learning Course can be instrumental to your career growth.


Almost Linear Constant-Factor Sketching for $\ell_1$ and Logistic Regression

arXiv.org Artificial Intelligence

We improve upon previous oblivious sketching and turnstile streaming results for $\ell_1$ and logistic regression, giving a much smaller sketching dimension achieving $O(1)$-approximation and yielding an efficient optimization problem in the sketch space. Namely, we achieve for any constant $c>0$ a sketching dimension of $\tilde{O}(d^{1+c})$ for $\ell_1$ regression and $\tilde{O}(\mu d^{1+c})$ for logistic regression, where $\mu$ is a standard measure that captures the complexity of compressing the data. For $\ell_1$-regression our sketching dimension is near-linear and improves previous work which either required $\Omega(\log d)$-approximation with this sketching dimension, or required a larger $\operatorname{poly}(d)$ number of rows. Similarly, for logistic regression previous work had worse $\operatorname{poly}(\mu d)$ factors in its sketching dimension. We also give a tradeoff that yields a $1+\varepsilon$ approximation in input sparsity time by increasing the total size to $(d\log(n)/\varepsilon)^{O(1/\varepsilon)}$ for $\ell_1$ and to $(\mu d\log(n)/\varepsilon)^{O(1/\varepsilon)}$ for logistic regression. Finally, we show that our sketch can be extended to approximate a regularized version of logistic regression where the data-dependent regularizer corresponds to the variance of the individual logistic losses.


Thread Counting in Plain Weave for Old Paintings Using Semi-Supervised Regression Deep Learning Models

arXiv.org Artificial Intelligence

In this work, the authors develop regression approaches based on deep learning to perform thread density estimation for plain weave canvas analysis. Previous approaches were based on Fourier analysis, which is quite robust for some scenarios but fails in some others, in machine learning tools, that involve pre-labeling of the painting at hand, or the segmentation of thread crossing points, that provides good estimations in all scenarios with no need of pre-labeling. The segmentation approach is time-consuming as the estimation of the densities is performed after locating the crossing points. In this novel proposal, we avoid this step by computing the density of threads directly from the image with a regression deep learning model. We also incorporate some improvements in the initial preprocessing of the input image with an impact on the final error. Several models are proposed and analyzed to retain the best one. Furthermore, we further reduce the density estimation error by introducing a semi-supervised approach. The performance of our novel algorithm is analyzed with works by Ribera, Vel\'azquez, and Poussin where we compare our results to the ones of previous approaches. Finally, the method is put into practice to support the change of authorship or a masterpiece at the Museo del Prado.


Maximum Covariance Unfolding Regression: A Novel Covariate-based Manifold Learning Approach for Point Cloud Data

arXiv.org Artificial Intelligence

Point cloud data are widely used in manufacturing applications for process inspection, modeling, monitoring and optimization. The state-of-art tensor regression techniques have effectively been used for analysis of structured point cloud data, where the measurements on a uniform grid can be formed into a tensor. However, these techniques are not capable of handling unstructured point cloud data that are often in the form of manifolds. In this paper, we propose a nonlinear dimension reduction approach named Maximum Covariance Unfolding Regression that is able to learn the low-dimensional (LD) manifold of point clouds with the highest correlation with explanatory covariates. This LD manifold is then used for regression modeling and process optimization based on process variables. The performance of the proposed method is subsequently evaluated and compared with benchmark methods through simulations and a case study of steel bracket manufacturing.


Impact, Attention, Influence: Early Assessment of Autonomous Driving Datasets

arXiv.org Artificial Intelligence

Autonomous Driving (AD), the area of robotics with the greatest potential impact on society, has gained a lot of momentum in the last decade. As a result of this, the number of datasets in AD has increased rapidly. Creators and users of datasets can benefit from a better understanding of developments in the field. While scientometric analysis has been conducted in other fields, it rarely revolves around datasets. Thus, the impact, attention, and influence of datasets on autonomous driving remains a rarely investigated field. In this work, we provide a scientometric analysis for over 200 datasets in AD. We perform a rigorous evaluation of relations between available metadata and citation counts based on linear regression. Subsequently, we propose an Influence Score to assess a dataset already early on without the need for a track-record of citations, which is only available with a certain delay.


The Complete Visual Guide to Machine Learning and Data Science - CouponED

#artificialintelligence

In Part 1 we'll introduce the machine learning workflow and common techniques for cleaning and preparing raw data for analysis. We'll explore univariate analysis with frequency tables, histograms, kernel densities, and profiling metrics, then dive into multivariate profiling tools like heat maps, violin and box plots, scatter plots, and correlation: Variable types, empty values, range and count calculations, left/right censoring, etc. Histograms, frequency tables, mean, median, mode, variance, skewness, etc. Throughout the course, we'll introduce real-world scenarios to solidify key concepts and simulate actual data science and business intelligence cases. You'll use profiling metrics to clean up product inventory data for a local grocery, explore Olympic athlete demographics with histograms and kernel densities, visualize traffic accident frequency with heat maps, and more. In Part 2 we'll introduce the supervised learning landscape, review the classification workflow, and address key topics like dependent vs. independent variables, feature engineering, data splitting and overfitting.


Sasaki Metric for Spline Models of Manifold-Valued Trajectories

arXiv.org Artificial Intelligence

We propose a generic spatiotemporal framework to analyze manifold-valued measurements, which allows for employing an intrinsic and computationally efficient Riemannian hierarchical model. Particularly, utilizing regression, we represent discrete trajectories in a Riemannian manifold by composite B\' ezier splines, propose a natural metric induced by the Sasaki metric to compare the trajectories, and estimate average trajectories as group-wise trends. We evaluate our framework in comparison to state-of-the-art methods within qualitative and quantitative experiments on hurricane tracks. Notably, our results demonstrate the superiority of spline-based approaches for an intensity classification of the tracks.