AITopics

2209.01169

Country:

North America > United States > Illinois > Cook County > Chicago (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)
Europe > Italy > Tuscany > Florence (0.04)
(3 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)
Health & Medicine > Therapeutic Area (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.48)

Chapman, James T. E., Desai, Ajit

Macroeconomic Predictions using Payments Data and Machine Learning

arXiv.org Artificial IntelligenceSep-2-2022

Predicting the economy's short-term dynamics -- a vital input to economic agents' decision-making process -- often uses lagged indicators in linear models. This is typically sufficient during normal times but could prove inadequate during crisis periods. This paper aims to demonstrate that non-traditional and timely data such as retail and wholesale payments, with the aid of nonlinear machine learning approaches, can provide policymakers with sophisticated models to accurately estimate key macroeconomic indicators in near real-time. Moreover, we provide a set of econometric tools to mitigate overfitting and interpretability challenges in machine learning models to improve their effectiveness for policy use. Our models with payments data, nonlinear methods, and tailored cross-validation approaches help improve macroeconomic nowcasting accuracy up to 40\% -- with higher gains during the COVID-19 period. We observe that the contribution of payments data for economic predictions is small and linear during low and normal growth periods. However, the payments data contribution is large, asymmetrical, and nonlinear during strong negative or positive growth periods.

payment, payment data, prediction, (15 more...)

2209.00948

Country:

Europe > Denmark (0.14)
North America > Canada > Ontario > National Capital Region > Ottawa (0.14)
North America > United States > New York (0.04)
(6 more...)

Genre: Research Report > New Finding (0.93)

Industry:

Banking & Finance > Economy (1.00)
Banking & Finance > Trading (0.67)
Government > Regional Government > North America Government (0.46)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

#artificialintelligenceSep-1-2022, 20:00:29 GMT

Linear Model

Important Note: Original article http://ecdicus.com/linear-model/ Linear model is the most widely used model in machine learning. It refers to a model that uses linear combinations of sample features to make predictions. Where $\textbf{w} [w_1,…, w_D] T$ is the D-dimensional weight vector, and b is the bias. The linear regression introduced in the previous chapter is a typical linear model, and $f(\textbf{x};\textbf{w})$ is directly used to predict the output target $y f(\textbf{x};\textbf{w})$. In the classification problem, since the output target y is some discrete labels, and the value range of $f(\textbf{x};\textbf{w})$ is a real number,$f(\textbf{x};\textbf{w})$ cannot be directly used for prediction, and a non-linear decision function needs to be introduced g(.) to predict the output target Where, f(x;w) is also called Discriminant Function.

frac, mathcal, regression, (16 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.54)

#artificialintelligenceAug-31-2022, 15:06:56 GMT

Dealing with Outliers Using Three Robust Linear Regression Models

Roughly 10% of data was identified as outliers and all the observations introduced were correctly classified as outliers. Them, we can quickly visualize the inliers compared to outliers to see the remaining 26 observations flagged as outliers. Figure 5 shows that the observations located farthest from the hypothetical best-fit line of the original data are considered outliers. The last of the robust regression algorithms available in scikit-learn is the Theil-Sen regression. It is a non-parametric regression method, which means that it makes no assumption about the underlying data distribution. In short, it involves fitting multiple regression models on subsets of the training data and then aggregating the coefficients at the last step.

coefficient, outlier, regression, (10 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)

arXiv.org Machine LearningAug-31-2022

Bayesian Variable Selection in a Million Dimensions

Jankowiak, Martin

Bayesian variable selection is a powerful tool for data analysis, as it offers a principled method for variable selection that accounts for prior information and uncertainty. However, wider adoption of Bayesian variable selection has been hampered by computational challenges, especially in difficult regimes with a large number of covariates P or non-conjugate likelihoods. To scale to the large P regime we introduce an efficient MCMC scheme whose cost per iteration is sublinear in P. In addition we show how this scheme can be extended to generalized linear models for count data, which are prevalent in biology, ecology, economics, and beyond. In particular we design efficient algorithms for variable selection in binomial and negative binomial regression, which includes logistic regression as a special case. In experiments we demonstrate the effectiveness of our methods, including on cancer and maize genomic data.

artificial intelligence, covariate, machine learning, (14 more...)

arXiv.org Machine Learning

2208.0118

Country:

North America > United States > Arizona (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.66)

Industry:

Health & Medicine > Therapeutic Area (0.68)
Health & Medicine > Pharmaceuticals & Biotechnology (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.48)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)

Arias-Castro, Ery, Qiao, Wanli

Embedding Functional Data: Multidimensional Scaling and Manifold Learning

We adapt concepts, methodology, and theory originally developed in the areas of multidimensional scaling and dimensionality reduction for multivariate data to the functional setting. We focus on classical scaling and Isomap -- prototypical methods that have played important roles in these area -- and showcase their use in the context of functional data analysis. In the process, we highlight the crucial role that the ambient metric plays.

divergence, metric, proposition 2, (16 more...)

2208.1454

Country:

North America > United States > Virginia > Fairfax County > Fairfax (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
(3 more...)

Genre: Research Report (0.81)

Industry:

Education (0.64)
Health & Medicine > Pharmaceuticals & Biotechnology (0.46)
Health & Medicine > Diagnostic Medicine > Imaging (0.45)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.67)

Rodrigues, É. O., Pinheiro, V. H. A., Liatsis, P., Conci, A.

Machine learning in the prediction of cardiac epicardial and mediastinal fat volumes

We propose a methodology to predict the cardiac epicardial and mediastinal fat volumes in computed tomography images using regression algorithms. The obtained results indicate that it is feasible to predict these fats with a high degree of correlation, thus alleviating the requirement for manual or automatic segmentation of both fat volumes. Instead, segmenting just one of them suffices, while the volume of the other may be predicted fairly precisely. The correlation coefficient obtained by the Rotation Forest algorithm using MLP Regressor for predicting the mediastinal fat based on the epicardial fat was 0.9876, with a relative absolute error of 14.4% and a root relative squared error of 15.7%. The best correlation coefficient obtained in the prediction of the epicardial fat based on the mediastinal was 0.9683 with a relative absolute error of 19.6% and a relative squared error of 24.9%. Moreover, we analysed the feasibility of using linear regressors, which provide an intuitive interpretation of the underlying approximations. In this case, the obtained correlation coefficient was 0.9534 for predicting the mediastinal fat based on the epicardial, with a relative absolute error of 31.6% and a root relative squared error of 30.1%. On the prediction of the epicardial fat based on the mediastinal fat, the correlation coefficient was 0.8531, with a relative absolute error of 50.43% and a root relative squared error of 52.06%. In summary, it is possible to speed up general medical analyses and some segmentation and quantification methods that are currently employed in the state-of-the-art by using this prediction approach, which consequently reduces costs and therefore enables preventive treatments that may lead to a reduction of health problems.

algorithm, mediastinal fat, pixel, (15 more...)

doi: 10.1016/j.compbiomed.2017.02.010

2208.14374

Country:

South America > Brazil > Santa Catarina > Florianópolis (0.04)
South America > Brazil > Rio de Janeiro > Rio de Janeiro (0.04)
South America > Brazil > Rio de Janeiro > Niterói (0.04)
(3 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Diagnostic Medicine (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.52)

Saha, Rajarshi, Pilanci, Mert, Goldsmith, Andrea J.

Minimax Optimal Quantization of Linear Models: Information-Theoretic Limits and Efficient Algorithms

High-dimensional models often have a large memory footprint and must be quantized after training before being deployed on resource-constrained edge devices for inference tasks. In this work, we develop an information-theoretic framework for the problem of quantizing a linear regressor learned from training data $(\mathbf{X}, \mathbf{y})$, for some underlying statistical relationship $\mathbf{y} = \mathbf{X}\boldsymbol{\theta} + \mathbf{v}$. The learned model, which is an estimate of the latent parameter $\boldsymbol{\theta} \in \mathbb{R}^d$, is constrained to be representable using only $Bd$ bits, where $B \in (0, \infty)$ is a pre-specified budget and $d$ is the dimension. We derive an information-theoretic lower bound for the minimax risk under this setting and propose a matching upper bound using randomized embedding-based algorithms which is tight up to constant factors. The lower and upper bounds together characterize the minimum threshold bit-budget required to achieve a performance risk comparable to the unquantized setting. We also propose randomized Hadamard embeddings that are computationally efficient and are optimal up to a mild logarithmic factor of the lower bound. Our model quantization strategy can be generalized and we show its efficacy by extending the method and upper-bounds to two-layer ReLU neural networks for non-linear regression. Numerical simulations show the improved performance of our proposed scheme as well as its closeness to the lower bound.

matrix, minimax optimal quantization, quantization, (13 more...)

2202.11277

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > United States > New Jersey > Mercer County > Princeton (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(4 more...)

Genre: Research Report > New Finding (0.48)

Industry: Telecommunications (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.87)

Online Active Regression

Chen, Cheng, Li, Yi, Sun, Yiming

Active regression considers a linear regression problem where the learner receives a large number of data points but can only observe a small number of labels. Since online algorithms can deal with incremental training data and take advantage of low computational cost, we consider an online extension of the active regression problem: the learner receives data points one by one and immediately decides whether it should collect the corresponding labels. The goal is to efficiently maintain the regression of received data points with a small budget of label queries. We propose novel algorithms for this problem under $\ell_p$ loss where $p\in[1,2]$. To achieve a $(1+\epsilon)$-approximate solution, our proposed algorithms only require $\tilde{\mathcal{O}}(\epsilon^{-1} d \log(n\kappa))$ queries of labels, where $n$ is the number of data points and $\kappa$ is a quantity, called the condition number, of the data points. The numerical results verify our theoretical results and show that our methods have comparable performance with offline active regression algorithms.

algorithm, lewis weight, probability, (15 more...)

2207.05945

Country:

North America > United States > Washington > King County > Seattle (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > North Carolina > Durham County > Durham (0.04)
(5 more...)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.55)

Discriminative Learning of Similarity and Group Equivariant Representations

Trivedi, Shubhendu

artificial intelligence, metric learning problem, neural information processing system, (15 more...)

One of the most fundamental problems in machine learning is to compare examples: Given a pair of objects we want to return a value which indicates degree of (dis)similarity. Similarity is often task specific, and pre-defined distances can perform poorly, leading to work in metric learning. However, being able to learn a similarity-sensitive distance function also presupposes access to a rich, discriminative representation for the objects at hand. In this dissertation we present contributions towards both ends. In the first part of the thesis, assuming good representations for the data, we present a formulation for metric learning that makes a more direct attempt to optimize for the k-NN accuracy as compared to prior work. We also present extensions of this formulation to metric learning for kNN regression, asymmetric similarity learning and discriminative learning of Hamming distance. In the second part, we consider a situation where we are on a limited computational budget i.e. optimizing over a space of possible metrics would be infeasible, but access to a label aware distance metric is still desirable. We present a simple, and computationally inexpensive approach for estimating a well motivated metric that relies only on gradient estimates, discussing theoretical and experimental results. In the final part, we address representational issues, considering group equivariant convolutional neural networks (GCNNs). Equivariance to symmetry transformations is explicitly encoded in GCNNs; a classical CNN being the simplest example. In particular, we present a SO(3)-equivariant neural network architecture for spherical data, that operates entirely in Fourier space, while also providing a formalism for the design of fully Fourier neural networks that are equivariant to the action of any continuous compact group.

1808.10078

Country:

North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(12 more...)

Genre:

Research Report > New Finding (1.00)
Instructional Material > Course Syllabus & Notes (1.00)
Summary/Review (0.87)

Industry:

Health & Medicine (0.67)
Education > Educational Setting > Higher Education (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.67)