AITopics

2111.05628

Country:

Europe > United Kingdom > England (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Europe > United Kingdom > Scotland (0.04)
(2 more...)

Genre:

Research Report (0.50)
Overview (0.48)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Health Care Technology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)

arXiv.org Machine LearningNov-9-2021

Generalized Kernel Ridge Regression for Causal Inference with Missing-at-Random Sample Selection

Singh, Rahul

I propose kernel ridge regression estimators for nonparametric dose response curves and semiparametric treatment effects in the setting where an analyst has access to a selected sample rather than a random sample; only for select observations, the outcome is observed. I assume selection is as good as random conditional on treatment and a sufficiently rich set of observed covariates, where the covariates are allowed to cause treatment or be caused by treatment -- an extension of missingness-at-random (MAR). I propose estimators of means, increments, and distributions of counterfactual outcomes with closed form solutions in terms of kernel matrix operations, allowing treatment and covariates to be discrete or continuous, and low, high, or infinite dimensional. For the continuous treatment case, I prove uniform consistency with finite sample rates. For the discrete treatment case, I prove root-n consistency, Gaussian approximation, and semiparametric efficiency.

counterfactual distribution, sample selection, selection, (14 more...)

2111.05277

Country:

Asia > Japan > Honshū > Kantō > Kanagawa Prefecture (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.81)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.67)

arXiv.org Artificial IntelligenceNov-9-2021

Internationalizing AI: Evolution and Impact of Distance Factors

Tang, Xuli, Li, Xin, Ma, Feicheng

International collaboration has become imperative in the field of AI. However, few studies exist concerning how distance factors have affected the international collaboration in AI research. In this study, we investigate this problem by using 1,294,644 AI related collaborative papers harvested from the Microsoft Academic Graph (MAG) dataset. A framework including 13 indicators to quantify the distance factors between countries from 5 perspectives (i.e., geographic distance, economic distance, cultural distance, academic distance, and industrial distance) is proposed. The relationships were conducted by the methods of descriptive analysis and regression analysis. The results show that international collaboration in the field of AI today is not prevalent (only 15.7%). All the separations in international collaborations have increased over years, except for the cultural distance in masculinity/felinity dimension and the industrial distance. The geographic distance, economic distance and academic distances have shown significantly negative relationships with the degree of international collaborations in the field of AI. The industrial distance has a significant positive relationship with the degree of international collaboration in the field of AI. Also, the results demonstrate that the participation of the United States and China have promoted the international collaboration in the field of AI. This study provides a comprehensive understanding of internationalizing AI research in geographic, economic, cultural, academic, and industrial aspects.

collaboration, cultural distance, international collaboration, (17 more...)

2112.01231

Country:

Asia > Japan (0.05)
Asia > China > Hubei Province > Wuhan (0.05)
Europe > United Kingdom (0.04)
(13 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Banking & Finance (1.00)
Education (0.67)
Health & Medicine > Epidemiology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.49)

Teuling, Niek Den, Pauws, Steffen, Heuvel, Edwin van den

Clustering of longitudinal data: A tutorial on a variety of approaches

arXiv.org Machine LearningNov-9-2021

During the past two decades, methods for identifying groups with different trends in longitudinal data have become of increasing interest across many areas of research. To support researchers, we summarize the guidance from the literature regarding longitudinal clustering. Moreover, we present a selection of methods for longitudinal clustering, including group-based trajectory modeling (GBTM), growth mixture modeling (GMM), and longitudinal k-means (KML). The methods are introduced at a basic level, and strengths, limitations, and model extensions are listed. Following the recent developments in data collection, attention is given to the applicability of these methods to intensive longitudinal data (ILD). We demonstrate the application of the methods on a synthetic dataset using packages available in R.

longitudinal data, modeling, trajectory, (16 more...)

2111.05469

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
Europe > Austria > Vienna (0.14)
North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.04)
(11 more...)

Genre:

Research Report > Experimental Study (1.00)
Overview (1.00)

Industry:

Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)
Health & Medicine > Consumer Health (1.00)
Health & Medicine > Epidemiology (0.67)
(2 more...)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
(3 more...)

Musco, Cameron, Musco, Christopher, Woodruff, David P., Yasuda, Taisuke

Active Sampling for Linear Regression Beyond the $\ell_2$ Norm

arXiv.org Machine LearningNov-8-2021

We study active sampling algorithms for linear regression, which aim to query only a small number of entries of a target vector $b\in\mathbb{R}^n$ and output a near minimizer to $\min_{x\in\mathbb{R}^d}\|Ax-b\|$, where $A\in\mathbb{R}^{n \times d}$ is a design matrix and $\|\cdot\|$ is some loss function. For $\ell_p$ norm regression for any $0

algorithm, probability, sensitivity, (17 more...)

2111.04888

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
(12 more...)

Genre: Research Report > New Finding (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)

arXiv.org Artificial IntelligenceNov-7-2021

Modelling and Optimisation of Resource Usage in an IoT Enabled Smart Campus

Sutjarittham, Thanchanok

University campuses are essentially a microcosm of a city. They comprise diverse facilities such as residences, sport centres, lecture theatres, parking spaces, and public transport stops. Universities are under constant pressure to improve efficiencies while offering a better experience to various stakeholders including students, staff, and visitors. Nonetheless, anecdotal evidence indicates that campus assets are not being utilised efficiently, often due to the lack of data collection and analysis, thereby limiting the ability to make informed decisions on the allocation and management of resources. Advances in the Internet of Things (IoT) technologies that can sense and communicate data from the physical world, coupled with data analytics and Artificial intelligence (AI) that can predict usage patterns, have opened up new opportunities for organisations to lower cost and improve user experience. This thesis explores this opportunity via theory and experimentation using UNSW Sydney as a living laboratory.

arrival departure figure 4, real-time information, ultrasonic distance sensor, (15 more...)

2111.04085

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Austria > Vienna (0.14)
North America > United States > Texas (0.13)
(34 more...)

Genre:

Research Report > New Finding (1.00)
Instructional Material > Course Syllabus & Notes (1.00)
Overview (0.93)

Industry:

Transportation > Passenger (1.00)
Transportation > Infrastructure & Services (1.00)
Transportation > Ground > Road (1.00)
(5 more...)

Technology:

Information Technology > Sensing and Signal Processing (1.00)
Information Technology > Security & Privacy (1.00)
Information Technology > Modeling & Simulation (1.00)
(15 more...)

Awais, Ch Muhammad, Bekkouch, Imad Eddine Ibrahim

What augmentations are sensitive to hyper-parameters and why?

arXiv.org Artificial IntelligenceNov-6-2021

We apply augmentations to our dataset to enhance the quality of our predictions and make our final models more resilient to noisy data and domain drifts. Yet the question remains, how are these augmentations going to perform with different hyper-parameters? In this study we evaluate the sensitivity of augmentations with regards to the model's hyper parameters along with their consistency and influence by performing a Local Surrogate (LIME) interpretation on the impact of hyper-parameters when different augmentations are applied to a machine learning model. We have utilized Linear regression coefficients for weighing each augmentation. Our research has proved that there are some augmentations which are highly sensitive to hyper-parameters and others which are more resilient and reliable.

augmentation, dataset, khan, (16 more...)

2111.03861

Country:

Europe > United Kingdom > England > Tyne and Wear > Newcastle (0.04)
Europe > Russia (0.04)
Europe > France > Île-de-France > Paris > Paris (0.04)
Asia > Russia (0.04)

Genre: Research Report > New Finding (0.68)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (0.46)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.57)

arXiv.org Machine LearningNov-4-2021

Consistent Estimation for PCA and Sparse Regression with Oblivious Outliers

d'Orsi, Tommaso, Liu, Chih-Hung, Nasser, Rajai, Novikov, Gleb, Steurer, David, Tiegel, Stefan

We develop machinery to design efficiently computable and consistent estimators, achieving estimation error approaching zero as the number of observations grows, when facing an oblivious adversary that may corrupt responses in all but an $\alpha$ fraction of the samples. As concrete examples, we investigate two problems: sparse regression and principal component analysis (PCA). For sparse regression, we achieve consistency for optimal sample size $n\gtrsim (k\log d)/\alpha^2$ and optimal error rate $O(\sqrt{(k\log d)/(n\cdot \alpha^2)})$ where $n$ is the number of observations, $d$ is the number of dimensions and $k$ is the sparsity of the parameter vector, allowing the fraction of inliers to be inverse-polynomial in the number of samples. Prior to this work, no estimator was known to be consistent when the fraction of inliers $\alpha$ is $o(1/\log \log n)$, even for (non-spherical) Gaussian design matrices. Results holding under weak design assumptions and in the presence of such general noise have only been shown in dense setting (i.e., general linear regression) very recently by d'Orsi et al. [dNS21]. In the context of PCA, we attain optimal error guarantees under broad spikiness assumptions on the parameter matrix (usually used in matrix completion). Previous works could obtain non-trivial guarantees only under the assumptions that the measurement noise corresponding to the inliers is polynomially small in $n$ (e.g., Gaussian with variance $1/n^2$). To devise our estimators, we equip the Huber loss with non-smooth regularizers such as the $\ell_1$ norm or the nuclear norm, and extend d'Orsi et al.'s approach [dNS21] in a novel way to analyze the loss function. Our machinery appears to be easily applicable to a wide range of estimation problems.

estimator, matrix, theorem 2, (15 more...)

2111.02966

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Arizona > Maricopa County > Phoenix (0.04)
Europe > Switzerland > Zürich > Zürich (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.35)

arXiv.org Machine LearningNov-3-2021

Why Stable Learning Works? A Theory of Covariate Shift Generalization

Xu, Renzhe, Cui, Peng, Shen, Zheyan, Zhang, Xingxuan, Zhang, Tong

Covariate shift generalization, a typical case in out-of-distribution (OOD) generalization, requires a good performance on the unknown testing distribution, which varies from the accessible training distribution in the form of covariate shift. Recently, stable learning algorithms have shown empirical effectiveness to deal with covariate shift generalization on several learning models involving regression algorithms and deep neural networks. However, the theoretical explanations for such effectiveness are still missing. In this paper, we take a step further towards the theoretical analysis of stable learning algorithms by explaining them as feature selection processes. We first specify a set of variables, named minimal stable variable set, that is minimal and optimal to deal with covariate shift generalization for common loss functions, including the mean squared loss and binary cross entropy loss. Then we prove that under ideal conditions, stable learning algorithms could identify the variables in this set. Further analysis on asymptotic properties and error propagation are also provided. These theories shed light on why stable learning works for covariate shift generalization.

markov boundary, optimal predictor, stable learning algorithm, (10 more...)

2111.02355

Country:

Asia > China > Hong Kong (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (0.82)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.34)

Chowdhury, Shovan, Lin, Yuxiao, Liaw, Boryann, Kerby, Leslie

Evaluation of Tree Based Regression over Multiple Linear Regression for Non-normally Distributed Data in Battery Performance

arXiv.org Artificial IntelligenceNov-3-2021

Battery performance datasets are typically non-normal and multicollinear. Extrapolating such datasets for model predictions needs attention to such characteristics. This study explores the impact of data normality in building machine learning models. In this work, tree-based regression models and multiple linear regressions models are each built from a highly skewed non-normal dataset with multicollinearity and compared. Several techniques are necessary, such as data transformation, to achieve a good multiple linear regression model with this dataset; the most useful techniques are discussed. With these techniques, the best multiple linear regression model achieved an R^2 = 81.23% and exhibited no multicollinearity effect for the dataset used in this study. Tree-based models perform better on this dataset, as they are non-parametric, capable of handling complex relationships among variables and not affected by multicollinearity. We show that bagging, in the use of Random Forests, reduces overfitting. Our best tree-based model achieved accuracy of R^2 = 97.73%. This study explains why tree-based regressions promise as a machine learning model for non-normally distributed, multicollinear data.

dataset, regression, transformation, (16 more...)

2111.02513

Country:

North America > United States > Idaho > Bonneville County > Idaho Falls (0.04)
North America > United States > Idaho > Bannock County > Pocatello (0.04)
North America > United States > Michigan > Wayne County > Livonia (0.04)
Asia > India > NCT > New Delhi (0.04)

Genre: Research Report > New Finding (0.88)

Industry:

Energy > Energy Storage (0.84)
Government > Regional Government > North America Government > United States Government (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)