AITopics

2202.02491

Country:

North America > United States > Pennsylvania > Northampton County > Bethlehem (0.04)
North America > United States > New York (0.04)
North America > United States > Maryland > Prince George's County > Adelphi (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry: Government > Military > Army (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.47)

Rajbahadur, Gopi Krishnan, Wang, Shaowei, Kamei, Yasutaka, Hassan, Ahmed E.

The impact of feature importance methods on the interpretation of defect classifiers

arXiv.org Artificial IntelligenceFeb-4-2022

Abstract--Classifier specific (CS) and classifier agnostic (CA) feature importance methods are widely used (often interchangeably) by prior studies to derive feature importance ranks from a defect classifier. However, different feature importance methods are likely to compute different feature importance ranks even for the same dataset and classifier. Hence such interchangeable use of feature importance methods can lead to conclusion instabilities unless there is a strong agreement among different methods. Therefore, in this paper, we evaluate the agreement between the feature importance ranks associated with the studied classifiers through a case study of 18 software projects and six commonly used classifiers. We find that: 1) The computed feature importance ranks by CA and CS methods do not always strongly agree with each other. Such findings raise concerns about the stability of conclusions across replicated studies. We further observe that the commonly used defect datasets are rife with feature interactions and these feature interactions impact the computed feature importance ranks of the CS methods (not the CA methods). We demonstrate that removing these feature interactions, even with simple methods like CFS improves agreement between the computed feature importance ranks of CA and CS methods. In light of our findings, we provide guidelines for stakeholders and practitioners when performing model interpretation and directions for future research, e.g., future research is needed to investigate the impact of advanced feature interaction removal methods on computed feature importance ranks of different CS methods. We note, however, that a CS method is not always readily available for Defect classifiers are widely used by many large software corporations a given classifier. Defect classifiers are commonly and deep neural networks do not have a widely accepted CS interpreted to uncover insights to improve software quality. Therefore it is the feature importance ranks of different classifiers is pivotal that these generated insights are reliable. Such CA methods measure the contribution of each feature a feature importance method to compute a ranking of feature towards a classifier's predictions. These measure the contribution of each feature by effecting changes to feature importance ranks reflect the order in which the studied that particular feature in the dataset and observing its impact on features contribute to the predictive capability of the studied the outcome. The primary advantage of CA methods is that they classifier [14].

classifier, feature importance rank, importance rank, (14 more...)

doi: 10.1109/TSE.2021.3056941

2202.02389

Country:

Asia > Japan > Kyūshū & Okinawa > Kyūshū (0.04)
North America > Canada > Manitoba (0.04)
South America > Brazil > São Paulo (0.04)
(4 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology (0.92)
Education (0.67)
Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Software Engineering (1.00)
Information Technology > Information Management (1.00)
Information Technology > Data Science > Data Mining (1.00)
(4 more...)

Miao, Guanhong, Ding, A. Adam, Wu, Samuel S.

Linear Model with Local Differential Privacy

arXiv.org Machine LearningFeb-4-2022

Scientific collaborations benefit from collaborative learning of distributed sources, but remain difficult to achieve when data are sensitive. In recent years, privacy preserving techniques have been widely studied to analyze distributed data across different agencies while protecting sensitive information. Secure multiparty computation has been widely studied for privacy protection with high privacy level but intense computation cost. There are also other security techniques sacrificing partial data utility to reduce disclosure risk. A major challenge is to balance data utility and disclosure risk while maintaining high computation efficiency. In this paper, matrix masking technique is applied to encrypt data such that the secure schemes are against malicious adversaries while achieving local differential privacy. The proposed schemes are designed for linear models and can be implemented for both vertical and horizontal partitioning scenarios. Moreover, cross validation is studied to prevent overfitting and select optimal parameters without additional communication cost. Simulation results present the efficiency of proposed schemes to analyze dataset with millions of records and high-dimensional data (n << p).

matrix, privacy, regression, (16 more...)

2202.02448

Country:

North America > United States > Florida > Alachua County > Gainesville (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Washington > King County > Seattle (0.04)
(4 more...)

Genre: Research Report > Experimental Study (0.46)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.70)

Chamroukhi, Faïcel, Pham, Nhat Thien, Hoang, Van Hà, McLachlan, Geoffrey J.

Functional Mixtures-of-Experts

arXiv.org Machine LearningFeb-4-2022

We consider the statistical analysis of heterogeneous data for clustering and prediction purposes, in situations where the observations include functions, typically time series. We extend the modeling with Mixtures-of-Experts (ME), as a framework of choice in modeling heterogeneity in data for prediction and clustering with vectorial observations, to this functional data analysis context. We first present a new family of functional ME (FME) models, in which the predictors are potentially noisy observations, from entire functions, and the data generating process of the pair predictor and the real response, is governed by a hidden discrete variable representing an unknown partition, leading to complex situations to which the standard ME framework is not adapted. Second, we provide sparse and interpretable functional representations of the FME models, thanks to Lasso-like regularizations, notably on the derivatives of the underlying functional parameters of the model, projected onto a set of continuous basis functions. We develop dedicated expectation--maximization algorithms for Lasso-like regularized maximum-likelihood parameter estimation strategies, to encourage sparse and interpretable solutions. The proposed FME models and the developed EM-Lasso algorithms are studied in simulated scenarios and in applications to two real data sets, and the obtained results demonstrate their performance in accurately capturing complex nonlinear relationships between the response and the functional predictor, and in clustering.

algorithm, ifme model, network parameter, (14 more...)

2202.02249

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > New York (0.04)
Oceania > Australia > Queensland > Brisbane (0.04)
(3 more...)

Genre: Research Report > New Finding (0.88)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.49)

Chen, You-Lin, Minorics, Lenon, Janzing, Dominik

Correcting Confounding via Random Selection of Background Variables

arXiv.org Machine LearningFeb-4-2022

We propose a method to distinguish causal influence from hidden confounding in the following scenario: given a target variable Y, potential causal drivers X, and a large number of background features, we propose a novel criterion for identifying causal relationship based on the stability of regression coefficients of X on Y with respect to selecting different background features. To this end, we propose a statistic V measuring the coefficient's variability. We prove, subject to a symmetry assumption for the background influence, that V converges to zero if and only if X contains no causal drivers. In experiments with simulated data, the method outperforms state of the art algorithms. Further, we report encouraging results for real-world data. Our approach aligns with the general belief that causal insights admit better generalization of statistical associations across environments, and justifies similar existing heuristic approaches from the literature.

assumption, coefficient, regression coefficient, (14 more...)

2202.0215

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > New York (0.04)
North America > United States > California (0.04)
(2 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Education (0.93)
Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.68)

#artificialintelligenceFeb-3-2022, 11:01:07 GMT

Implementing Logistic Regression from Scratch using Python

This article was published as a part of the Data Science Blogathon. For this article, we will be using sklearn's make_classification dataset with four features This is the vectorised form of the gradient descent expression, which we will be using in our code. Now that we are done with every part, we will put everything together in a single class. You can fiddle around with hyper-parameters and see the behaviour of cost function. Now, let's see how our logistic regression fares in comparison to sklearn's logistic regression.

cost function, implementing logistic regression, python, (1 more...)

#artificialintelligence

Genre:

Research Report > New Finding (0.94)
Research Report > Experimental Study (0.94)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.94)

arXiv.org Artificial IntelligenceFeb-3-2022

Review of automated time series forecasting pipelines

Meisenbacher, Stefan, Turowski, Marian, Phipps, Kaleb, Rätz, Martin, Müller, Dirk, Hagenmeyer, Veit, Mikut, Ralf

Time series forecasting is fundamental for various use cases in different domains such as energy systems and economics. Creating a forecasting model for a specific use case requires an iterative and complex design process. The typical design process includes the five sections (1) data pre-processing, (2) feature engineering, (3) hyperparameter optimization, (4) forecasting method selection, and (5) forecast ensembling, which are commonly organized in a pipeline structure. One promising approach to handle the ever-growing demand for time series forecasts is automating this design process. The present paper, thus, analyzes the existing literature on automated time series forecasting pipelines to investigate how to automate the design process of forecasting models. Thereby, we consider both Automated Machine Learning (AutoML) and automated statistical forecasting methods in a single forecasting pipeline. For this purpose, we firstly present and compare the proposed automation methods for each pipeline section. Secondly, we analyze the automation methods regarding their interaction, combination, and coverage of the five pipeline sections. For both, we discuss the literature, identify problems, give recommendations, and suggest future research. This review reveals that the majority of papers only cover two or three of the five pipeline sections. We conclude that future research has to holistically consider the automation of the forecasting pipeline to enable the large-scale application of time series forecasting.

forecasting method, forecasting pipeline, time sery, (12 more...)

doi: 10.1002/widm.1475

2202.01712

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.05)
South America > Uruguay > Maldonado > Maldonado (0.04)
(18 more...)

Genre:

Research Report (1.00)
Overview (1.00)

Industry:

Health & Medicine (0.93)
Energy > Power Industry (0.46)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
(4 more...)

Saengkyongam, Sorawit, Henckel, Leonard, Pfister, Niklas, Peters, Jonas

Exploiting Independent Instruments: Identification and Distribution Generalization

arXiv.org Machine LearningFeb-3-2022

When estimating the causal function between a vector of covariates X and a response Y in the presence of unobserved confounding, standard regression procedures such as ordinary least squares (OLS) are even asymptotically biased. Instrumental variable approaches (Wright, 1928; Imbens and Angrist, 1994; Newey, 2013) exploit the existence of exogenous heterogeneity in the form of an instrumental variable (IV) Z and estimate, under suitable conditions, the causal function consistently. Importantly, the errors in Y and the hidden confounders U should be uncorrelated with the instruments Z. Usually, this has to be argued for with background knowledge. When the data generating process is modeled by a structural causal model (SCM) (Pearl, 2009; Bongers et al., 2021) (so that the distribution is Markov with respect to the induced graph), then the above condition is satisfied if Y and U are d-separated from Z in the graph obtained by removing the edge from X to Y. Furthermore, in this case the errors in Y and U are even independent from Z. Using that the errors and instruments are not only uncorrelated but also independent comes with several benefits. For example, even in settings, where the causal function can be identified by classical approaches based on uncorrelatedness, the independence can be exploited to construct estimators that achieve the semiparametric efficiency bound, at least when the error distribution comes from a known, parametric family (Hansen et al., 2010). Furthermore, the independence constraint is stronger than uncorrelatedness and therefore yields stronger identifiability results, which has been reported in the field of econometrics (e.g., Imbens and Newey, 2009; Chesher, 2003).

causal function, estimator, independence condition, (14 more...)

2202.01864

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts > Middlesex County > Malden (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(3 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.48)

Ayme, Alexis, Boyer, Claire, Dieuleveut, Aymeric, Scornet, Erwan

Minimax rate of consistency for linear models with missing values

arXiv.org Machine LearningFeb-3-2022

Missing values are more and more present as the size of datasets increases. These missing values can occur for a variety of reasons, such as sensor failures, refusals to answer poll questions, or aggregations of data coming from different sources (with different methods of data collection). There may be different processes of missing value generation on the same dataset, which makes the task of data cleaning difficult or impossible without creating large biases. In his leading work, Rubin [1976] distinguishes three missing values scenarios: Missing Completely At Random (MCAR), Missing At Random (MAR), and Missing Not At Random (MNAR), depending on the links between the observed variables, the missing ones, and the missing pattern. In the linear regression framework, most of the literature focuses on parameter estimation [Little, 1992, Jones, 1996], using sometimes a sparse prior leading to the Lasso estimator [Loh and Wainwright, 2012] or the Dantzig selector [Rosenbaum and Tsybakov, 2010]. Note that the robust estimation literature [Dalalyan and Thompson, 2019, Chen and Caramanis, 2013] could be also used to handle missing values, as the latter can be reinterpreted as a multiplicative noise in linear models.

assumption, predictor, theorem 3, (16 more...)

2202.01463

Country:

North America > United States > Georgia > Fulton County > Atlanta (0.04)
North America > United States > California (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
(2 more...)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.48)

Emmanuel, Ibe Chukwuemeka, Mitrofanova, Ekaterina

Fairness of Machine Learning Algorithms in Demography

arXiv.org Artificial IntelligenceFeb-2-2022

The paper is devoted to the study of the model fairness and process fairness of the Russian demographic dataset by making predictions of divorce of the 1st marriage, religiosity, 1st employment and completion of education. Our goal was to make classifiers more equitable by reducing their reliance on sensitive features while increasing or at least maintaining their accuracy. We took inspiration from "dropout" techniques in neural-based approaches and suggested a model that uses "feature drop-out" to address process fairness. To evaluate a classifier's fairness and decide the sensitive features to eliminate, we used "LIME Explanations". This results in a pool of classifiers due to feature dropout whose ensemble has been shown to be less reliant on sensitive features and to have improved or no effect on accuracy. Our empirical study was performed on four families of classifiers (Logistic Regression, Random Forest, Bagging, and Adaboost) and carried out on real-life dataset (Russian demographic data derived from Generations and Gender Survey), and it showed that all of the models became less dependent on sensitive features (such as gender, breakup of the 1st partnership, 1st partnership, etc.) and showed improvements or no impact on accuracy

classifier, fairness, sensitive feature, (13 more...)

2202.01013

Country:

Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)
Asia > Russia (0.04)
Oceania > Australia > Western Australia > Perth (0.04)
(5 more...)

Genre: Research Report > New Finding (0.48)

Industry: Law > Civil Rights & Constitutional Law (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.34)