AITopics

2209.13011

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report (0.83)

Industry:

Media (0.49)
Leisure & Entertainment (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.47)

arXiv.org Artificial IntelligenceSep-16-2022

Truthful Generalized Linear Models

Qiu, Yuan, Liu, Jinyan, Wang, Di

In this paper we study estimating Generalized Linear Models (GLMs) in the case where the agents (individuals) are strategic or self-interested and they concern about their privacy when reporting data. Compared with the classical setting, here we aim to design mechanisms that can both incentivize most agents to truthfully report their data and preserve the privacy of individuals' reports, while their outputs should also close to the underlying parameter. In the first part of the paper, we consider the case where the covariates are sub-Gaussian and the responses are heavy-tailed where they only have the finite fourth moments. First, motivated by the stationary condition of the maximizer of the likelihood function, we derive a novel private and closed form estimator. Based on the estimator, we propose a mechanism which has the following properties via some appropriate design of the computation and payment scheme for several canonical models such as linear regression, logistic regression and Poisson regression: (1) the mechanism is $o(1)$-jointly differentially private (with probability at least $1-o(1)$); (2) it is an $o(\frac{1}{n})$-approximate Bayes Nash equilibrium for a $(1-o(1))$-fraction of agents to truthfully report their data, where $n$ is the number of agents; (3) the output could achieve an error of $o(1)$ to the underlying parameter; (4) it is individually rational for a $(1-o(1))$ fraction of agents in the mechanism ; (5) the payment budget required from the analyst to run the mechanism is $o(1)$. In the second part, we consider the linear regression model under more general setting where both covariates and responses are heavy-tailed and only have finite fourth moments. By using an $\ell_4$-norm shrinkage operator, we propose a private estimator and payment scheme which have similar properties as in the sub-Gaussian case.

agent, artificial intelligence, machine learning, (18 more...)

2209.07815

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report > Experimental Study (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)

#artificialintelligenceSep-15-2022, 18:56:09 GMT

INTRODUCTION TO SUPERVISED LEARNING

Machine learning is a set of tools that, broadly speaking, allow us to "teach" computers how to perform tasks by providing examples of how they should be done. For example, suppose we wish to write a program to distinguish between valid email messages and unwanted spam. We could try to write a set of simple rules, for example, flagging messages that contain certain features (such as the word "viagra" or obviously-fake headers). However, writing rules to accurately distinguish which text is valid can actually be quite difficult to do well, resulting either in many missed spam messages, or, worse, many lost emails. Worse, the spammers will actively adjust the way they send spam in order to trick these strategies (e.g., writing "vi@gr@"). Writing effective rules -- and keeping them up to date -- quickly becomes an insurmountable task. Fortunately, machine learning has provided a solution. Modern spam filters are "learned" from Examples: we provide the learning algorithm with example emails which we have manually labelled as "ham" (valid email) or "spam" (unwanted email), and the algorithms learn to distinguish between them automatically.

algorithm, classification, training data, (9 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.51)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.48)

#artificialintelligenceSep-15-2022, 18:55:11 GMT

How to Convince Your Boss to Trust Your ML/DL Models

Some company managers or stakeholders are pessimistic about machine learning model predictions. Therefore, it is data scientists' reasonability to convince them that the model prediction is credible and also understandable to humans. Therefore, we need to focus not only on creating powerful machine learning/deep learning models, but also make the models interpretable by humans. Interpretability helps in many ways, such as helping us to understand how a model makes a decision, it justifies model prediction and gaining insights, building trust in the model, and it helps us improve the model. There are two types of ML model interpretation -- global and local. Good Examples of inherently explainable models are linear regression and decision trees.

lime, prediction, surrogate model, (14 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.42)

#artificialintelligenceSep-15-2022, 14:39:02 GMT

Pyspark MLlib

Originally published on Towards AI the World's Leading AI and Technology News and Media Company. If you are building an AI-related product or service, we invite you to consider becoming an AI sponsor. At Towards AI, we help scale AI and technology startups. Let us help you unleash your technology to the masses. In the previous sections, we discussed about RDD, Dataframes, and Pyspark concepts.

classification, dataframe, pyspark mllib, (9 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.36)

#artificialintelligenceSep-14-2022, 02:21:36 GMT

The Importance of Being Ernest, Ekundayo, or Eswari: An Interpretable Machine Learning Approach to Name-Based Ethnicity Classification · Issue 4.3, Summer 2022

A new method for estimating race/ethnicity and associated disparities where administrative records lack self-reported race/ethnicity.

ethnicia, ethnicity, textrm, (15 more...)

Country:

North America > United States > Georgia (0.46)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.40)
North America > United States > Texas (0.14)
(13 more...)

Genre: Research Report > New Finding (0.94)

Industry:

Health & Medicine (1.00)
Government > Voting & Elections (1.00)
Government > Regional Government > North America Government > United States Government (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.67)

arXiv.org Machine LearningSep-14-2022

Learning Value-at-Risk and Expected Shortfall

Barrera, D, Crépey, S, Gobet, E, Nguyen, Hoang-Dung, Saadeddine, B

We propose a non-asymptotic convergence analysis of a two-step approach to learn a conditional value-at-risk (VaR) and expected shortfall (ES) in a nonparametric setting using Rademacher and Vapnik-Chervonenkis bounds. Our approach for the VaR is extended to the problem of learning at once multiple VaRs corresponding to different quantile levels. This results in efficient learning schemes based on neural network quantile and least-squares regressions. An a posteriori Monte Carlo (non-nested) procedure is introduced to estimate distances to the ground-truth VaR and ES without access to the latter. This is illustrated using numerical experiments in a Gaussian toy-model and a financial case-study where the objective is to learn a dynamic initial margin.

artificial intelligence, machine learning, var, (17 more...)

arXiv.org Machine Learning

2209.06476

Country:

Europe > France (0.04)
South America > Colombia > Bogotá D.C. > Bogotá (0.04)

Genre:

Research Report > Experimental Study (0.46)
Research Report > New Finding (0.46)

Industry: Banking & Finance (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

S, Sachin Kumar, HV, Prithvi, Nandini, C

Data Science Approach to predict the winning Fantasy Cricket Team Dream 11 Fantasy Sports

arXiv.org Artificial IntelligenceSep-14-2022

The evolution of digital technology and the increasing popularity of sports inspired the innovators to take the experience of users with a proclivity towards sports to a whole new different level, by introducing Fantasy Sports Platforms FSPs. The application of Data Science and Analytics is Ubiquitous in the Modern World. Data Science and Analytics open doors to gain a deeper understanding and help in the decision making process. We firmly believed that we could adopt Data Science to predict the winning fantasy cricket team on the FSP, Dream 11. We built a predictive model that predicts the performance of players in a prospective game. We used a combination of Greedy and Knapsack Algorithms to prescribe the combination of 11 players to create a fantasy cricket team that has the most significant statistical odds of finishing as the strongest team thereby giving us a higher chance of winning the pot of bets on the Dream 11 FSP. We used PyCaret Python Library to help us understand and adopt the best Regressor Algorithm for our problem statement to make precise predictions. Further, we used Plotly Python Library to give us visual insights into the team, and players performances by accounting for the statistical, and subjective factors of a prospective game. The interactive plots help us to bolster the recommendations of our predictive model. You either win big, win small, or lose your bet based on the performance of the players selected for your fantasy team in the prospective game, and our model increases the probability of you winning big.

artificial intelligence, data mining, machine learning, (19 more...)

2209.06999

Country:

Asia > India > Karnataka > Bengaluru (0.04)
North America > United States > New York (0.04)
North America > United States > California > Los Angeles County > Culver City (0.04)

Genre: Research Report (0.82)

Industry: Leisure & Entertainment > Sports > Cricket (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

de Franca, Fabricio Olivetti, Kronberger, Gabriel

Prediction Intervals and Confidence Regions for Symbolic Regression Models based on Likelihood Profiles

arXiv.org Artificial IntelligenceSep-14-2022

Symbolic regression is a nonlinear regression method which is commonly performed by an evolutionary computation method such as genetic programming. Quantification of uncertainty of regression models is important for the interpretation of models and for decision making. The linear approximation and so-called likelihood profiles are well-known possibilities for the calculation of confidence and prediction intervals for nonlinear regression models. These simple and effective techniques have been completely ignored so far in the genetic programming literature. In this work we describe the calculation of likelihood profiles in details and also provide some illustrative examples with models created with three different symbolic regression algorithms on two different datasets. The examples highlight the importance of the likelihood profiles to understand the limitations of symbolic regression models and to help the user taking an informed post-prediction decision.

artificial intelligence, machine learning, prediction interval and confidence region, (2 more...)

2209.06454

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)

Krantsevich, Nikolay, He, Jingyu, Hahn, P. Richard

Stochastic Tree Ensembles for Estimating Heterogeneous Effects

arXiv.org Artificial IntelligenceSep-14-2022

Determining subgroups that respond especially well (or poorly) to specific interventions (medical or policy) requires new supervised learning methods tailored specifically for causal inference. Bayesian Causal Forest (BCF) is a recent method that has been documented to perform well on data generating processes with strong confounding of the sort that is plausible in many applications. This paper develops a novel algorithm for fitting the BCF model, which is more efficient than the previously available Gibbs sampler. The new algorithm can be used to initialize independent chains of the existing Gibbs sampler leading to better posterior exploration and coverage of the associated interval estimates in simulation studies. The new algorithm is compared to related approaches via simulation studies as well as an empirical analysis.

artificial intelligence, machine learning, treatment effect, (17 more...)

2209.06998

Country:

North America > United States > Arizona > Maricopa County > Tempe (0.04)
Asia > China > Hong Kong (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Portugal (0.04)

Genre: Research Report (1.00)

Industry:

Health & Medicine (0.93)
Education > Educational Setting (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)