AITopics | bootstrapped sample

Collaborating Authors

bootstrapped sample

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Topology of Out-of-Distribution Examples in Deep Neural Networks

Datta, Esha, Hennig, Johanna, Domschot, Eva, Mattes, Connor, Smith, Michael R.

arXiv.org Artificial IntelligenceJan-21-2025

As deep neural networks (DNNs) become increasingly common, concerns about their robustness do as well. A longstanding problem for deployed DNNs is their behavior in the face of unfamiliar inputs; specifically, these models tend to be overconfident and incorrect when encountering out-of-distribution (OOD) examples. In this work, we present a topological approach to characterizing OOD examples using latent layer embeddings from DNNs. Our goal is to identify topological features, referred to as landmarks, that indicate OOD examples. We conduct extensive experiments on benchmark datasets and a realistic DNN model, revealing a key insight for OOD detection. Well-trained DNNs have been shown to induce a topological simplification on training data for simple models and datasets; we show that this property holds for realistic, large-scale test and training data, but does not hold for OOD examples. More specifically, we find that the average lifetime (or persistence) of OOD examples is statistically longer than that of training or test examples. This indicates that DNNs struggle to induce topological simplification on unfamiliar inputs. Our empirical results provide novel evidence of topological simplification in realistic DNNs and lay the groundwork for topologically-informed OOD detection strategies.

artificial intelligence, homology, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2501.12522

Country:

North America > United States > New Mexico > Bernalillo County > Albuquerque (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre:

Overview (1.00)
Research Report > Experimental Study (0.48)
Research Report > New Finding (0.30)

Industry:

Energy (0.94)
Government > Regional Government > North America Government > United States Government (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Out of Bag (OOB) Evaluation in Random Forests

#artificialintelligenceOct-9-2022, 04:30:10 GMT

Out of Bag (OOB) Evaluation is a very important yet underrated topic in ensemble learning. People tend to learn a lot about Random forests and other bagging algorithms, but often they tend to skip or overlook this concept. I myself missed it while learning about ensemble models and failed an interview where the last question asked was "How are the Out of Bag data utilized while training a random forest model?" (hence, decided to write this blog as a lesson) Oops! Cannot recall random forests? Basically, it is nothing but absolute supervised learning based on the concept of creating independent base learners (multiple decision trees containing bootstrapped samples from the original dataset) and training them. The bootstrapped samples are created by random sampling with replacement of dataset(d), with n features, where each sample d is less than d, and n n.

base learner, decision tree, evaluation, (14 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

Analyzing Bagging Methods for Language Models

Islam, Pranab, Khosla, Shaan, Lok, Arthur, Saxena, Mudit

arXiv.org Artificial IntelligenceJul-19-2022

Modern language models leverage increasingly large numbers of parameters to achieve performance on natural language understanding tasks. Ensembling these models in specific configurations for downstream tasks show even further performance improvements. In this paper, we perform an analysis of bagging language models and compare single language models to bagged ensembles that are roughly equivalent in terms of final model size. We explore an array of model bagging configurations for natural language understanding tasks with final ensemble sizes ranging from 300M parameters to 1.5B parameters and determine that our ensembling methods are at best roughly equivalent to single LM baselines. We note other positive effects of bagging and pruning in specific scenarios according to findings in our experiments such as variance reduction and minor performance improvements.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2207.09099

Country: North America > United States > New York (0.05)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.37)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.36)

Add feedback

Classifying variety of customer's online engagement for churn prediction with mixed-penalty logistic regression

Šimović, Petra Posedel, Horvatic, Davor, Sun, Edward W.

arXiv.org Machine LearningMay-17-2021

Using big data to analyze consumer behavior can provide effective decision-making tools for preventing customer attrition (churn) in customer relationship management (CRM). Focusing on a CRM dataset with several different categories of factors that impact customer heterogeneity (i.e., usage of self-care service channels, duration of service, and responsiveness to marketing actions), we provide new predictive analytics of customer churn rate based on a machine learning method that enhances the classification of logistic regression by adding a mixed penalty term. The proposed penalized logistic regression can prevent overfitting when dealing with big data and minimize the loss function when balancing the cost from the median (absolute value) and mean (squared value) regularization. We show the analytical properties of the proposed method and its computational advantage in this research. In addition, we investigate the performance of the proposed method with a CRM data set (that has a large number of features) under different settings by efficiently eliminating the disturbance of (1) least important features and (2) sensitivity from the minority (churn) class. Our empirical results confirm the expected performance of the proposed method in full compliance with the common classification criteria (i.e., accuracy, precision, and recall) for evaluating machine learning methods.

artificial intelligence, customer, machine learning, (18 more...)

arXiv.org Machine Learning

2105.07671

Country:

Europe > Croatia > Zagreb County > Zagreb (0.04)
North America > United States > Massachusetts (0.04)
Europe > France (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Leisure & Entertainment > Gambling (0.68)
Leisure & Entertainment > Sports (0.68)
Banking & Finance (0.67)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback

New Amazon Data Scientist Interview Practice Problems for 2021

#artificialintelligenceNov-28-2020, 08:20:45 GMT

Bagging, also known as bootstrap aggregating, is the process in which multiple models of the same learning algorithm are trained with bootstrapped samples of the original dataset. Then, like the random forest example above, a vote is taken on all of the models' outputs. Boosting is a variation of bagging where each individual model is built sequentially, iterating over the previous one. Specifically, any data points that are falsely classified by the previous model is emphasized in the following model. This is done to improve the overall accuracy of the model.

bootstrapped sample, data scientist interview practice problem

#artificialintelligence

Genre: Instructional Material > Course Syllabus & Notes (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.67)

Add feedback

Ensemble Methods for Decision Trees

#artificialintelligenceFeb-10-2020, 07:13:17 GMT

Decision Trees are popular Machine Learning algorithms used for both regression and classification tasks. Their popularity mainly arises from their interpretability and representability, as they mimic the way the human brain takes decisions. However, to be interpretable, they pay a price in terms of prediction accuracy. To overcome this caveat, some techniques have been developed, with the goal of creating strong and robust models starting from'poor' models. Those techniques are known as'ensemble' methods and, in this article, I'm going to talk about three of them: Bagging, Random Forest and Boosting.

dataset, decision tree, predictor, (14 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.98)
Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (0.65)

Add feedback

How to build Ensemble Models in machine learning? (with code in R)

@machinelearnbotDec-21-2017, 02:01:31 GMT

Over the last 12 months, I have been participating in a number of machine learning hackathons on Analytics Vidhya and Kaggle competitions. After the competition, I always make sure to go through winner's solution. The winner's solution usually provide me critical insights, which have helped me immensely in future competitions. Most of the winners rely on an ensemble of well-tuned individual models along with feature engineering. If you are starting with machine learning, I would advise you to lay emphasis on these two areas as I have found them equally important to do well in a machine learning.

artificial intelligence, machine learning, prediction, (16 more...)

@machinelearnbot

Genre: Contests & Prizes (0.56)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.50)

Add feedback

Random Forest – The Bayesian Quest

#artificialintelligenceOct-28-2016, 21:25:58 GMT

In the first part of this series we set the context for Random Forest algorithm by introducing the tree based algorithm for classification problems. In this post we will look at some of the limitations of the tree based model and how they were overcome paving the way to a powerful model – Random Forest. Two major methods that were employed to overcome those pitfalls are Bootstrapping and Bagging. We will discuss them first before delving into random forest. When we discussed the tree based model we saw that such models are very intuitive i.e. they are easy to interpret.

artificial intelligence, bootstrapped sample, machine learning, (17 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

Social Information Improves Location Prediction in the Wild

Li, Jai (University of Illinois at Chicago) | Brugere, Ivan (University of Illinois at Chicago) | Ziebart, Brian (University of Illinois at Chicago) | Berger-Wolf, Tanya (University of Illinois at Chicago) | Crofoot, Margaret (University of California-Davis) | Farine, Damien (University of California-Davis)

AAAI ConferencesMar-1-2015

How can knowing the location of my friends be used to more accurately predict my location? This paper explores socially-aware location prediction under a particularly challenging setting where the underlying interactions and social network are unknown and must be inferred over continuous spatiotemporal data. Our method samples inferred network topology using a linear regression model to predict future individual locations. We present an in-depth empirical study comparing different network models and network sampling regimes under a bootstrapped sampling baseline. Furthermore, our qualitative analysis demonstrates the value of social information in population mobility modeling under our application’s challenges.

information, neighborhood, prediction, (15 more...)

AAAI Conferences

Workshops at the Twenty-Ninth AAAI Conference on Artificial Intelligence

Country:

North America > United States > California > Yolo County > Davis (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
Africa > Kenya (0.04)

Genre: Research Report (0.46)

Industry: Information Technology > Services (0.36)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.55)

Add feedback