AITopics

Industry: Information Technology > Security & Privacy (0.73)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)

arXiv.org Artificial IntelligenceOct-29-2021

Diagnosing Web Data of ICTs to Provide Focused Assistance in Agricultural Adoptions

Singh, Ashwin, Subramanian, Mallika, Agarwal, Anmol, Priyadarshi, Pratyush, Gupta, Shrey, Garimella, Kiran, Kumar, Sanjeev, Kumar, Ritesh, Garg, Lokesh, Arya, Erica, Kumaraguru, Ponnurangam

The past decade has witnessed a rapid increase in technology ownership across rural areas of India, signifying the potential for ICT initiatives to empower rural households. In our work, we focus on the web infrastructure of one such ICT - Digital Green that started in 2008. Following a participatory approach for content production, Digital Green disseminates instructional agricultural videos to smallholder farmers via human mediators to improve the adoption of farming practices. Their web-based data tracker, CoCo, captures data related to these processes, storing the attendance and adoption logs of over 2.3 million farmers across three continents and twelve countries. Using this data, we model the components of the Digital Green ecosystem involving the past attendance-adoption behaviours of farmers, the content of the videos screened to them and their demographic features across five states in India. We use statistical tests to identify different factors which distinguish farmers with higher adoption rates to understand why they adopt more than others. Our research finds that farmers with higher adoption rates adopt videos of shorter duration and belong to smaller villages. The co-attendance and co-adoption networks of farmers indicate that they greatly benefit from past adopters of a video from their village and group when it comes to adopting practices from the same video. Following our analysis, we model the adoption of practices from a video as a prediction problem to identify and assist farmers who might face challenges in adoption in each of the five states. We experiment with different model architectures and achieve macro-f1 scores ranging from 79% to 89% using a Random Forest classifier. Finally, we measure the importance of different features using SHAP values and provide implications for improving the adoption rates of nearly a million farmers across five states in India.

adoption, farmer, video, (14 more...)

2111.00052

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Asia > India > Madhya Pradesh (0.07)
Asia > India > Andhra Pradesh (0.07)
(15 more...)

Genre:

Research Report > Experimental Study (0.68)
Research Report > New Finding (0.46)

Industry: Food & Agriculture > Agriculture (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.34)

#artificialintelligenceOct-25-2021, 08:55:49 GMT

Introduction to Boosted Trees

Welcome to my new article series: Boosting algorithms in machine learning! This is Part 1 of the series. Here, I'll give you a short introduction to boosting, its objective, some key definitions and a list of boosting algorithms that we intend to cover in the next posts. You should be familiar with elementary tree-based machine learning models such as decision trees and random forests. In addition to that, it is recommended to have good knowledge of Python and its Scikit-learn library.

algorithm, decision tree, random forest, (15 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.92)

Grislain, Nicolas, Gonzalvez, Joan

DP-XGBoost: Private Machine Learning at Scale

arXiv.org Artificial IntelligenceOct-25-2021

The big-data revolution announced ten years ago does not seem to have fully happened at the expected scale. One of the main obstacle to this, has been the lack of data circulation. And one of the many reasons people and organizations did not share as much as expected is the privacy risk associated with data sharing operations. There has been many works on practical systems to compute statistical queries with Differential Privacy (DP). There have also been practical implementations of systems to train Neural Networks with DP, but relatively little efforts have been dedicated to designing scalable classical Machine Learning (ML) models providing DP guarantees. In this work we describe and implement a DP fork of a battle tested ML model: XGBoost. Our approach beats by a large margin previous attempts at the task, in terms of accuracy achieved for a given privacy budget. It is also the only DP implementation of boosted trees that scales to big data and can run in distributed environments such as: Kubernetes, Dask or Apache Spark.

algorithm, differential privacy, mechanism, (14 more...)

2110.1277

Country: North America > United States (0.04)

Genre: Research Report (0.50)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.68)

#artificialintelligenceOct-24-2021, 20:46:10 GMT

BetaBoosting

At this point, we all know of XGBoost due to the massive success it has had in numerous Data Science competitions held on platforms like Kaggle. Along with its success, we have seen several variations such as CatBoost and LightGBM. All of these implementations are based on the Gradient Boosting algorithm developed by Friedman¹, which involves iteratively building an ensemble of weak learners (usually decision trees) where each subsequent learner is trained on the previous learner's errors. Let's take a look at some general pseudo-code for the algorithm from Elements of Statistical Learning²: However, this is not complete! A core mechanism which allows boosting to work is a shrinkage parameter that penalizes each learner at each boosting round that is commonly called the'learning rate'.

betaboosting, guardrail, learner, (11 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.62)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.55)

#artificialintelligenceOct-22-2021, 06:15:10 GMT

How and why to build your own gradient boosted-tree package

In order to make accurate and fast travel-time predictions, Lyft built a gradient boosted tree (GBT) package from the ground up. It is slower to train than off-the-shelf packages, but can be customized to treat space and time more efficiently and yield less volatile predictions. Machine learning runs at the core of what we do at Lyft. Examples include predicting travel time between two locations, modeling the probability of a ride being canceled, forecasting supply and demand, and many more. These models enable us to match riders and drivers more efficiently, incentivize drivers to be where they can get more rides, and improve the ride experience.

decision tree, geoboost, travel time, (12 more...)

Country: North America > United States > New York (0.05)

Industry:

Transportation > Passenger (1.00)
Transportation > Ground > Road (0.97)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.71)

arXiv.org Artificial IntelligenceOct-21-2021

SecureBoost+ : A High Performance Gradient Boosting Tree Framework for Large Scale Vertical Federated Learning

Chen, Weijing, Ma, Guoqiang, Fan, Tao, Kang, Yan, Xu, Qian, Yang, Qiang

Gradient boosting decision tree (GBDT) is a widely used ensemble algorithm in the industry. Its vertical federated learning version, SecureBoost, is one of the most popular algorithms used in cross-silo privacy-preserving modeling. As the area of privacy computation thrives in recent years, demands for large-scale and high-performance federated learning have grown dramatically in real-world applications. In this paper, to fulfill these requirements, we propose SecureBoost+ that is both novel and improved from the prior work SecureBoost. SecureBoost+ integrates several ciphertext calculation optimizations and engineering optimizations. The experimental results demonstrate that Secureboost+ has significant performance improvements on large and high dimensional data sets compared to SecureBoost. It makes effective and efficient large-scale vertical federated learning possible.

algorithm, computation, secureboost, (16 more...)

2110.10927

Country:

Asia > China > Hong Kong (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)

Genre: Research Report > New Finding (0.48)

Industry: Information Technology > Security & Privacy (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)

Sami, Shoaib Meraj, Bhuiyan, Mohammed Imamul Hassan

Power Transformer Fault Diagnosis with Intrinsic Time-scale Decomposition and XGBoost Classifier

arXiv.org Machine LearningOct-21-2021

An intrinsic time-scale decomposition (ITD) based method for power transformer fault diagnosis is proposed. Dissolved gas analysis (DGA) parameters are ranked according to their skewness, and then ITD based features extraction is performed. An optimal set of PRC features are determined by an XGBoost classifier. For classification purpose, an XGBoost classifier is used to the optimal PRC features set. The proposed method's performance in classification is studied using publicly available DGA data of 376 power transformers and employing an XGBoost classifier. The Proposed method achieves more than 95% accuracy and high sensitivity and F1-score, better than conventional methods and some recent machine learning-based fault diagnosis approaches. Moreover, it gives better Cohen Kappa and F1-score as compared to the recently introduced EMD-based hierarchical technique for fault diagnosis in power transformers.

fault diagnosis, transformer, xgboost classifier, (11 more...)

arXiv.org Machine Learning

2110.11467

Country: Asia > Bangladesh > Dhaka Division > Dhaka District > Dhaka (0.04)

Genre: Research Report (0.51)

Industry: Energy > Power Industry (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

Gómez-Méndez, Irving, Joly, Emilien

Regression with Missing Data, a Comparison Study of TechniquesBased on Random Forests

arXiv.org Machine LearningOct-18-2021

Random forests and recursive trees are widely used in applied statistics and computer science. The popularity of recursive trees relies on several factors: their easy interpretability, the fact that they can be used for both regression and classification tasks, the small number of hyper-parameters to be tuned and finally, their non-parametric nature that allows their use to infer arbitrarily complex relations between the input and the output space. A random forest combines several randomized trees, improving the prediction accuracy at a cost of a slight lost in interpretation. This technique is easily parallelizable which has made it one of the most popular tools for handling high dimensional data sets. It has been successfully involved in various practical problems, including chemioinformatics, ecology, 3D object recognition, bioinformatics and econometrics. Biau and Scornet (2016) present a detailed list of applications as well as a review on random forests. In the present work we have focused on the ability of random forests to deal with missing values.

algorithm, missing-data mechanism, random forest, (14 more...)

arXiv.org Machine Learning

2110.09333

Country: North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

arXiv.org Artificial IntelligenceOct-13-2021

E-Commerce Dispute Resolution Prediction

Tsurel, David, Doron, Michael, Nus, Alexander, Dagan, Arnon, Guy, Ido, Shahaf, Dafna

E-Commerce marketplaces support millions of daily transactions, and some disagreements between buyers and sellers are unavoidable. Resolving disputes in an accurate, fast, and fair manner is of great importance for maintaining a trustworthy platform. Simple cases can be automated, but intricate cases are not sufficiently addressed by hard-coded rules, and therefore most disputes are currently resolved by people. In this work we take a first step towards automatically assisting human agents in dispute resolution at scale. We construct a large dataset of disputes from the eBay online marketplace, and identify several interesting behavioral and linguistic patterns. We then train classifiers to predict dispute outcomes with high accuracy. We explore the model and the dataset, reporting interesting correlations, important features, and insights.

classifier, dataset, dispute, (17 more...)

doi: 10.1145/3340531.3411906

2110.1573

Country:

Asia > Middle East > Israel > Jerusalem District > Jerusalem (0.04)
North America > United States > Ohio (0.04)
Europe > Ireland (0.04)
(5 more...)

Genre: Research Report > New Finding (0.67)

Industry:

Law > Litigation (1.00)
Law > Alternative Dispute Resolution (0.86)
Information Technology > Services > e-Commerce Services (0.62)

Technology:

Information Technology > e-Commerce (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(5 more...)