AITopics

doi: 10.1007/s10489-023-04735-w

2402.03386

Country:

Europe > Spain > Castile and León > León Province > León (0.04)
North America > United States > New York > New York County > New York City (0.04)
South America > Peru (0.04)
(3 more...)

Genre: Research Report (1.00)

Industry:

Energy (0.68)
Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)

Uddin, Majbah, Liu, Yuandong, Lim, Hyeonsup

Exploring the Effects of Population and Employment Characteristics on Truck Flows: An Analysis of NextGen NHTS Origin-Destination Data

Truck transportation remains the dominant mode of US freight transportation because of its advantages, such as the flexibility of accessing pickup and drop-off points and faster delivery. Because of the massive freight volume transported by trucks, understanding the effects of population and employment characteristics on truck flows is critical for better transportation planning and investment decisions. The US Federal Highway Administration published a truck travel origin-destination data set as part of the Next Generation National Household Travel Survey program. This data set contains the total number of truck trips in 2020 within and between 583 predefined zones encompassing metropolitan and nonmetropolitan statistical areas within each state and Washington, DC. In this study, origin-destination-level truck trip flow data was augmented to include zone-level population and employment characteristics from the US Census Bureau. Census population and County Business Patterns data were included. The final data set was used to train a machine learning algorithm-based model, Extreme Gradient Boosting (XGBoost), where the target variable is the number of total truck trips. Shapley Additive ExPlanation (SHAP) was adopted to explain the model results. Results showed that the distance between the zones was the most important variable and had a nonlinear relationship with truck flows.

destination zone, establishment, truck flow, (11 more...)

doi: 10.1061/9780784484883.044

2402.04019

Country:

North America > United States > District of Columbia > Washington (0.25)
North America > Cuba > Holguín Province > Holguín (0.06)
North America > United States > Tennessee > Anderson County > Oak Ridge (0.05)
(2 more...)

Genre: Research Report > New Finding (0.90)

Industry:

Transportation (1.00)
Government > Regional Government > North America Government > United States Government (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.58)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.52)

van der Laan, Lars, Carone, Marco, Luedtke, Alex

Combining T-learning and DR-learning: a framework for oracle-efficient estimation of causal contrasts

We introduce efficient plug-in (EP) learning, a novel framework for the estimation of heterogeneous causal contrasts, such as the conditional average treatment effect and conditional relative risk. The EP-learning framework enjoys the same oracle-efficiency as Neyman-orthogonal learning strategies, such as DR-learning and R-learning, while addressing some of their primary drawbacks, including that (i) their practical applicability can be hindered by loss function non-convexity; and (ii) they may suffer from poor performance and instability due to inverse probability weighting and pseudo-outcomes that violate bounds. To avoid these drawbacks, EP-learner constructs an efficient plug-in estimator of the population risk function for the causal contrast, thereby inheriting the stability and robustness properties of plug-in estimation strategies like T-learning. Under reasonable conditions, EP-learners based on empirical risk minimization are oracle-efficient, exhibiting asymptotic equivalence to the minimizer of an oracle-efficient one-step debiased estimator of the population risk function. In simulation experiments, we illustrate that EP-learners of the conditional average treatment effect and conditional relative risk outperform state-of-the-art competitors, including T-learner, R-learner, and DR-learner. Open-source implementations of the proposed methods are available in our R package hte3.

ep-learner, estimator, mean squared error, (16 more...)

2402.01972

Country:

North America > United States > New York > New York County > New York City (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > North Carolina > Wake County > Raleigh (0.04)
North America > United States > Hawaii > Honolulu County > Honolulu (0.04)

Genre:

Research Report > Experimental Study (1.00)
Research Report > Strength High (0.67)

Industry: Health & Medicine (0.45)

Technology:

Information Technology > Data Science > Data Mining (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.67)

Curth, Alicia, Jeffares, Alan, van der Schaar, Mihaela

Why do Random Forests Work? Understanding Tree Ensembles as Self-Regularizing Adaptive Smoothers

Despite their remarkable effectiveness and broad application, the drivers of success underlying ensembles of trees (especially random forests and gradient boosting) are still not fully understood. In this paper, we highlight how interpreting tree ensembles as adaptive and self-regularizing smoothers can provide new intuition and deeper insight to this topic. We use this perspective to show that, when studied as smoothers - whose predictions can be understood as simple weighted averages of the training labels - randomized tree ensembles not only make predictions that are quantifiably more smooth than the predictions of the individual trees they consist of, but also further regulate their smoothness at test-time based on the dissimilarity between testing and training inputs. First, we use this insight to revisit, refine and reconcile two recent explanations of forest success by providing a new way of quantifying the conjectured behaviors of tree ensembles objectively by measuring the effective degree of smoothing they imply. Then, we move beyond existing explanations for the mechanisms by which tree ensembles improve upon individual trees and challenge the popular wisdom that the superior performance of forests should be understood as a consequence of variance reduction alone. We argue that the current high-level dichotomy into bias-and variance-reduction prevalent in statistics is insufficient to understand tree ensembles - because the prevailing definition of bias does not capture differences in the expressivity of the hypothesis classes formed by trees and forests. Instead, we show that forests can improve upon trees by three distinct mechanisms that are usually implicitly entangled. In particular, we demonstrate that the smoothing effect of ensembling can reduce variance in predictions due to noise in outcome generation, reduce variability in the quality of the learned function given fixed input data and reduce potential bias in learnable functions by enriching the available hypothesis space.

ensemble, prediction, randomness, (16 more...)

2402.01502

Country:

North America > United States > California (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Rieder, Wolf, Raschke, Philip, Cory, Thomas

Beyond the Request: Harnessing HTTP Response Headers for Cross-Browser Web Tracker Classification in an Imbalanced Setting

The World Wide Web's connectivity is greatly attributed to the HTTP protocol, with HTTP messages offering informative header fields that appeal to disciplines like web security and privacy, especially concerning web tracking. Despite existing research employing HTTP/S request messages to identify web trackers, HTTP/S response headers are often overlooked. This study endeavors to design effective machine learning classifiers for web tracker detection using HTTP/S response headers. Data from the Chrome, Firefox, and Brave browsers, obtained through the traffic monitoring browser extension T.EX, serves as our data set. Eleven supervised models were trained on Chrome data and tested across all browsers. The results demonstrated high accuracy, F1-score, precision, recall, and minimal log-loss error for Chrome and Firefox, but subpar performance on Brave, potentially due to its distinct data distribution and feature set. The research suggests that these classifiers are viable for detecting web trackers in Chrome and Firefox. However, real-world application testing remains pending, and the distinction between tracker types and broader label sources could be explored in future studies.

classifier, dataset, header, (15 more...)

2402.0124

Country:

Europe > Austria > Vienna (0.14)
North America > United States > New York > New York County > New York City (0.04)
Europe > Germany > Berlin (0.04)
(6 more...)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(4 more...)

Rahman, Md Saydur, Tumpa, Farhana Akter, Islam, Md Shazid, Arabi, Abul Al, Hossain, Md Sanzid Bin, Haque, Md Saad Ul

Comparative Evaluation of Weather Forecasting using Machine Learning Models

Gaining a deeper understanding of weather and being able to predict its future conduct have always been considered important endeavors for the growth of our society. This research paper explores the advancements in understanding and predicting nature's behavior, particularly in the context of weather forecasting, through the application of machine learning algorithms. By leveraging the power of machine learning, data mining, and data analysis techniques, significant progress has been made in this field. This study focuses on analyzing the contributions of various machine learning algorithms in predicting precipitation and temperature patterns using a 20-year dataset from a single weather station in Dhaka city. Algorithms such as Gradient Boosting, AdaBoosting, Artificial Neural Network, Stacking Random Forest, Stacking Neural Network, and Stacking KNN are evaluated and compared based on their performance metrics, including Confusion matrix measurements. The findings highlight remarkable achievements and provide valuable insights into their performances and features correlation.

algorithm, precipitation, prediction, (15 more...)

2402.01206

Country:

Asia > Bangladesh > Dhaka Division > Dhaka District > Dhaka (0.26)
Oceania > Australia > New South Wales > Wollongong (0.04)
North America > United States > Texas > Brazos County > College Station (0.04)
North America > United States > California > Riverside County > Riverside (0.04)

Genre: Research Report > New Finding (0.47)

Industry: Energy > Power Industry (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.98)

Fernandez-Lozano, Carlos, Hervella, Pablo, Mato-Abad, Virginia, Rodriguez-Yanez, Manuel, Suarez-Garaboa, Sonia, Lopez-Dequidt, Iria, Estany-Gestal, Ana, Sobrino, Tomas, Campos, Francisco, Castillo, Jose, Rodriguez-Yanez, Santiago, Iglesias-Rey, Ramon

Random Forest-Based Prediction of Stroke Outcome

arXiv.org Artificial IntelligenceFeb-1-2024

We research into the clinical, biochemical and neuroimaging factors associated with the outcome of stroke patients to generate a predictive model using machine learning techniques for prediction of mortality and morbidity 3 months after admission. The dataset consisted of patients with ischemic stroke (IS) and non-traumatic intracerebral hemorrhage (ICH) admitted to Stroke Unit of a European Tertiary Hospital prospectively registered. We identified the main variables for machine learning Random Forest (RF), generating a predictive model that can estimate patient mortality/morbidity. In conclusion, machine learning algorithms RF can be effectively used in stroke patients for long-term outcome prediction of mortality and morbidity.

admission, nihss, prediction, (16 more...)

doi: 10.1038/s41598-021-89434-7

2402.00638

Country:

Europe > Austria > Vienna (0.14)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.05)
Europe > Spain > Galicia > A Coruña Province > Santiago de Compostela (0.05)
(7 more...)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.69)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Therapeutic Area > Hematology (1.00)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.62)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)

arXiv.org Artificial IntelligenceJan-30-2024

Dynamic Model Switching for Improved Accuracy in Machine Learning

Hasani, Syed Tahir Abbas

In the dynamic landscape of machine learning, where datasets vary widely in size and complexity, selecting the most effective model poses a significant challenge. Rather than fixating on a single model, our research propels the field forward with a novel emphasis on dynamic model switching. This paradigm shift allows us to harness the inherent strengths of different models based on the evolving size of the dataset. Consider the scenario where CatBoost demonstrates exceptional efficacy in handling smaller datasets, providing nuanced insights and accurate predictions. However, as datasets grow in size and intricacy, XGBoost, with its scalability and robustness, becomes the preferred choice. Our approach introduces an adaptive ensemble that intuitively transitions between CatBoost and XGBoost. This seamless switching is not arbitrary; instead, it's guided by a user-defined accuracy threshold, ensuring a meticulous balance between model sophistication and data requirements. The user sets a benchmark, say 80% accuracy, prompting the system to dynamically shift to the new model only if it guarantees improved performance. This dynamic model-switching mechanism aligns with the evolving nature of data in real-world scenarios. It offers practitioners a flexible and efficient solution, catering to diverse dataset sizes and optimising predictive accuracy at every juncture. Our research, therefore, stands at the forefront of innovation, redefining how machine learning models adapt and excel in the face of varying dataset dynamics.

dataset, mechanism, scenario, (14 more...)

2404.18932

Country: Asia > India (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.76)

Pfeifer, Bastian, Sirocchi, Christel, Bloice, Marcus D., Kreuzthaler, Markus, Urschler, Martin

Federated unsupervised random forest for privacy-preserving patient stratification

arXiv.org Artificial IntelligenceJan-29-2024

In the realm of precision medicine, effective patient stratification and disease subtyping demand innovative methodologies tailored for multi-omics data. Clustering techniques applied to multi-omics data have become instrumental in identifying distinct subgroups of patients, enabling a finer-grained understanding of disease variability. This work establishes a powerful framework for advancing precision medicine through unsupervised random-forest-based clustering and federated computing. We introduce a novel multi-omics clustering approach utilizing unsupervised random-forests. The unsupervised nature of the random forest enables the determination of cluster-specific feature importance, unraveling key molecular contributors to distinct patient groups. Moreover, our methodology is designed for federated execution, a crucial aspect in the medical domain where privacy concerns are paramount. We have validated our approach on machine learning benchmark data sets as well as on cancer data from The Cancer Genome Atlas (TCGA). Our method is competitive with the state-of-the-art in terms of disease subtyping, but at the same time substantially improves the cluster interpretability. Experiments indicate that local clustering performance can be improved through federated computing.

affinity matrix, experiment, unsupervised random forest, (14 more...)

2401.16094

Country:

Europe > Italy (0.04)
Europe > Austria > Styria > Graz (0.04)

Genre: Research Report (0.82)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Therapeutic Area > Oncology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

arXiv.org Artificial IntelligenceJan-26-2024

Empirical and Experimental Insights into Machine Learning-Based Defect Classification in Semiconductor Wafers

Taha, Kamal

This survey paper offers a comprehensive review of methodologies utilizing machine learning (ML) classification techniques for identifying wafer defects in semiconductor manufacturing. Despite the growing body of research demonstrating the effectiveness of ML in wafer defect identification, there is a noticeable absence of comprehensive reviews on this subject. This survey attempts to fill this void by amalgamating available literature and providing an in-depth analysis of the advantages, limitations, and potential applications of various ML classification algorithms in the realm of wafer defect detection. An innovative taxonomy of methodologies that we present provides a detailed classification of algorithms into more refined categories and techniques. This taxonomy follows a three-tier structure, starting from broad methodology categories and ending with specific techniques. It aids researchers in comprehending the complex relationships between different algorithms and their techniques. We employ a rigorous empirical and experimental evaluation to rank these varying techniques. For the empirical evaluation, we assess techniques based on a set of five criteria. The experimental evaluation ranks the algorithms employing the same techniques, sub-categories, and categories. Also the paper illuminates the future prospects of ML classification techniques for wafer defect identification, underscoring potential advancements and opportunities for further research in this field

classification, dataset, defect, (15 more...)

2310.10705

Country:

Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
North America > United States > Texas (0.04)
North America > United States > New York > Saratoga County > Saratoga Springs (0.04)
(6 more...)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)

Industry:

Semiconductors & Electronics (1.00)
Information Technology > Hardware (0.89)
Health & Medicine (0.88)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)