Goto

Collaborating Authors

 random forest regression model


Efficient Milling Quality Prediction with Explainable Machine Learning

arXiv.org Artificial Intelligence

This paper presents an explainable machine learning (ML) approach for predicting surface roughness in milling. Utilizing a dataset from milling aluminum alloy 2017A, the study employs random forest regression models and feature importance techniques. The key contributions include developing ML models that accurately predict various roughness values and identifying redundant sensors, particularly those for measuring normal cutting force. Our experiments show that removing certain sensors can reduce costs without sacrificing predictive accuracy, highlighting the potential of explainable machine learning to improve cost-effectiveness in machining.


Solving Boltzmann Optimization Problems with Deep Learning

arXiv.org Artificial Intelligence

Decades of exponential scaling in high performance computing (HPC) efficiency is coming to an end. Transistor based logic in complementary metal-oxide semiconductor (CMOS) technology is approaching physical limits beyond which further miniaturization will be impossible. Future HPC efficiency gains will necessarily rely on new technologies and paradigms of compute. The Ising model shows particular promise as a future framework for highly energy efficient computation. Ising systems are able to operate at energies approaching thermodynamic limits for energy consumption of computation. Ising systems can function as both logic and memory. Thus, they have the potential to significantly reduce energy costs inherent to CMOS computing by eliminating costly data movement. The challenge in creating Ising-based hardware is in optimizing useful circuits that produce correct results on fundamentally nondeterministic hardware. The contribution of this paper is a novel machine learning approach, a combination of deep neural networks and random forests, for efficiently solving optimization problems that minimize sources of error in the Ising model. In addition, we provide a process to express a Boltzmann probability optimization problem as a supervised machine learning problem.


Learning the Factors Controlling Mineralization for Geologic Carbon Sequestration

arXiv.org Artificial Intelligence

We perform a set of flow and reactive transport simulations within three-dimensional fracture networks to learn the factors controlling mineral reactions. CO$_2$ mineralization requires CO$_2$-laden water, dissolution of a mineral that then leads to precipitation of a CO$_2$-bearing mineral. Our discrete fracture networks (DFN) are partially filled with quartz that gradually dissolves until it reaches a quasi-steady state. At the end of the simulation, we measure the quartz remaining in each fracture within the domain. We observe that a small backbone of fracture exists, where the quartz is fully dissolved which leads to increased flow and transport. However, depending on the DFN topology and the rate of dissolution, we observe a large variability of these changes, which indicates an interplay between the fracture network structure and the impact of geochemical dissolution. In this work, we developed a machine learning framework to extract the important features that support mineralization in the form of dissolution. In addition, we use structural and topological features of the fracture network to predict the remaining quartz volume in quasi-steady state conditions. As a first step to characterizing carbon mineralization, we study dissolution with this framework. We studied a variety of reaction and fracture parameters and their impact on the dissolution of quartz in fracture networks. We found that the dissolution reaction rate constant of quartz and the distance to the flowing backbone in the fracture network are the two most important features that control the amount of quartz left in the system. For the first time, we use a combination of a finite-volume reservoir model and graph-based approach to study reactive transport in a complex fracture network to determine the key features that control dissolution.


A quantitative study of NLP approaches to question difficulty estimation

arXiv.org Artificial Intelligence

Recent years witnessed an increase in the amount of research on the task of Question Difficulty Estimation from Text (QDET) with Natural Language Processing (NLP) techniques, with the goal of targeting the limitations of traditional approaches to question calibration. However, almost the entirety of previous research focused on single silos, without performing quantitative comparisons between different models or across datasets from different educational domains. In this work, we aim at filling this gap, by quantitatively analyzing several approaches proposed in previous research, and comparing their performance on three publicly available real world datasets containing questions of different types from different educational domains. Specifically, we consider reading comprehension Multiple Choice Questions (MCQs), science MCQs, and math questions. We find that Transformer based models are the best performing across different educational domains, with DistilBERT performing almost as well as BERT, and that they outperform other approaches even on smaller datasets. As for the other models, the hybrid ones often outperform the ones based on a single type of features, the ones based on linguistic features perform well on reading comprehension questions, while frequency based features (TF-IDF) and word embeddings (word2vec) perform better in domain knowledge assessment.


Physical Systems Modeled Without Physical Laws

arXiv.org Artificial Intelligence

Physics-based simulations typically operate with a combination of complex differentiable equations and many scientific and geometric inputs. Our work involves gathering data from those simulations and seeing how well tree-based machine learning methods can emulate desired outputs without "knowing" the complex backing involved in the simulations. The selected physics-based simulations included Navier-Stokes, stress analysis, and electromagnetic field lines to benchmark performance as numerical and statistical algorithms. We specifically focus on predicting specific spatial-temporal data between two simulation outputs and increasing spatial resolution to generalize the physics predictions to finer test grids without the computational costs of repeating the numerical calculation.


A Supervised Learning Approach to Rankability

arXiv.org Machine Learning

The rankability of data is a recently proposed problem that considers the ability of a dataset, represented as a graph, to produce a meaningful ranking of the items it contains. To study this concept, a number of rankability measures have recently been proposed, based on comparisons to a complete dominance graph via combinatorial and linear algebraic methods. In this paper, we review these measures and highlight some questions to which they give rise before going on to propose new methods to assess rankability, which are amenable to efficient estimation. Finally, we compare these measures by applying them to both synthetic and real-life sports data.


Random Forest Regression

#artificialintelligence

A few weeks ago, I wrote an article demonstrating random forest classification models. In this article, we will demonstrate the regression case of random forest using sklearn's RandomForrestRegressor() model. Similarly to my last article, I will begin this article by highlighting some definitions and terms relating to and comprising the backbone of the random forest machine learning. The goal of this article is to describe the random forest model, and demonstrate how it can be applied using the sklearn package. Our goal will not be to solve for the most optimal solution as this is just a basic guide.


Machine Learning Basics: Random Forest Regression

#artificialintelligence

Previously, I had explained the various Regression models such as Linear, Polynomial, Support Vector and Decision Tree Regression. In this article, we will go through the code for the application of Random Forest Regression which is an extension to the Decision Tree Regression implemented previously. The Decision Tree is an easily understood and interpreted algorithm and hence a single tree may not be enough for the model to learn the features from it. On the other hand, Random Forest is also a "Tree"-based algorithm that uses the qualities features of multiple Decision Trees for making decisions. Therefore, it can be referred to as a'Forest' of trees and hence the name "Random Forest".