Regression
An Evidential Neural Network Model for Regression Based on Random Fuzzy Numbers
We introduce a distance-based neural network model for regression, in which prediction uncertainty is quantified by a belief function on the real line. The model interprets the distances of the input vector to prototypes as pieces of evidence represented by Gaussian random fuzzy numbers (GRFN's) and combined by the generalized product intersection rule, an operator that extends Dempster's rule to random fuzzy sets. The network output is a GRFN that can be summarized by three numbers characterizing the most plausible predicted value, variability around this value, and epistemic uncertainty. Experiments with real datasets demonstrate the very good performance of the method as compared to state-of-the-art evidential and statistical learning algorithms.
How should we proxy for race/ethnicity? Comparing Bayesian improved surname geocoding to machine learning methods
Political science research often requires constructing a race/ethnicity proxy variable for datasets that do not contain it, like voter registration files, lists of electoral candidates, or political donation records. Constructing such a proxy is an important step for conducting ecological inference in voting rights litigation (Barreto et al. [2019], Imai and Khanna [2016]), redistricting (DeLuca and Curiel [2022], Kenny et al. [2021]), and substantive research on the role of race/ethnicity in politics (Enos [2016], Enos et al. [2019], Grumbach and Sahn [2020]). The most common method for proxying race/ethnicity is Bayesian Improved Surname Geocoding (BISG), which uses Bayes' rule to compute a probability distribution over race/ethnicity categories conditional on a voter's surname and where they live (Elliott et al. [2008, 2009]). BISG has attained widespread popularity due to its parsimony, computational efficiency, and superior performance when compared to existing alternatives, namely spatial interpolation of Census racial-ethnic composition from Census geographies (Imai and Khanna [2016], Clark et al. [2021], Shah and Davis [2017]). While BISG performs well compared to the small suite of existing alternatives, it has not yet been benchmarked against machine learning (ML) models, which can produce race/ethnicity predictions from more flexible and potentially more accurate models. In this paper I present the results of such a benchmark. I train a range of machine learning models using voter registration data from Florida, Georgia, North Carolina, and a portion of California where voters self-report their race/ethnicity upon registration. The registries in these states contain over 26 million labelled observations, which equates to greater than a five percent non-representative sample of the United States electorate. I then compare BISG against predictions from these models made out-of-state.
Knowledge mining of unstructured information: application to cyber-domain
Takko, Tuomas, Bhattacharya, Kunal, Lehto, Martti, Jalasvirta, Pertti, Cederberg, Aapo, Kaski, Kimmo
Information on cyber-related crimes, incidents, and conflicts is abundantly available in numerous open online sources. However, processing the large volumes and streams of data is a challenging task for the analysts and experts, and entails the need for newer methods and techniques. In this article we present and implement a novel knowledge graph and knowledge mining framework for extracting the relevant information from free-form text about incidents in the cyberdomain. The framework includes a machine learning based pipeline for generating graphs of organizations, countries, industries, products and attackers with a non-technical cyber-ontology. The extracted knowledge graph is utilized to estimate the incidence of cyberattacks on a given graph configuration. We use publicly available collections of real cyber-incident reports to test the efficacy of our methods. The knowledge extraction is found to be sufficiently accurate, and the graph-based threat estimation demonstrates a level of correlation with the actual records of attacks. In practical use, an analyst utilizing the presented framework can infer additional information from the current cyber-landscape in terms of risk to various entities and propagation of the risk heuristic between industries and countries.
Intelligent decision-making method of TBM operating parameters based on multiple constraints and objective optimization
Liu, Bin, Wang, Jiwen, Wang, Ruirui, Wang, Yaxu, Zhao, Guangzu
The decision-making of TBM operating parameters has an important guiding significance for TBM safe and efficient construction, and it has been one of the research hotpots in the field of TBM tunneling. For this purpose, this paper introduces rock-breaking rules into machine learning method, and a rock-machine mapping dual-driven by physical-rule and data-mining is established with high accuracy. This dual-driven mappings are subsequently used as objective function and constraints to build a decision-making method for TBM operating parameters. By searching the revolution per minute and penetration corresponding to the extremum of the objective function subject to the constraints, the optimal operating parameters can be obtained. This method is verified in the field of the Second Water Source Channel of Hangzhou, China, resulting in the average penetration rate increased by 11.3%, and the total cost decreased by 10.0%, which proves the practicability and effectiveness of the developed decision-making model.
[100%OFF] Logistic Regression In R Studio
In this section we will learn – What does Machine Learning mean. What are the meanings or different terms associated with machine learning? You will see some examples so that you understand what machine learning actually is. It also contains steps involved in building a machine learning model, not just linear models, any machine learning model.
Beyond Linear Regression
Linear regression is among the primary/entry-level Machine Learning (ML) models. It's not even wrong to say that it's the synonym of the "Hello world" program for Data scientists. Finding the linear regression coefficients β_1, …, β_p involves finding the "best" linear combination of variables that approaches the response. Said differently, finding the coefficients that minimize the mean squared error (MSE). It's possible to endow the regression coefficients with some extra properties by considering the MSE plus an additional penalty term.
Top 10 Machine Learning Algorithms Explained
Linear Regression: For statistical techniques, linear regression is used in which the value of the dependent variable is predicted through independent variables. A relationship is formed by mapping the dependent and independent variable on a line, and that line is called the regression line, which is represented by Y a*X b where Y Dependent variable (for example, weight) X Independent Variable (e.g., height) b Intercept and a slope. Logistic Regression: In logistic regression, we have a lot of data whose classification is done by building an equation. This method is used to find the discrete dependent variable from the set of independent variables. Its goal is to find the best fit set of parameters. In this classifier, each feature is multiplied by a weight, and then all are added.
Logistic Regression
The outcome of a Linear Regression can take any form, discrete or continuous, and it may not be limited, within a boundary, in range 0 to 1. Linear Regression can give values large than 1 or less than 0 which is not desirable for classification problem. Logistic Regression, on the other hand, as we have seen above squeezes the output between 0 and 1 which is more desirable for classification problem. Linear Regression is based on linear algebra where as Logistic Regression uses probability.
Understanding Neural Networks -- Part 1/3: Intuition of Forward Propagation
Basically, it's just a type of ML algorithm that was built to emulate connections in a brain. It can be used for classification and regression tasks. Today, we're going to go over a classification task. The big thing about NNs is that they are "universal function approximators," meaning they can approximate any function (duh). Compare this with linear regression which ONLY can approximate linear functions. The first layer is called the input layer and has as many neurons as we have features in our data.
Robust Rayleigh Regression Method for SAR Image Processing in Presence of Outliers
Palm, B. G., Bayer, F. M., Machado, R., Pettersson, M. I., Vu, V. T., Cintra, R. J.
The presence of outliers (anomalous values) in synthetic aperture radar (SAR) data and the misspecification in statistical image models may result in inaccurate inferences. To avoid such issues, the Rayleigh regression model based on a robust estimation process is proposed as a more realistic approach to model this type of data. This paper aims at obtaining Rayleigh regression model parameter estimators robust to the presence of outliers. The proposed approach considered the weighted maximum likelihood method and was submitted to numerical experiments using simulated and measured SAR images. Monte Carlo simulations were employed for the numerical assessment of the proposed robust estimator performance in finite signal lengths, their sensitivity to outliers, and the breakdown point. For instance, the non-robust estimators show a relative bias value $65$-fold larger than the results provided by the robust approach in corrupted signals. In terms of sensitivity analysis and break down point, the robust scheme resulted in a reduction of about $96\%$ and $10\%$, respectively, in the mean absolute value of both measures, in compassion to the non-robust estimators. Moreover, two SAR data sets were used to compare the ground type and anomaly detection results of the proposed robust scheme with competing methods in the literature.