AITopics

doi: 10.1007/978-3-031-17801-6_6

2208.00647

Country:

North America > United States > New Jersey > Mercer County > Princeton (0.04)
Europe > France > Île-de-France > Paris > Paris (0.04)
Europe > France > Hauts-de-France > Oise > Compiègne (0.04)

Genre: Research Report (0.83)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.48)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.38)

arXiv.org Artificial IntelligenceAug-1-2022

How should we proxy for race/ethnicity? Comparing Bayesian improved surname geocoding to machine learning methods

Decter-Frain, Ari

Political science research often requires constructing a race/ethnicity proxy variable for datasets that do not contain it, like voter registration files, lists of electoral candidates, or political donation records. Constructing such a proxy is an important step for conducting ecological inference in voting rights litigation (Barreto et al. [2019], Imai and Khanna [2016]), redistricting (DeLuca and Curiel [2022], Kenny et al. [2021]), and substantive research on the role of race/ethnicity in politics (Enos [2016], Enos et al. [2019], Grumbach and Sahn [2020]). The most common method for proxying race/ethnicity is Bayesian Improved Surname Geocoding (BISG), which uses Bayes' rule to compute a probability distribution over race/ethnicity categories conditional on a voter's surname and where they live (Elliott et al. [2008, 2009]). BISG has attained widespread popularity due to its parsimony, computational efficiency, and superior performance when compared to existing alternatives, namely spatial interpolation of Census racial-ethnic composition from Census geographies (Imai and Khanna [2016], Clark et al. [2021], Shah and Davis [2017]). While BISG performs well compared to the small suite of existing alternatives, it has not yet been benchmarked against machine learning (ML) models, which can produce race/ethnicity predictions from more flexible and potentially more accurate models. In this paper I present the results of such a benchmark. I train a range of machine learning models using voter registration data from Florida, Georgia, North Carolina, and a portion of California where voters self-report their race/ethnicity upon registration. The registries in these states contain over 26 million labelled observations, which equates to greater than a five percent non-representative sample of the United States electorate. I then compare BISG against predictions from these models made out-of-state.

bisg, race ethnicity, rmse, (14 more...)

2206.14583

Country:

North America > United States > Georgia (0.55)
North America > United States > North Carolina (0.25)
Oceania > Australia > Victoria > Melbourne (0.04)
(7 more...)

Genre: Research Report > New Finding (0.68)

Industry: Government > Voting & Elections (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

Takko, Tuomas, Bhattacharya, Kunal, Lehto, Martti, Jalasvirta, Pertti, Cederberg, Aapo, Kaski, Kimmo

Knowledge mining of unstructured information: application to cyber-domain

arXiv.org Artificial IntelligenceAug-1-2022

Information on cyber-related crimes, incidents, and conflicts is abundantly available in numerous open online sources. However, processing the large volumes and streams of data is a challenging task for the analysts and experts, and entails the need for newer methods and techniques. In this article we present and implement a novel knowledge graph and knowledge mining framework for extracting the relevant information from free-form text about incidents in the cyberdomain. The framework includes a machine learning based pipeline for generating graphs of organizations, countries, industries, products and attackers with a non-technical cyber-ontology. The extracted knowledge graph is utilized to estimate the incidence of cyberattacks on a given graph configuration. We use publicly available collections of real cyber-incident reports to test the efficacy of our methods. The knowledge extraction is found to be sufficiently accurate, and the graph-based threat estimation demonstrates a level of correlation with the actual records of attacks. In practical use, an analyst utilizing the presented framework can infer additional information from the current cyber-landscape in terms of risk to various entities and propagation of the risk heuristic between industries and countries.

information, knowledge graph, unstructured information, (14 more...)

2109.03848

Country:

North America > United States > New Mexico > Los Alamos County > Los Alamos (0.04)
Europe > United Kingdom (0.04)
Europe > Norway (0.04)
(6 more...)

Genre: Research Report > New Finding (0.94)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (0.93)
Government > Military > Cyberwarfare (0.91)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (0.51)
(2 more...)

arXiv.org Artificial IntelligenceJul-31-2022

Intelligent decision-making method of TBM operating parameters based on multiple constraints and objective optimization

Liu, Bin, Wang, Jiwen, Wang, Ruirui, Wang, Yaxu, Zhao, Guangzu

The decision-making of TBM operating parameters has an important guiding significance for TBM safe and efficient construction, and it has been one of the research hotpots in the field of TBM tunneling. For this purpose, this paper introduces rock-breaking rules into machine learning method, and a rock-machine mapping dual-driven by physical-rule and data-mining is established with high accuracy. This dual-driven mappings are subsequently used as objective function and constraints to build a decision-making method for TBM operating parameters. By searching the revolution per minute and penetration corresponding to the extremum of the objective function subject to the constraints, the optimal operating parameters can be obtained. This method is verified in the field of the Second Water Source Channel of Hangzhou, China, resulting in the average penetration rate increased by 11.3%, and the total cost decreased by 10.0%, which proves the practicability and effectiveness of the developed decision-making model.

artificial intelligence, constraint, machine learning, (13 more...)

2208.00404

Country:

Asia > China > Shandong Province (0.28)
Asia > China > Zhejiang Province > Hangzhou (0.25)

Genre: Research Report > New Finding (0.93)

Industry: Energy > Oil & Gas > Upstream (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

#artificialintelligenceJul-30-2022, 21:25:58 GMT

[100%OFF] Logistic Regression In R Studio

In this section we will learn – What does Machine Learning mean. What are the meanings or different terms associated with machine learning? You will see some examples so that you understand what machine learning actually is. It also contains steps involved in building a machine learning model, not just linear models, any machine learning model.

business problem, learning, machine learning, (9 more...)

Genre:

Instructional Material > Course Syllabus & Notes (1.00)
Research Report (0.90)

Industry: Education (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.67)

#artificialintelligenceJul-30-2022, 07:30:50 GMT

Beyond Linear Regression

Linear regression is among the primary/entry-level Machine Learning (ML) models. It's not even wrong to say that it's the synonym of the "Hello world" program for Data scientists. Finding the linear regression coefficients β_1, …, β_p involves finding the "best" linear combination of variables that approaches the response. Said differently, finding the coefficients that minimize the mean squared error (MSE). It's possible to endow the regression coefficients with some extra properties by considering the MSE plus an additional penalty term.

categorical variable, coefficient, group lasso, (12 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)

#artificialintelligenceJul-29-2022, 21:00:25 GMT

Top 10 Machine Learning Algorithms Explained

Linear Regression: For statistical techniques, linear regression is used in which the value of the dependent variable is predicted through independent variables. A relationship is formed by mapping the dependent and independent variable on a line, and that line is called the regression line, which is represented by Y a*X b where Y Dependent variable (for example, weight) X Independent Variable (e.g., height) b Intercept and a slope. Logistic Regression: In logistic regression, we have a lot of data whose classification is done by building an equation. This method is used to find the discrete dependent variable from the set of independent variables. Its goal is to find the best fit set of parameters. In this classifier, each feature is multiplied by a weight, and then all are added.

algorithm, decision tree, regression, (13 more...)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.99)

#artificialintelligenceJul-29-2022, 13:57:23 GMT

Logistic Regression

The outcome of a Linear Regression can take any form, discrete or continuous, and it may not be limited, within a boundary, in range 0 to 1. Linear Regression can give values large than 1 or less than 0 which is not desirable for classification problem. Logistic Regression, on the other hand, as we have seen above squeezes the output between 0 and 1 which is more desirable for classification problem. Linear Regression is based on linear algebra where as Logistic Regression uses probability.

linear regression, logistic regression, regression, (1 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)

#artificialintelligenceJul-29-2022, 00:04:08 GMT

Understanding Neural Networks -- Part 1/3: Intuition of Forward Propagation

Basically, it's just a type of ML algorithm that was built to emulate connections in a brain. It can be used for classification and regression tasks. Today, we're going to go over a classification task. The big thing about NNs is that they are "universal function approximators," meaning they can approximate any function (duh). Compare this with linear regression which ONLY can approximate linear functions. The first layer is called the input layer and has as many neurons as we have features in our data.

equation, neuron, vector, (15 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.38)

arXiv.org Artificial IntelligenceJul-29-2022

Robust Rayleigh Regression Method for SAR Image Processing in Presence of Outliers

Palm, B. G., Bayer, F. M., Machado, R., Pettersson, M. I., Vu, V. T., Cintra, R. J.

The presence of outliers (anomalous values) in synthetic aperture radar (SAR) data and the misspecification in statistical image models may result in inaccurate inferences. To avoid such issues, the Rayleigh regression model based on a robust estimation process is proposed as a more realistic approach to model this type of data. This paper aims at obtaining Rayleigh regression model parameter estimators robust to the presence of outliers. The proposed approach considered the weighted maximum likelihood method and was submitted to numerical experiments using simulated and measured SAR images. Monte Carlo simulations were employed for the numerical assessment of the proposed robust estimator performance in finite signal lengths, their sensitivity to outliers, and the breakdown point. For instance, the non-robust estimators show a relative bias value $65$-fold larger than the results provided by the robust approach in corrupted signals. In terms of sensitivity analysis and break down point, the robust scheme resulted in a reduction of about $96\%$ and $10\%$, respectively, in the mean absolute value of both measures, in compassion to the non-robust estimators. Moreover, two SAR data sets were used to compare the ground type and anomaly detection results of the proposed robust scheme with competing methods in the literature.

outlier, rayleigh regression model, regression model, (14 more...)

doi: 10.1109/TGRS.2021.3105694

2208.00097

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Sweden (0.04)
South America > Brazil > Pernambuco (0.04)

Genre: Research Report (0.83)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.89)