AITopics

2312.01555

Country:

Asia > Middle East > Saudi Arabia (0.14)
Europe > United Kingdom (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
(24 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Transportation (1.00)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Law (1.00)
(14 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.68)

Odeh, Omar Husni, Arram, Anas, Njoum, Murad

Classification of Spam URLs Using Machine Learning Approaches

arXiv.org Artificial IntelligenceDec-3-2023

The Internet is used by billions of users every day because it offers fast and free communication tools and platforms. Nevertheless, with this significant increase in usage, huge amounts of spam are generated every second, which wastes internet resources and, more importantly, users' time. This study investigates the use of machine learning models to classify URLs as spam or nonspam. We first extract the features from the URL as it has only one feature, and then we compare the performance of several models, including k nearest neighbors, bagging, random forest, logistic regression, and others. Experimental results demonstrate that bagging outperformed other models and achieved the highest accuracy of 98.64%. In addition, bagging outperformed the current state-of-the-art approaches which emphasize its effectiveness in addressing spam-related challenges on the Internet. This suggests that bagging is a promising approach for URL spam classification.

algorithm, classifier, dataset, (13 more...)

2310.05953

Country:

North America > United States (0.04)
Asia > Middle East > Jordan (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Promising Solution (0.86)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.69)
(2 more...)

Diakonikolas, Ilias, Kane, Daniel M., Pensia, Ankit, Pittas, Thanasis

Near-Optimal Algorithms for Gaussians with Huber Contamination: Mean Estimation and Linear Regression

arXiv.org Machine LearningDec-3-2023

We study the fundamental problems of Gaussian mean estimation and linear regression with Gaussian covariates in the presence of Huber contamination. Our main contribution is the design of the first sample near-optimal and almost linear-time algorithms with optimal error guarantees for both of these problems. Specifically, for Gaussian robust mean estimation on $\mathbb{R}^d$ with contamination parameter $\epsilon \in (0, \epsilon_0)$ for a small absolute constant $\epsilon_0$, we give an algorithm with sample complexity $n = \tilde{O}(d/\epsilon^2)$ and almost linear runtime that approximates the target mean within $\ell_2$-error $O(\epsilon)$. This improves on prior work that achieved this error guarantee with polynomially suboptimal sample and time complexity. For robust linear regression, we give the first algorithm with sample complexity $n = \tilde{O}(d/\epsilon^2)$ and almost linear runtime that approximates the target regressor within $\ell_2$-error $O(\epsilon)$. This is the first polynomial sample and time algorithm achieving the optimal error guarantee, answering an open question in the literature. At the technical level, we develop a methodology that yields almost-linear time algorithms for multi-directional filtering that may be of broader interest.

artificial intelligence, machine learning, mean estimation and linear regression, (3 more...)

2312.01547

Genre: Research Report (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.80)

del Val, David, Berrendero, José R., Suárez, Alberto

Relation between PLS and OLS regression in terms of the eigenvalue distribution of the regressor covariance matrix

arXiv.org Machine LearningDec-3-2023

Partial least squares (PLS) is a dimensionality reduction technique introduced in the field of chemometrics and successfully employed in many other areas. The PLS components are obtained by maximizing the covariance between linear combinations of the regressors and of the target variables. In this work, we focus on its application to scalar regression problems. PLS regression consists in finding the least squares predictor that is a linear combination of a subset of the PLS components. Alternatively, PLS regression can be formulated as a least squares problem restricted to a Krylov subspace. This equivalent formulation is employed to analyze the distance between ${\hat{\boldsymbol\beta}\;}_{\mathrm{PLS}}^{\scriptscriptstyle {(L)}}$, the PLS estimator of the vector of coefficients of the linear regression model based on $L$ PLS components, and $\hat{\boldsymbol \beta}_{\mathrm{OLS}}$, the one obtained by ordinary least squares (OLS), as a function of $L$. Specifically, ${\hat{\boldsymbol\beta}\;}_{\mathrm{PLS}}^{\scriptscriptstyle {(L)}}$ is the vector of coefficients in the aforementioned Krylov subspace that is closest to $\hat{\boldsymbol \beta}_{\mathrm{OLS}}$ in terms of the Mahalanobis distance with respect to the covariance matrix of the OLS estimate. We provide a bound on this distance that depends only on the distribution of the eigenvalues of the regressor covariance matrix. Numerical examples on synthetic and real-world data are used to illustrate how the distance between ${\hat{\boldsymbol\beta}\;}_{\mathrm{PLS}}^{\scriptscriptstyle {(L)}}$ and $\hat{\boldsymbol \beta}_{\mathrm{OLS}}$ depends on the number of clusters in which the eigenvalues of the regressor covariance matrix are grouped.

artificial intelligence, eigenvalue, machine learning, (19 more...)

2312.01379

Country:

North America > United States > California (0.05)
Europe > Spain > Galicia > Madrid (0.04)
North America > United States > Washington > King County > Seattle (0.04)
(4 more...)

Genre:

Workflow (0.46)
Research Report (0.40)

Industry:

Health & Medicine (0.67)
Government (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.69)

Duval, Francis, Boucher, Jean-Philippe, Pigeon, Mathieu

Telematics Combined Actuarial Neural Networks for Cross-Sectional and Longitudinal Claim Count Data

arXiv.org Machine LearningDec-3-2023

We present novel cross-sectional and longitudinal claim count models for vehicle insurance built upon the Combined Actuarial Neural Network (CANN) framework proposed by Mario W\"uthrich and Michael Merz. The CANN approach combines a classical actuarial model, such as a generalized linear model, with a neural network. This blending of models results in a two-component model comprising a classical regression model and a neural network part. The CANN model leverages the strengths of both components, providing a solid foundation and interpretability from the classical model while harnessing the flexibility and capacity to capture intricate relationships and interactions offered by the neural network. In our proposed models, we use well-known log-linear claim count regression models for the classical regression part and a multilayer perceptron (MLP) for the neural network part. The MLP part is used to process telematics car driving data given as a vector characterizing the driving behavior of each insured driver. In addition to the Poisson and negative binomial distributions for cross-sectional data, we propose a procedure for training our CANN model with a multivariate negative binomial (MVNB) specification. By doing so, we introduce a longitudinal model that accounts for the dependence between contracts from the same insured. Our results reveal that the CANN models exhibit superior performance compared to log-linear models that rely on manually engineered telematics features.

artificial intelligence, contract, machine learning, (19 more...)

2308.01729

Country:

North America > Canada > Quebec > Montreal (0.04)
North America > United States > California (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (0.87)

Industry:

Transportation > Ground > Road (1.00)
Banking & Finance > Insurance (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

arXiv.org Artificial IntelligenceDec-2-2023

Interpretable AI-Driven Discovery of Terrain-Precipitation Relationships for Enhanced Climate Insights

Xu, Hao, Chen, Yuntian, Zeng, Zhenzhong, Li, Nina, Li, Jian, Zhang, Dongxiao

Despite the remarkable strides made by AI-driven models in modern precipitation forecasting, these black-box models cannot inherently deepen the comprehension of underlying mechanisms. To address this limitation, we propose an AI-driven knowledge discovery framework known as genetic algorithm-geographic weighted regression (GA-GWR). Our approach seeks to unveil the explicit equations that govern the intricate relationship between precipitation patterns and terrain characteristics in regions marked by complex terrain. Through this AI-driven knowledge discovery, we uncover previously undisclosed explicit equations that shed light on the connection between terrain features and precipitation patterns. These equations demonstrate remarkable accuracy when applied to precipitation data, outperforming conventional empirical models. Notably, our research reveals that the parameters within these equations are dynamic, adapting to evolving climate patterns. Ultimately, the unveiled equations have practical applications, particularly in fine-scale downscaling for precipitation predictions using low-resolution future climate data. This capability offers invaluable insights into the anticipated changes in precipitation patterns across diverse terrains under future climate scenarios, which enhances our ability to address the challenges posed by contemporary climate science.

artificial intelligence, machine learning, precipitation, (19 more...)

2309.154

Country:

Europe (0.28)
Asia > China > Guangdong Province (0.14)

Genre: Research Report (1.00)

Industry: Energy > Oil & Gas > Upstream (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.66)

Oryngozha, Nazzere, Shamoi, Pakizar, Igali, Ayan

Detection and Analysis of Stress-Related Posts in Reddit Acamedic Communities

arXiv.org Artificial IntelligenceDec-2-2023

Nowadays, the significance of monitoring stress levels and recognizing early signs of mental illness cannot be overstated. Automatic stress detection in text can proactively help manage stress and protect mental well-being. In today's digital era, social media platforms reflect the psychological well-being and stress levels within various communities. This study focuses on detecting and analyzing stress-related posts in Reddit academic communities. Due to online education and remote work, these communities have become central for academic discussions and support. We classify text as stressed or not using natural language processing and machine learning classifiers, with Dreaddit as our training dataset, which contains labeled data from Reddit. Next, we collect and analyze posts from various academic subreddits. We identified that the most effective individual feature for stress detection is the Bag of Words, paired with the Logistic Regression classifier, achieving a 77.78% accuracy rate and an F1 score of 0.79 on the DReaddit dataset. This combination also performs best in stress detection on human-annotated datasets, with a 72% accuracy rate. Our key findings reveal that posts and comments in professors Reddit communities are the most stressful, compared to other academic levels, including bachelor, graduate, and Ph.D. students. This research contributes to our understanding of the stress levels within academic communities. It can help academic institutions and online communities develop measures and interventions to address this issue effectively.

dataset, stress detection, student, (16 more...)

doi: 10.1109/ACCESS.2024.3357662

2312.0105

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
North America > Canada (0.04)
Europe > United Kingdom > England (0.04)
(4 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (1.00)
Education > Educational Setting (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

arXiv.org Machine LearningDec-1-2023

Supervised Machine Learning and Physics based Machine Learning approach for prediction of peak temperature distribution in Additive Friction Stir Deposition of Aluminium Alloy

Mishra, Akshansh

Additive friction stir deposition (AFSD) is a novel solid-state additive manufacturing technique that circumvents issues of porosity, cracking, and properties anisotropy that plague traditional powder bed fusion and directed energy deposition approaches. However, correlations between process parameters, thermal profiles, and resulting microstructure in AFSD remain poorly understood. This hinders process optimization for properties. This work employs a framework combining supervised machine learning (SML) and physics-informed neural networks (PINNs) to predict peak temperature distribution in AFSD from process parameters. Eight regression algorithms were implemented for SML modeling, while four PINNs leveraged governing equations for transport, wave propagation, heat transfer, and quantum mechanics. Across multiple statistical measures, ensemble techniques like gradient boosting proved superior for SML, with lowest MSE of 165.78. The integrated ML approach was also applied to classify deposition quality from process factors, with logistic regression delivering robust accuracy. By fusing data-driven learning and fundamental physics, this dual methodology provides comprehensive insights into tailoring microstructure through thermal management in AFSD. The work demonstrates the power of bridging statistical and physics-based modeling for elucidating AM process-property relationships.

algorithm, prediction, regression, (13 more...)

2309.06838

Country: Europe > Italy > Lombardy > Milan (0.04)

Genre: Research Report (1.00)

Industry: Materials > Metals & Mining > Aluminum (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Model-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.69)

Svoboda, Radek, Basterrech, Sebastian, Kozal, Jędrzej, Platoš, Jan, Woźniak, Michał

A Natural Gas Consumption Forecasting System for Continual Learning Scenarios based on Hoeffding Trees with Change Point Detection Mechanism

arXiv.org Artificial IntelligenceNov-30-2023

Forecasting natural gas consumption, considering seasonality and trends, is crucial in planning its supply and consumption and optimizing the cost of obtaining it, mainly by industrial entities. However, in times of threats to its supply, it is also a critical element that guarantees the supply of this raw material to meet individual consumers' needs, ensuring society's energy security. This article introduces a novel multistep ahead forecasting of natural gas consumption with change point detection integration for model collection selection with continual learning capabilities using data stream processing. The performance of the forecasting models based on the proposed approach is evaluated in a complex real-world use case of natural gas consumption forecasting. We employed Hoeffding tree predictors as forecasting models and the Pruned Exact Linear Time (PELT) algorithm for the change point detection procedure. The change point detection integration enables selecting a different model collection for successive time frames. Thus, three model collection selection procedures (with and without an error feedback loop) are defined and evaluated for forecasting scenarios with various densities of detected change points. These models were compared with change point agnostic baseline approaches. Our experiments show that fewer change points result in a lower forecasting error regardless of the model collection selection procedure employed. Also, simpler model collection selection procedures omitting forecasting error feedback leads to more robust forecasting models suitable for continual learning tasks.

change point, data mining, machine learning, (19 more...)

2309.0372

Country:

Europe > Czechia (0.46)
Europe > Poland (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Denmark > Capital Region > Kongens Lyngby (0.14)

Genre:

Research Report > New Finding (1.00)
Overview (0.93)

Industry: Energy > Oil & Gas > Downstream (1.00)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Berlanger, Nick, van Ophoven, Noah, Verdonck, Tim, Wilms, Ines

Tree-based Forecasting of Day-ahead Solar Power Generation from Granular Meteorological Features

arXiv.org Artificial IntelligenceNov-30-2023

Accurate forecasts for day-ahead photovoltaic (PV) power generation are crucial to support a high PV penetration rate in the local electricity grid and to assure stability in the grid. We use state-of-the-art tree-based machine learning methods to produce such forecasts and, unlike previous studies, we hereby account for (i) the effects various meteorological as well as astronomical features have on PV power production, and this (ii) at coarse as well as granular spatial locations. To this end, we use data from Belgium and forecast day-ahead PV power production at an hourly resolution. The insights from our study can assist utilities, decision-makers, and other stakeholders in optimizing grid operations, economic dispatch, and in facilitating the integration of distributed PV power into the electricity grid.

configuration, forecast, model configuration, (15 more...)

2312.0009

Country:

Europe > Austria > Vienna (0.14)
Europe > Sweden > Stockholm > Stockholm (0.04)
Europe > Portugal > Coimbra > Coimbra (0.04)
(5 more...)

Genre: Research Report (0.64)

Industry:

Energy > Renewable > Solar (1.00)
Energy > Power Industry (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)