AITopics

doi: 10.1016/j.eswa.2023.122478

2402.00654

Country:

North America > United States > Tennessee > Knox County > Knoxville (0.14)
North America > United States > California > Los Angeles County > Los Angeles (0.14)
North America > United States > Tennessee > Anderson County > Oak Ridge (0.04)
(12 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Transportation > Freight & Logistics Services (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
Health & Medicine (0.95)
(3 more...)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
(3 more...)

Gultepe, Eren, Wang, Sen, Blomquist, Byron, Fernando, Harindra J. S., Kreidl, O. Patrick, Delene, David J., Gultepe, Ismail

Generative Nowcasting of Marine Fog Visibility in the Grand Banks area and Sable Island in Canada

arXiv.org Artificial IntelligenceFeb-9-2024

This study presents the application of generative deep learning techniques to evaluate marine fog visibility nowcasting using the FATIMA (Fog and turbulence interactions in the marine atmosphere) campaign observations collected during July 2022 in the North Atlantic in the Grand Banks area and vicinity of Sable Island (SI), northeast of Canada. The measurements were collected using the Vaisala Forward Scatter Sensor model FD70 and Weather Transmitter model WXT50, and Gill R3A ultrasonic anemometer mounted on the Research Vessel Atlantic Condor. To perform nowcasting, the time series of fog visibility (Vis), wind speed, dew point depression, and relative humidity with respect to water were preprocessed to have lagged time step features. Generative nowcasting of Vis time series for lead times of 30 and 60 minutes were performed using conditional generative adversarial networks (cGAN) regression at visibility thresholds of Vis < 1 km and < 10 km. Extreme gradient boosting (XGBoost) was used as a baseline method for comparison against cGAN. At the 30 min lead time, Vis was best predicted with cGAN at Vis < 1 km (RMSE = 0.151 km) and with XGBoost at Vis < 10 km (RMSE = 2.821 km). At the 60 min lead time, Vis was best predicted with XGBoost at Vis < 1 km (RMSE = 0.167 km) and Vis < 10 km (RMSE = 3.508 km), but the cGAN RMSE was similar to XGBoost. Despite nowcasting Vis at 30 min being quite difficult, the ability of the cGAN model to track the variation in Vis at 1 km suggests that there is potential for generative analysis of marine fog visibility using observational meteorological parameters.

min lead time, prediction, visibility, (13 more...)

2402.068

Country:

North America > United States > North Dakota > Grand Forks County > Grand Forks (0.14)
North America > United States > Florida > Duval County > Jacksonville (0.14)
Oceania > Australia > Victoria > Melbourne (0.04)
(11 more...)

Genre: Research Report > New Finding (0.46)

Industry: Transportation > Air (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)

arXiv.org Artificial IntelligenceFeb-8-2024

Comparison of machine learning and statistical approaches for digital elevation model (DEM) correction: interim results

Okolie, Chukwuma, Adeleke, Adedayo, Smit, Julian, Mills, Jon, Maduako, Iyke, Ogbeta, Caleb

Several methods have been proposed for correcting the elevation bias in digital elevation models (DEMs) for example, linear regression. Nowadays, supervised machine learning enables the modelling of complex relationships between variables, and has been deployed by researchers in a variety of fields. In the existing literature, several studies have adopted either machine learning or statistical approaches in the task of DEM correction. However, to our knowledge, none of these studies have compared the performance of both approaches, especially with regard to open-access global DEMs. Our previous work has already shown the potential of machine learning approaches, specifically gradient boosted decision trees (GBDTs) for DEM correction. In this study, we share some results from the comparison of three recent implementations of gradient boosted decision trees (XGBoost, LightGBM and CatBoost), versus multiple linear regression (MLR) for enhancing the vertical accuracy of 30 m Copernicus and AW3D global DEMs in Cape Town, South Africa.

correction, landscape, university, (12 more...)

2402.06688

Country:

Africa > South Africa > Western Cape > Cape Town (0.28)
North America > United States > Oregon (0.05)
Europe > United Kingdom (0.05)
(3 more...)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.79)

Surve, Tanmay, Pradhan, Romila

Example-based Explanations for Random Forests using Machine Unlearning

arXiv.org Artificial IntelligenceFeb-7-2024

Tree-based machine learning models, such as decision trees and random forests, have been hugely successful in classification tasks primarily because of their predictive power in supervised learning tasks and ease of interpretation. Despite their popularity and power, these models have been found to produce unexpected or discriminatory outcomes. Given their overwhelming success for most tasks, it is of interest to identify sources of their unexpected and discriminatory behavior. However, there has not been much work on understanding and debugging tree-based classifiers in the context of fairness. We introduce FairDebugger, a system that utilizes recent advances in machine unlearning research to identify training data subsets responsible for instances of fairness violations in the outcomes of a random forest classifier. FairDebugger generates top-$k$ explanations (in the form of coherent training data subsets) for model unfairness. Toward this goal, FairDebugger first utilizes machine unlearning to estimate the change in the tree structures of the random forest when parts of the underlying training data are removed, and then leverages the Apriori algorithm from frequent itemset mining to reduce the subset search space. We empirically evaluate our approach on three real-world datasets, and demonstrate that the explanations generated by FairDebugger are consistent with insights from prior studies on these datasets.

contribution, explanation, subset, (15 more...)

2402.05007

Country:

North America > United States > California (0.14)
North America > United States > New York > New York County > New York City (0.05)
North America > Canada (0.04)

Genre: Research Report (0.82)

Industry:

Law > Civil Rights & Constitutional Law (1.00)
Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Explanation & Argumentation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
(2 more...)

arXiv.org Artificial IntelligenceFeb-7-2024

Exposing propaganda: an analysis of stylistic cues comparing human annotations and machine classification

Faye, Géraud, Icard, Benjamin, Casanova, Morgane, Chanson, Julien, Maine, François, Bancilhon, François, Gadek, Guillaume, Gravier, Guillaume, Égré, Paul

This paper investigates the language of propaganda and its stylistic features. It presents the PPN dataset, standing for Propagandist Pseudo-News, a multisource, multilingual, multimodal dataset composed of news articles extracted from websites identified as propaganda sources by expert agencies. A limited sample from this set was randomly mixed with papers from the regular French press, and their URL masked, to conduct an annotation-experiment by humans, using 11 distinct labels. The results show that human annotators were able to reliably discriminate between the two types of press across each of the labels. We propose different NLP techniques to identify the cues used by the annotators, and to compare them with machine classification. They include the analyzer VAGO to measure discourse vagueness and subjectivity, a TF-IDF to serve as a baseline, and four different classifiers: two RoBERTa-based models, CATS using syntax, and one XGBoost combining syntactic and semantic features.

corpus, dataset, propaganda, (16 more...)

2402.0378

Country:

Asia > Russia (0.68)
Europe > France (0.15)
Europe > Ukraine > Kyiv Oblast > Kyiv (0.14)
(6 more...)

Genre: Research Report > New Finding (0.34)

Industry:

Media > News (0.95)
Government > Regional Government > Europe Government > Russia Government (0.46)
Government > Regional Government > Asia Government > Russia Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.89)

Sharma, Hemlata, Harsora, Hitesh, Ogunleye, Bayode

An Optimal House Price Prediction Algorithm: XGBoost

arXiv.org Artificial IntelligenceFeb-6-2024

An accurate prediction of house prices is a fundamental requirement for various sectors including real estate and mortgage lending. It is widely recognized that a property value is not solely determined by its physical attributes but is significantly influenced by its surrounding neighbourhood. Meeting the diverse housing needs of individuals while balancing budget constraints is a primary concern for real estate developers. To this end, we addressed the house price prediction problem as a regression task and thus employed various machine learning techniques capable of expressing the significance of independent variables. We made use of the housing dataset of Ames City in Iowa, USA to compare support vector regressor, random forest regressor, XGBoost, multilayer perceptron and multiple linear regression algorithms for house price prediction. Afterwards, we identified the key factors that influence housing costs. Our results show that XGBoost is the best performing model for house price prediction.

prediction, price prediction, regression, (13 more...)

doi: 10.3390/analytics3010003

2402.04082

Country:

North America > United States > Iowa (0.24)
North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > New Jersey > Middlesex County > Piscataway (0.04)
(24 more...)

Genre: Research Report > New Finding (1.00)

Industry: Banking & Finance > Real Estate (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)

da Cunha, Arthur, Larsen, Kasper Green, Ritzert, Martin

Boosting, Voting Classifiers and Randomized Sample Compression Schemes

In boosting, we aim to leverage multiple weak learners to produce a strong learner. At the center of this paradigm lies the concept of building the strong learner as a voting classifier, which outputs a weighted majority vote of the weak learners. While many successful boosting algorithms, such as the iconic AdaBoost, produce voting classifiers, their theoretical performance has long remained sub-optimal: the best known bounds on the number of training examples necessary for a voting classifier to obtain a given accuracy has so far always contained at least two logarithmic factors above what is known to be achievable by general weak-to-strong learners. In this work, we break this barrier by proposing a randomized boosting algorithm that outputs voting classifiers whose generalization error contains a single logarithmic dependency on the sample size. We obtain this result by building a general framework that extends sample compression methods to support randomized learning algorithms based on sub-sampling.

algorithm 1, compression scheme, voting classifier, (14 more...)

2402.02976

Country:

Europe > Germany > Lower Saxony > Gottingen (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Austria > Styria > Graz (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.46)

A Survey of Privacy Threats and Defense in Vertical Federated Learning: From Model Life Cycle Perspective

Yu, Lei, Han, Meng, Li, Yiming, Lin, Changting, Zhang, Yao, Zhang, Mingyang, Liu, Yan, Weng, Haiqin, Jeon, Yuseok, Chow, Ka-Ho, Patterson, Stacy

Vertical Federated Learning (VFL) is a federated learning paradigm where multiple participants, who share the same set of samples but hold different features, jointly train machine learning models. Although VFL enables collaborative machine learning without sharing raw data, it is still susceptible to various privacy threats. In this paper, we conduct the first comprehensive survey of the state-of-the-art in privacy attacks and defenses in VFL. We provide taxonomies for both attacks and defenses, based on their characterizations, and discuss open challenges and future research directions. Specifically, our discussion is structured around the model's life cycle, by delving into the privacy threats encountered during different stages of machine learning and their corresponding countermeasures. This survey not only serves as a resource for the research community but also offers clear guidance and actionable insights for practitioners to safeguard data privacy throughout the model's life cycle.

federated learning, gradient, learning, (14 more...)

2402.03688

Country:

Europe > Austria > Vienna (0.14)
Asia > South Korea > Ulsan > Ulsan (0.04)
Asia > China > Hong Kong (0.04)
(15 more...)

Genre:

Overview (1.00)
Research Report > New Finding (0.68)

Industry:

Information Technology > Security & Privacy (1.00)
Transportation > Ground > Road (0.45)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.93)
(2 more...)

Machine Learning Resistant Amorphous Silicon Physically Unclonable Functions (PUFs)

Kilic, Velat, Macfarlane, Neil, Stround, Jasper, Metais, Samuel, Alemohammad, Milad, Cooper, A. Brinton, Foster, Amy C., Foster, Mark A.

Many crypto protocols rely heavily on the security of keys that are stored in device memory and are susceptible to malware attacks. Physically unclonable functions (PUFs) have been proposed as an alternative [1] whose response to external stimuli (challenge) is determined by their microscopic structure, which is difficult to clone. PUF operation consists of two phases: enrollment and deployment. During the enrollment process, the manufacturer creates a challenge response pair (CRP) library by probing the device with unique binary challenges and measuring/generating the corresponding digitized response. The CRP data set is then stored for the deployment phase where the PUF device can be authenticated by probing it with a subset of challenges in the CRP data set and comparing the responses. To be a strong security primitive a PUF must exhibit behavior that is i) deterministic, ii) unpredictable, and iii) unique.

bit level, information, puf, (16 more...)

2402.02846

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe (0.04)

Genre: Research Report (0.50)

Industry: Information Technology > Security & Privacy (0.87)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.69)

Mohamed, Ahmed P., Lee, Byunghyun, Zhang, Yaguang, Hollingsworth, Max, Anderson, C. Robert, Krogmeier, James V., Love, David J.

Simulation-Enhanced Data Augmentation for Machine Learning Pathloss Prediction

Machine learning (ML) offers a promising solution to pathloss prediction. However, its effectiveness can be degraded by the limited availability of data. To alleviate these challenges, this paper introduces a novel simulation-enhanced data augmentation method for ML pathloss prediction. Our method integrates synthetic data generated from a cellular coverage simulator and independently collected real-world datasets. These datasets were collected through an extensive measurement campaign in different environments, including farms, hilly terrains, and residential areas. This comprehensive data collection provides vital ground truth for model training. A set of channel features was engineered, including geographical attributes derived from LiDAR datasets. These features were then used to train our prediction model, incorporating the highly efficient and robust gradient boosting ML algorithm, CatBoost. The integration of synthetic data, as demonstrated in our study, significantly improves the generalizability of the model in different environments, achieving a remarkable improvement of approximately 12dB in terms of mean absolute error for the best-case scenario. Moreover, our analysis reveals that even a small fraction of measurements added to the simulation training set, with proper data balance, can significantly enhance the model's performance.

dataset, prediction, synthetic data, (15 more...)

2402.01969

Country:

North America > United States > Colorado > Boulder County > Boulder (0.14)
North America > United States > Virginia > Montgomery County > Blacksburg (0.04)
North America > United States > Indiana > Tippecanoe County > West Lafayette (0.04)
(2 more...)

Genre: Research Report > New Finding (0.66)

Industry:

Telecommunications (1.00)
Government > Regional Government > North America Government > United States Government (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.34)