AITopics | synthetic point

Collaborating Authors

synthetic point

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Simplicial SMOTE: Oversampling Solution to the Imbalanced Learning Problem

Kachan, Oleg, Savchenko, Andrey, Gusev, Gleb

arXiv.org Artificial IntelligenceMar-5-2025

SMOTE (Synthetic Minority Oversampling Technique) is the established geometric approach to random oversampling to balance classes in the imbalanced learning problem, followed by many extensions. Its idea is to introduce synthetic data points of the minor class, with each new point being the convex combination of an existing data point and one of its k-nearest neighbors. In this paper, by viewing SMOTE as sampling from the edges of a geometric neighborhood graph and borrowing tools from the topological data analysis, we propose a novel technique, Simplicial SMOTE, that samples from the simplices of a geometric neighborhood simplicial complex. A new synthetic point is defined by the barycentric coordinates w.r.t. a simplex spanned by an arbitrary number of data points being sufficiently close rather than a pair. Such a replacement of the geometric data model results in better coverage of the underlying data distribution compared to existing geometric sampling methods and allows the generation of synthetic points of the minority class closer to the majority class on the decision boundary. We experimentally demonstrate that our Simplicial SMOTE outperforms several popular geometric sampling methods, including the original SMOTE. Moreover, we show that simplicial sampling can be easily integrated into existing SMOTE extensions. We generalize and evaluate simplicial extensions of the classic Borderline SMOTE, Safe-level SMOTE, and ADASYN algorithms, all of which outperform their graph-based counterparts.

simplicial smote, smote, synthetic point, (15 more...)

arXiv.org Artificial Intelligence

2503.03418

Country:

North America > Canada > Ontario > Toronto (0.05)
Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)
Asia > Russia (0.04)
(3 more...)

Genre: Research Report > Promising Solution (0.34)

Industry:

Education > Focused Education > Special Education (0.61)
Health & Medicine (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.68)

Add feedback

A Little Human Data Goes A Long Way

Ashok, Dhananjay, May, Jonathan

arXiv.org Artificial IntelligenceOct-16-2024

Faced with an expensive human annotation process, creators of NLP systems increasingly turn to synthetic data generation. While this method shows promise, the extent to which synthetic data can replace human annotation is poorly understood. We investigate the use of synthetic data in Fact Verification (FV) and Question Answering (QA) by studying the effects of incrementally replacing human generated data with synthetic points on eight diverse datasets. Strikingly, replacing up to 90% of the training data only marginally decreases performance, but replacing the final 10% leads to severe declines. We find that models trained on purely synthetic data can be reliably improved by including as few as 125 human generated data points. We show that matching the performance gain of just a little additional human data (only 200 points) requires an order of magnitude more synthetic data and estimate price ratios at which human annotation would be a more cost-effective solution. Our results suggest that even when human annotation at scale is infeasible, there is great value to having a small proportion of the dataset being human generated.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2410.13098

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > California (0.14)
Asia > Thailand > Bangkok > Bangkok (0.04)
(7 more...)

Genre: Research Report > New Finding (1.00)

Industry: Education > Educational Setting (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.30)

Add feedback