Dissimilar Batch Decompositions of Random Datasets

Apr-9-2025–arXiv.org Machine Learning

Noname manuscript No. (will be inserted by the editor) Ghurumuruhan Ganesan IISER Bhopal Abstract For better learning, large datasets are often split into small batch es and fed sequentially to the predictive model. In this paper, we study suc h batch decompositions from a probabilistic perspective. We assume that data poin ts (possibly corrupted) are drawn independently from a given space and define a co ncept of similarity between two data points. We then consider decompositions that restrict the amount of similarity within each batch and obtain high probability bounds for the minimum size. We demonstrate an inherent tradeoff between relaxing the similarity constraint and the overall size and also use martingale methods to obtain bounds fo r the maximum size of data subsets with a given similarity.

artificial intelligence, decomposition, machine learning, (16 more...)

arXiv.org Machine Learning

Apr-9-2025

arXiv.org PDF

Add feedback

Country:
- Europe > United Kingdom
  - England > Oxfordshire > Oxford (0.04)
- Asia > India
  - Madhya Pradesh > Bhopal (0.25)

Genre:
- Research Report (0.50)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning (1.00)
  - Representation & Reasoning (0.66)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found