arxiv preprint
Towards Federated Foundation Models: Scalable Dataset Pipelines for Group-Structured Learning Zachary Charles
We introduce Dataset Grouper, a library to create large-scale group-structured (e.g., federated) datasets, enabling federated learning simulation at the scale of foundation models. This library facilitates the creation of group-structured versions of existing datasets based on user-specified partitions, and directly leads to a variety of useful heterogeneous datasets that can be plugged into existing software frameworks. Dataset Grouper offers three key advantages. First, it scales to settings where even a single group's dataset is too large to fit in memory. Second, it provides flexibility, both in choosing the base (non-partitioned) dataset and in defining partitions.
Country:
- North America > United States > Virginia (0.04)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
Technology:
- Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.67)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Country:
- North America > United States (0.14)
- Africa > Malawi (0.05)
- Europe > France > Île-de-France > Paris > Paris (0.04)
Technology:
Country:
- North America > United States > New York > New York County > New York City (0.14)
- North America > United States > California > Los Angeles County > Long Beach (0.14)
- Europe > Austria (0.04)
- (21 more...)
Technology:
Country:
- Asia > Singapore (0.04)
- Europe > Austria (0.04)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- (4 more...)
Technology:
Country:
- North America > United States > California > Los Angeles County > Long Beach (0.14)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- (9 more...)
Technology:
Industry:
- Transportation (0.93)
- Government > Regional Government (0.46)
Technology:
Country:
- Europe > Switzerland > Zürich > Zürich (0.14)
- Europe > Netherlands (0.04)
- Europe > Germany > Baden-Württemberg > Karlsruhe Region > Karlsruhe (0.04)
- (4 more...)
Technology:
Country:
- Asia > China > Shanghai > Shanghai (0.04)
- Asia > China > Beijing > Beijing (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- (4 more...)
Technology:
Country:
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- Asia > Middle East > Jordan (0.04)
Industry:
- Health & Medicine > Surgery (1.00)
- Health & Medicine > Health Care Technology (1.00)
Technology:
- Information Technology > Sensing and Signal Processing > Image Processing (1.00)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)
Country:
- North America > United States > California > Los Angeles County > Long Beach (0.14)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > Canada > Quebec > Montreal (0.04)
- (18 more...)
Technology:
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Data Science (0.92)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)