Goto

Collaborating Authors

 foundation model


Towards Federated Foundation Models: Scalable Dataset Pipelines for Group-Structured Learning Zachary Charles

Neural Information Processing Systems

We introduce Dataset Grouper, a library to create large-scale group-structured (e.g., federated) datasets, enabling federated learning simulation at the scale of foundation models. This library facilitates the creation of group-structured versions of existing datasets based on user-specified partitions, and directly leads to a variety of useful heterogeneous datasets that can be plugged into existing software frameworks. Dataset Grouper offers three key advantages. First, it scales to settings where even a single group's dataset is too large to fit in memory. Second, it provides flexibility, both in choosing the base (non-partitioned) dataset and in defining partitions.


Segment Anything in 3D with NeRFs

Neural Information Processing Systems

We refer to the proposed solution as SA3D, for Segment Anything in 3D. It is only required to provide a manual segmentation prompt ( e.g., rough points) for the target object in a single view, which is used to generate its 2D mask in this view with SAM.


Bridging the Domain Gap: Self-Supervised 3D Scene Understanding with Foundation Models Zhimin Chen

Neural Information Processing Systems

Foundation models have achieved remarkable results in 2D and language tasks like image segmentation, object detection, and visual-language understanding. However, their potential to enrich 3D scene representation learning is largely untapped due to the existence of the domain gap. In this work, we propose an innovative methodology called Bridge3D to address this gap by pre-training 3D models using features, semantic masks, and captions sourced from foundation models. Specifically, our method employs semantic masks from foundation models to guide the masking and reconstruction process for the masked autoen-coder, enabling more focused attention on foreground representations.







Learning to see the physical world: an interview with Jiajun Wu

AIHub

What is your research area? My research topic, at a high level, hasn't changed much since my dissertation. It has always been the problem of physical scene understanding - building machines that see, reason about, and interact with the physical world. Besides learning algorithms, what are the levels of abstraction needed by Al systems in their representations, and where do they come from? I aim to answer these fundamental questions, drawing inspiration from nature, i.e., the physical world itself, and from human cognition.

  Country: Asia (0.05)
  Genre:
  Industry: Leisure & Entertainment (0.49)

Limitations

Neural Information Processing Systems

While our study identifies clear separations between model hypothesis classes, our best models still have not reached the consistency ceiling of the neural and behavioral benchmarks we have compared against. All models were simultaneously trained across all eight scenarios of the Physion Dynamics Training Set, constituting around 16,000 total training scenarios (2,000 scenes per scenario) [Bear et al., 2021], with a Each C-SWM [Kipf et al., 2020] model was trained on For each stimulus, we compute the proportion of "hit" responses by The Correlation to A verage Human Response is the Pearson's correlation between the model probability-hit vector and the human proportion-hit vector, across stimuli per scenario. OCP Accuracy of humans and models is the average accuracy, across stimuli per scenario. To give the final values of the two quantities, we then compute the weighted mean and s.e.m. of the above per Note that these values are therefore different for each condition, but always the same across all models. All neural predictivities are reported on heldout conditions and their timepoints.