South America
Unlocking the Potential of Global Human Expertise
Meyerson, Elliot, Francon, Olivier, Sargent, Darren, Hodjat, Babak, Miikkulainen, Risto
Solving societal problems on a global scale requires the collection and processing of ideas and methods from diverse sets of international experts. As the number and diversity of human experts increase, so does the likelihood that elements in this collective knowledge can be combined and refined to discover novel and better solutions. However, it is difficult to identify, combine, and refine complementary information in an increasingly large and diverse knowledge base. This paper argues that artificial intelligence (AI) can play a crucial role in this process. An evolutionary AI framework, termed RHEA, fills this role by distilling knowledge from diverse models created by human experts into equivalent neural networks, which are then recombined and refined in a population-based search. The framework was implemented in a formal synthetic domain, demonstrating that it is transparent and systematic. It was then applied to the results of the XPRIZE Pandemic Response Challenge, in which over 100 teams of experts across 23 countries submitted models based on diverse methodologies to predict COVID-19 cases and suggest non-pharmaceutical intervention policies for 235 nations, states, and regions across the globe. Building upon this expert knowledge, by recombining and refining the 169 resulting policy suggestion models, RHEA discovered a broader and more effective set of policies than either AI or human experts alone, as evaluated based on real-world data. The results thus suggest that AI can play a crucial role in realizing the potential of human expertise in global problem-solving.
NIMBA: Towards Robust and Principled Processing of Point Clouds With SSMs
Köprücü, Nursena, Okpekpe, Destiny, Orvieto, Antonio
Transformers have become dominant in large-scale deep learning tasks across various domains, including text, 2D and 3D vision. However, the quadratic complexity of their attention mechanism limits their efficiency as the sequence length increases, particularly in high-resolution 3D data such as point clouds. Recently, state space models (SSMs) like Mamba have emerged as promising alternatives, offering linear complexity, scalability, and high performance in long-sequence tasks. The key challenge in the application of SSMs in this domain lies in reconciling the non-sequential structure of point clouds with the inherently directional (or bi-directional) order-dependent processing of recurrent models like Mamba. To achieve this, previous research proposed reorganizing point clouds along multiple directions or predetermined paths in 3D space, concatenating the results to produce a single 1D sequence capturing different views. In our work, we introduce a method to convert point clouds into 1D sequences that maintain 3D spatial structure with no need for data replication, allowing Mamba sequential processing to be applied effectively in an almost permutation-invariant manner. In contrast to other works, we found that our method does not require positional embeddings and allows for shorter sequence lengths while still achieving state-of-the-art results in ModelNet40 and ScanObjectNN datasets and surpassing Transformer-based models in both accuracy and efficiency.
TurtleBench: A Visual Programming Benchmark in Turtle Geometry
Rismanchian, Sina, Razeghi, Yasaman, Singh, Sameer, Doroudi, Shayan
Humans have the ability to reason about geometric patterns in images and scenes from a young age. However, developing large multimodal models (LMMs) capable of similar reasoning remains a challenge, highlighting the need for robust evaluation methods to assess these capabilities. We introduce TurtleBench, a benchmark designed to evaluate LMMs' capacity to interpret geometric patterns -- given visual examples, textual instructions, or both -- and generate precise code outputs. Inspired by turtle geometry, a notion used to teach children foundational coding and geometric concepts, TurtleBench features tasks with patterned shapes that have underlying algorithmic logic. Our evaluation reveals that leading LMMs struggle significantly with these tasks, with GPT-4o achieving only 19\% accuracy on the simplest tasks and few-shot prompting only marginally improves their performance ($<2\%$). TurtleBench highlights the gap between human and AI performance in intuitive and visual geometrical understanding, setting the stage for future research in this area. TurtleBench stands as one of the few benchmarks to evaluate the integration of visual understanding and code generation capabilities in LMMs, setting the stage for future research. Code and Dataset for this paper is provided here: https://github.com/sinaris76/TurtleBench
Space for Improvement: Navigating the Design Space for Federated Learning in Satellite Constellations
Kim, Grace, Powell, Luca, Svoboda, Filip, Lane, Nicholas
Space has emerged as an exciting new application area for machine learning, with several missions equipping deep learning capabilities on-board spacecraft. Pre-processing satellite data through on-board training is necessary to address the satellite downlink deficit, as not enough transmission opportunities are available to match the high rates of data generation. To scale this effort across entire constellations, collaborated training in orbit has been enabled through federated learning (FL). While current explorations of FL in this context have successfully adapted FL algorithms for scenario-specific constraints, these theoretical FL implementations face several limitations that prevent progress towards real-world deployment. To address this gap, we provide a holistic exploration of the FL in space domain on several fronts. 1) We develop a method for space-ification of existing FL algorithms, evaluated on 2) FLySTacK, our novel satellite constellation design and hardware aware testing platform where we perform rigorous algorithm evaluations. Finally we introduce 3) AutoFLSat, a generalized, hierarchical, autonomous FL algorithm for space that provides a 12.5% to 37.5% reduction in model training time than leading alternatives.
Self-Healing Machine Learning: A Framework for Autonomous Adaptation in Real-World Environments
Rauba, Paulius, Seedat, Nabeel, Kacprzyk, Krzysztof, van der Schaar, Mihaela
Real-world machine learning systems often encounter model performance degradation due to distributional shifts in the underlying data generating process (DGP). Existing approaches to addressing shifts, such as concept drift adaptation, are limited by their reason-agnostic nature. By choosing from a pre-defined set of actions, such methods implicitly assume that the causes of model degradation are irrelevant to what actions should be taken, limiting their ability to select appropriate adaptations. In this paper, we propose an alternative paradigm to overcome these limitations, called self-healing machine learning (SHML). Contrary to previous approaches, SHML autonomously diagnoses the reason for degradation and proposes diagnosis-based corrective actions. We formalize SHML as an optimization problem over a space of adaptation actions to minimize the expected risk under the shifted DGP. We introduce a theoretical framework for self-healing systems and build an agentic self-healing solution H-LLM which uses large language models to perform self-diagnosis by reasoning about the structure underlying the DGP, and self-adaptation by proposing and evaluating corrective actions. Empirically, we analyze different components of H-LLM to understand why and when it works, demonstrating the potential of self-healing ML.
Enhancing Brain Tumor Classification Using TrAdaBoost and Multi-Classifier Deep Learning Approaches
Mohammadi, Mahin, Jamshidi, Saman
Brain tumors pose a serious health threat due to their rapid growth and potential for metastasis. While medical imaging has advanced significantly, accurately identifying and characterizing these tumors remains a challenge. This study addresses this challenge by leveraging the innovative TrAdaBoost methodology to enhance the Brain Tumor Segmentation (BraTS2020) dataset, aiming to improve the efficiency and accuracy of brain tumor classification. Our approach combines state-of-the-art deep learning algorithms, including the Vision Transformer (ViT), Capsule Neural Network (CapsNet), and convolutional neural networks (CNNs) such as ResNet-152 and VGG16. By integrating these models within a multi-classifier framework, we harness the strengths of each approach to achieve more robust and reliable tumor classification. A novel decision template is employed to synergistically combine outputs from different algorithms, further enhancing classification accuracy. To augment the training process, we incorporate a secondary dataset, "Brain Tumor MRI Dataset," as a source domain, providing additional data for model training and improving generalization capabilities. Our findings demonstrate a high accuracy rate in classifying tumor versus non-tumor images, signifying the effectiveness of our approach in the medical imaging domain. This study highlights the potential of advanced machine learning techniques to contribute significantly to the early and accurate diagnosis of brain tumors, ultimately improving patient outcomes.
ResiDual Transformer Alignment with Spectral Decomposition
Basile, Lorenzo, Maiorca, Valentino, Bortolussi, Luca, Rodolà, Emanuele, Locatello, Francesco
When examined through the lens of their residual streams, a puzzling property emerges in transformer networks: residual contributions (e.g., attention heads) sometimes specialize in specific tasks or input attributes. In this paper, we analyze this phenomenon in vision transformers, focusing on the spectral geometry of residuals, and explore its implications for modality alignment in vision-language models. First, we link it to the intrinsically low-dimensional structure of visual head representations, zooming into their principal components and showing that they encode specialized roles across a wide variety of input data distributions. Then, we analyze the effect of head specialization in multimodal models, focusing on how improved alignment between text and specialized heads impacts zero-shot classification performance. This specialization-performance link consistently holds across diverse pre-training data, network sizes, and objectives, demonstrating a powerful new mechanism for boosting zero-shot classification through targeted alignment. Ultimately, we translate these insights into actionable terms by introducing ResiDual, a technique for spectral alignment of the residual stream. Much like panning for gold, it lets the noise from irrelevant unit principal components (i.e., attributes) wash away to amplify task-relevant ones. Remarkably, this dual perspective on modality alignment yields fine-tuning level performances on different data distributions while modeling an extremely interpretable and parameter-efficient transformation, as we extensively show on more than 50 (pre-trained network, dataset) pairs.
SlowFast-VGen: Slow-Fast Learning for Action-Driven Long Video Generation
Hong, Yining, Liu, Beide, Wu, Maxine, Zhai, Yuanhao, Chang, Kai-Wei, Li, Linjie, Lin, Kevin, Lin, Chung-Ching, Wang, Jianfeng, Yang, Zhengyuan, Wu, Yingnian, Wang, Lijuan
Human beings are endowed with a complementary learning system, which bridges the slow learning of general world dynamics with fast storage of episodic memory from a new experience. Previous video generation models, however, primarily focus on slow learning by pre-training on vast amounts of data, overlooking the fast learning phase crucial for episodic memory storage. This oversight leads to inconsistencies across temporally distant frames when generating longer videos, as these frames fall beyond the model's context window. To this end, we introduce SlowFast-VGen, a novel dual-speed learning system for action-driven long video generation. Our approach incorporates a masked conditional video diffusion model for the slow learning of world dynamics, alongside an inference-time fast learning strategy based on a temporal LoRA module. Specifically, the fast learning process updates its temporal LoRA parameters based on local inputs and outputs, thereby efficiently storing episodic memory in its parameters. We further propose a slow-fast learning loop algorithm that seamlessly integrates the inner fast learning loop into the outer slow learning loop, enabling the recall of prior multi-episode experiences for context-aware skill learning. To facilitate the slow learning of an approximate world model, we collect a large-scale dataset of 200k videos with language action annotations, covering a wide range of scenarios. Extensive experiments show that SlowFast-VGen outperforms baselines across various metrics for action-driven video generation, achieving an FVD score of 514 compared to 782, and maintaining consistency in longer videos, with an average of 0.37 scene cuts versus 0.89. The slow-fast learning loop algorithm significantly enhances performances on long-horizon planning tasks as well. Project Website: https://slowfast-vgen.github.io
Parameter-Efficient Fine-Tuning in Large Models: A Survey of Methodologies
Wang, Luping, Chen, Sheng, Jiang, Linnan, Pan, Shu, Cai, Runze, Yang, Sen, Yang, Fei
The large models, as predicted by scaling raw forecasts, have made groundbreaking progress in many fields, particularly in natural language generation tasks, where they have approached or even surpassed human levels. However, the unprecedented scale of their parameters brings significant computational and storage costs. These large models require substantial computational resources and GPU memory to operate. When adapting large models to specific downstream tasks, their massive parameter scale poses a significant challenge in fine-tuning on hardware platforms with limited computational power and GPU memory. To address this issue, Parameter-Efficient Fine-Tuning (PEFT) offers a practical solution by efficiently adjusting the parameters of large pre-trained models to suit various downstream tasks. Specifically, PEFT adjusts the parameters of pre-trained large models to adapt to specific tasks or domains, minimizing the introduction of additional parameters and the computational resources required. This review mainly introduces the preliminary knowledge of PEFT, the core ideas and principles of various PEFT algorithms, the applications of PEFT, and potential future research directions. By reading this review, we believe that interested parties can quickly grasp the PEFT methodology, thereby accelerating its development and innovation.
Using Deep Neural Networks to Quantify Parking Dwell Time
Ribas, Marcelo Eduardo Marques, Mendes, Heloisa Benedet, de Oliveira, Luiz Eduardo Soares, Zanlorensi, Luiz Antonio, de Almeida, Paulo Ricardo Lisboa
In smart cities, it is common practice to define a maximum length of stay for a given parking space to increase the space's rotativity and discourage the usage of individual transportation solutions. However, automatically determining individual car dwell times from images faces challenges, such as images collected from low-resolution cameras, lighting variations, and weather effects. In this work, we propose a method that combines two deep neural networks to compute the dwell time of each car in a parking lot. The proposed method first defines the parking space status between occupied and empty using a deep classification network. Then, it uses a Siamese network to check if the parked car is the same as the previous image. Using an experimental protocol that focuses on a cross-dataset scenario, we show that if a perfect classifier is used, the proposed system generates 75% of perfect dwell time predictions, where the predicted value matched exactly the time the car stayed parked. Nevertheless, our experiments show a drop in prediction quality when a real-world classifier is used to predict the parking space statuses, reaching 49% of perfect predictions, showing that the proposed Siamese network is promising but impacted by the quality of the classifier used at the beginning of the pipeline.