Do, Minh
Towards Scalable Foundation Model for Multi-modal and Hyperspectral Geospatial Data
Si, Haozhe, Wan, Yuxuan, Do, Minh, Vasisht, Deepak, Zhao, Han, Hamann, Hendrik F.
Geospatial raster data, such as that collected by satellite-based imaging systems at different times and spectral bands, hold immense potential for enabling a wide range of high-impact applications. This potential stems from the rich information that is spatially and temporally contextualized across multiple channels and sensing modalities. Recent work has adapted existing self-supervised learning approaches for such geospatial data. However, they fall short of scalable model architectures, leading to inflexibility and computational inefficiencies when faced with an increasing number of channels and modalities. To address these limitations, we introduce Low-rank Efficient Spatial-Spectral Vision Transformer with three key innovations: i) the LESS Attention Block that approximates high-dimensional spatial-spectral attention through Kronecker's product of the low-dimensional spatial and spectral attention components; ii) the Continuous Positional-Channel Embedding Layer that preserves both the continuity and physical characteristics of each spatial-spectral patch; and iii) the Perception Field Mask that exploits local spatial dependencies by constraining attention to neighboring patches. To evaluate the proposed innovations, we construct GFM-Bench, which serves as a comprehensive benchmark for such geospatial raster data. We pretrain LESS ViT using a Hyperspectral Masked Autoencoder framework with integrated positional and channel masking strategies. Experimental results demonstrate that our proposed method achieves competitive performance against state-of-the-art multi-modal geospatial foundation models while outperforming them on cross-satellite generalization tasks with higher computational efficiency. The flexibility and extensibility of our framework make it a promising direction for future geospatial data analysis tasks that involve a wide range of modalities and channels.
Transforming the Hybrid Cloud for Emerging AI Workloads
Chen, Deming, Youssef, Alaa, Pendse, Ruchi, Schleife, Andrรฉ, Clark, Bryan K., Hamann, Hendrik, He, Jingrui, Laino, Teodoro, Varshney, Lav, Wang, Yuxiong, Sil, Avirup, Jabbarvand, Reyhaneh, Xu, Tianyin, Kindratenko, Volodymyr, Costa, Carlos, Adve, Sarita, Mendis, Charith, Zhang, Minjia, Nรบรฑez-Corrales, Santiago, Ganti, Raghu, Srivatsa, Mudhakar, Kim, Nam Sung, Torrellas, Josep, Huang, Jian, Seelam, Seetharami, Nahrstedt, Klara, Abdelzaher, Tarek, Eilam, Tamar, Zhao, Huimin, Manica, Matteo, Iyer, Ravishankar, Hirzel, Martin, Adve, Vikram, Marinov, Darko, Franke, Hubertus, Tong, Hanghang, Ainsworth, Elizabeth, Zhao, Han, Vasisht, Deepak, Do, Minh, Oliveira, Fabio, Pacifici, Giovanni, Puri, Ruchir, Nagpurkar, Priya
This white paper, developed through close collaboration between IBM Research and UIUC researchers within the IIDAI Institute, envisions transforming hybrid cloud systems to meet the growing complexity of AI workloads through innovative, full-stack co-design approaches, emphasizing usability, manageability, affordability, adaptability, efficiency, and scalability. By integrating cutting-edge technologies such as generative and agentic AI, cross-layer automation and optimization, unified control plane, and composable and adaptive system architecture, the proposed framework addresses critical challenges in energy efficiency, performance, and cost-effectiveness. Incorporating quantum computing as it matures will enable quantum-accelerated simulations for materials science, climate modeling, and other high-impact domains. Collaborative efforts between academia and industry are central to this vision, driving advancements in foundation models for material design and climate solutions, scalable multimodal data processing, and enhanced physics-based AI emulators for applications like weather forecasting and carbon sequestration. Research priorities include advancing AI agentic systems, LLM as an Abstraction (LLMaaA), AI model optimization and unified abstractions across heterogeneous infrastructure, end-to-end edge-cloud transformation, efficient programming model, middleware and platform, secure infrastructure, application-adaptive cloud systems, and new quantum-classical collaborative workflows. These ideas and solutions encompass both theoretical and practical research questions, requiring coordinated input and support from the research community. This joint initiative aims to establish hybrid clouds as secure, efficient, and sustainable platforms, fostering breakthroughs in AI-driven applications and scientific discovery across academia, industry, and society.
A Survey of IMU Based Cross-Modal Transfer Learning in Human Activity Recognition
Kamboj, Abhi, Do, Minh
Despite living in a multi-sensory world, most AI models are limited to textual and visual understanding of human motion and behavior. Inertial measurement sensors provide a signal for AI to understand motion, however, in practice they has been understudied due to numerous difficulties and the uniterpretability of the data to humans. In fact, full situational awareness of human motion could best be understood through a combination of sensors. In this survey we investigate how knowledge can be transferred and utilized amongst modalities for Human Activity/Action Recognition (HAR), i.e. cross-modality transfer learning. We motivate the importance and potential of IMU data and its applicability in crossmodality learning as well as the importance of studying the HAR problem. We categorize HAR related tasks by time and abstractness and then compare various types of multimodal HAR datasets. We also distinguish and expound on many related but inconsistently used terms in the literature, such as transfer learning, domain adaptation, representation learning, sensor fusion, and multimodal learning, and describe how cross-modal learning fits with all these concepts. We then review the literature in IMU-based cross-modal transfer for HAR. The two main approaches for cross-modal transfer are instance-based transfer, where instances of one modality are mapped to another (e.g.
Planning for Compilation of a Quantum Algorithm for Graph Coloring
Do, Minh, Wang, Zhihui, O'Gorman, Bryan, Venturelli, Davide, Rieffel, Eleanor, Frank, Jeremy
The problem of compiling general quantum algorithms for implementation on near-term quantum processors has been introduced to the AI community. Previous work demonstrated that temporal planning is an attractive approach for part of this compilationtask, specifically, the routing of circuits that implement the Quantum Alternating Operator Ansatz (QAOA) applied to the MaxCut problem on a quantum processor architecture. In this paper, we extend the earlier work to route circuits that implement QAOA for Graph Coloring problems. QAOA for coloring requires execution of more, and more complex, operations on the chip, which makes routing a more challenging problem. We evaluate the approach on state-of-the-art hardware architectures from leading quantum computing companies. Additionally, we apply a planning approach to qubit initialization. Our empirical evaluation shows that temporal planning compares well to reasonable analytic upper bounds, and that solving qubit initialization with a classical planner generally helps temporal planners in finding shorter-makespan compilations for QAOA for Graph Coloring. These advances suggest that temporal planning can be an effective approach for more complex quantum computing algorithms and architectures.
Interpretable and Globally Optimal Prediction for Textual Grounding using Image Concepts
Yeh, Raymond, Xiong, Jinjun, Hwu, Wen-Mei, Do, Minh, Schwing, Alexander
Textual grounding is an important but challenging task for human-computer inter- action, robotics and knowledge mining. Existing algorithms generally formulate the task as selection from a set of bounding box proposals obtained from deep net based systems. In this work, we demonstrate that we can cast the problem of textual grounding into a unified framework that permits efficient search over all possible bounding boxes. Hence, the method is able to consider significantly more proposals and doesn't rely on a successful first stage hypothesizing bounding box proposals. Beyond, we demonstrate that the trained parameters of our model can be used as word-embeddings which capture spatial-image relationships and provide interpretability.
Comparing and Integrating Constraint Programming and Temporal Planning for Quantum Circuit Compilation
Booth, Kyle E. C. (University of Toronto) | Do, Minh (Stinger Ghaffarian Technologies Inc.) | Beck, J. Christopher (University of Toronto) | Rieffel, Eleanor (NASA Ames Research Center) | Venturelli, Davide (NASA Ames Research Center) | Frank, Jeremy (NASA Ames Research Center)
Recently, the makespan-minimization problem of compiling a general class of quantum algorithms into near-term quantum processors has been introduced to the AI community. The research demonstrated that temporal planning is a strong approach for a class of quantum circuit compilation (QCC) problems. In this paper, we explore the use of constraint programming (CP) as an alternative and complementary approach to temporal planning. We extend previous work by introducing two new problem variations that incorporate important characteristics identified by the quantum computing community. We apply temporal planning and CP to the baseline and extended QCC problems as both stand-alone and hybrid approaches. Our hybrid methods use solutions found by temporal planning to warm start CP, leveraging the ability of the former to find satisficing solutions to problems with a high degree of task optionality, an area that CP typically struggles with. The CP model, benefiting from inferred bounds on planning horizon length and task counts provided by the warm start, is then used to find higher quality solutions. Our empirical evaluation indicates that while stand-alone CP is only competitive for the smallest problems, CP in our hybridization with temporal planning out-performs stand-alone temporal planning in the majority of problem classes.
Interpretable and Globally Optimal Prediction for Textual Grounding using Image Concepts
Yeh, Raymond, Xiong, Jinjun, Hwu, Wen-Mei, Do, Minh, Schwing, Alexander
Textual grounding is an important but challenging task for human-computer inter- action, robotics and knowledge mining. Existing algorithms generally formulate the task as selection from a set of bounding box proposals obtained from deep net based systems. In this work, we demonstrate that we can cast the problem of textual grounding into a unified framework that permits efficient search over all possible bounding boxes. Hence, the method is able to consider significantly more proposals and doesnโt rely on a successful first stage hypothesizing bounding box proposals. Beyond, we demonstrate that the trained parameters of our model can be used as word-embeddings which capture spatial-image relationships and provide interpretability. Lastly, at the time of submission, our approach outperformed the current state-of-the-art methods on the Flickr 30k Entities and the ReferItGame dataset by 3.08% and 7.77% respectively.
Compiling quantum circuits to realistic hardware architectures using temporal planners
Venturelli, Davide, Do, Minh, Rieffel, Eleanor, Frank, Jeremy
To run quantum algorithms on emerging gate-model quantum hardware, quantum circuits must be compiled to take into account constraints on the hardware. For near-term hardware, with only limited means to mitigate decoherence, it is critical to minimize the duration of the circuit. We investigate the application of temporal planners to the problem of compiling quantum circuits to newly emerging quantum hardware. While our approach is general, we focus on compiling to superconducting hardware architectures with nearest neighbor constraints. Our initial experiments focus on compiling Quantum Alternating Operator Ansatz (QAOA) circuits whose high number of commuting gates allow great flexibility in the order in which the gates can be applied. That freedom makes it more challenging to find optimal compilations but also means there is a greater potential win from more optimized compilation than for less flexible circuits. We map this quantum circuit compilation problem to a temporal planning problem, and generated a test suite of compilation problems for QAOA circuits of various sizes to a realistic hardware architecture. We report compilation results from several state-of-the-art temporal planners on this test set. This early empirical evaluation demonstrates that temporal planning is a viable approach to quantum circuit compilation.
Scheduling Ocean Color Observations for a GEO-Stationary Satellite
Frank, Jeremy (NASA Ames Research Center) | Do, Minh (NASA Ames Research Center) | Tran, Tony (University of Toronto)
The GEO-Stationary Coastal and Air Pollution Events (GEO-CAPE) mission plans to put a visible spectrum imaging instrument on a satellite in geo-stationary orbit to perform ocean color remote sensing. Two different instrument designs, Filter Radiometer (FR) and COastal Ecosystems Dynamic Imager (COEDI), with different shape for the imaged area and image acquisition time, are being evaluated. Scheduling observations for either instrument requires optimizing science objectives in the presence of predicted cloud cover and available daylight. We model this scheduling problem as both Mixed Integer Linear Program (MILP) and Constraint Programming (CP) problems, and compare these two formulations for FR and COEDI using real cloudiness data collected at different times throughout the year. Our results show that MILP is the more suitable technique, and the schedule quality metric shows FR is the preferred design. We have reported our results to the GEO-CAPE mission team to assist them making an informed decision for the next step in formulating this mission.
Explorations of Quantum-Classical Approaches to Scheduling a Mars Lander Activity Problem
Tran, Tony T. (University of Toronto) | Wang, Zhihui (National Aeronautics and Space Administration) | Do, Minh (National Aeronautics and Space Administration) | Rieffel, Eleanor G. (National Aeronautics and Space Administration) | Frank, Jeremy (National Aeronautics and Space Administration) | O' (National Aeronautics and Space Administration) | Gorman, Bryan (National Aeronautics and Space Administration) | Venturelli, Davide (University of Toronto) | Beck, J. Christopher
An effective approach to solving problems involving mixed (continuous and discrete) variables and constraints, such as hybrid systems, is to decompose them into subproblems and integrate dedicated solvers geared toward those subproblems. Here, we introduce a new framework based on a tree search algorithm to solve hybrid discrete-continuous problems that incorporates: (1) a quantum annealer that samples from the configuration space for the discrete portion and provides information about the quality of the samples, and (2) a classical computer that makes use of information from the quantum annealer to prune and focus the search as well as check a continuous constraint. We consider four variants of our algorithm, each with progressively more guidance from the results provided by the quantum annealer. We empirically test our algorithm and compare the variants on a simplified Mars Lander task scheduling problem. Variants with more guidance from the quantum annealer have better performance.