Not enough data to create a plot.
Try a different view from the menu above.
Sun, Min
Solving the Best Subset Selection Problem via Suboptimal Algorithms
Singh, Vikram, Sun, Min
Best subset selection in linear regression is well known to be nonconvex and computationally challenging to solve, as the number of possible subsets grows rapidly with increasing dimensionality of the problem. As a result, finding the global optimal solution via an exact optimization method for a problem with dimensions of 1000s may take an impractical amount of CPU time. This suggests the importance of finding suboptimal procedures that can provide good approximate solutions using much less computational effort than exact methods. In this work, we introduce a new procedure and compare it with other popular suboptimal algorithms to solve the best subset selection problem. Extensive computational experiments using synthetic and real data have been performed. The results provide insights into the performance of these methods in different data settings. The new procedure is observed to be a competitive suboptimal algorithm for solving the best subset selection problem for high-dimensional data.
POp-GS: Next Best View in 3D-Gaussian Splatting with P-Optimality
Wilson, Joey, Almeida, Marcelino, Mahajan, Sachit, Labrie, Martin, Ghaffari, Maani, Ghasemalizadeh, Omid, Sun, Min, Kuo, Cheng-Hao, Sen, Arnab
In this paper, we present a novel algorithm for quantifying uncertainty and information gained within 3D Gaussian Splatting (3D-GS) through P-Optimality. While 3D-GS has proven to be a useful world model with high-quality rasterizations, it does not natively quantify uncertainty. Quantifying uncertainty in parameters of 3D-GS is necessary to understand the information gained from acquiring new images as in active perception, or identify redundant images which can be removed from memory due to resource constraints in online 3D-GS SLAM. We propose to quantify uncertainty and information gain in 3D-GS by reformulating the problem through the lens of optimal experimental design, which is a classical solution to measuring information gain. By restructuring information quantification of 3D-GS through optimal experimental design, we arrive at multiple solutions, of which T-Optimality and D-Optimality perform the best quantitatively and qualitatively as measured on two popular datasets. Additionally, we propose a block diagonal approximation of the 3D-GS uncertainty, which provides a measure of correlation for computing more accurate information gain, at the expense of a greater computation cost.
GAC-Net_Geometric and attention-based Network for Depth Completion
Zhu, Kuang, Gan, Xingli, Sun, Min
Depth completion is a key task in autonomous driving, aiming to complete sparse LiDAR depth measurements into high-quality dense depth maps through image guidance. However, existing methods usually treat depth maps as an additional channel of color images, or directly perform convolution on sparse data, failing to fully exploit the 3D geometric information in depth maps, especially with limited performance in complex boundaries and sparse areas. To address these issues, this paper proposes a depth completion network combining channel attention mechanism and 3D global feature perception (CGA-Net). The main innovations include: 1) Utilizing PointNet++ to extract global 3D geometric features from sparse depth maps, enhancing the scene perception ability of low-line LiDAR data; 2) Designing a channel-attention-based multimodal feature fusion module to efficiently integrate sparse depth, RGB images, and 3D geometric features; 3) Combining residual learning with CSPN++ to optimize the depth refinement stage, further improving the completion quality in edge areas and complex scenes. Experiments on the KITTI depth completion dataset show that CGA-Net can significantly improve the prediction accuracy of dense depth maps, achieving a new state-of-the-art (SOTA), and demonstrating strong robustness to sparse and complex scenes.
Modeling Uncertainty in 3D Gaussian Splatting through Continuous Semantic Splatting
Wilson, Joey, Almeida, Marcelino, Sun, Min, Mahajan, Sachit, Ghaffari, Maani, Ewen, Parker, Ghasemalizadeh, Omid, Kuo, Cheng-Hao, Sen, Arnie
In this paper, we present a novel algorithm for probabilistically updating and rasterizing semantic maps within 3D Gaussian Splatting (3D-GS). Although previous methods have introduced algorithms which learn to rasterize features in 3D-GS for enhanced scene understanding, 3D-GS can fail without warning which presents a challenge for safety-critical robotic applications. To address this gap, we propose a method which advances the literature of continuous semantic mapping from voxels to ellipsoids, combining the precise structure of 3D-GS with the ability to quantify uncertainty of probabilistic robotic maps. Given a set of images, our algorithm performs a probabilistic semantic update directly on the 3D ellipsoids to obtain an expectation and variance through the use of conjugate priors. We also propose a probabilistic rasterization which returns per-pixel segmentation predictions with quantifiable uncertainty. We compare our method with similar probabilistic voxel-based methods to verify our extension to 3D ellipsoids, and perform ablation studies on uncertainty quantification and temporal smoothing.
Configurable Embodied Data Generation for Class-Agnostic RGB-D Video Segmentation
Opipari, Anthony, Krishnan, Aravindhan K, Gayaka, Shreekant, Sun, Min, Kuo, Cheng-Hao, Sen, Arnie, Jenkins, Odest Chadwicke
This paper presents a method for generating large-scale datasets to improve class-agnostic video segmentation across robots with different form factors. Specifically, we consider the question of whether video segmentation models trained on generic segmentation data could be more effective for particular robot platforms if robot embodiment is factored into the data generation process. To answer this question, a pipeline is formulated for using 3D reconstructions (e.g. from HM3DSem) to generate segmented videos that are configurable based on a robot's embodiment (e.g. sensor type, sensor placement, and illumination source). A resulting massive RGB-D video panoptic segmentation dataset (MVPd) is introduced for extensive benchmarking with foundation and video segmentation models, as well as to support embodiment-focused research in video segmentation. Our experimental findings demonstrate that using MVPd for finetuning can lead to performance improvements when transferring foundation models to certain robot embodiments, such as specific camera placements. These experiments also show that using 3D modalities (depth images and camera pose) can lead to improvements in video segmentation accuracy and consistency. The project webpage is available at https://topipari.com/projects/MVPd
Enhancing Instructional Quality: Leveraging Computer-Assisted Textual Analysis to Generate In-Depth Insights from Educational Artifacts
Tian, Zewei, Sun, Min, Liu, Alex, Sarkar, Shawon, Liu, Jing
To meet the shifts in post-pandemic learning needs and the demand of artificial intelligence (AI) advancement on workforce development, the education system seeks new instructional and learning strategies that are personalized, effective, safe, and scalable [8]. Throughout the years, richer and more complex educational data have been generated by the advancement of instructional practices, providing vast potential for analyses but at the same time posing challenges to the approaches that process such data. Conventional quantitative methods are limited by the capacity of calculation and the efficiency of models, hence preventing efforts to improve teaching and learning outcomes. AI/ML approaches are able to effectively process the existing and forthcoming complex data with scalability and precision [5], presenting an unprecedented opportunity to promote the research and instructional practices in education. These characteristics of new data and methods provide timely and actionable insights into the dynamics of the instructional environment. Furthermore, in recent years, this trend has been accelerated by the rapid adoption of generative AI tools, such as ChatGPT and Bard, which synergizes the capabilities of both text analysis and generation. A new field of research has emerged, in which researchers integrate the cutting-edge AI/ML techniques with educational domain knowledge of curriculum, teaching, and learning and to explore crucial questions for instructional improvement.
ReCLIP: Refine Contrastive Language Image Pre-Training with Source Free Domain Adaptation
Hu, Xuefeng, Zhang, Ke, Xia, Lu, Chen, Albert, Luo, Jiajia, Sun, Yuyin, Wang, Ken, Qiao, Nan, Zeng, Xiao, Sun, Min, Kuo, Cheng-Hao, Nevatia, Ram
Large-scale Pre-Training Vision-Language Model such as CLIP has demonstrated outstanding performance in zero-shot classification, e.g. achieving 76.3% top-1 accuracy on ImageNet without seeing any example, which leads to potential benefits to many tasks that have no labeled data. However, while applying CLIP to a downstream target domain, the presence of visual and text domain gaps and cross-modality misalignment can greatly impact the model performance. To address such challenges, we propose ReCLIP, the first source-free domain adaptation method for vision-language models, which does not require any source data or target labeled data. ReCLIP first learns a projection space to mitigate the misaligned visual-text embeddings and learns pseudo labels, and then deploys cross-modality self-training with the pseudo labels, to update visual and text encoders, refine labels and reduce domain gaps and misalignments iteratively. With extensive experiments, we demonstrate ReCLIP reduces the average error rate of CLIP from 30.17% to 25.06% on 22 image classification benchmarks. Code available at https://github.com/michiganleon/ReCLIP_WACV.
From Voices to Validity: Leveraging Large Language Models (LLMs) for Textual Analysis of Policy Stakeholder Interviews
Liu, Alex, Sun, Min
Obtaining stakeholders' diverse experiences and opinions about current policy in a timely manner is crucial for policymakers to identify strengths and gaps in resource allocation, thereby supporting effective policy design and implementation. However, manually coding even moderately sized interview texts or open-ended survey responses from stakeholders can often be labor-intensive and time-consuming. This study explores the integration of Large Language Models (LLMs)--like GPT-4--with human expertise to enhance text analysis of stakeholder interviews regarding K-12 education policy within one U.S. state. Employing a mixed-methods approach, human experts developed a codebook and coding processes as informed by domain knowledge and unsupervised topic modeling results. They then designed prompts to guide GPT-4 analysis and iteratively evaluate different prompts' performances. This combined human-computer method enabled nuanced thematic and sentiment analysis. Results reveal that while GPT-4 thematic coding aligned with human coding by 77.89% at specific themes, expanding to broader themes increased congruence to 96.02%, surpassing traditional Natural Language Processing (NLP) methods by over 25%. Additionally, GPT-4 is more closely matched to expert sentiment analysis than lexicon-based methods. Findings from quantitative measures and qualitative reviews underscore the complementary roles of human domain expertise and automated analysis as LLMs offer new perspectives and coding consistency. The human-computer interactive approach enhances efficiency, validity, and interpretability of educational policy research.
Tabletop Transparent Scene Reconstruction via Epipolar-Guided Optical Flow with Monocular Depth Completion Prior
Chen, Xiaotong, Zhou, Zheming, Deng, Zhuo, Ghasemalizadeh, Omid, Sun, Min, Kuo, Cheng-Hao, Sen, Arnie
Reconstructing transparent objects using affordable RGB-D cameras is a persistent challenge in robotic perception due to inconsistent appearances across views in the RGB domain and inaccurate depth readings in each single-view. We introduce a two-stage pipeline for reconstructing transparent objects tailored for mobile platforms. In the first stage, off-the-shelf monocular object segmentation and depth completion networks are leveraged to predict the depth of transparent objects, furnishing single-view shape prior. Subsequently, we propose Epipolar-guided Optical Flow (EOF) to fuse several single-view shape priors from the first stage to a cross-view consistent 3D reconstruction given camera poses estimated from opaque part of the scene. Our key innovation lies in EOF which employs boundary-sensitive sampling and epipolar-line constraints into optical flow to accurately establish 2D correspondences across multiple views on transparent objects. Quantitative evaluations demonstrate that our pipeline significantly outperforms baseline methods in 3D reconstruction quality, paving the way for more adept robotic perception and interaction with transparent objects.
The Study of Highway for Lifelong Multi-Agent Path Finding
Li, Ming-Feng, Sun, Min
In modern fulfillment warehouses, agents traverse the map to complete endless tasks that arrive on the fly, which is formulated as a lifelong Multi-Agent Path Finding (lifelong MAPF) problem. The goal of tackling this challenging problem is to find the path for each agent in a finite runtime while maximizing the throughput. However, existing methods encounter exponential growth of runtime and undesirable phenomena of deadlocks and rerouting as the map size or agent density grows. To address these challenges in lifelong MAPF, we explore the idea of highways mainly studied for one-shot MAPF (i.e., finding paths at once beforehand), which reduces the complexity of the problem by encouraging agents to move in the same direction. We utilize two methods to incorporate the highway idea into the lifelong MAPF framework and discuss the properties that minimize the existing problems of deadlocks and rerouting. The experimental results demonstrate that the runtime is considerably reduced and the decay of throughput is gradually insignificant as the map size enlarges under the settings of the highway. Furthermore, when the density of agents increases, the phenomena of deadlocks and rerouting are significantly reduced by leveraging the highway.