Goto

Collaborating Authors

 analysis result


Adaptive Contextual Perception: How To Generalize To New Backgrounds and Ambiguous Objects

Neural Information Processing Systems

Biological vision systems make adaptive use of context to recognize objects in new settings with novel contexts as well as occluded or blurry objects in familiar settings. In this paper, we investigate how vision models adaptively use context for out-of-distribution (OOD) generalization and leverage our analysis results to improve model OOD generalization. First, we formulate two distinct OOD settings where the contexts are either beneficial Object-Disambiguation or irrelevant Background-Invariance, reflecting the diverse contextual challenges faced in biological vision. We then analyze model performance in these two different OOD settings and demonstrate that models that excel in one setting tend to struggle in the other. Notably, prior works on learning causal features improve on one setting but hurt on the other.


Small Ensemble-based Data Assimilation: A Machine Learning-Enhanced Data Assimilation Method with Limited Ensemble Size

Li, Zhilin, Yao, Zhou, Li, Xianglong, Liu, Zeng, Lu, Zhaokuan, Xu, Shanlin, Kim, Seungnam, Wang, Guangyao

arXiv.org Artificial Intelligence

Ensemble-based data assimilation (DA) methods have become increasingly popular due to their inherent ability to address nonlinear dynamic problems. However, these methods often face a trade-off between analysis accuracy and computational efficiency, as larger ensemble sizes required for higher accuracy also lead to greater computational cost. In this study, we propose a novel machine learning-based data assimilation approach that combines the traditional ensemble Kalman filter (EnKF) with a fully connected neural network (FCNN). Specifically, our method uses a relatively small ensemble size to generate preliminary yet suboptimal analysis states via EnKF. A FCNN is then employed to learn and predict correction terms for these states, thereby mitigating the performance degradation induced by the limited ensemble size. We evaluate the performance of our proposed EnKF-FCNN method through numerical experiments involving Lorenz systems and nonlinear ocean wave field simulations. The results consistently demonstrate that the new method achieves higher accuracy than traditional EnKF with the same ensemble size, while incurring negligible additional computational cost. Moreover, the EnKF-FCNN method is adaptable to diverse applications through coupling with different models and the use of alternative ensemble-based DA methods.


Topology Optimization of Leg Structures for Construction Robots Based on Variable Density Method

Liu, Xiao, Yang, Xianlong, Wang, Weijun, Feng, Wei

arXiv.org Artificial Intelligence

In complex terrain construction environments, there are high demands for robots to achieve both high payload capacity and mobility flexibility. As the key load-bearing component, the optimization of robotic leg structures is of particular importance. Therefore, this study focuses on the optimization of leg structures for construction robots, proposing a topology optimization strategy based on the SIMP (Solid Isotropic Microstructures with Penalization) variable density method along with a structural re-design approach. The design performance is comprehensively validated through finite element analysis using ANSYS. First, static and modal analyses are conducted to evaluate the rationality of the initial design. Then, topology optimization using the SIMP-based variable density method is applied to the femur section, which accounts for the largest proportion of the leg's weight. Based on iterative calculations, the femur undergoes secondary structural reconstruction. After optimization, the mass of the femur is reduced by 19.45\%, and the overall leg mass decreases by 7.92\%, achieving the goal of lightweight design. Finally, static and modal analyses are conducted on the reconstructed leg. The results demonstrate that the optimized leg still meets structural performance requirements, validating the feasibility of lightweight design. This research provides robust theoretical and technical support for lightweight construction robot design and lays a foundation for their efficient operation in complex construction environments.


A Large Vision-Language Model based Environment Perception System for Visually Impaired People

Chen, Zezhou, Liu, Zhaoxiang, Wang, Kai, Wang, Kohou, Lian, Shiguo

arXiv.org Artificial Intelligence

It is a challenging task for visually impaired people to perceive their surrounding environment due to the complexity of the natural scenes. Their personal and social activities are thus highly limited. This paper introduces a Large Vision-Language Model(LVLM) based environment perception system which helps them to better understand the surrounding environment, by capturing the current scene they face with a wearable device, and then letting them retrieve the analysis results through the device. The visually impaired people could acquire a global description of the scene by long pressing the screen to activate the LVLM output, retrieve the categories of the objects in the scene resulting from a segmentation model by tapping or swiping the screen, and get a detailed description of the objects they are interested in by double-tapping the screen. To help visually impaired people more accurately perceive the world, this paper proposes incorporating the segmentation result of the RGB image as external knowledge into the input of LVLM to reduce the LVLM's hallucination. Technical experiments on POPE, MME and LLaVA-QA90 show that the system could provide a more accurate description of the scene compared to Qwen-VL-Chat, exploratory experiments show that the system helps visually impaired people to perceive the surrounding environment effectively.


TablePilot: Recommending Human-Preferred Tabular Data Analysis with Large Language Models

Yi, Deyin, Liu, Yihao, Cao, Lang, Zhou, Mengyu, Dong, Haoyu, Han, Shi, Zhang, Dongmei

arXiv.org Artificial Intelligence

Tabular data analysis is crucial in many scenarios, yet efficiently identifying the most relevant data analysis queries and results for a new table remains a significant challenge. The complexity of tabular data, diverse analytical operations, and the demand for high-quality analysis make the process tedious. To address these challenges, we aim to recommend query-code-result triplets tailored for new tables in tabular data analysis workflows. In this paper, we present TablePilot, a pioneering tabular data analysis framework leveraging large language models to autonomously generate comprehensive and superior analytical results without relying on user profiles or prior interactions. The framework incorporates key designs in analysis preparation and analysis optimization to enhance accuracy. Additionally, we propose Rec-Align, a novel method to further improve recommendation quality and better align with human preferences. Experiments on DART, a dataset specifically designed for comprehensive tabular data analysis recommendation, demonstrate the effectiveness of our framework. Based on GPT-4o, the tuned TablePilot achieves 77.0% top-5 recommendation recall. Human evaluations further highlight its effectiveness in optimizing tabular data analysis workflows.


Adaptive Contextual Perception: How To Generalize To New Backgrounds and Ambiguous Objects

Neural Information Processing Systems

Biological vision systems make adaptive use of context to recognize objects in new settings with novel contexts as well as occluded or blurry objects in familiar settings. In this paper, we investigate how vision models adaptively use context for out-of-distribution (OOD) generalization and leverage our analysis results to improve model OOD generalization. First, we formulate two distinct OOD settings where the contexts are either beneficial Object-Disambiguation or irrelevant Background-Invariance, reflecting the diverse contextual challenges faced in biological vision. We then analyze model performance in these two different OOD settings and demonstrate that models that excel in one setting tend to struggle in the other. Notably, prior works on learning causal features improve on one setting but hurt on the other.


FFAA: Multimodal Large Language Model based Explainable Open-World Face Forgery Analysis Assistant

Huang, Zhengchao, Xia, Bin, Lin, Zicheng, Mou, Zhun, Yang, Wenming

arXiv.org Artificial Intelligence

The rapid advancement of deepfake technologies has sparked widespread public concern, particularly as face forgery poses a serious threat to public information security. However, the unknown and diverse forgery techniques, varied facial features and complex environmental factors pose significant challenges for face forgery analysis. Existing datasets lack descriptions of these aspects, making it difficult for models to distinguish between real and forged faces using only visual information amid various confounding factors. In addition, existing methods do not yield user-friendly and explainable results, complicating the understanding of the model's decision-making process. To address these challenges, we introduce a novel Open-World Face Forgery Analysis VQA (OW-FFA-VQA) task and the corresponding benchmark. To tackle this task, we first establish a dataset featuring a diverse collection of real and forged face images with essential descriptions and reliable forgery reasoning. Base on this dataset, we introduce FFAA: Face Forgery Analysis Assistant, consisting of a fine-tuned Multimodal Large Language Model (MLLM) and Multi-answer Intelligent Decision System (MIDS). By integrating hypothetical prompts with MIDS, the impact of fuzzy classification boundaries is effectively mitigated, enhancing the model's robustness. Extensive experiments demonstrate that our method not only provides user-friendly explainable results but also significantly boosts accuracy and robustness compared to previous methods.


TimeCSL: Unsupervised Contrastive Learning of General Shapelets for Explorable Time Series Analysis

Liang, Zhiyu, Liang, Chen, Liang, Zheng, Wang, Hongzhi, Zheng, Bo

arXiv.org Artificial Intelligence

Unsupervised (a.k.a. Self-supervised) representation learning (URL) has emerged as a new paradigm for time series analysis, because it has the ability to learn generalizable time series representation beneficial for many downstream tasks without using labels that are usually difficult to obtain. Considering that existing approaches have limitations in the design of the representation encoder and the learning objective, we have proposed Contrastive Shapelet Learning (CSL), the first URL method that learns the general-purpose shapelet-based representation through unsupervised contrastive learning, and shown its superior performance in several analysis tasks, such as time series classification, clustering, and anomaly detection. In this paper, we develop TimeCSL, an end-to-end system that makes full use of the general and interpretable shapelets learned by CSL to achieve explorable time series analysis in a unified pipeline. We introduce the system components and demonstrate how users interact with TimeCSL to solve different analysis tasks in the unified pipeline, and gain insight into their time series by exploring the learned shapelets and representation.


Evaluation of General Large Language Models in Contextually Assessing Semantic Concepts Extracted from Adult Critical Care Electronic Health Record Notes

Liu, Darren, Ding, Cheng, Bold, Delgersuren, Bouvier, Monique, Lu, Jiaying, Shickel, Benjamin, Jabaley, Craig S., Zhang, Wenhui, Park, Soojin, Young, Michael J., Wainwright, Mark S., Clermont, Gilles, Rashidi, Parisa, Rosenthal, Eric S., Dimisko, Laurie, Xiao, Ran, Yoon, Joo Heung, Yang, Carl, Hu, Xiao

arXiv.org Artificial Intelligence

The field of healthcare has increasingly turned its focus towards Large Language Models (LLMs) due to their remarkable performance. However, their performance in actual clinical applications has been underexplored. Traditional evaluations based on question-answering tasks don't fully capture the nuanced contexts. This gap highlights the need for more in-depth and practical assessments of LLMs in real-world healthcare settings. Objective: We sought to evaluate the performance of LLMs in the complex clinical context of adult critical care medicine using systematic and comprehensible analytic methods, including clinician annotation and adjudication. Methods: We investigated the performance of three general LLMs in understanding and processing real-world clinical notes. Concepts from 150 clinical notes were identified by MetaMap and then labeled by 9 clinicians. Each LLM's proficiency was evaluated by identifying the temporality and negation of these concepts using different prompts for an in-depth analysis. Results: GPT-4 showed overall superior performance compared to other LLMs. In contrast, both GPT-3.5 and text-davinci-003 exhibit enhanced performance when the appropriate prompting strategies are employed. The GPT family models have demonstrated considerable efficiency, evidenced by their cost-effectiveness and time-saving capabilities. Conclusion: A comprehensive qualitative performance evaluation framework for LLMs is developed and operationalized. This framework goes beyond singular performance aspects. With expert annotations, this methodology not only validates LLMs' capabilities in processing complex medical data but also establishes a benchmark for future LLM evaluations across specialized domains.


Data Science–A Systematic Treatment

Communications of the ACM

There is a data-driven revolution under way in science and society, disrupting every form of enterprise. We are collecting and storing data more rapidly than ever before. The value of data as a central asset in an organization is now well established and generally accepted. The Economist called data "the world's most valuable resource."40 The World Economic Forum's briefing paper, A New Paradigm for Business of Data, states "At the heart of digital economy and society is the explosion of insight, intelligence and information--data."5 The field of data science is expected to enable data to be leveraged for making better decisions and achieving more meaningful outcomes. Although the term data science has some history, in its current incarnation as a modern field of study, it has already had significant economic impact. A 2015 Organisation for Economic Co-operation and Development (OECD) report identified "data-driven innovation" (DDI) as having a central driving role in 21st century economies, defining DDI as "the use of data and analytics to improve and foster new products, processes, organisational methods and markets." Data science deployments are still what might be called first generation, but their impact is already being felt in many areas: global sustainability,11 power and energy systems,25 biological and biomedical systems,38 health sciences and health informatics,12 finance and insurance,8 smart cities,33 digital humanities,28 and more. The last decade has established the terms "big data," "data analytics," and "data science" into our lexicon, both as buzzwords and as important fields of study. Interest in the topic, as evidenced by Google Trends (see Figure 1), has exploded over the same period. An increasing number of countries have released policy statements related to data science. In academia, data-science programs and research institutes have been established with significant speed, while many industrial organizations have created data-science units.