AITopics | Li, Yanyan

Collaborating Authors

Li, Yanyan

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

GSCE: A Prompt Framework with Enhanced Reasoning for Reliable LLM-driven Drone Control

Wang, Wenhao, Li, Yanyan, Jiao, Long, Yuan, Jiawei

arXiv.org Artificial IntelligenceFeb-17-2025

The integration of Large Language Models (LLMs) into robotic control, including drones, has the potential to revolutionize autonomous systems. Research studies have demonstrated that LLMs can be leveraged to support robotic operations. However, when facing tasks with complex reasoning, concerns and challenges are raised about the reliability of solutions produced by LLMs. In this paper, we propose a prompt framework with enhanced reasoning to enable reliable LLM-driven control for drones. Our framework consists of novel technical components designed using Guidelines, Skill APIs, Constraints, and Examples, namely GSCE. GSCE is featured by its reliable and constraint-compliant code generation. We performed thorough experiments using GSCE for the control of drones with a wide level of task complexities. Our experiment results demonstrate that GSCE can significantly improve task success rates and completeness compared to baseline approaches, highlighting its potential for reliable LLM-driven autonomous drone systems.

artificial intelligence, large language model, natural language, (17 more...)

arXiv.org Artificial Intelligence

2502.12531

Country: North America > United States (0.68)

Genre: Research Report > New Finding (1.00)

Industry:

Transportation (0.68)
Information Technology > Robotics & Automation (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Large Language Models show both individual and collective creativity comparable to humans

Sun, Luning, Yuan, Yuzhuo, Yao, Yuan, Li, Yanyan, Zhang, Hao, Xie, Xing, Wang, Xiting, Luo, Fang, Stillwell, David

arXiv.org Artificial IntelligenceDec-4-2024

Artificial intelligence has, so far, largely automated routine tasks, but what does it mean for the future of work if Large Language Models (LLMs) show creativity comparable to humans? To measure the creativity of LLMs holistically, the current study uses 13 creative tasks spanning three domains. We benchmark the LLMs against individual humans, and also take a novel approach by comparing them to the collective creativity of groups of humans. We find that the best LLMs (Claude and GPT-4) rank in the 52nd percentile against humans, and overall LLMs excel in divergent thinking and problem solving but lag in creative writing. When questioned 10 times, an LLM's collective creativity is equivalent to 8-10 humans. When more responses are requested, two additional responses of LLMs equal one extra human. Ultimately, LLMs, when optimally applied, may compete with a small group of humans in the future of work.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2412.03151

Country:

North America > United States (0.45)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study > Negative Result (0.45)

Industry: Banking & Finance (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.92)

Add feedback

Robust Gaussian Splatting SLAM by Leveraging Loop Closure

Zhu, Zunjie, Fang, Youxu, Li, Xin, Yan, Chengang, Xu, Feng, Yuen, Chau, Li, Yanyan

arXiv.org Artificial IntelligenceSep-30-2024

3D Gaussian Splatting algorithms excel in novel view rendering applications and have been adapted to extend the capabilities of traditional SLAM systems. However, current Gaussian Splatting SLAM methods, designed mainly for hand-held RGB or RGB-D sensors, struggle with tracking drifts when used with rotating RGB-D camera setups. In this paper, we propose a robust Gaussian Splatting SLAM architecture that utilizes inputs from rotating multiple RGB-D cameras to achieve accurate localization and photorealistic rendering performance. The carefully designed Gaussian Splatting Loop Closure module effectively addresses the issue of accumulated tracking and mapping errors found in conventional Gaussian Splatting SLAM systems. First, each Gaussian is associated with an anchor frame and categorized as historical or novel based on its timestamp. By rendering different types of Gaussians at the same viewpoint, the proposed loop detection strategy considers both co-visibility relationships and distinct rendering outcomes. Furthermore, a loop closure optimization approach is proposed to remove camera pose drift and maintain the high quality of 3D Gaussian models. The approach uses a lightweight pose graph optimization algorithm to correct pose drift and updates Gaussians based on the optimized poses. Additionally, a bundle adjustment scheme further refines camera poses using photometric and geometric constraints, ultimately enhancing the global consistency of scenarios. Quantitative and qualitative evaluations on both synthetic and real-world datasets demonstrate that our method outperforms state-of-the-art methods in camera pose estimation and novel view rendering tasks. The code will be open-sourced for the community.

artificial intelligence, gaussian, optimization problem, (16 more...)

arXiv.org Artificial Intelligence

2409.20111

Country:

Europe (0.46)
Asia > China (0.28)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)

Add feedback

Open-Structure: a Structural Benchmark Dataset for SLAM Algorithms

Li, Yanyan, Guo, Zhao, Yang, Ze, Sun, Yanbiao, Zhao, Liang, Tombari, Federico

arXiv.org Artificial IntelligenceOct-16-2023

This paper introduces a new benchmark dataset, Open-Structure, for evaluating visual odometry and SLAM methods, which directly equips point and line measurements, correspondences, structural associations, and co-visibility factor graphs instead of providing raw images. Based on the proposed benchmark dataset, these 2D or 3D data can be directly input to different stages of SLAM pipelines to avoid the impact of the data preprocessing modules in ablation experiments. First, we propose a dataset generator for real-world and simulated scenarios. In real-world scenes, it maintains the same observations and occlusions as actual feature extraction results. Those generated simulation sequences enhance the dataset's diversity by introducing various carefully designed trajectories and observations. Second, a SLAM baseline is proposed using our dataset to evaluate widely used modules in camera pose tracking, parametrization, and optimization modules. By evaluating these state-of-the-art algorithms across different scenarios, we discern each module's strengths and weaknesses within the camera tracking and optimization process. Our dataset and baseline are available at \url{https://github.com/yanyan-li/Open-Structure}.

artificial intelligence, dataset, sequence, (16 more...)

arXiv.org Artificial Intelligence

2310.10931

Country:

Asia > China (0.28)
Europe (0.28)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Vision (0.69)

Add feedback

Towards Table-to-Text Generation with Pretrained Language Model: A Table Structure Understanding and Text Deliberating Approach

Chen, Miao, Lu, Xinjiang, Xu, Tong, Li, Yanyan, Zhou, Jingbo, Dou, Dejing, Xiong, Hui

arXiv.org Artificial IntelligenceJan-5-2023

Although remarkable progress on the neural table-to-text methods has been made, the generalization issues hinder the applicability of these models due to the limited source tables. Large-scale pretrained language models sound like a promising solution to tackle such issues. However, how to effectively bridge the gap between the structured table and the text input by fully leveraging table information to fuel the pretrained model is still not well explored. Besides, another challenge of integrating the deliberation mechanism into the text-to-text pretrained model for solving the table-to-text task remains seldom studied. In this paper, to implement the table-to-text generation with pretrained language model, we propose a table structure understanding and text deliberating approach, namely TASD. Specifically, we devise a three-layered multi-head attention network to realize the table-structure-aware text generation model with the help of the pretrained language model. Furthermore, a multi-pass decoder framework is adopted to enhance the capability of polishing generated text for table descriptions. The empirical studies, as well as human evaluation, on two public datasets, validate that our approach can generate faithful and fluent descriptive texts for different types of tables.

artificial intelligence, natural language, text deliberating approach, (2 more...)

arXiv.org Artificial Intelligence

2301.02071

Genre: Research Report (0.69)

Technology: Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

E-Graph: Minimal Solution for Rigid Rotation with Extensibility Graphs

Li, Yanyan, Tombari, Federico

arXiv.org Artificial IntelligenceJul-20-2022

Minimal solutions for relative rotation and translation estimation tasks have been explored in different scenarios, typically relying on the so-called co-visibility graph. However, how to build direct rotation relationships between two frames without overlap is still an open topic, which, if solved, could greatly improve the accuracy of visual odometry. In this paper, a new minimal solution is proposed to solve relative rotation estimation between two images without overlapping areas by exploiting a new graph structure, which we call Extensibility Graph (E-Graph). Differently from a co-visibility graph, high-level landmarks, including vanishing directions and plane normals, are stored in our E-Graph, which are geometrically extensible. Based on E-Graph, the rotation estimation problem becomes simpler and more elegant, as it can deal with pure rotational motion and requires fewer assumptions, e.g. Manhattan/Atlanta World, planar/vertical motion. Finally, we embed our rotation estimation strategy into a complete camera tracking and mapping system which obtains 6-DoF camera poses and a dense 3D mesh model. Extensive experiments on public benchmarks demonstrate that the proposed method achieves state-of-the-art tracking performance.

artificial intelligence, e-graph, estimation, (18 more...)

arXiv.org Artificial Intelligence

2207.10008

Country: Europe > Switzerland (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Robots (0.49)
Information Technology > Artificial Intelligence > Vision (0.31)

Add feedback