AITopics | Tian, Guangjian

Collaborating Authors

Tian, Guangjian

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

SpatialCoT: Advancing Spatial Reasoning through Coordinate Alignment and Chain-of-Thought for Embodied Task Planning

Liu, Yuecheng, Chi, Dafeng, Wu, Shiguang, Zhang, Zhanguang, Hu, Yaochen, Zhang, Lingfeng, Zhang, Yingxue, Wu, Shuang, Cao, Tongtong, Huang, Guowei, Huang, Helong, Tian, Guangjian, Qiu, Weichao, Quan, Xingyue, Hao, Jianye, Zhuang, Yuzheng

arXiv.org Artificial IntelligenceJan-22-2025

Spatial reasoning is an essential problem in embodied AI research. Efforts to enhance spatial reasoning abilities through supplementary spatial data and fine-tuning have proven limited and ineffective when addressing complex embodied tasks, largely due to their dependence on language-based outputs. While some approaches have introduced a point-based action space to mitigate this issue, they fall short in managing more intricate tasks within complex environments. This deficiency arises from their failure to fully exploit the inherent thinking and reasoning capabilities that are fundamental strengths of Vision-Language Models (VLMs). To address these limitations, we propose a novel approach named SpatialCoT, specifically designed to bolster the spatial reasoning capabilities of VLMs. Our approach comprises two stages: spatial coordinate bi-directional alignment, which aligns vision-language inputs with spatial coordinates, and chain-of-thought spatial grounding, which harnesses the reasoning capabilities of language models for advanced spatial reasoning. We evaluate SpatialCoT on challenging navigation and manipulation tasks, both in simulation and real-world settings. Experimental results demonstrate that our method significantly outperforms previous state-of-the-art approaches in both tasks.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2501.10074

Genre:

Research Report > Promising Solution (0.68)
Overview > Innovation (0.54)
Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

ROS-LLM: A ROS framework for embodied AI with task feedback and structured reasoning

Mower, Christopher E., Wan, Yuhui, Yu, Hongzhan, Grosnit, Antoine, Gonzalez-Billandon, Jonas, Zimmer, Matthieu, Wang, Jinlong, Zhang, Xinyu, Zhao, Yao, Zhai, Anbang, Liu, Puze, Palenicek, Daniel, Tateo, Davide, Cadena, Cesar, Hutter, Marco, Peters, Jan, Tian, Guangjian, Zhuang, Yuzheng, Shao, Kun, Quan, Xingyue, Hao, Jianye, Wang, Jun, Bou-Ammar, Haitham

arXiv.org Artificial IntelligenceJul-12-2024

We present a framework for intuitive robot programming by non-experts, leveraging natural language prompts and contextual information from the Robot Operating System (ROS). Our system integrates large language models (LLMs), enabling non-experts to articulate task requirements to the system through a chat interface. Key features of the framework include: integration of ROS with an AI agent connected to a plethora of open-source and commercial LLMs, automatic extraction of a behavior from the LLM output and execution of ROS actions/services, support for three behavior modes (sequence, behavior tree, state machine), imitation learning for adding new robot actions to the library of possible actions, and LLM reflection via human and environment feedback. Extensive experiments validate the framework, showcasing robustness, scalability, and versatility in diverse scenarios, including long-horizon tasks, tabletop rearrangements, and remote supervisory control. To facilitate the adoption of our framework and support the reproduction of our results, we have made our code open-source. You can access it at: https://github.com/huawei-noah/HEBO/tree/master/ROSLLM.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2406.19741

Country:

Asia > China (0.68)
Europe > Switzerland > Zürich > Zürich (0.14)
Asia > Japan > Honshū (0.14)

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Fast Gumbel-Max Sketch and its Applications

Zhang, Yuanming, Wang, Pinghui, Qi, Yiyan, Cheng, Kuankuan, Zhao, Junzhou, Tian, Guangjian, Guan, Xiaohong

arXiv.org Artificial IntelligenceFeb-10-2023

The well-known Gumbel-Max Trick for sampling elements from a categorical distribution (or more generally a non-negative vector) and its variants have been widely used in areas such as machine learning and information retrieval. To sample a random element $i$ in proportion to its positive weight $v_i$, the Gumbel-Max Trick first computes a Gumbel random variable $g_i$ for each positive weight element $i$, and then samples the element $i$ with the largest value of $g_i+\ln v_i$. Recently, applications including similarity estimation and weighted cardinality estimation require to generate $k$ independent Gumbel-Max variables from high dimensional vectors. However, it is computationally expensive for a large $k$ (e.g., hundreds or even thousands) when using the traditional Gumbel-Max Trick. To solve this problem, we propose a novel algorithm, FastGM, which reduces the time complexity from $O(kn^+)$ to $O(k \ln k + n^+)$, where $n^+$ is the number of positive elements in the vector of interest. FastGM stops the procedure of Gumbel random variables computing for many elements, especially for those with small weights. We perform experiments on a variety of real-world datasets and the experimental results demonstrate that FastGM is orders of magnitude faster than state-of-the-art methods without sacrificing accuracy or incurring additional expenses.

data mining, information retrieval, machine learning, (22 more...)

arXiv.org Artificial Intelligence

2302.05176

Country: Asia > China (0.95)

Genre:

Research Report > New Finding (0.48)
Research Report > Promising Solution (0.48)

Industry: Information Technology (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

Synergetic Learning of Heterogeneous Temporal Sequences for Multi-Horizon Probabilistic Forecasting

Li, Longyuan, Zhang, Jihai, Yan, Junchi, Jin, Yaohui, Zhang, Yunhao, Duan, Yanjie, Tian, Guangjian

arXiv.org Artificial IntelligenceJan-31-2021

Time-series is ubiquitous across applications, such as transportation, finance and healthcare. Time-series is often influenced by external factors, especially in the form of asynchronous events, making forecasting difficult. However, existing models are mainly designated for either synchronous time-series or asynchronous event sequence, and can hardly provide a synthetic way to capture the relation between them. We propose Variational Synergetic Multi-Horizon Network (VSMHN), a novel deep conditional generative model. To learn complex correlations across heterogeneous sequences, a tailored encoder is devised to combine the advances in deep point processes models and variational recurrent neural networks. In addition, an aligned time coding and an auxiliary transition scheme are carefully devised for batched training on unaligned sequences. Our model can be trained effectively using stochastic variational inference and generates probabilistic predictions with Monte-Carlo simulation. Furthermore, our model produces accurate, sharp and more realistic probabilistic forecasts. We also show that modeling asynchronous event sequences is crucial for multi-horizon time-series forecasting.

deep learning, neural network, sequence, (15 more...)

arXiv.org Artificial Intelligence

2102.00431

Country: Asia > China (0.14)

Genre: Research Report (0.82)

Industry: Health & Medicine (0.34)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.88)

Add feedback

Do RNN and LSTM have Long Memory?

Zhao, Jingyu, Huang, Feiqing, Lv, Jia, Duan, Yanjie, Qin, Zhen, Li, Guodong, Tian, Guangjian

arXiv.org Machine LearningJun-10-2020

The LSTM network was proposed to overcome the difficulty in learning long-term dependence, and has made significant advancements in applications. With its success and drawbacks in mind, this paper raises the question - do RNN and LSTM have long memory? We answer it partially by proving that RNN and LSTM do not have long memory from a statistical perspective. A new definition for long memory networks is further introduced, and it requires the model weights to decay at a polynomial rate. To verify our theory, we convert RNN and LSTM into long memory networks by making a minimal modification, and their superiority is illustrated in modeling long-term dependence of various datasets.

banking & finance, deep learning, neural network, (17 more...)

arXiv.org Machine Learning

2006.0386

Country:

North America > Canada > Nova Scotia (0.14)
Europe > Austria > Vienna (0.14)

Genre: Research Report (1.00)

Industry: Banking & Finance > Trading (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Compact Autoregressive Network

Wang, Di, Huang, Feiqing, Zhao, Jingyu, Li, Guodong, Tian, Guangjian

arXiv.org Machine LearningSep-6-2019

Recurrent neural networks (RNN) and their variants, such as Long-Short Term Memory (Hochreiter and Schmidhuber, 1997) and Gated Recurrent Unit (Cho et al., 2014), are commonly used as the default architecture or even the synonym of sequence modeling by deep learning practitioners (Goodfellow et al., 2016). In the meanwhile, especially for high-dimensional time series, we may also consider the autoregressive modeling or multi-task learning, null y t f (y t 1, y t 2,..., y t P), (1) where the output null y t and each input y t i are N -dimensional, and the lag P can be very large for accomodating sequential dependence. Some non-recurrent feed-forward networks with convolutional or other certain architectures have been proposed recently for sequence modeling, and are shown to have state-of-the-art accuracy. For example, some autoregressive networks, such as PixelCNN (Van den Oord et al., 2016b) and WaveNet (Van den Oord et al., 2016a) for image and audio sequence modeling, are compelling alternatives to the recurrent networks. This paper aims at the autoregressive model (1) with a large number of sequences.

autoregressive network, deep learning, neural network, (21 more...)

arXiv.org Machine Learning

1909.0383

Country: Asia > China (0.14)

Genre: Research Report (0.82)

Industry: Banking & Finance > Trading (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback