AITopics | Fu, Rao

Collaborating Authors

Fu, Rao

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

ScratchEval: Are GPT-4o Smarter than My Child? Evaluating Large Multimodal Models with Visual Programming Challenges

Fu, Rao, Luo, Ziyang, Lin, Hongzhan, Ye, Zhen, Ma, Jing

arXiv.org Artificial IntelligenceNov-28-2024

Recent advancements in large multimodal models (LMMs) have showcased impressive code generation capabilities, primarily evaluated through image-to-code benchmarks. However, these benchmarks are limited to specific visual programming scenarios where the logic reasoning and the multimodal understanding capacities are split apart. To fill this gap, we propose ScratchEval, a novel benchmark designed to evaluate the visual programming reasoning ability of LMMs. ScratchEval is based on Scratch, a block-based visual programming language widely used in children's programming education. By integrating visual elements and embedded programming logic, ScratchEval requires the model to process both visual information and code structure, thereby comprehensively evaluating its programming intent understanding ability. Our evaluation approach goes beyond the traditional image-to-code mapping and focuses on unified logical thinking and problem-solving abilities, providing a more comprehensive and challenging framework for evaluating the visual programming ability of LMMs. ScratchEval not only fills the gap in existing evaluation methods, but also provides new insights for the future development of LMMs in the field of visual programming. Our benchmark can be accessed at https://github.com/HKBUNLP/ScratchEval .

large language model, machine learning, programming language, (23 more...)

arXiv.org Artificial Intelligence

2411.18932

Country:

Europe > Austria (0.29)
Asia > China (0.28)

Genre: Research Report (1.00)

Industry: Education (0.68)

Technology:

Information Technology > Visual Languages (1.00)
Information Technology > Software > Programming Languages (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Add feedback

Scene-LLM: Extending Language Model for 3D Visual Understanding and Reasoning

Fu, Rao, Liu, Jingyu, Chen, Xilun, Nie, Yixin, Xiong, Wenhan

arXiv.org Artificial IntelligenceMar-22-2024

This paper introduces Scene-LLM, a 3D-visual-language model that enhances embodied agents' abilities in interactive 3D indoor environments by integrating the reasoning strengths of Large Language Models (LLMs). Scene-LLM adopts a hybrid 3D visual feature representation, that incorporates dense spatial information and supports scene state updates. The model employs a projection layer to efficiently project these features in the pre-trained textual embedding space, enabling effective interpretation of 3D visual information. Unique to our approach is the integration of both scene-level and ego-centric 3D information. This combination is pivotal for interactive planning, where scene-level data supports global planning and ego-centric data is important for localization. Notably, we use ego-centric 3D frame features for feature alignment, an efficient technique that enhances the model's ability to align features of small objects within the scene. Our experiments with Scene-LLM demonstrate its strong capabilities in dense captioning, question answering, and interactive planning. We believe Scene-LLM advances the field of 3D visual understanding and reasoning, offering new possibilities for sophisticated agent interactions in indoor settings.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2403.11401

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Deep Generative Modeling for Financial Time Series with Application in VaR: A Comparative Review

Ericson, Lars, Zhu, Xuejun, Han, Xusi, Fu, Rao, Li, Shuang, Guo, Steve, Hu, Ping

arXiv.org Artificial IntelligenceJan-18-2024

In the financial services industry, forecasting the risk factor distribution conditional on the history and the current market environment is the key to market risk modeling in general and value at risk (VaR) model in particular. As one of the most widely adopted VaR models in commercial banks, Historical simulation (HS) uses the empirical distribution of daily returns in a historical window as the forecast distribution of risk factor returns in the next day. The objectives for financial time series generation are to generate synthetic data paths with good variety, and similar distribution and dynamics to the original historical data. In this paper, we apply multiple existing deep generative methods (e.g., CGAN, CWGAN, Diffusion, and Signature WGAN) for conditional time series generation, and propose and test two new methods for conditional multi-step time series generation, namely Encoder-Decoder CGAN and Conditional TimeVAE. Furthermore, we introduce a comprehensive framework with a set of KPIs to measure the quality of the generated time series for financial modeling. The KPIs cover distribution distance, autocorrelation and backtesting. All models (HS, parametric and neural networks) are tested on both historical USD yield curve data and additional data simulated from GARCH and CIR processes. The study shows that top performing models are HS, GARCH and CWGAN models. Future research directions in this area are also discussed.

data mining, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2401.1037

Country: North America > United States (0.45)

Genre:

Overview (1.00)
Research Report > New Finding (0.68)

Industry:

Banking & Finance > Trading (1.00)
Banking & Finance > Financial Services (0.88)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(3 more...)

Add feedback

AnyHome: Open-Vocabulary Generation of Structured and Textured 3D Homes

Wen, Zehao, Liu, Zichen, Sridhar, Srinath, Fu, Rao

arXiv.org Artificial IntelligenceDec-11-2023

We introduce AnyHome, a framework that translates open-vocabulary descriptions, ranging from simple labels to elaborate paragraphs, into well-structured and textured 3D indoor scenes at a house-scale. Inspired by cognition theories, AnyHome employs an amodal structured representation to capture 3D spatial cues from textual narratives and then uses egocentric inpainting to enrich these scenes. To this end, we begin by using specially designed template prompts for Large Language Models (LLMs), which enable precise control over the textual input. We then utilize intermediate representations to maintain the spatial structure's consistency, ensuring that the 3D scenes align closely with the textual description. Then, we apply a Score Distillation Sampling process to refine the placement of objects. Lastly, an egocentric inpainting process is incorporated to enhance the realism and appearance of the scenes. AnyHome stands out due to its hierarchical structured representation combined with the versatility of open-vocabulary text interpretation. This allows for extensive customization of indoor scenes at various levels of granularity. We demonstrate that AnyHome can reliably generate a range of diverse indoor scenes, characterized by their detailed spatial structures and textures, all corresponding to the free-form textual inputs.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2312.06644

Country:

Asia (0.28)
Oceania > Australia (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Optimal Virtual Tube Planning and Control for Swarm Robotics

Mao, Pengda, Fu, Rao, Quan, Quan

arXiv.org Artificial IntelligenceOct-23-2023

This paper presents a novel method for efficiently solving a trajectory planning problem for swarm robotics in cluttered environments. Recent research has demonstrated high success rates in real-time local trajectory planning for swarm robotics in cluttered environments, but optimizing trajectories for each robot is still computationally expensive, with a computational complexity from $O\left(k\left(n_t,\varepsilon \right)n_t^2\right)$ to $ O\left(k\left(n_t,\varepsilon \right)n_t^3\right)$ where $n_t$ is the number of parameters in the parameterized trajectory, $\varepsilon$ is precision and $k\left(n_t,\varepsilon \right)$ is the number of iterations with respect to $n_t$ and $\varepsilon$. Furthermore, the swarm is difficult to move as a group. To address this issue, we define and then construct the optimal virtual tube, which includes infinite optimal trajectories. Under certain conditions, any optimal trajectory in the optimal virtual tube can be expressed as a convex combination of a finite number of optimal trajectories, with a computational complexity of $O\left(n_t\right)$. Afterward, a hierarchical approach including a planning method of the optimal virtual tube with minimizing energy and distributed model predictive control is proposed. In simulations and experiments, the proposed approach is validated and its effectiveness over other methods is demonstrated through comparison.

artificial intelligence, optimal virtual tube planning, virtual tube planning and control, (1 more...)

arXiv.org Artificial Intelligence

doi: 10.1177/02783649231210012

2304.11407

Genre: Research Report > New Finding (0.53)

Technology: Information Technology > Artificial Intelligence > Robots (1.00)

Add feedback

Simplifying Low-Light Image Enhancement Networks with Relative Loss Functions

Zhang, Yu, Di, Xiaoguang, Wu, Junde, Fu, Rao, Li, Yong, Wang, Yue, Xu, Yanwu, Yang, Guohui, Wang, Chunhui

arXiv.org Artificial IntelligenceAug-3-2023

Image enhancement is a common technique used to mitigate issues such as severe noise, low brightness, low contrast, and color deviation in low-light images. However, providing an optimal high-light image as a reference for low-light image enhancement tasks is impossible, which makes the learning process more difficult than other image processing tasks. As a result, although several low-light image enhancement methods have been proposed, most of them are either too complex or insufficient in addressing all the issues in low-light images. In this paper, to make the learning easier in low-light image enhancement, we introduce FLW-Net (Fast and LightWeight Network) and two relative loss functions. Specifically, we first recognize the challenges of the need for a large receptive field to obtain global contrast and the lack of an absolute reference, which limits the simplification of network structures in this task. Then, we propose an efficient global feature information extraction component and two loss functions based on relative information to overcome these challenges. Finally, we conducted comparative experiments to demonstrate the effectiveness of the proposed method, and the results confirm that the proposed method can significantly reduce the complexity of supervised low-light image enhancement networks while improving processing effect. The code is available at \url{https://github.com/hitzhangyu/FLW-Net}.

artificial intelligence, loss function, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2304.02978

Country: Asia > China > Guangdong Province (0.14)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

CLIP-Sculptor: Zero-Shot Generation of High-Fidelity and Diverse Shapes from Natural Language

Sanghi, Aditya, Fu, Rao, Liu, Vivian, Willis, Karl, Shayani, Hooman, Khasahmadi, Amir Hosein, Sridhar, Srinath, Ritchie, Daniel

arXiv.org Artificial IntelligenceMay-24-2023

Recent works have demonstrated that natural language can be used to generate and edit 3D shapes. However, these methods generate shapes with limited fidelity and diversity. We introduce CLIP-Sculptor, a method to address these constraints by producing high-fidelity and diverse 3D shapes without the need for (text, shape) pairs during training. CLIP-Sculptor achieves this in a multi-resolution approach that first generates in a low-dimensional latent space and then upscales to a higher resolution for improved shape fidelity. For improved shape diversity, we use a discrete latent space which is modeled using a transformer conditioned on CLIP's image-text embedding space. We also present a novel variant of classifier-free guidance, which improves the accuracy-diversity trade-off. Finally, we perform extensive experiments demonstrating that CLIP-Sculptor outperforms state-of-the-art baselines. The code is available at https://ivl.cs.brown.edu/#/projects/clip-sculptor.

clip-sculptor, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2211.01427

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.95)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.65)

Add feedback

ShapeCrafter: A Recursive Text-Conditioned 3D Shape Generation Model

Fu, Rao, Zhan, Xiao, Chen, Yiwen, Ritchie, Daniel, Sridhar, Srinath

arXiv.org Artificial IntelligenceApr-8-2023

We present ShapeCrafter, a neural network for recursive text-conditioned 3D shape generation. Existing methods to generate text-conditioned 3D shapes consume an entire text prompt to generate a 3D shape in a single step. However, humans tend to describe shapes recursively-we may start with an initial description and progressively add details based on intermediate results. To capture this recursive process, we introduce a method to generate a 3D shape distribution, conditioned on an initial phrase, that gradually evolves as more phrases are added. Since existing datasets are insufficient for training this approach, we present Text2Shape++, a large dataset of 369K shape-text pairs that supports recursive shape generation. To capture local details that are often used to refine shape descriptions, we build on top of vector-quantized deep implicit functions that generate a distribution of high-quality shapes. Results show that our method can generate shapes consistent with text descriptions, and shapes evolve gradually as more phrases are added. Our method supports shape editing, extrapolation, and can enable new applications in human-machine collaboration for creative design.

artificial intelligence, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2207.09446

Country: Europe (0.28)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Practical Distributed Control for VTOL UAVs to Pass a Virtual Tube

Quan, Quan, Fu, Rao, Li, Mengxin, Wei, Donghui, Gao, Yan, Cai, Kai-Yuan

arXiv.org Artificial IntelligenceJul-30-2021

Unmanned Aerial Vehicles (UAVs) are now becoming increasingly accessible to amateur and commercial users alike. An air traffic management (ATM) system is needed to help ensure that this newest entrant into the skies does not collide with others. In an ATM, airspace can be composed of airways, intersections and nodes. In this paper, for simplicity, distributed coordinating the motions of Vertical TakeOff and Landing (VTOL) UAVs to pass an airway is focused. This is formulated as a virtual tube passing problem, which includes passing a virtual tube, inter-agent collision avoidance and keeping within the virtual tube. Lyapunov-like functions are designed elaborately, and formal analysis based on invariant set theorem is made to show that all UAVs can pass the virtual tube without getting trapped, avoid collision and keep within the virtual tube. What is more, by the proposed distributed control, a VTOL UAV can keep away from another VTOL UAV or return back to the virtual tube as soon as possible, once it enters into the safety area of another or has a collision with the virtual tube during it is passing the virtual tube. Simulations and experiments are carried out to show the effectiveness of the proposed method and the comparison with other methods.

artificial intelligence, survey article, virtual tube, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/TIV.2021.3123110

2101.07578

Country:

Asia > China (0.29)
North America > United States (0.28)

Genre: Research Report (0.64)

Industry:

Transportation > Infrastructure & Services (1.00)
Transportation > Air (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.93)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (0.66)

Add feedback

Query-aware Tip Generation for Vertical Search

Yang, Yang, Hao, Junmei, Li, Canjia, Wang, Zili, Wang, Jingang, Zhang, Fuzheng, Fu, Rao, Hou, Peixu, Zhang, Gong, Wang, Zhongyuan

arXiv.org Artificial IntelligenceOct-19-2020

As a concise form of user reviews, tips have unique advantages to explain the search results, assist users' decision making, and further improve user experience in vertical search scenarios. Existing work on tip generation does not take query into consideration, which limits the impact of tips in search scenarios. To address this issue, this paper proposes a query-aware tip generation framework, integrating query information into encoding and subsequent decoding processes. Two specific adaptations of Transformer and Recurrent Neural Network (RNN) are proposed. For Transformer, the query impact is incorporated into the self-attention computation of both the encoder and the decoder. As for RNN, the query-aware encoder adopts a selective network to distill query-relevant information from the review, while the query-aware decoder integrates the query information into the attention computation during decoding. The framework consistently outperforms the competing methods on both public and real-world industrial datasets. Last but not least, online deployment experiments on Dianping demonstrate the advantage of the proposed framework for tip generation as well as its online business values.

deep learning, neural network, query, (15 more...)

arXiv.org Artificial Intelligence

2010.09254

Country: North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback