AITopics | Li, Yifeng

Collaborating Authors

Li, Yifeng

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Can We Afford The Perfect Prompt? Balancing Cost and Accuracy with the Economical Prompting Index

McDonald, Tyler, Colosimo, Anthony, Li, Yifeng, Emami, Ali

arXiv.org Artificial IntelligenceDec-2-2024

As prompt engineering research rapidly evolves, evaluations beyond accuracy are crucial for developing cost-effective techniques. We present the Economical Prompting Index (EPI), a novel metric that combines accuracy scores with token consumption, adjusted by a user-specified cost concern level to reflect different resource constraints. Our study examines 6 advanced prompting techniques, including Chain-of-Thought, Self-Consistency, and Tree of Thoughts, across 10 widely-used language models and 4 diverse datasets. We demonstrate that approaches such as Self-Consistency often provide statistically insignificant gains while becoming cost-prohibitive. For example, on high-performing models like Claude 3.5 Sonnet, the EPI of simpler techniques like Chain-of-Thought (0.72) surpasses more complex methods like Self-Consistency (0.64) at slight cost concern levels. Our findings suggest a reevaluation of complex prompting strategies in resource-constrained scenarios, potentially reshaping future research priorities and improving cost-effectiveness for end-users.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2412.0169

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Education (0.49)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

GR-2: A Generative Video-Language-Action Model with Web-Scale Knowledge for Robot Manipulation

Cheang, Chi-Lam, Chen, Guangzeng, Jing, Ya, Kong, Tao, Li, Hang, Li, Yifeng, Liu, Yuxiao, Wu, Hongtao, Xu, Jiafeng, Yang, Yichu, Zhang, Hanbo, Zhu, Minzhao

arXiv.org Artificial IntelligenceOct-8-2024

GR-2 is first pre-trained on a vast number of Internet videos to capture the dynamics of the world. This large-scale pre-training, involving 38 million video clips and over 50 billion tokens, equips GR-2 with the ability to generalize across a wide range of robotic tasks and environments during subsequent policy learning. Following this, GR-2 is fine-tuned for both video generation and action prediction using robot trajectories. It exhibits impressive multi-task learning capabilities, achieving an average success rate of 97.7% across more than 100 tasks. Moreover, GR-2 demonstrates exceptional generalization to new, previously unseen scenarios, including novel backgrounds, environments, objects, and tasks.

artificial intelligence, arxiv preprint arxiv, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2410.06158

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Robots > Manipulation (0.51)

Add feedback

Picturing Ambiguity: A Visual Twist on the Winograd Schema Challenge

Park, Brendan, Janecek, Madeline, Ezzati-Jivan, Naser, Li, Yifeng, Emami, Ali

arXiv.org Artificial IntelligenceJun-3-2024

Large Language Models (LLMs) have demonstrated remarkable success in tasks like the Winograd Schema Challenge (WSC), showcasing advanced textual common-sense reasoning. However, applying this reasoning to multimodal domains, where understanding text and images together is essential, remains a substantial challenge. To address this, we introduce WinoVis, a novel dataset specifically designed to probe text-to-image models on pronoun disambiguation within multimodal contexts. Utilizing GPT-4 for prompt generation and Diffusion Attentive Attribution Maps (DAAM) for heatmap analysis, we propose a novel evaluation framework that isolates the models' ability in pronoun disambiguation from other visual processing challenges. Evaluation of successive model versions reveals that, despite incremental advancements, Stable Diffusion 2.0 achieves a precision of 56.7% on WinoVis, only marginally surpassing random guessing. Further error analysis identifies important areas for future research aimed at advancing text-to-image models in their ability to interpret and interact with the complex visual world.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2405.16277

Country:

North America > United States (0.46)
North America > Canada > Ontario (0.28)
Europe > Italy (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Towards Unified Interactive Visual Grounding in The Wild

Xu, Jie, Zhang, Hanbo, Si, Qingyi, Li, Yifeng, Lan, Xuguang, Kong, Tao

arXiv.org Artificial IntelligenceJan-29-2024

Interactive visual grounding in Human-Robot Interaction (HRI) is challenging yet practical due to the inevitable ambiguity in natural languages. It requires robots to disambiguate the user input by active information gathering. Previous approaches often rely on predefined templates to ask disambiguation questions, resulting in performance reduction in realistic interactive scenarios. In this paper, we propose TiO, an end-to-end system for interactive visual grounding in human-robot interaction. Benefiting from a unified formulation of visual dialogue and grounding, our method can be trained on a joint of extensive public data, and show superior generality to diversified and challenging open-world scenarios. In the experiments, we validate TiO on GuessWhat?! and InViG benchmarks, setting new state-of-the-art performance by a clear margin. Moreover, we conduct HRI experiments on the carefully selected 150 challenging scenes as well as real-robot platforms. Results show that our method demonstrates superior generality to diversified visual and language inputs with a high success rate. Codes and demos are available at https://github.com/jxu124/TiO.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2401.16699

Country: Asia > China (0.28)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Generative Category-Level Shape and Pose Estimation with Semantic Primitives

Li, Guanglin, Li, Yifeng, Ye, Zhichao, Zhang, Qihang, Kong, Tao, Cui, Zhaopeng, Zhang, Guofeng

arXiv.org Artificial IntelligenceFeb-1-2023

Empowering autonomous agents with 3D understanding for daily objects is a grand challenge in robotics applications. When exploring in an unknown environment, existing methods for object pose estimation are still not satisfactory due to the diversity of object shapes. In this paper, we propose a novel framework for category-level object shape and pose estimation from a single RGB-D image. To handle the intra-category variation, we adopt a semantic primitive representation that encodes diverse shapes into a unified latent space, which is the key to establish reliable correspondences between observed point clouds and estimated shapes. Then, by using a SIM(3)-invariant shape descriptor, we gracefully decouple the shape and pose of an object, thus supporting latent shape optimization of target objects in arbitrary poses. Extensive experiments show that the proposed method achieves SOTA pose estimation performance and better generalization in the real-world dataset. Code and video are available at https://zju3dv.github.io/gCasp.

artificial intelligence, machine learning, video understanding, (16 more...)

arXiv.org Artificial Intelligence

2210.01112

Country: Asia (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision > Video Understanding (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)

Add feedback

Tri-Attention: Explicit Context-Aware Attention Mechanism for Natural Language Processing

Yu, Rui, Li, Yifeng, Lu, Wenpeng, Cao, Longbing

arXiv.org Artificial IntelligenceNov-5-2022

In natural language processing (NLP), the context of a word or sentence plays an essential role. Contextual information such as the semantic representation of a passage or historical dialogue forms an essential part of a conversation and a precise understanding of the present phrase or sentence. However, the standard attention mechanisms typically generate weights using query and key but ignore context, forming a Bi-Attention framework, despite their great success in modeling sequence alignment. This Bi-Attention mechanism does not explicitly model the interactions between the contexts, queries and keys of target sequences, missing important contextual information and resulting in poor attention performance. Accordingly, a novel and general triple-attention (Tri-Attention) framework expands the standard Bi-Attention mechanism and explicitly interacts query, key, and context by incorporating context as the third dimension in calculating relevance scores. Four variants of Tri-Attention are generated by expanding the two-dimensional vector-based additive, dot-product, scaled dot-product, and bilinear operations in Bi-Attention to the tensor operations for Tri-Attention. Extensive experiments on three NLP tasks demonstrate that Tri-Attention outperforms about 30 state-of-the-art non-attention, standard Bi-Attention, contextual Bi-Attention approaches and pretrained neural language models1.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2211.02899

Country:

Asia > China (0.46)
North America (0.28)

Genre: Research Report (0.64)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.47)
Education > Educational Setting (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Deep Evolutionary Learning for Molecular Design

Li, Yifeng, Ooi, Hsu Kiang, Tchagang, Alain

arXiv.org Artificial IntelligenceDec-27-2020

In this paper, we propose a deep evolutionary learning (DEL) process that integrates fragment-based deep generative model and multi-objective evolutionary computation for molecular design. Our approach enables (1) evolutionary operations in the latent space of the generative model, rather than the structural space, to generate novel promising molecular structures for the next evolutionary generation, and (2) generative model fine-tuning using newly generated high-quality samples. Thus, DEL implements a data-model co-evolution concept which improves both sample population and generative model learning. Experiments on two public datasets indicate that sample population obtained by DEL exhibits improved property distributions, and dominates samples generated by multi-objective Bayesian optimization algorithms.

deep learning, feature distribution, neural network, (18 more...)

arXiv.org Artificial Intelligence

2102.01011

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > Canada > Ontario > National Capital Region > Ottawa (0.14)

Genre: Research Report (0.64)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

Exploring Deep Anomaly Detection Methods Based on Capsule Net

Li, Xiaoyan, Kiringa, Iluju, Yeap, Tet, Zhu, Xiaodan, Li, Yifeng

arXiv.org Machine LearningJul-14-2019

In this paper, we develop and explore deep anomaly detection techniques based on the capsule network (CapsNet) for image data. Being able to encoding intrinsic spatial relationship between parts and a whole, CapsNet has been applied as both a classifier and deep autoencoder. This inspires us to design a prediction-probability-based and a reconstruction-error-based normality score functions for evaluating the "outlierness" of unseen images. Our results on three datasets demonstrate that the prediction-probability-based method performs consistently well, while the reconstruction-error-based approach is relatively sensitive to the similarity between labeled and unlabeled images. Furthermore, both of the CapsNet-based methods outperform the principled benchmark methods in many cases.

capsnet, deep learning, neural network, (22 more...)

arXiv.org Machine Learning

1907.06312

Country: North America > Canada > Ontario (0.14)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Data Science > Data Mining > Anomaly Detection (0.89)

Add feedback

Defending Adversarial Attacks by Correcting logits

Li, Yifeng, Xie, Lingxi, Zhang, Ya, Zhang, Rui, Wang, Yanfeng, Tian, Qi

arXiv.org Machine LearningJun-26-2019

Generating and eliminating adversarial examples has been an intriguing topic in the field of deep learning. While previous research verified that adversarial attacks are often fragile and can be defended via image-level processing, it remains unclear how high-level features are perturbed by such attacks. We investigate this issue from a new perspective, which purely relies on logits, the class scores before softmax, to detect and defend adversarial attacks. Our defender is a two-layer network trained on a mixed set of clean and perturbed logits, with the goal being recovering the original prediction. Upon a wide range of adversarial attacks, our simple approach shows promising results with relatively high accuracy in defense, and the defender can transfer across attackers with similar properties. More importantly, our defender can work in the scenarios that image data are unavailable, and enjoys high interpretability especially at the semantic level.

deep learning, defender, neural network, (18 more...)

arXiv.org Machine Learning

1906.10973

Country: Asia (0.14)

Genre: Research Report (0.82)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Vision (0.93)

Add feedback