AITopics | Guo, Yuxuan

Collaborating Authors

Guo, Yuxuan

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Luban: Building Open-Ended Creative Agents via Autonomous Embodied Verification

Guo, Yuxuan, Peng, Shaohui, Guo, Jiaming, Huang, Di, Zhang, Xishan, Zhang, Rui, Hao, Yifan, Li, Ling, Tian, Zikang, Gao, Mingju, Li, Yutai, Gan, Yiming, Liang, Shuai, Zhang, Zihao, Du, Zidong, Guo, Qi, Hu, Xing, Chen, Yunji

arXiv.org Artificial IntelligenceMay-24-2024

Building open agents has always been the ultimate goal in AI research, and creative agents are the more enticing. Existing LLM agents excel at long-horizon tasks with well-defined goals (e.g., `mine diamonds' in Minecraft). However, they encounter difficulties on creative tasks with open goals and abstract criteria due to the inability to bridge the gap between them, thus lacking feedback for self-improvement in solving the task. In this work, we introduce autonomous embodied verification techniques for agents to fill the gap, laying the groundwork for creative tasks. Specifically, we propose the Luban agent target creative building tasks in Minecraft, which equips with two-level autonomous embodied verification inspired by human design practices: (1) visual verification of 3D structural speculates, which comes from agent synthesized CAD modeling programs; (2) pragmatic verification of the creation by generating and verifying environment-relevant functionality programs based on the abstract criteria. Extensive multi-dimensional human studies and Elo ratings show that the Luban completes diverse creative building tasks in our proposed benchmark and outperforms other baselines ($33\%$ to $100\%$) in both visualization and pragmatism. Additional demos on the real-world robotic arm show the creation potential of the Luban in the physical world.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2405.15414

Country: Asia > China (0.14)

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment > Games > Computer Games (0.73)
Leisure & Entertainment > Games > Chess (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.89)
Information Technology > Artificial Intelligence > Robots (0.87)
(2 more...)

Add feedback

Multi-Modal Discussion Transformer: Integrating Text, Images and Graph Transformers to Detect Hate Speech on Social Media

Hebert, Liam, Sahu, Gaurav, Guo, Yuxuan, Sreenivas, Nanda Kishore, Golab, Lukasz, Cohen, Robin

arXiv.org Artificial IntelligenceJan-7-2024

We present the Multi-Modal Discussion Transformer (mDT), a novel methodfor detecting hate speech in online social networks such as Reddit discussions. In contrast to traditional comment-only methods, our approach to labelling a comment as hate speech involves a holistic analysis of text and images grounded in the discussion context. This is done by leveraging graph transformers to capture the contextual relationships in the discussion surrounding a comment and grounding the interwoven fusion layers that combine text and image embeddings instead of processing modalities separately. To evaluate our work, we present a new dataset, HatefulDiscussions, comprising complete multi-modal discussions from multiple online communities on Reddit. We compare the performance of our model to baselines that only process individual comments and conduct extensive ablation studies.

artificial intelligence, proceedings, social media, (16 more...)

arXiv.org Artificial Intelligence

2307.09312

Country: North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Media > News (0.55)
Information Technology > Services (0.34)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.46)

Add feedback

Emergent Communication for Rules Reasoning

Guo, Yuxuan, Hao, Yifan, Zhang, Rui, Zhou, Enshuai, Du, Zidong, Zhang, Xishan, Song, Xinkai, Wen, Yuanbo, Zhao, Yongwei, Zhou, Xuehai, Guo, Jiaming, Yi, Qi, Peng, Shaohui, Huang, Di, Chen, Ruizhi, Guo, Qi, Chen, Yunji

arXiv.org Artificial IntelligenceNov-8-2023

Research on emergent communication between deep-learning-based agents has received extensive attention due to its inspiration for linguistics and artificial intelligence. However, previous attempts have hovered around emerging communication under perception-oriented environmental settings, that forces agents to describe low-level perceptual features intra image or symbol contexts. In this work, inspired by the classic human reasoning test (namely Raven's Progressive Matrix), we propose the Reasoning Game, a cognition-oriented environment that encourages agents to reason and communicate high-level rules, rather than perceived low-level contexts. Moreover, we propose 1) an unbiased dataset (namely rule-RAVEN) as a benchmark to avoid overfitting, 2) and a two-stage curriculum agent training method as a baseline for more stable convergence in the Reasoning Game, where contexts and semantics are bilaterally drifting. Experimental results show that, in the Reasoning Game, a semantically stable and compositional language emerges to solve reasoning problems. The emerged language helps agents apply the extracted rules to the generalization of unseen context attributes, and to the transfer between different context attributes or even tasks.

deep learning, emergent communication, machine learning, (2 more...)

arXiv.org Artificial Intelligence

2311.04474

Genre: Research Report (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.53)

Add feedback