AITopics | Sun, Jingdong

Collaborating Authors

Sun, Jingdong

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

HA-VLN: A Benchmark for Human-Aware Navigation in Discrete-Continuous Environments with Dynamic Multi-Human Interactions, Real-World Validation, and an Open Leaderboard

Dong, Yifei, Wu, Fengyi, He, Qi, Li, Heng, Li, Minghan, Cheng, Zebang, Zhou, Yuxuan, Sun, Jingdong, Dai, Qi, Cheng, Zhi-Qi, Hauptmann, Alexander G

arXiv.org Artificial IntelligenceMar-18-2025

Vision-and-Language Navigation (VLN) systems often focus on either discrete (panoramic) or continuous (free-motion) paradigms alone, overlooking the complexities of human-populated, dynamic environments. We introduce a unified Human-Aware VLN (HA-VLN) benchmark that merges these paradigms under explicit social-awareness constraints. Our contributions include: 1. A standardized task definition that balances discrete-continuous navigation with personal-space requirements; 2. An enhanced human motion dataset (HAPS 2.0) and upgraded simulators capturing realistic multi-human interactions, outdoor contexts, and refined motion-language alignment; 3. Extensive benchmarking on 16,844 human-centric instructions, revealing how multi-human dynamics and partial observability pose substantial challenges for leading VLN agents; 4. Real-world robot tests validating sim-to-real transfer in crowded indoor spaces; and 5. A public leaderboard supporting transparent comparisons across discrete and continuous tasks. Empirical results show improved navigation success and fewer collisions when social context is integrated, underscoring the need for human-centric design. By releasing all datasets, simulators, agent code, and evaluation tools, we aim to advance safer, more capable, and socially responsible VLN research.

evolutionary algorithm, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2503.14229

Genre:

Workflow (0.67)
Research Report > New Finding (0.34)

Industry:

Leisure & Entertainment (1.00)
Media > Television (0.45)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
(2 more...)

Add feedback

MetaDesigner: Advancing Artistic Typography through AI-Driven, User-Centric, and Multilingual WordArt Synthesis

He, Jun-Yan, Cheng, Zhi-Qi, Li, Chenyang, Sun, Jingdong, He, Qi, Xiang, Wangmeng, Chen, Hanyuan, Lan, Jin-Peng, Lin, Xianhui, Zhu, Kang, Luo, Bin, Geng, Yifeng, Xie, Xuansong, Hauptmann, Alexander G.

arXiv.org Artificial IntelligenceJul-4-2024

MetaDesigner revolutionizes artistic typography synthesis by leveraging the strengths of Large Language Models (LLMs) to drive a design paradigm centered around user engagement. At the core of this framework lies a multi-agent system comprising the Pipeline, Glyph, and Texture agents, which collectively enable the creation of customized WordArt, ranging from semantic enhancements to the imposition of complex textures. MetaDesigner incorporates a comprehensive feedback mechanism that harnesses insights from multimodal models and user evaluations to refine and enhance the design process iteratively. Through this feedback loop, the system adeptly tunes hyperparameters to align with user-defined stylistic and thematic preferences, generating WordArt that not only meets but exceeds user expectations of visual appeal and contextual relevance. Empirical validations highlight MetaDesigner's capability to effectively serve diverse WordArt applications, consistently producing aesthetically appealing and context-sensitive results.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2406.19859

Country: North America > United States > Michigan (0.14)

Genre: Research Report (0.82)

Industry:

Education (0.67)
Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

WordArt Designer API: User-Driven Artistic Typography Synthesis with Large Language Models on ModelScope

He, Jun-Yan, Cheng, Zhi-Qi, Li, Chenyang, Sun, Jingdong, Xiang, Wangmeng, Hu, Yusen, Lin, Xianhui, Kang, Xiaoyang, Jin, Zengke, Luo, Bin, Geng, Yifeng, Xie, Xuansong, Zhou, Jingren

arXiv.org Artificial IntelligenceJan-12-2024

This paper introduces the WordArt Designer API, a novel framework for user-driven artistic typography synthesis utilizing Large Language Models (LLMs) on ModelScope. We address the challenge of simplifying artistic typography for non-professionals by offering a dynamic, adaptive, and computationally efficient alternative to traditional rigid templates. Our approach leverages the power of LLMs to understand and interpret user input, facilitating a more intuitive design process. We demonstrate through various case studies how users can articulate their aesthetic preferences and functional requirements, which the system then translates into unique and creative typographic designs. Our evaluations indicate significant improvements in user satisfaction, design flexibility, and creative expression over existing systems. The WordArt Designer API not only democratizes the art of typography but also opens up new possibilities for personalized digital communication and design.

artificial intelligence, large language model, natural language, (9 more...)

arXiv.org Artificial Intelligence

2401.01699

Country: North America > United States (0.15)

Genre: Research Report (0.50)

Industry:

Education (0.70)
Information Technology > Security & Privacy (0.48)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

WordArt Designer: User-Driven Artistic Typography Synthesis using Large Language Models

He, Jun-Yan, Cheng, Zhi-Qi, Li, Chenyang, Sun, Jingdong, Xiang, Wangmeng, Lin, Xianhui, Kang, Xiaoyang, Jin, Zengke, Hu, Yusen, Luo, Bin, Geng, Yifeng, Xie, Xuansong, Zhou, Jingren

arXiv.org Artificial IntelligenceNov-26-2023

This paper introduces WordArt Designer, a user-driven framework for artistic typography synthesis, relying on the Large Language Model (LLM). The system incorporates four key modules: the LLM Engine, SemTypo, StyTypo, and TexTypo modules. 1) The LLM Engine, empowered by the LLM (e.g., GPT-3.5), interprets user inputs and generates actionable prompts for the other modules, thereby transforming abstract concepts into tangible designs. 2) The SemTypo module optimizes font designs using semantic concepts, striking a balance between artistic transformation and readability. 3) Building on the semantic layout provided by the SemTypo module, the StyTypo module creates smooth, refined images. 4) The TexTypo module further enhances the design's aesthetics through texture rendering, enabling the generation of inventive textured fonts. Notably, WordArt Designer highlights the fusion of generative AI with artistic typography. Experience its capabilities on ModelScope: https://www.modelscope.cn/studios/WordArt/WordArt.

large language model, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2310.18332

Country: North America > United States (0.68)

Genre: Research Report (0.50)

Industry:

Government (1.00)
Education (0.93)
Media (0.69)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.34)

Add feedback

Debunking Free Fusion Myth: Online Multi-view Anomaly Detection with Disentangled Product-of-Experts Modeling

Wang, Hao, Cheng, Zhi-Qi, Sun, Jingdong, Yang, Xin, Wu, Xiao, Chen, Hongyang, Yang, Yan

arXiv.org Artificial IntelligenceOct-31-2023

Multi-view or even multi-modal data is appealing yet challenging for real-world applications. Detecting anomalies in multi-view data is a prominent recent research topic. However, most of the existing methods 1) are only suitable for two views or type-specific anomalies, 2) suffer from the issue of fusion disentanglement, and 3) do not support online detection after model deployment. To address these challenges, our main ideas in this paper are three-fold: multi-view learning, disentangled representation learning, and generative model. To this end, we propose dPoE, a novel multi-view variational autoencoder model that involves (1) a Product-of-Experts (PoE) layer in tackling multi-view data, (2) a Total Correction (TC) discriminator in disentangling view-common and view-specific representations, and (3) a joint loss function in wrapping up all components. In addition, we devise theoretical information bounds to control both view-common and view-specific representations. Extensive experiments on six real-world datasets markedly demonstrate that the proposed dPoE outperforms baselines.

artificial intelligence, disentangled product-of-expert modeling, machine learning, (2 more...)

arXiv.org Artificial Intelligence

2310.18728

Genre: Research Report (0.40)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (0.85)
Information Technology > Artificial Intelligence > Machine Learning (0.53)

Add feedback

KeyPosS: Plug-and-Play Facial Landmark Detection through GPS-Inspired True-Range Multilateration

Bao, Xu, Cheng, Zhi-Qi, He, Jun-Yan, Li, Chenyang, Xiang, Wangmeng, Sun, Jingdong, Liu, Hanbing, Liu, Wei, Luo, Bin, Geng, Yifeng, Xie, Xuansong

arXiv.org Artificial IntelligenceSep-23-2023

Accurate facial landmark detection is critical for facial analysis tasks, yet prevailing heatmap and coordinate regression methods grapple with prohibitive computational costs and quantization errors. Through comprehensive theoretical analysis and experimentation, we identify and elucidate the limitations of existing techniques. To overcome these challenges, we pioneer the application of True-Range Multilateration, originally devised for GPS localization, to facial landmark detection. We propose KeyPoint Positioning System (KeyPosS) - the first framework to deduce exact landmark coordinates by triangulating distances between points of interest and anchor points predicted by a fully convolutional network. A key advantage of KeyPosS is its plug-and-play nature, enabling flexible integration into diverse decoding pipelines. Extensive experiments on four datasets demonstrate state-of-the-art performance, with KeyPosS outperforming existing methods in low-resolution settings despite minimal computational overhead. By spearheading the integration of Multilateration with facial analysis, KeyPosS marks a paradigm shift in facial landmark detection. The code is available at https://github.com/zhiqic/KeyPosS.

artificial intelligence, keyposs, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2305.16437

Country: North America > United States (0.68)

Genre: Research Report > Promising Solution (0.46)

Industry:

Information Technology (0.93)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

ChartReader: A Unified Framework for Chart Derendering and Comprehension without Heuristic Rules

Cheng, Zhi-Qi, Dai, Qi, Li, Siyao, Sun, Jingdong, Mitamura, Teruko, Hauptmann, Alexander G.

arXiv.org Artificial IntelligenceApr-4-2023

Charts are a powerful tool for visually conveying complex data, but their comprehension poses a challenge due to the diverse chart types and intricate components. Existing chart comprehension methods suffer from either heuristic rules or an over-reliance on OCR systems, resulting in suboptimal performance. To address these issues, we present ChartReader, a unified framework that seamlessly integrates chart derendering and comprehension tasks. Our approach includes a transformer-based chart component detection module and an extended pre-trained vision-language model for chart-to-X tasks. By learning the rules of charts automatically from annotated datasets, our approach eliminates the need for manual rule-making, reducing effort and enhancing accuracy.~We also introduce a data variable replacement technique and extend the input and position embeddings of the pre-trained model for cross-task training. We evaluate ChartReader on Chart-to-Table, ChartQA, and Chart-to-Text tasks, demonstrating its superiority over existing methods. Our proposed framework can significantly reduce the manual effort involved in chart analysis, providing a step towards a universal chart understanding model. Moreover, our approach offers opportunities for plug-and-play integration with mainstream LLMs such as T5 and TaPas, extending their capability to chart comprehension tasks. The code is available at https://github.com/zhiqic/ChartReader.

artificial intelligence, expert system, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2304.02173

Country: North America > United States (0.46)

Genre: Research Report > New Finding (0.46)

Industry: Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (0.85)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.71)
(2 more...)

Add feedback