AITopics | Wang, Junjie

Plotting

Wang, Junjie

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

TorchTitan: One-stop PyTorch native solution for production ready LLM pre-training

Liang, Wanchao, Liu, Tianyu, Wright, Less, Constable, Will, Gu, Andrew, Huang, Chien-Chin, Zhang, Iris, Feng, Wei, Huang, Howard, Wang, Junjie, Purandare, Sanket, Nadathur, Gokul, Idreos, Stratos

arXiv.org Artificial IntelligenceNov-4-2024

The development of large language models (LLMs) has been instrumental in advancing state-of-the-art natural language processing applications. Training LLMs with billions of parameters and trillions of tokens require sophisticated distributed systems that enable composing and comparing several state-of-the-art techniques in order to efficiently scale across thousands of accelerators. However, existing solutions are complex, scattered across multiple libraries/repositories, lack interoperability, and are cumbersome to maintain. Thus, curating and empirically comparing training recipes require non-trivial engineering effort. This paper introduces TorchTitan, an open-source, PyTorch-native distributed training system that unifies state-of-the-art techniques, streamlining integration and reducing overhead. TorchTitan enables 3D parallelism in a modular manner with elastic scaling, providing comprehensive logging, checkpointing, and debugging tools for production-ready training. It also incorporates hardware-software co-designed solutions, leveraging features like Float8 training and SymmetricMemory. As a flexible test bed, TorchTitan facilitates custom recipe curation and comparison, allowing us to develop optimized training recipes for Llama 3.1 and provide guidance on selecting techniques for maximum efficiency based on our experiences. We thoroughly assess TorchTitan on the Llama 3.1 family of LLMs, spanning 8 billion to 405 billion parameters, and showcase its exceptional performance, modular composability, and elastic scalability. By stacking training optimizations, we demonstrate accelerations of 65.08% with 1D parallelism at the 128-GPU scale (Llama 3.1 8B), an additional 12.59% with 2D parallelism at the 256-GPU scale (Llama 3.1 70B), and an additional 30% with 3D parallelism at the 512-GPU scale (Llama 3.1 405B) on NVIDIA H100 GPUs over optimized baselines.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2410.06511

Country: North America > United States > New York (0.14)

Genre: Research Report (0.91)

Industry: Information Technology (0.35)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

CodePurify: Defend Backdoor Attacks on Neural Code Models via Entropy-based Purification

Mu, Fangwen, Wang, Junjie, Yu, Zhuohao, Shi, Lin, Wang, Song, Li, Mingyang, Wang, Qing

arXiv.org Artificial IntelligenceOct-26-2024

Neural code models have found widespread success in tasks pertaining to code intelligence, yet they are vulnerable to backdoor attacks, where an adversary can manipulate the victim model's behavior by inserting triggers into the source code. Recent studies indicate that advanced backdoor attacks can achieve nearly 100% attack success rates on many software engineering tasks. However, effective defense techniques against such attacks remain insufficiently explored. In this study, we propose CodePurify, a novel defense against backdoor attacks on code models through entropy-based purification. Entropy-based purification involves the process of precisely detecting and eliminating the possible triggers in the source code while preserving its semantic information. Within this process, CodePurify first develops a confidence-driven entropy-based measurement to determine whether a code snippet is poisoned and, if so, locates the triggers. Subsequently, it purifies the code by substituting the triggers with benign tokens using a masked language model. We extensively evaluate CodePurify against four advanced backdoor attacks across three representative tasks and two popular code models. The results show that CodePurify significantly outperforms four commonly used defense baselines, improving average defense performance by at least 40%, 40%, and 12% across the three tasks, respectively. These findings highlight the potential of CodePurify to serve as a robust defense against backdoor attacks on neural code models.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2410.20136

Country:

North America > Canada (0.68)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.88)

Add feedback

Multi-path Exploration and Feedback Adjustment for Text-to-Image Person Retrieval

Kang, Bin, Chen, Bin, Wang, Junjie, Xu, Yong

arXiv.org Artificial IntelligenceOct-25-2024

Text-based person retrieval aims to identify the specific persons using textual descriptions as queries. Existing ad vanced methods typically depend on vision-language pre trained (VLP) models to facilitate effective cross-modal alignment. However, the inherent constraints of VLP mod-els, which include the global alignment biases and insuffi-cient self-feedback regulation, impede optimal retrieval per formance. In this paper, we propose MeFa, a Multi-Pathway Exploration, Feedback, and Adjustment framework, which deeply explores intrinsic feedback of intra and inter-modal to make targeted adjustment, thereby achieving more precise person-text associations. Specifically, we first design an intra modal reasoning pathway that generates hard negative sam ples for cross-modal data, leveraging feedback from these samples to refine intra-modal reasoning, thereby enhancing sensitivity to subtle discrepancies. Subsequently, we intro duce a cross-modal refinement pathway that utilizes both global information and intermodal feedback to refine local in formation, thus enhancing its global semantic representation. Finally, the discriminative clue correction pathway incorpo rates fine-grained features of secondary similarity as discrim inative clues to further mitigate retrieval failures caused by disparities in these features. Experimental results on three public benchmarks demonstrate that MeFa achieves superior person retrieval performance without necessitating additional data or complex structures.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2410.21318

Country: Asia > China (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.48)

Add feedback

ToolBeHonest: A Multi-level Hallucination Diagnostic Benchmark for Tool-Augmented Large Language Models

Zhang, Yuxiang, Chen, Jing, Wang, Junjie, Liu, Yaxin, Yang, Cheng, Shi, Chufan, Zhu, Xinyu, Lin, Zihao, Wan, Hanwen, Yang, Yujiu, Sakai, Tetsuya, Feng, Tian, Yamana, Hayato

arXiv.org Artificial IntelligenceJun-28-2024

Tool-augmented large language models (LLMs) are rapidly being integrated into real-world applications. Due to the lack of benchmarks, the community still needs to fully understand the hallucination issues within these models. To address this challenge, we introduce a comprehensive diagnostic benchmark, ToolBH. Specifically, we assess the LLM's hallucinations through two perspectives: depth and breadth. In terms of depth, we propose a multi-level diagnostic process, including (1) solvability detection, (2) solution planning, and (3) missing-tool analysis. For breadth, we consider three scenarios based on the characteristics of the toolset: missing necessary tools, potential tools, and limited functionality tools. Furthermore, we developed seven tasks and collected 700 evaluation samples through multiple rounds of manual annotation. The results show the significant challenges presented by the ToolBH benchmark. The current advanced models Gemini-1.5-Pro and GPT-4o only achieve a total score of 45.3 and 37.0, respectively, on a scale of 100. In this benchmark, larger model parameters do not guarantee better performance; the training data and response strategies also play a crucial role in tool-enhanced LLM scenarios. Our diagnostic analysis indicates that the primary reason for model errors lies in assessing task solvability. Additionally, open-weight models suffer from performance drops with verbose replies, whereas proprietary models excel with longer reasoning.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2406.20015

Country: Asia > China (0.28)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine (0.67)
Information Technology > Security & Privacy (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Learning to Plan for Retrieval-Augmented Large Language Models from Knowledge Graphs

Wang, Junjie, Chen, Mingyang, Hu, Binbin, Yang, Dan, Liu, Ziqi, Shen, Yue, Wei, Peng, Zhang, Zhiqiang, Gu, Jinjie, Zhou, Jun, Pan, Jeff Z., Zhang, Wen, Chen, Huajun

arXiv.org Artificial IntelligenceJun-20-2024

Improving the performance of large language models (LLMs) in complex question-answering (QA) scenarios has always been a research focal point. Recent studies have attempted to enhance LLMs' performance by combining step-wise planning with external retrieval. While effective for advanced models like GPT-3.5, smaller LLMs face challenges in decomposing complex questions, necessitating supervised fine-tuning. Previous work has relied on manual annotation and knowledge distillation from teacher LLMs, which are time-consuming and not accurate enough. In this paper, we introduce a novel framework for enhancing LLMs' planning capabilities by using planning data derived from knowledge graphs (KGs). LLMs fine-tuned with this data have improved planning capabilities, better equipping them to handle complex QA tasks that involve retrieval. Evaluations on multiple datasets, including our newly proposed benchmark, highlight the effectiveness of our framework and the benefits of KG-derived planning data.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2406.14282

Country:

Asia (1.00)
North America > United States (0.28)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)

Add feedback

PIN: A Knowledge-Intensive Dataset for Paired and Interleaved Multimodal Documents

Wang, Junjie, Zhang, Yin, Ji, Yatai, Zhang, Yuxiang, Jiang, Chunyang, Wang, Yubo, Zhu, Kang, Wang, Zekun, Wang, Tiezhen, Huang, Wenhao, Fu, Jie, Chen, Bei, Lin, Qunshu, Liu, Minghao, Zhang, Ge, Chen, Wenhu

arXiv.org Artificial IntelligenceJun-19-2024

Recent advancements in Large Multimodal Models (LMMs) have leveraged extensive multimodal datasets to enhance capabilities in complex knowledge-driven tasks. However, persistent challenges in perceptual and reasoning errors limit their efficacy, particularly in interpreting intricate visual data and deducing multimodal relationships. Addressing these issues, we introduce a novel dataset format, PIN (Paired and INterleaved multimodal documents), designed to significantly improve both the depth and breadth of multimodal training. The PIN format is built on three foundational principles: knowledge intensity, scalability, and support for diverse training modalities. This innovative format combines markdown files and comprehensive images to enrich training data with a dense knowledge structure and versatile training strategies. We present PIN-14M, an open-source dataset comprising 14 million samples derived from a diverse range of Chinese and English sources, tailored to include complex web and scientific content. This dataset is constructed meticulously to ensure data quality and ethical integrity, aiming to facilitate advanced training strategies and improve model robustness against common multimodal training pitfalls. Our initial results, forming the basis of this technical report, suggest significant potential for the PIN format in refining LMM performance, with plans for future expansions and detailed evaluations of its impact on model capabilities.

artificial intelligence, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2406.13923

Country:

North America > United States (0.14)
Africa (0.14)

Genre: Research Report (0.84)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback

HoLLMwood: Unleashing the Creativity of Large Language Models in Screenwriting via Role Playing

Chen, Jing, Zhu, Xinyu, Yang, Cheng, Shi, Chufan, Xi, Yadong, Zhang, Yuxiang, Wang, Junjie, Pu, Jiashu, Zhang, Rongsheng, Yang, Yujiu, Feng, Tian

arXiv.org Artificial IntelligenceJun-17-2024

Generative AI has demonstrated unprecedented creativity in the field of computer vision, yet such phenomena have not been observed in natural language processing. In particular, large language models (LLMs) can hardly produce written works at the level of human experts due to the extremely high complexity of literature writing. In this paper, we present HoLLMwood, an automated framework for unleashing the creativity of LLMs and exploring their potential in screenwriting, which is a highly demanding task. Mimicking the human creative process, we assign LLMs to different roles involved in the real-world scenario. In addition to the common practice of treating LLMs as ${Writer}$, we also apply LLMs as ${Editor}$, who is responsible for providing feedback and revision advice to ${Writer}$. Besides, to enrich the characters and deepen the plots, we introduce a role-playing mechanism and adopt LLMs as ${Actors}$ that can communicate and interact with each other. Evaluations on automatically generated screenplays show that HoLLMwood substantially outperforms strong baselines in terms of coherence, relevance, interestingness and overall quality.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2406.11683

Country: North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report > New Finding (0.46)

Industry:

Media > Film (1.00)
Leisure & Entertainment (1.00)
Health & Medicine > Therapeutic Area (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.47)

Add feedback

ChartMimic: Evaluating LMM's Cross-Modal Reasoning Capability via Chart-to-Code Generation

Shi, Chufan, Yang, Cheng, Liu, Yaxin, Shui, Bo, Wang, Junjie, Jing, Mohan, Xu, Linran, Zhu, Xinyu, Li, Siheng, Zhang, Yuxiang, Liu, Gongye, Nie, Xiaomei, Cai, Deng, Yang, Yujiu

arXiv.org Artificial IntelligenceJun-14-2024

We introduce a new benchmark, ChartMimic, aimed at assessing the visually-grounded code generation capabilities of large multimodal models (LMMs). ChartMimic utilizes information-intensive visual charts and textual instructions as inputs, requiring LMMs to generate the corresponding code for chart rendering. ChartMimic includes 1,000 human-curated (figure, instruction, code) triplets, which represent the authentic chart use cases found in scientific papers across various domains(e.g., Physics, Computer Science, Economics, etc). These charts span 18 regular types and 4 advanced types, diversifying into 191 subcategories. Furthermore, we propose multi-level evaluation metrics to provide an automatic and thorough assessment of the output code and the rendered charts. Unlike existing code generation benchmarks, ChartMimic places emphasis on evaluating LMMs' capacity to harmonize a blend of cognitive capabilities, encompassing visual understanding, code generation, and cross-modal reasoning. The evaluation of 3 proprietary models and 11 open-weight models highlights the substantial challenges posed by ChartMimic. Even the advanced GPT-4V, Claude-3-opus only achieve an average score of 73.2 and 53.7, respectively, indicating significant room for improvement. We anticipate that ChartMimic will inspire the development of LMMs, advancing the pursuit of artificial general intelligence.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2406.09961

Country: Asia (0.28)

Genre: Research Report > Experimental Study (0.46)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Automatic Programming (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

OV-DQUO: Open-Vocabulary DETR with Denoising Text Query Training and Open-World Unknown Objects Supervision

Wang, Junjie, Chen, Bin, Kang, Bin, Li, Yulin, Chen, YiChi, Xian, Weizhi, Chang, Huifeng

arXiv.org Artificial IntelligenceMay-28-2024

Open-Vocabulary Detection (OVD) aims to detect objects from novel categories beyond the base categories on which the detector is trained. However, existing open-vocabulary detectors trained on known category data tend to assign higher confidence to trained categories and confuse novel categories with background. To resolve this, we propose OV-DQUO, an Open-Vocabulary DETR with Denoising text Query training and open-world Unknown Objects supervision. Specifically, we introduce a wildcard matching method that enables the detector to learn from pairs of unknown objects recognized by the open-world detector and text embeddings with general semantics, mitigating the confidence bias between base and novel categories. Additionally, we propose a denoising text query training strategy that synthesizes additional noisy query-box pairs from open-world unknown objects to trains the detector through contrastive learning, enhancing its ability to distinguish novel objects from the background. We conducted extensive experiments on the challenging OV-COCO and OV-LVIS benchmarks, achieving new state-of-the-art results of 45.6 AP50 and 39.3 mAP on novel categories respectively, without the need for additional training data.

artificial intelligence, category, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2405.17913

Country:

Asia > China (0.28)
Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Vision (0.98)

Add feedback

StructLM: Towards Building Generalist Models for Structured Knowledge Grounding

Zhuang, Alex, Zhang, Ge, Zheng, Tianyu, Du, Xinrun, Wang, Junjie, Ren, Weiming, Huang, Stephen W., Fu, Jie, Yue, Xiang, Chen, Wenhu

arXiv.org Artificial IntelligenceApr-24-2024

Structured data sources, such as tables, graphs, and databases, are ubiquitous knowledge sources. Despite the demonstrated capabilities of large language models (LLMs) on plain text, their proficiency in interpreting and utilizing structured data remains limited. Our investigation reveals a notable deficiency in LLMs' ability to process structured data, e.g., ChatGPT lags behind state-of-the-art (SoTA) model by an average of 35%. To augment the Structured Knowledge Grounding (SKG) capabilities in LLMs, we have developed a comprehensive instruction tuning dataset comprising 1.1 million examples. Utilizing this dataset, we train a series of models, referred to as StructLM, based on the Mistral and the CodeLlama model family, ranging from 7B to 34B parameters. Our StructLM series surpasses task-specific models on 16 out of 18 evaluated datasets and establishes new SoTA performance on 8 SKG tasks. Furthermore, StructLM demonstrates strong generalization across 6 novel held-out SKG tasks, outperforming TableLlama by an average of 35\% and Flan-UL2 20B by an average of 10\%. Contrary to expectations, we observe that scaling model size offers marginal benefits, with StructLM-34B showing only slight improvements over StructLM-7B. This suggests that structured knowledge grounding is still a challenging task and requires more innovative design to push to a new level.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2402.16671

Country:

North America > United States (1.00)
Asia (0.67)

Genre: Research Report (1.00)

Industry:

Banking & Finance > Trading (1.00)
Government (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback