AITopics | Wang, Yuqi

Collaborating Authors

Wang, Yuqi

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Open-Sora 2.0: Training a Commercial-Level Video Generation Model in $200k

Peng, Xiangyu, Zheng, Zangwei, Shen, Chenhui, Young, Tom, Guo, Xinying, Wang, Binluo, Xu, Hang, Liu, Hongxin, Jiang, Mingyan, Li, Wenjun, Wang, Yuhui, Ye, Anbang, Ren, Gang, Ma, Qianran, Liang, Wanying, Lian, Xiang, Wu, Xiwen, Zhong, Yuting, Li, Zhuangyan, Gong, Chaoyu, Lei, Guojun, Cheng, Leijun, Zhang, Limin, Li, Minghao, Zhang, Ruijie, Hu, Silan, Huang, Shijie, Wang, Xiaokang, Zhao, Yuanheng, Wang, Yuqi, Wei, Ziang, You, Yang

arXiv.org Artificial IntelligenceMar-12-2025

Video generation models have achieved remarkable progress in the past year. The quality of AI video continues to improve, but at the cost of larger model size, increased data quantity, and greater demand for training compute. In this report, we present Open-Sora 2.0, a commercial-level video generation model trained for only $200k. With this model, we demonstrate that the cost of training a top-performing video generation model is highly controllable. We detail all techniques that contribute to this efficiency breakthrough, including data curation, model architecture, training strategy, and system optimization. According to human evaluation results and VBench scores, Open-Sora 2.0 is comparable to global leading video generation models including the open-source HunyuanVideo and the closed-source Runway Gen-3 Alpha. By making Open-Sora 2.0 fully open-source, we aim to democratize access to advanced video generation technology, fostering broader innovation and creativity in content creation. All resources are publicly available at: https://github.com/hpcaitech/Open-Sora.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2503.09642

Country: North America > United States (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Software (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback

LLM360 K2: Building a 65B 360-Open-Source Large Language Model from Scratch

Liu, Zhengzhong, Tan, Bowen, Wang, Hongyi, Neiswanger, Willie, Tao, Tianhua, Li, Haonan, Koto, Fajri, Wang, Yuqi, Sun, Suqi, Pangarkar, Omkar, Fan, Richard, Gu, Yi, Miller, Victor, Ma, Liqun, Tang, Liping, Ranjan, Nikhil, Zhuang, Yonghao, He, Guowei, Wang, Renxi, Deng, Mingkai, Algayres, Robin, Li, Yuanzhi, Shen, Zhiqiang, Nakov, Preslav, Xing, Eric

arXiv.org Artificial IntelligenceJan-16-2025

We detail the training of the LLM360 K2-65B model, scaling up our 360-degree OPEN SOURCE approach to the largest and most powerful models under project LLM360. While open-source LLMs continue to advance, the answer to "How are the largest LLMs trained?" remains unclear within the community. The implementation details for such high-capacity models are often protected due to business considerations associated with their high cost. This lack of transparency prevents LLM researchers from leveraging valuable insights from prior experience, e.g., "What are the best practices for addressing loss spikes?" The LLM360 K2 project addresses this gap by providing full transparency and access to resources accumulated during the training of LLMs at the largest scale. This report highlights key elements of the K2 project, including our first model, K2 DIAMOND, a 65 billion-parameter LLM that surpasses LLaMA-65B and rivals LLaMA2-70B, while requiring fewer FLOPs and tokens. We detail the implementation steps and present a longitudinal analysis of K2 DIAMOND's capabilities throughout its training process. We also outline ongoing projects such as TXT360, setting the stage for future models in the series. By offering previously unavailable resources, the K2 project also resonates with the 360-degree OPEN SOURCE principles of transparency, reproducibility, and accessibility, which we believe are vital in the era of resource-intensive AI research.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2501.07124

Country: North America > United States > California (0.27)

Genre: Research Report > New Finding (0.45)

Industry:

Materials > Chemicals (1.00)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Law (1.00)
(7 more...)

Technology:

Information Technology > Software (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Domain-specific Guided Summarization for Mental Health Posts

Qian, Lu, Wang, Yuqi, Wang, Zimu, Zhang, Haiyang, Wang, Wei, Yu, Ting, Nguyen, Anh

arXiv.org Artificial IntelligenceNov-3-2024

In domain-specific contexts, particularly mental health, abstractive summarization requires advanced techniques adept at handling specialized content to generate domain-relevant and faithful summaries. In response to this, we introduce a guided summarizer equipped with a dual-encoder and an adapted decoder that utilizes novel domain-specific guidance signals, i.e., mental health terminologies and contextually rich sentences from the source document, to enhance its capacity to align closely with the content and context of guidance, thereby generating a domain-relevant summary. Additionally, we present a post-editing correction model to rectify errors in the generated summary, thus enhancing its consistency with the original content in detail. Evaluation on the MentSum dataset reveals that our model outperforms existing baseline models in terms of both ROUGE and FactCC scores. Although the experiments are specifically designed for mental health posts, the methodology we've developed offers broad applicability, highlighting its versatility and effectiveness in producing high-quality domain-specific summaries.

computational linguistic, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2411.01485

Country:

Europe (0.93)
Asia > China (0.46)

Genre: Research Report (0.82)

Industry: Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

DrivingDojo Dataset: Advancing Interactive and Knowledge-Enriched Driving World Model

Wang, Yuqi, Cheng, Ke, He, Jiawei, Wang, Qitai, Dai, Hengchen, Chen, Yuntao, Xia, Fei, Zhang, Zhaoxiang

arXiv.org Artificial IntelligenceOct-14-2024

Driving world models have gained increasing attention due to their ability to model complex physical dynamics. However, their superb modeling capability is yet to be fully unleashed due to the limited video diversity in current driving datasets. We introduce DrivingDojo, the first dataset tailor-made for training interactive world models with complex driving dynamics. Our dataset features video clips with a complete set of driving maneuvers, diverse multi-agent interplay, and rich open-world driving knowledge, laying a stepping stone for future world model development. We further define an action instruction following (AIF) benchmark for world models and demonstrate the superiority of the proposed dataset for generating action-controlled future predictions.

artificial intelligence, dataset, world model, (13 more...)

arXiv.org Artificial Intelligence

2410.10738

Country: Asia > China (0.46)

Genre: Research Report (1.00)

Industry:

Transportation > Ground > Road (1.00)
Information Technology (1.00)
Automobiles & Trucks (0.90)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)

Add feedback

Revealing COVID-19's Social Dynamics: Diachronic Semantic Analysis of Vaccine and Symptom Discourse on Twitter

Wang, Zeqiang, Wu, Jiageng, Wang, Yuqi, Wang, Wei, Yang, Jie, Johnson, Jon, Sastry, Nishanth, De, Suparna

arXiv.org Artificial IntelligenceOct-10-2024

Social media is recognized as an important source for deriving insights into public opinion dynamics and social impacts due to the vast textual data generated daily and the 'unconstrained' behavior of people interacting on these platforms. However, such analyses prove challenging due to the semantic shift phenomenon, where word meanings evolve over time. This paper proposes an unsupervised dynamic word embedding method to capture longitudinal semantic shifts in social media data without predefined anchor words. The method leverages word co-occurrence statistics and dynamic updating to adapt embeddings over time, addressing the challenges of data sparseness, imbalanced distributions, and synergistic semantic effects. Evaluated on a large COVID-19 Twitter dataset, the method reveals semantic evolution patterns of vaccine- and symptom-related entities across different pandemic stages, and their potential correlations with real-world statistics. Our key contributions include the dynamic embedding technique, empirical analysis of COVID-19 semantic shifts, and discussions on enhancing semantic shift modeling for computational social science research. This study enables capturing longitudinal semantic dynamics on social media to understand public discourse and collective phenomena.

artificial intelligence, natural language, time slice, (20 more...)

arXiv.org Artificial Intelligence

2410.08352

Country:

Asia (1.00)
North America > United States > Minnesota (0.28)

Genre: Research Report (0.50)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)

Add feedback

Unleashing the Potential of Large Language Models for Predictive Tabular Tasks in Data Science

Yang, Yazheng, Wang, Yuqi, Sen, Sankalok, Li, Lei, Liu, Qi

arXiv.org Artificial IntelligenceJul-13-2024

In the domain of data science, the predictive tasks of classification, regression, and imputation of missing values are commonly encountered challenges associated with tabular data. This research endeavors to apply Large Language Models (LLMs) towards addressing these predictive tasks. Despite their proficiency in comprehending natural language, LLMs fall short in dealing with structured tabular data. This limitation stems from their lacking exposure to the intricacies of tabular data during their foundational training. Our research aims to mitigate this gap by compiling a comprehensive corpus of tables annotated with instructions and executing large-scale training of Llama-2 on this enriched dataset. Furthermore, we investigate the practical application of applying the trained model to zero-shot prediction, few-shot prediction, and in-context learning scenarios. Through extensive experiments, our methodology has shown significant improvements over existing benchmarks. These advancements highlight the efficacy of tailoring LLM training to solve table-related problems in data science, thereby establishing a new benchmark in the utilization of LLMs for enhancing tabular intelligence.

large language model, machine learning, tabular data, (15 more...)

arXiv.org Artificial Intelligence

2403.20208

Country: Asia > China (0.14)

Genre: Research Report (1.00)

Industry:

Banking & Finance (1.00)
Information Technology (0.67)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Benchmarking and Boosting Radiology Report Generation for 3D High-Resolution Medical Images

Liu, Che, Wan, Zhongwei, Wang, Yuqi, Shen, Hui, Wang, Haozhe, Zheng, Kangyu, Zhang, Mi, Arcucci, Rossella

arXiv.org Artificial IntelligenceJun-12-2024

Automatic radiology report generation can significantly benefit the labor-intensive process of report writing by radiologists, especially for 3D radiographs like CT scans, which are crucial for broad clinical diagnostics yet underexplored compared to 2D radiographs. Existing methods often handle 3D volumes either slice-wise or with aggressive downsampling due to current GPU memory limitations, which results in a loss of the inherent 3D nature and critical details. To overcome these issues, we introduce a novel framework that efficiently and effectively generates radiology reports for high-resolution (HR) 3D volumes, based on large language models (LLMs). Specifically, our framework utilizes low-resolution (LR) visual tokens as queries to mine information from HR tokens, preserving detailed HR information while reducing computational costs by only processing HR informed LR visual queries. Further benefiting the field, we curate and release BIMCV-RG, a new dataset with 5,328 HR 3D volumes and paired reports, establishing the first benchmarks for report generation from 3D HR medical images.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2406.07146

Country:

North America > United States (0.14)
Europe > Germany (0.14)

Genre: Research Report (0.50)

Industry:

Health & Medicine > Nuclear Medicine (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Multimodal ArXiv: A Dataset for Improving Scientific Comprehension of Large Vision-Language Models

Li, Lei, Wang, Yuqi, Xu, Runxin, Wang, Peiyi, Feng, Xiachong, Kong, Lingpeng, Liu, Qi

arXiv.org Artificial IntelligenceJun-2-2024

Large vision-language models (LVLMs) excel across diverse tasks involving concrete images from natural scenes. However, their ability to interpret abstract figures, such as geometry shapes and scientific plots, remains limited due to a scarcity of training datasets in scientific domains. To fill this gap, we introduce Multimodal ArXiv, consisting of ArXivCap and ArXivQA, for enhancing LVLMs scientific comprehension. ArXivCap is a figure-caption dataset comprising 6.4M images and 3.9M captions, sourced from 572K ArXiv papers spanning various scientific domains. Drawing from ArXivCap, we introduce ArXivQA, a question-answering dataset generated by prompting GPT-4V based on scientific figures. ArXivQA greatly enhances open-sourced LVLMs' mathematical reasoning capabilities, achieving a 10.4\% absolute accuracy gain on a multimodal mathematical reasoning benchmark. Furthermore, employing ArXivCap, we devise four vision-to-text tasks for benchmarking LVLMs. Evaluation results with state-of-the-art LVLMs underscore their struggle with the nuanced semantics of academic figures, while domain-specific training yields substantial performance gains. Our error analysis uncovers misinterpretations of visual context, recognition errors, and the production of overly simplified captions by current LVLMs, shedding light on future improvements.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2403.00231

Country:

North America (0.46)
Asia (0.28)
Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

Vibe-Eval: A hard evaluation suite for measuring progress of multimodal language models

Padlewski, Piotr, Bain, Max, Henderson, Matthew, Zhu, Zhongkai, Relan, Nishant, Pham, Hai, Ong, Donovan, Aleksiev, Kaloyan, Ormazabal, Aitor, Phua, Samuel, Yeo, Ethan, Lamprecht, Eugenie, Liu, Qi, Wang, Yuqi, Chen, Eric, Fu, Deyu, Li, Lei, Zheng, Che, d'Autume, Cyprien de Masson, Yogatama, Dani, Artetxe, Mikel, Tay, Yi

arXiv.org Artificial IntelligenceMay-3-2024

We introduce Vibe-Eval: a new open benchmark and framework for evaluating multimodal chat models. Vibe-Eval consists of 269 visual understanding prompts, including 100 of hard difficulty, complete with gold-standard responses authored by experts. Vibe-Eval is open-ended and challenging with dual objectives: (i) vibe checking multimodal chat models for day-to-day tasks and (ii) rigorously testing and probing the capabilities of present frontier models. Notably, our hard set contains >50% questions that all frontier models answer incorrectly. We explore the nuances of designing, evaluating, and ranking models on ultra challenging prompts. We also discuss trade-offs between human and automatic evaluation, and show that automatic model evaluation using Reka Core roughly correlates to human judgment. We offer free API access for the purpose of lightweight evaluation and plan to conduct formal human evaluations for public models that perform well on the Vibe-Eval's automatic scores. We release the evaluation code and data, see https://github.com/reka-ai/reka-vibe-eval

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2405.02287

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.95)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models

Reka Team, null, Ormazabal, Aitor, Zheng, Che, d'Autume, Cyprien de Masson, Yogatama, Dani, Fu, Deyu, Ong, Donovan, Chen, Eric, Lamprecht, Eugenie, Pham, Hai, Ong, Isaac, Aleksiev, Kaloyan, Li, Lei, Henderson, Matthew, Bain, Max, Artetxe, Mikel, Relan, Nishant, Padlewski, Piotr, Liu, Qi, Chen, Ren, Phua, Samuel, Yang, Yazheng, Tay, Yi, Wang, Yuqi, Zhu, Zhongkai, Xie, Zhihui

arXiv.org Artificial IntelligenceApr-18-2024

We introduce Reka Core, Flash, and Edge, a series of powerful multimodal language models trained from scratch by Reka. Reka models are able to process and reason with text, images, video, and audio inputs. This technical report discusses details of training some of these models and provides comprehensive evaluation results. We show that Reka Edge and Reka Flash are not only state-of-the-art but also outperform many much larger models, delivering outsized values for their respective compute class. Meanwhile, our most capable and largest model, Reka Core, approaches the best frontier models on both automatic evaluations and blind human evaluations. On image question answering benchmarks (e.g. MMMU, VQAv2), Core performs competitively to GPT4-V. Meanwhile, on multimodal chat, Core ranks as the second most preferred model under a blind third-party human evaluation setup, outperforming other models such as Claude 3 Opus. On text benchmarks, Core not only performs competitively to other frontier models on a set of well-established benchmarks (e.g. MMLU, GSM8K) but also outperforms GPT4-0613 on human evaluation. On video question answering (Perception-Test), Core outperforms Gemini Ultra. Models are shipped in production at http://chat.reka.ai . A showcase of non cherry picked qualitative examples can also be found at http://showcase.reka.ai .

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2404.12387

Country: Asia > Middle East > UAE (0.14)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.94)

Add feedback