AITopics | Zhou, Xing

Collaborating Authors

Zhou, Xing

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Open-Sora Plan: Open-Source Large Video Generation Model

Lin, Bin, Ge, Yunyang, Cheng, Xinhua, Li, Zongjian, Zhu, Bin, Wang, Shaodong, He, Xianyi, Ye, Yang, Yuan, Shenghai, Chen, Liuhan, Jia, Tanghui, Zhang, Junwu, Tang, Zhenyu, Pang, Yatian, She, Bin, Yan, Cen, Hu, Zhiheng, Dong, Xiaoyi, Chen, Lin, Pan, Zhang, Zhou, Xing, Dong, Shaoling, Tian, Yonghong, Yuan, Li

arXiv.org Artificial IntelligenceNov-28-2024

We introduce Open-Sora Plan, an open-source project that aims to contribute a large generation model for generating desired high-resolution videos with long durations based on various user inputs. Our project comprises multiple components for the entire video generation process, including a Wavelet-Flow Variational Autoencoder, a Joint Image-Video Skiparse Denoiser, and various condition controllers. Moreover, many assistant strategies for efficient training and inference are designed, and a multi-dimensional data curation pipeline is proposed for obtaining desired high-quality data. Benefiting from efficient thoughts, our Open-Sora Plan achieves impressive video generation results in both qualitative and quantitative evaluations. We hope our careful design and practical experience can inspire the video generation research community. All our codes and model weights are publicly available at \url{https://github.com/PKU-YuanGroup/Open-Sora-Plan}.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2412.00131

Country: Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(4 more...)

Add feedback

ADSNet: Cross-Domain LTV Prediction with an Adaptive Siamese Network in Advertising

Wang, Ruize, Xu, Hui, Cheng, Ying, He, Qi, Zhou, Xing, Feng, Rui, Xu, Wei, Huang, Lei, Jiang, Jie

arXiv.org Artificial IntelligenceJun-15-2024

Advertising platforms have evolved in estimating Lifetime Value (LTV) to better align with advertisers' true performance metric. However, the sparsity of real-world LTV data presents a significant challenge to LTV predictive model(i.e., pLTV), severely limiting the their capabilities. Therefore, we propose to utilize external data, in addition to the internal data of advertising platform, to expand the size of purchase samples and enhance the LTV prediction model of the advertising platform. To tackle the issue of data distribution shift between internal and external platforms, we introduce an Adaptive Difference Siamese Network (ADSNet), which employs cross-domain transfer learning to prevent negative transfer. Specifically, ADSNet is designed to learn information that is beneficial to the target domain. We introduce a gain evaluation strategy to calculate information gain, aiding the model in learning helpful information for the target domain and providing the ability to reject noisy samples, thus avoiding negative transfer. Additionally, we also design a Domain Adaptation Module as a bridge to connect different domains, reduce the distribution distance between them, and enhance the consistency of representation space distribution. We conduct extensive offline experiments and online A/B tests on a real advertising platform. Our proposed ADSNet method outperforms other methods, improving GINI by 2$\%$. The ablation study highlights the importance of the gain evaluation strategy in negative gain sample rejection and improving model performance. Additionally, ADSNet significantly improves long-tail prediction. The online A/B tests confirm ADSNet's efficacy, increasing online LTV by 3.47$\%$ and GMV by 3.89$\%$.

data mining, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2406.10517

Country:

Europe (0.48)
Asia > China (0.29)
North America > United States > New York (0.14)

Genre: Research Report (1.00)

Industry:

Marketing (0.47)
Information Technology > Services (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
(2 more...)

Add feedback

LLMBind: A Unified Modality-Task Integration Framework

Zhu, Bin, Ning, Munan, Jin, Peng, Lin, Bin, Huang, Jinfa, Song, Qi, Zhang, Junwu, Tang, Zhenyu, Pan, Mingjun, Zhou, Xing, Yuan, Li

arXiv.org Artificial IntelligenceApr-18-2024

In the multi-modal domain, the dependence of various models on specific input formats leads to user confusion and hinders progress. To address this challenge, we introduce \textbf{LLMBind}, a novel framework designed to unify a diverse array of multi-modal tasks. By harnessing a Mixture-of-Experts (MoE) Large Language Model (LLM), LLMBind processes multi-modal inputs and generates task-specific tokens, enabling the invocation of corresponding models to accomplish tasks. This unique approach empowers LLMBind to interpret inputs and generate outputs across various modalities, including image, text, video, and audio. Furthermore, we have constructed an interaction dataset comprising 400k instructions, which unlocks the ability of LLMBind for interactive visual generation and editing tasks. Extensive experimentation demonstrates that LLMBind achieves very superior performance across diverse tasks and outperforms existing models in user evaluations conducted in real-world scenarios. Moreover, the adaptability of LLMBind allows for seamless integration with the latest models and extension to new modality tasks, highlighting its potential to serve as a unified AI agent for modeling universal modalities.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2402.14891

Country:

Oceania > Australia (0.17)
Asia > China (0.14)
Europe > Netherlands (0.14)

Genre: Research Report (0.82)

Industry: Media (0.69)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

VISION2UI: A Real-World Dataset with Layout for Code Generation from UI Designs

Gui, Yi, Li, Zhen, Wan, Yao, Shi, Yemin, Zhang, Hongyu, Su, Yi, Dong, Shaoling, Zhou, Xing, Jiang, Wenbin

arXiv.org Artificial IntelligenceApr-9-2024

Automatically generating UI code from webpage design visions can significantly alleviate the burden of developers, enabling beginner developers or designers to directly generate Web pages from design diagrams. Currently, prior research has accomplished the objective of generating UI code from rudimentary design visions or sketches through designing deep neural networks. Inspired by the groundbreaking advancements achieved by Multimodal Large Language Models (MLLMs), the automatic generation of UI code from high-fidelity design images is now emerging as a viable possibility. Nevertheless, our investigation reveals that existing MLLMs are hampered by the scarcity of authentic, high-quality, and large-scale datasets, leading to unsatisfactory performance in automated UI code generation. To mitigate this gap, we present a novel dataset, termed VISION2UI, extracted from real-world scenarios, augmented with comprehensive layout information, tailored specifically for finetuning MLLMs in UI code generation. Specifically, this dataset is derived through a series of operations, encompassing collecting, cleaning, and filtering of the open-source Common Crawl dataset. In order to uphold its quality, a neural scorer trained on labeled samples is utilized to refine the data, retaining higher-quality instances. Ultimately, this process yields a dataset comprising 2,000 (Much more is coming soon) parallel samples encompassing design visions and UI code. The dataset is available at https://huggingface.co/datasets/xcodemind/vision2ui.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2404.06369

Country:

Asia > China (0.28)
North America > United States > Hawaii (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Efficient Large Scale Language Modeling with Mixtures of Experts

Artetxe, Mikel, Bhosale, Shruti, Goyal, Naman, Mihaylov, Todor, Ott, Myle, Shleifer, Sam, Lin, Xi Victoria, Du, Jingfei, Iyer, Srinivasan, Pasunuru, Ramakanth, Anantharaman, Giri, Li, Xian, Chen, Shuohui, Akin, Halil, Baines, Mandeep, Martin, Louis, Zhou, Xing, Koura, Punit Singh, O'Horo, Brian, Wang, Jeff, Zettlemoyer, Luke, Diab, Mona, Kozareva, Zornitsa, Stoyanov, Ves

arXiv.org Artificial IntelligenceDec-20-2021

Mixture of Experts layers (MoEs) enable efficient scaling of language models through conditional computation. This paper presents a detailed empirical study of how autoregressive MoE language models scale in comparison with dense models in a wide range of settings: in- and out-of-domain language modeling, zero- and few-shot priming, and full fine-tuning. With the exception of fine-tuning, we find MoEs to be substantially more compute efficient. At more modest training budgets, MoEs can match the performance of dense models using $\sim$4 times less compute. This gap narrows at scale, but our largest MoE model (1.1T parameters) consistently outperforms a compute-equivalent dense model (6.7B parameters). Overall, this performance gap varies greatly across tasks and domains, suggesting that MoE and dense models generalize differently in ways that are worthy of future study. We make our code and models publicly available for research use.

artificial intelligence, machine learning, natural language, (14 more...)

arXiv.org Artificial Intelligence

2112.10684

Country:

Europe (1.00)
North America > United States > Minnesota (0.14)
North America > United States > Louisiana (0.14)
North America > United States > California (0.14)

Genre: Research Report > New Finding (1.00)

Industry: Energy (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.87)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.72)

Add feedback