zhu
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Inference scaling empowers LLMs with unprecedented reasoning ability, with reinforcement learning as the core technique to elicit complex reasoning. However, key technical details of state-of-the-art reasoning LLMs are concealed (such as in OpenAI o1 blog and DeepSeek R1 technical report), thus the community still struggles to reproduce their RL training results.
NaViL: Rethinking Scaling Properties of Native Multimodal Large Language Models under Data Constraints
Compositional training has been the de-facto paradigm in existing Multimodal Large Language Models (MLLMs), where pre-trained vision encoders are connected with pre-trained LLMs through continuous multimodal pre-training. However, the multimodal scaling property of this paradigm remains difficult to explore due to the separated training. In this paper, we focus on the native training of MLLMs in an end-to-end manner and systematically study its design space and scaling property under a practical setting, i.e., data constraint. Through careful study of various choices in MLLM, we obtain the optimal meta-architecture that best balances performance and training cost. After that, we further explore the scaling properties of the native MLLM and indicate the positively correlated scaling relationship between visual encoders and LLMs. Based on these findings, we propose a native MLLM called NaViL, combined with a simple and cost-effective recipe. Experimental results on 14 multimodal benchmarks confirm the competitive performance of NaViL against existing MLLMs. Besides that, our findings and results provide in-depth insights for the future study of native MLLMs.
How Chinese short dramas became AI content machines
The viral short dramas are increasingly being created entirely with AI, with hundreds of new shows spun up each day. In a dimly lit bedroom, a frightened young woman is thrown onto a bed by a tall, muscular man. He grabs her hand, and flame-like vines crawl across her body, fusing with her flesh. A dragon-shaped tattoo appears across her chest. "Two months," the man says. "Give me an heir, or I will eat you."
GraphStochasticNeuralNetworksfor Semi-supervisedLearning: SupplementalMaterial
Let θ and φ denote the optimal parameters after model training. The detailed statistics of three datasets used in this paper are listed in Table 1. In this paper, when evaluating the performance in the standard experimental scenario and in the label-scarce scenario, we compare with six state-of-the-art baselines used for graph-based semisupervised learning. Three of them are deterministic GNN-based models, which are GCN [1], Graph Attention Networks(GAT)[2]andGraphSAGE[3]respectively.