moonlight
Group-Aware Reinforcement Learning for Output Diversity in Large Language Models
Anschel, Oron, Shoshan, Alon, Botach, Adam, Hakimi, Shunit Haviv, Gendler, Asaf, Baruch, Emanuel Ben, Bhonker, Nadav, Kviatkovsky, Igor, Aggarwal, Manoj, Medioni, Gerard
Large Language Models (LLMs) often suffer from mode collapse, repeatedly generating the same few completions even when many valid answers exist, limiting their diversity across a wide range of tasks. We introduce Group-Aware Policy Optimization (GAPO), a simple extension of the recent and popular Group Relative Policy Optimization (GRPO) that computes rewards over the group as a whole. GAPO enables learning from the group-level properties such as diversity and coverage. We demonstrate GAPO using a frequency-aware reward function that encourages uniform sampling over valid LLM completions, and show that GAPO-trained models produce valid and more diverse model responses. Beyond this setup, GAPO generalizes to open-ended prompts and improves response diversity without compromising accuracy on standard LLM benchmarks (GSM8K, MATH, HumanEval, MMLU-Pro). Our code will be made publicly available.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Europe > Spain (0.04)
- Asia > Japan (0.04)
- (3 more...)
Supplementary Materials: FiV A: Fine-grained Visual Attribute Dataset for T ext-to-Image Diffusion Models
Section A. We then introduce additional details on dataset construction in Section B. Further, we Finally, we discuss the limitations and future work of the project in Section D. Please also find the Details on attribute taxonomy and statistics. We visualize the rough distribution of visual attributes and subjects on the left. We also visualize the attribute alignment accuracy via human validation here. Due to space limitations, only 15 sub-subjects are listed for each major-subject. The result shows that Image 4 exhibits inconsistencies, with the reasons provided.
MARS-M: When Variance Reduction Meets Matrices
Liu, Yifeng, Yuan, Angela, Gu, Quanquan
Matrix-based preconditioned optimizers, such as Muon, have recently been shown to be more efficient than scalar-based optimizers for training large-scale neural networks, including large language models (LLMs). On the other hand, recent benchmarks on optimizers for LLM pre-training have demonstrated that variance-reduction techniques such as MARS can achieve substantial speedups over standard optimizers that do not employ variance reduction. In this paper, to achieve the best of both worlds, we introduce MARS-M, a new optimizer that integrates the variance reduction technique in MARS with Muon. Under standard regularity conditions, we prove that Muon-M converges to a first-order stationary point at a rate of $\tilde{\mathcal{O}}(T^{-1/3})$, which improves upon $\tilde{\mathcal{O}}(T^{-1/4})$ rate attained by Muon. Our empirical results on language modeling and computer vision tasks demonstrate that MARS-M consistently yields lower losses and improved performance across various downstream benchmarks. The implementation of MARS-M is available at https://github.com/AGI-Arena/MARS/tree/main/MARS_M.
- North America > United States > California > Los Angeles County > Los Angeles (0.28)
- Asia > Middle East > Jordan (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- (7 more...)
- Research Report (0.64)
- Workflow (0.46)
Supplementary Materials: FiV A: Fine-grained Visual Attribute Dataset for T ext-to-Image Diffusion Models
Section A. We then introduce additional details on dataset construction in Section B. Further, we Finally, we discuss the limitations and future work of the project in Section D. Please also find the Details on attribute taxonomy and statistics. We visualize the rough distribution of visual attributes and subjects on the left. We also visualize the attribute alignment accuracy via human validation here. Due to space limitations, only 15 sub-subjects are listed for each major-subject. The result shows that Image 4 exhibits inconsistencies, with the reasons provided.
- Law (0.68)
- Media > Photography (0.46)
Muon is Scalable for LLM Training
Liu, Jingyuan, Su, Jianlin, Yao, Xingcheng, Jiang, Zhejun, Lai, Guokun, Du, Yulun, Qin, Yidao, Xu, Weixin, Lu, Enzhe, Yan, Junjie, Chen, Yanru, Zheng, Huabin, Liu, Yibo, Liu, Shaowei, Yin, Bohong, He, Weiran, Zhu, Han, Wang, Yuzhi, Wang, Jianzhou, Dong, Mengnan, Zhang, Zheng, Kang, Yongsheng, Zhang, Hao, Xu, Xinran, Zhang, Yutao, Wu, Yuxin, Zhou, Xinyu, Yang, Zhilin
Recently, the Muon optimizer based on matrix orthogonalization has demonstrated strong results in training small-scale language models, but the scalability to larger models has not been proven. We identify two crucial techniques for scaling up Muon: (1) adding weight decay and (2) carefully adjusting the per-parameter update scale. These techniques allow Muon to work out-of-the-box on large-scale training without the need of hyper-parameter tuning. Scaling law experiments indicate that Muon achieves $\sim\!2\times$ computational efficiency compared to AdamW with compute optimal training. Based on these improvements, we introduce Moonlight, a 3B/16B-parameter Mixture-of-Expert (MoE) model trained with 5.7T tokens using Muon. Our model improves the current Pareto frontier, achieving better performance with much fewer training FLOPs compared to prior models. We open-source our distributed Muon implementation that is memory optimal and communication efficient. We also release the pretrained, instruction-tuned, and intermediate checkpoints to support future research.
- Asia > Middle East > Jordan (0.05)
- North America > United States > California > San Diego County > San Diego (0.04)
ArtAug: Enhancing Text-to-Image Generation through Synthesis-Understanding Interaction
Duan, Zhongjie, Zhao, Qianyi, Chen, Cen, Chen, Daoyuan, Zhou, Wenmeng, Li, Yaliang, Chen, Yingda
The emergence of diffusion models has significantly advanced image synthesis. The recent studies of model interaction and self-corrective reasoning approach in large language models offer new insights for enhancing text-to-image models. Inspired by these studies, we propose a novel method called ArtAug for enhancing text-to-image models in this paper. To the best of our knowledge, ArtAug is the first one that improves image synthesis models via model interactions with understanding models. In the interactions, we leverage human preferences implicitly learned by image understanding models to provide fine-grained suggestions for image synthesis models. The interactions can modify the image content to make it aesthetically pleasing, such as adjusting exposure, changing shooting angles, and adding atmospheric effects. The enhancements brought by the interaction are iteratively fused into the synthesis model itself through an additional enhancement module. This enables the synthesis model to directly produce aesthetically pleasing images without any extra computational cost. In the experiments, we train the ArtAug enhancement module on existing text-to-image models. Various evaluation metrics consistently demonstrate that ArtAug enhances the generative capabilities of text-to-image models without incurring additional computational costs. The source code and models will be released publicly.
Chinese Traditional Poetry Generating System Based on Deep Learning
Chinese traditional poetry is an important intangible cultural heritage of China and an artistic carrier of thought, culture, spirit and emotion. However, due to the strict rules of ancient poetry, it is very difficult to write poetry by machine. This paper proposes an automatic generation method of Chinese traditional poetry based on deep learning technology, which extracts keywords from each poem and matches them with the previous text to make the poem conform to the theme, and when a user inputs a paragraph of text, the machine obtains the theme and generates poem sentence by sentence. Using the classic word2vec model as the preprocessing model, the Chinese characters which are not understood by the computer are transformed into matrix for processing. Bi-directional Long Short-Term Memory is used as the neural network model to generate Chinese characters one by one and make the meaning of Chinese characters as accurate as possible. At the same time, TF-IDF and TextRank are used to extract keywords. Using the attention mechanism based encoding-decoding model, we can solve practical problems by transforming the model, and strengthen the important information of long-distance information, so as to grasp the key points without losing important information. In the aspect of emotion judgment, Long Short-Term Memory network is used. The final result shows that it can get good poetry outputs according to the user input text.
European Space Agency reveals ambitious plans to build sat-nav around the moon
The European Space Agency (ESA) has launched an ambitious new project to build a sat-nav and communication satellite network in orbit around the moon. This new infrastructure could one day turn our natural satellite into the'eighth continent' as humanity spreads its wings and builds cities on the lunar surface. ESA says the project, known as Moonlight, will support the Lunar Gateway space station, multiple agencies working on moon missions and human exploration. In what will be the world's first commercial service of its kind, a number of British firms have won contracts to investigate how it might work, worth over £2 million. 'We are entering a new phase - the systematic exploration of our "eighth continent", the Moon,' ESA's David Parker told BBC News.
- Europe (0.95)
- North America > United States > Florida > Brevard County (0.15)
- Government > Space Agency (1.00)
- Government > Regional Government > North America Government > United States Government (0.33)
'The Women's Balcony,' 'Moonlight' and more critics' picks, March 3-9
Arrival Amy Adams stars in this elegant, involving science-fiction drama that is simultaneously old and new, revisiting many alien-invasion conventions but with unexpected intelligence, visual style and heart. Elle Paul Verhoeven's brilliantly booby-trapped thriller starring Isabelle Huppert is a gripping whodunit, a tour de force of psychological suspense and a wickedly droll comedy of manners. The Founder Michael Keaton gives a performance of ratty, reptilian brilliance as Ray Kroc, the American salesman who turned a California burger stand into the global fast-food behemoth that is McDonald's, in John Lee Hancock's shrewd and satisfyingly fat-free biopic. I Am Not Your Negro As directed by the gifted Raoul Peck, this documentary on James Baldwin uses the entire spectrum of movie effects, not only spoken language but also sound, music, editing and all manner of visuals, to create a cinematic essay that is powerful and painfully relevant. La La Land Starring a well-paired Ryan Gosling and Emma Stone, writer-director Damien Chazelle's tuneful tribute to classic movie musicals is often stronger in concept than execution, but it's lovely and transporting all the same.
- North America > United States > California (0.26)
- Asia > Middle East > Israel > Jerusalem District > Jerusalem (0.06)
- Media > Film (1.00)
- Leisure & Entertainment (1.00)
- Health & Medicine (1.00)