simplified
Scaling, Simplification, and Adaptation: Lessons from Pretraining on Machine-Translated Text
Velasco, Dan John, Roque, Matthew Theodore
Most languages lack sufficient data for large-scale monolingual pretraining, creating a "data wall." Multilingual pretraining helps but is limited by language imbalance and the "curse of multilinguality." An alternative is to translate high-resource text with machine translation (MT), which raises three questions: (1) How does MT-derived data scale with model capacity? (2) Can source-side transformations (e.g., simplifying English with an LLM) improve generalization to native text? (3) How well do models pretrained on MT-derived data adapt when continually trained on limited native text? We investigate these questions by translating English into Indonesian and Tamil--two typologically distant, lower-resource languages--and pretraining GPT-2 models (124M-774M) on native or MT-derived corpora from raw and LLM-simplified English. We evaluate cross-entropy loss on native text, along with accuracy on syntactic probes and downstream tasks. Our results show that (1) MT-pretrained models benefit from scaling; (2) source-side simplification harms generalization to native text; and (3) adapting MT-pretrained models on native text often yields better performance than native-only models, even with less native data. However, tasks requiring cultural nuance (e.g., toxicity detection) demand more exposure to native data.
Relative Drawing Identification Complexity is Invariant to Modality in Vision-Language Models
Freitas, Diogo, Håvardstun, Brigt, Ferri, Cèsar, Garigliotti, Darío, Telle, Jan Arne, Hernández-Orallo, José
Large language models have become multimodal, and many of them are said to integrate their modalities using common representations. If this were true, a drawing of a car as an image, for instance, should map to a similar area in the latent space as a textual description of the strokes that form the drawing. To explore this in a black-box access regime to these models, we propose the use of machine teaching, a theory that studies the minimal set of examples a teacher needs to choose so that the learner captures the concept. In this paper, we evaluate the complexity of teaching vision-language models a subset of objects in the Quick, Draw! dataset using two presentations: raw images as bitmaps and trace coordinates in TikZ format. The results indicate that image-based representations generally require fewer segments and achieve higher accuracy than coordinate-based representations. But, surprisingly, the teaching size usually ranks concepts similarly across both modalities, even when controlling for (a human proxy of) concept priors, suggesting that the simplicity of concepts may be an inherent property that transcends modality representations.
Linguistic Generalizability of Test-Time Scaling in Mathematical Reasoning
Son, Guijin, Hong, Jiwoo, Ko, Hyunwoo, Thorne, James
Scaling pre-training compute has proven effective for achieving mulitlinguality, but does the same hold for test-time scaling? In this work, we introduce MCLM, a multilingual math benchmark featuring competition-level problems in 55 languages. We test three test-time scaling methods-Outcome Reward Modeling (ORM), Process Reward Modeling (ORM), and Budget Forcing (BF)-on both Qwen2.5-1.5B Math and MR1-1.5B, a multilingual LLM we trained for extended reasoning. Our experiments show that using Qwen2.5-1.5B Math with ORM achieves a score of 35.8 on MCLM, while BF on MR1-1.5B attains 35.2. Although "thinking LLMs" have recently garnered significant attention, we find that their performance is comparable to traditional scaling methods like best-of-N once constrained to similar levels of inference FLOPs. Moreover, while BF yields a 20-point improvement on English AIME, it provides only a 1.94-point average gain across other languages-a pattern consistent across the other test-time scaling methods we studied-higlighting that test-time scaling may not generalize as effectively to multilingual tasks. To foster further research, we release MCLM, MR1-1.5B, and evaluation results.
How Neural Networks Actually Work -- Python Implementation Part 2 (Simplified)
For each layer, we will initialize parameters then perform the required computation. There is no computation happening at the input layer, layer 0 and therefore we go straight into the hidden layer, layer 1. This is 4 by 3. We need (n¹, 1) for the bias, which is 4 by 1. To make it easy to print out data, we are not using the data defined at the beginning of the article at this point (we will do that later in this article). We will instead use the following subset.
How Neural Networks Actually Work -- Python Implementation (Simplified)
Neural Network (NN) is a black box for so many people. We know that it works, but we don't understand how it works. This article will demystify this belief by working on some examples to show how a neural network really works. If some terms are not so clear in this article, I have already written two more articles to cover the real basics: article 1 and article 2. In this first example, let us consider a simple case where we have a dataset of 3 features, a target variable and just one training example (in reality we can never have this kind of data, but, it will make a very good start). Fact 1: The structure of the data influences the architecture chosen for modelling.
Top 10 AI-Generated Images by DALL-E 2 - Simplified
OpenAI, a San Francisco Artificial Intelligence company closely affiliated with Microsoft, launched an A.I. system and neural network in January 2021 known as DALL-E. Named using a pun of the surrealist artist Salvador Dalí and Pixar's famous movie WALL-E, DALL-E creates images from text.In this blog, we'll let you in on everything you should know about DALL-E, its variation DALL-E 2, and share ten of the most creative AI-generated images of Dall-E 2. Picture of a dog wearing a beret and a turtleneck generated by the DALL-E 2 image generation software. Now, you may be wondering what DALL-E is all about. It's an AI tool that takes a description of an object or a scene and automatically produces an image depicting the scene/object. DALL-E also allows you to edit all the wonderful AI-generated images you've created with simple tools and text modifications.
Convolutional Neural Networks: Simplified
If you are getting into Artificial Intelligence, chances are that you've heard of Convolutional Neural Networks (CNN) and were overwhelmed by it, or maybe you just want to know what they are. In this article, I will try to explain CNN in layman's terms. Let us take an analogy to understand Convolutional Neural Networks. Imagine you have been given the pieces of a jigsaw puzzle and told to identify what it depicts, and there is no cover image to help you. Say you find a few pieces, that when put together, form eyes.
Simplified: Off-Policy vs On-Policy in Reinforcement Learning
Early on when learning Reinforcement Learning you may encounter such distinction between algorithms -- some are on-policy some off-policy. You may read many explanations, but still, ask the question: what the hell is the difference? Let's try to clarify this concept once forever. I believe that the best way to do this is by example. So let's set up a simple environment.
Simplified: Artificial Intelligence for IT
Sign in to report inappropriate content. The promise of AI is real: less time troubleshooting issues and more time on strategic work and innovation that secures the business and drives it forward. Download includes English, Simplified Chinese, Japanese, Korean, German, and French editions: https://www.juniper.net/documentation...