alchemist
Alchemist: Turning Public Text-to-Image Data into Generative Gold
Pre-training equips text-to-image (T2I) models with broad world knowledge, but this alone is often insufficient to achieve high aesthetic quality and alignment. Consequently, supervised fine-tuning (SFT) is crucial for further refinement. However, its effectiveness highly depends on the quality of the fine-tuning dataset. Existing public SFT datasets frequently target narrow domains (e.g., anime or specific art styles), and the creation of high-quality, general-purpose SFT datasets remains a significant challenge. Current curation methods are often costly and struggle to identify truly impactful samples.
Alchemist: Turning Public Text-to-Image Data into Generative Gold
Pre-training equips text-to-image (T2I) models with broad world knowledge, but this alone is often insufficient to achieve high aesthetic quality and alignment. Consequently, supervised fine-tuning (SFT) is crucial for further refinement. However, its effectiveness highly depends on the quality of the fine-tuning dataset.Existing public SFT datasets frequently target narrow domains (e.g., anime or specific art styles), and the creation of high-quality, general-purpose SFT datasets remains a significant challenge.Current curation methods are often costly and struggle to identify truly impactful samples.This challenge is further complicated by the scarcity of public general-purpose datasets, as leading models often rely on large, proprietary, and poorly documented internal data, hindering broader research progress.This paper introduces a novel methodology for creating general-purpose SFT datasets by leveraging a pre-trained generative model as an estimator of high-impact training samples. We apply this methodology to construct and release Alchemist, a compact (3,350 samples) yet highly effective SFT dataset. Experiments demonstrate that Alchemist substantially improves the generative quality of five public T2I models while preserving diversity and style. Additionally, we release the fine-tuned models' weights to the public.
OmniPlay: Benchmarking Omni-Modal Models on Omni-Modal Game Playing
Bie, Fuqing, Huang, Shiyu, Tao, Xijia, Fang, Zhiqin, Pan, Leyi, Chen, Junzhe, Ren, Min, Xiang, Liuyu, He, Zhaofeng
While generalist foundation models like Gemini and GPT-4o demonstrate impressive multi-modal competence, existing evaluations fail to test their intelligence in dynamic, interactive worlds. Static benchmarks lack agency, while interactive benchmarks suffer from a severe modal bottleneck, typically ignoring crucial auditory and temporal cues. To bridge this evaluation chasm, we introduce OmniPlay, a diagnostic benchmark designed not just to evaluate, but to probe the fusion and reasoning capabilities of agentic models across the full sensory spectrum. Built on a core philosophy of modality interdependence, OmniPlay comprises a suite of five game environments that systematically create scenarios of both synergy and conflict, forcing agents to perform genuine cross-modal reasoning. Our comprehensive evaluation of six leading omni-modal models reveals a critical dichotomy: they exhibit superhuman performance on high-fidelity memory tasks but suffer from systemic failures in challenges requiring robust reasoning and strategic planning. We demonstrate that this fragility stems from brittle fusion mechanisms, which lead to catastrophic performance degradation under modality conflict and uncover a counter-intuitive "less is more" paradox, where removing sensory information can paradoxically improve performance. Our findings suggest that the path toward robust AGI requires a research focus beyond scaling to explicitly address synergistic fusion. Our platform is available for anonymous review at https://github.com/fuqingbie/omni-game-benchmark.
The ALCHEmist: Automated Labeling 500x CHEaper than LLM Data Annotators
Large pretrained models can be used as annotators, helping replace or augment crowdworkers and enabling distilling generalist models into smaller specialist models. Unfortunately, this comes at a cost: employing top-of-the-line models often requires paying thousands of dollars for API calls, while the resulting datasets are static and challenging to audit. To address these challenges, we propose a simple alternative: rather than directly querying labels from pretrained models, we task models to generate programs that can produce labels. These programs can be stored and applied locally, re-used and extended, and cost orders of magnitude less. Our system, \textbf{Alchemist}, obtains comparable to or better performance than large language model-based annotation in a range of tasks for a fraction of the cost: on average, improvements amount to a \textbf{12.9}
Alchemist: Towards the Design of Efficient Online Continual Learning System
Huang, Yuyang, Liu, Yuhan, Gunawi, Haryadi S., Li, Beibin, Hwang, Changho
Continual learning has become a promising solution to refine large language models incrementally by leveraging user feedback. In particular, online continual learning - iteratively training the model with small batches of user feedback - has demonstrated notable performance improvements. However, the existing practice of separating training and serving processes forces the online trainer to recompute the intermediate results already done during serving. Such redundant computations can account for 30%-42% of total training time. In this paper, we propose Alchemist, to the best of our knowledge, the first online continual learning system that efficiently reuses serving activations to increase training throughput. Alchemist introduces two key techniques: (1) recording and storing activations and KV cache only during the prefill phase to minimize latency and memory overhead; and (2) smart activation offloading and hedging. Evaluations with inputs of varied token length sampled from ShareGPT dataset show that compared with a separate training cluster, Alchemist significantly increases training throughput by up to 1.72x, reduces up to 47% memory usage during training, and supports up to 2x more training tokens - all while maintaining negligible impact on serving latency.
The ALCHEmist: Automated Labeling 500x CHEaper Than LLM Data Annotators
Huang, Tzu-Heng, Cao, Catherine, Bhargava, Vaishnavi, Sala, Frederic
Large pretrained models can be used as annotators, helping replace or augment crowdworkers and enabling distilling generalist models into smaller specialist models. Unfortunately, this comes at a cost: employing top-of-the-line models often requires paying thousands of dollars for API calls, while the resulting datasets are static and challenging to audit. To address these challenges, we propose a simple alternative: rather than directly querying labels from pretrained models, we task models to generate programs that can produce labels. These programs can be stored and applied locally, re-used and extended, and cost orders of magnitude less. Our system, Alchemist, obtains comparable to or better performance than large language model-based annotation in a range of tasks for a fraction of the cost: on average, improvements amount to a 12.9% enhancement while the total labeling costs across all datasets are reduced by a factor of approximately 500x.
Alchemist: Parametric Control of Material Properties with Diffusion Models
Sharma, Prafull, Jampani, Varun, Li, Yuanzhen, Jia, Xuhui, Lagun, Dmitry, Durand, Fredo, Freeman, William T., Matthews, Mark
We propose a method to control material attributes of objects like roughness, metallic, albedo, and transparency in real images. Our method capitalizes on the generative prior of text-to-image models known for photorealism, employing a scalar value and instructions to alter low-level material properties. Addressing the lack of datasets with controlled material attributes, we generated an object-centric synthetic dataset with physically-based materials. Fine-tuning a modified pre-trained text-to-image model on this synthetic dataset enables us to edit material properties in real-world images while preserving all other attributes. We show the potential application of our model to material edited NeRFs.
Alchemist's Palette: Free AI Video Magic - by Datasculptor
Jump into a thrilling exploration of visual transformation with these 3 free art tools that unlock endless artistic possibilities. Take your skills to the next level. In the kaleidoscopic realm of artistic experimentation, where the alchemy of reimagining has been stripped of their creators' stylistic signatures: A fully controlled cinematic spectacle crafted with finesse through a single stroke of a magical wand, transcending the boundaries of tools like Runway GEN 2. (!) Delve with me into this fantastical journey as we unlock the secrets of redefining the visual experience and unleash the boundless potential of these revolutionary methods, all nestled within the confines of this Substack post.