AITopics | selma

SELMA: Learning and Merging Skill-Specific Text-to-Image Experts with Auto-Generated Data

Neural Information Processing SystemsMar-20-2026, 00:51:58 GMT

Recent text-to-image (T2I) generation models have demonstrated impressive capabilities in creating images from text descriptions. However, these T2I generation models often fail to generate images that precisely match the details of the text inputs, such as incorrect spatial relationship or missing objects. In this paper, we introduce SELMA: Skill-Specific Expert Learning and Merging with Auto-Generated Data, a novel paradigm to improve the faithfulness of T2I models by fine-tuning models on automatically generated, multi-skill image-text datasets, with skill-specific expert learning and merging. First, SELMA leverages an LLM's in-context learning capability to generate multiple datasets of text prompts that can teach different skills, and then generates the images with a T2I model based on the prompts. Next, SELMA adapts the T2I model to the new skills by learning multiple single-skill LoRA (low-rank adaptation) experts followed by expert merging.

artificial intelligence, name change, proceedings, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.54)

Add feedback

SELMA: Learning and Merging Skill-Specific Text-to-Image Experts with Auto-Generated Data

Neural Information Processing SystemsFeb-12-2026, 01:23:56 GMT

Recent text-to-image (T2I) generation models have demonstrated impressive capabilities in creating images from text descriptions.

large language model, machine learning, natural language, (22 more...)

Neural Information Processing Systems

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Industry:

Information Technology (0.67)
Leisure & Entertainment > Games (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Add feedback

SELMA: Learning and Merging Skill-Specific Text-to-Image Experts with Auto-Generated Data

Neural Information Processing SystemsOct-10-2025, 00:45:42 GMT

Recent text-to-image (T2I) generation models have demonstrated impressive capabilities in creating images from text descriptions.

dataset, selma, text prompt, (17 more...)

Neural Information Processing Systems

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Industry:

Information Technology (0.67)
Leisure & Entertainment > Games (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Add feedback

SELMA: Learning and Merging Skill-Specific Text-to-Image Experts with Auto-Generated Data

Neural Information Processing SystemsMay-26-2025, 23:05:32 GMT

Recent text-to-image (T2I) generation models have demonstrated impressive capabilities in creating images from text descriptions. However, these T2I generation models often fail to generate images that precisely match the details of the text inputs, such as incorrect spatial relationship or missing objects. In this paper, we introduce SELMA: Skill-Specific Expert Learning and Merging with Auto-Generated Data, a novel paradigm to improve the faithfulness of T2I models by fine-tuning models on automatically generated, multi-skill image-text datasets, with skill-specific expert learning and merging. First, SELMA leverages an LLM's in-context learning capability to generate multiple datasets of text prompts that can teach different skills, and then generates the images with a T2I model based on the prompts. Next, SELMA adapts the T2I model to the new skills by learning multiple single-skill LoRA (low-rank adaptation) experts followed by expert merging.

artificial intelligence, auto-generated data, merging skill-specific text-to-image expert, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.57)

Add feedback

SELMA: A Speech-Enabled Language Model for Virtual Assistant Interactions

Wagner, Dominik, Churchill, Alexander, Sigtia, Siddharth, Marchi, Erik

arXiv.org Artificial IntelligenceFeb-3-2025

In this work, we present and evaluate SELMA, a Speech-Enabled Language Model for virtual Assistant interactions that integrates audio and text as inputs to a Large Language Model (LLM). SELMA is designed to handle three primary and two auxiliary tasks related to interactions with virtual assistants simultaneously within a single end-to-end model. We employ low-rank adaptation modules for parameter-efficient training of both the audio encoder and the LLM. Additionally, we implement a feature pooling strategy enabling the system to recognize global patterns and improve accuracy on tasks less reliant on individual sequence elements. Experimental results on Voice Trigger (VT) detection, Device-Directed Speech Detection (DDSD), and Automatic Speech Recognition (ASR), demonstrate that our approach both simplifies the typical input processing pipeline of virtual assistants significantly and also improves performance compared to dedicated models for each individual task. SELMA yields relative Equal-Error Rate improvements of 64% on the VT detection task, and 22% on DDSD, while also achieving word error rates close to the baseline.

artificial intelligence, large language model, natural language, (17 more...)

arXiv.org Artificial Intelligence

2501.19377

Country:

Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Europe > Germany > Bavaria > Middle Franconia > Nuremberg (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

SELMA: Learning and Merging Skill-Specific Text-to-Image Experts with Auto-Generated Data

Li, Jialu, Cho, Jaemin, Sung, Yi-Lin, Yoon, Jaehong, Bansal, Mohit

arXiv.org Artificial IntelligenceMar-11-2024

Recent text-to-image (T2I) generation models have demonstrated impressive capabilities in creating images from text descriptions. However, these T2I generation models often fall short of generating images that precisely match the details of the text inputs, such as incorrect spatial relationship or missing objects. In this paper, we introduce SELMA: Skill-Specific Expert Learning and Merging with Auto-Generated Data, a novel paradigm to improve the faithfulness of T2I models by fine-tuning models on automatically generated, multi-skill image-text datasets, with skill-specific expert learning and merging. First, SELMA leverages an LLM's in-context learning capability to generate multiple datasets of text prompts that can teach different skills, and then generates the images with a T2I model based on the prompts. Next, SELMA adapts the T2I model to the new skills by learning multiple single-skill LoRA (low-rank adaptation) experts followed by expert merging. Our independent expert fine-tuning specializes multiple models for different skills, and expert merging helps build a joint multi-skill T2I model that can generate faithful images given diverse text prompts, while mitigating the knowledge conflict from different datasets. We empirically demonstrate that SELMA significantly improves the semantic alignment and text faithfulness of state-of-the-art T2I diffusion models on multiple benchmarks (+2.1% on TIFA and +6.9% on DSG), human preference metrics (PickScore, ImageReward, and HPS), as well as human evaluation. Moreover, fine-tuning with image-text pairs auto-collected via SELMA shows comparable performance to fine-tuning with ground truth data. Lastly, we show that fine-tuning with images from a weaker T2I model can help improve the generation quality of a stronger T2I model, suggesting promising weak-to-strong generalization in T2I models.

dataset, selma, text prompt, (16 more...)

arXiv.org Artificial Intelligence

2403.06952

Country:

North America > United States (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)
Europe > Netherlands > North Holland > Amsterdam (0.04)

Genre: Research Report (0.65)

Industry: Leisure & Entertainment > Games (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.90)

Add feedback

TVStoryGen: A Dataset for Generating Stories with Character Descriptions

Chen, Mingda, Gimpel, Kevin

arXiv.org Artificial IntelligenceOct-9-2022

We introduce TVStoryGen, a story generation dataset that requires generating detailed TV show episode recaps from a brief summary and a set of documents describing the characters involved. Unlike other story generation datasets, TVStoryGen contains stories that are authored by professional screen-writers and that feature complex interactions among multiple characters. Generating stories in TVStoryGen requires drawing relevant information from the lengthy provided documents about characters based on the brief summary. In addition, we propose to train reverse models on our dataset for evaluating the faithfulness of generated stories. We create TVStoryGen from fan-contributed websites, which allows us to collect 26k episode recaps with 1868.7 tokens on average. Empirically, we take a hierarchical story generation approach and find that the neural model that uses oracle content selectors for character descriptions demonstrates the best performance on automatic metrics, showing the potential of our dataset to inspire future research on story generation with constraints. Qualitative analysis shows that the best-performing model sometimes generates content that is unfaithful to the short summaries, suggesting promising directions for future work.

computational linguistic, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2109.08833

Country:

North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
Europe > Italy > Tuscany > Florence (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
(15 more...)

Genre: Overview (0.69)

Industry:

Media > Film (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Moon exploration rovers from Lunar Zebro. What if you would use this?

RobohubApr-28-2021, 14:24:24 GMT

In this series of articles, we take robot innovations from their test-lab and bring them to a randomly selected workplace in the outside world. We discovered that Lunar Zebro is not only good for risky endeavours on extraterrestrial terrains, but that these sturdy self-organising little rovers can also simplify the life of hardworking interior decorators. 'Shoot for the moon' is taken quite literally by this team of students and professors from TU Delft. "We want to make the exploration of the moon available for a wider audience", says Pieter van Santvliet, partnerships coordinator at Lunar Zebro. "So we build the world's smallest and lightest rover."

moon exploration rover, rover, zebro, (9 more...)

Robohub

Country: Europe > Netherlands > South Holland > Delft (0.25)

Technology: Information Technology > Artificial Intelligence > Robots (0.58)

Add feedback

Filters

Collaborating Authors

selma

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

SELMA: Learning and Merging Skill-Specific Text-to-Image Experts with Auto-Generated Data

SELMA: Learning and Merging Skill-Specific Text-to-Image Experts with Auto-Generated Data

SELMA: Learning and Merging Skill-Specific Text-to-Image Experts with Auto-Generated Data

SELMA: Learning and Merging Skill-Specific Text-to-Image Experts with Auto-Generated Data

SELMA: A Speech-Enabled Language Model for Virtual Assistant Interactions

SELMA: Learning and Merging Skill-Specific Text-to-Image Experts with Auto-Generated Data

TVStoryGen: A Dataset for Generating Stories with Character Descriptions

Moon exploration rovers from Lunar Zebro. What if you would use this?