Goto

Collaborating Authors

 anime character


ToonOut: Fine-tuned Background-Removal for Anime Characters

Muratori, Matteo, Seytre, Joël

arXiv.org Artificial Intelligence

While state-of-the-art background removal models excel at realistic imagery, they frequently underperform in specialized domains such as anime-style content, where complex features like hair and transparency present unique challenges. To address this limitation, we collected and annotated a custom dataset of 1,228 high-quality anime images of characters and objects, and fine-tuned the open-sourced BiRefNet model on this dataset. This resulted in marked improvements in background removal accuracy for anime-style images, increasing from 95.3% to 99.5% for our newly introduced Pixel Accuracy metric. We are open-sourcing the code, the fine-tuned model weights, as well as the dataset at: https://github.com/MatteoKartoon/BiRefNet.


LLMs vs. Chinese Anime Enthusiasts: A Comparative Study on Emotionally Supportive Role-Playing

Qiu, Lanlan, Pu, Xiao, Feng, Yeqi, He, Tianxing

arXiv.org Artificial Intelligence

Large Language Models (LLMs) have demonstrated impressive capabilities in role-playing conversations and providing emotional support as separate research directions. However, there remains a significant research gap in combining these capabilities to enable emotionally supportive interactions with virtual characters. To address this research gap, we focus on anime characters as a case study because of their well-defined personalities and large fan bases. This choice enables us to effectively evaluate how well LLMs can provide emotional support while maintaining specific character traits. We introduce ChatAnime, the first Emotionally Supportive Role-Playing (ESRP) dataset. We first thoughtfully select 20 top-tier characters from popular anime communities and design 60 emotion-centric real-world scenario questions. Then, we execute a nationwide selection process to identify 40 Chinese anime enthusiasts with profound knowledge of specific characters and extensive experience in role-playing. Next, we systematically collect two rounds of dialogue data from 10 LLMs and these 40 Chinese anime enthusiasts. To evaluate the ESRP performance of LLMs, we design a user experience-oriented evaluation system featuring 9 fine-grained metrics across three dimensions: basic dialogue, role-playing and emotional support, along with an overall metric for response diversity. In total, the dataset comprises 2,400 human-written and 24,000 LLM-generated answers, supported by over 132,000 human annotations. Experimental results show that top-performing LLMs surpass human fans in role-playing and emotional support, while humans still lead in response diversity. We hope this work can provide valuable resources and insights for future research on optimizing LLMs in ESRP. Our datasets are available at https://github.com/LanlanQiu/ChatAnime.


ReGraP-LLaVA: Reasoning enabled Graph-based Personalized Large Language and Vision Assistant

Xiang, Yifan, Zhang, Zhenxi, Li, Bin, Weng, Yixuan, Zhou, Shoujun, He, Yangfan, Li, Keqin

arXiv.org Artificial Intelligence

Recent advances in personalized MLLMs enable effective capture of user-specific concepts, supporting both recognition of personalized concepts and contextual captioning. However, humans typically explore and reason over relations among objects and individuals, transcending surface-level information to achieve more personalized and contextual understanding. To this end, existing methods may face three main limitations: Their training data lacks multi-object sets in which relations among objects are learnable. Building on the limited training data, their models overlook the relations between different personalized concepts and fail to reason over them. Their experiments mainly focus on a single personalized concept, where evaluations are limited to recognition and captioning tasks. To address the limitations, we present a new dataset named ReGraP, consisting of 120 sets of personalized knowledge. Each set includes images, KGs, and CoT QA pairs derived from the KGs, enabling more structured and sophisticated reasoning pathways. We propose ReGraP-LLaVA, an MLLM trained with the corresponding KGs and CoT QA pairs, where soft and hard graph prompting methods are designed to align KGs within the model's semantic space. We establish the ReGraP Benchmark, which contains diverse task types: multiple-choice, fill-in-the-blank, True/False, and descriptive questions in both open- and closed-ended settings. The proposed benchmark is designed to evaluate the relational reasoning and knowledge-connection capability of personalized MLLMs. We conduct experiments on the proposed ReGraP-LLaVA and other competitive MLLMs. Results show that the proposed model not only learns personalized knowledge but also performs relational reasoning in responses, achieving the SoTA performance compared with the competitive methods. All the codes and datasets are released at: https://github.com/xyfyyds/ReGraP.


Building a simple Generative Adversarial Network (GAN) using Keras

#artificialintelligence

In this post, we will learn to develop a Generative Adversarial Network (GAN) for generating realistic manga or anime characters. I've always been amazed by vivid animations, especially Manga and their bold looks and strokes. Wouldn't it be awesome to be able to draw a few ourselves, to experience the thrill of creating them with the help of a self-developed Neural Network?! The best way to master a skill is to practice and improvise it until you're satisfied with yourself and your efforts. For a machine or a neural network, the best output it can generate is the one that matches human-generated outputs--or even fool a human to believe that a human actually produced the output.


Deep Learning Model Morphs VTube Talking Heads With a Few Mouse Clicks

#artificialintelligence

Every day is Halloween for Virtual YouTubers or "VTubers" -- the new generation of wildly popular online entertainers whose voices and actions are represented in real time by colourful and expressive anime characters. Now, a Google researcher has released a deep neural network model that makes animating a VTube persona a little easier. Using motion capture systems to transfer human movements to cartoon characters in real-time is a process that can be traced back to the 90s. The approach however was not popularized, and the term "Virtual YouTuber" did not enter our vocabulary until the virtual character "Kizuna AI" debuted in 2016. Kizuna is a cute young girl with wide eyes and a pink butterfly bow perched atop her long flowing hair -- any otaku's dream.


Deep Learning Model Morphs VTube Talking Heads With a Few Mouse Clicks

#artificialintelligence

Every day is Halloween for Virtual YouTubers or "VTubers" -- the new generation of wildly popular online entertainers whose voices and actions are represented in real time by colourful and expressive anime characters. Now, a Google researcher has released a deep neural network model that makes animating a VTube persona a little easier. Using motion capture systems to transfer human movements to cartoon characters in real-time is a process that can be traced back to the 90s. The approach however was not popularized, and the term "Virtual YouTuber" did not enter our vocabulary until the virtual character "Kizuna AI" debuted in 2016. Kizuna is a cute young girl with wide eyes and a pink butterfly bow perched atop her long flowing hair -- any otaku's dream.


Finally, an app that turns your selfie into an anime character

#artificialintelligence

There are countless uses for neural networks: one composes terrifying jazz, and another dreams up an entire text adventure game in real time. So it should come as no surprise that a smartphone app called TwinFACE, now available on the Google Play store, is designed to transform your selfie into an anime character. TwinFACE uses the same open-source UGATIT code as an AI developed earlier this year by a team of South Korean researchers from video game company NCSoft. But there's one catch: it doesn't do a very good job. The resulting portraits are, frankly, terrible and in some instances pretty scary.


More Deep Fakes: From Rental Listings to Anime Characters

#artificialintelligence

Thanks to the sudden popularity of an artificial intelligence (AI) image generator that makes scarily realistic images of faces, a whole host of other image generators have cropped up on the internet. Using the same generative adversarial network technology (GAN for short), these machine learning algorithms are producing everything from fake Airbnb listings, to anime characters. We take a look at some of the best (and worst) of what these image generators have to offer. If you're interested in finding out more about the technology behind deep fakes, make sure to read up on our article and take the deep fakes quiz. Let me start off by saying, these generators are nowhere near as convincing as the human face generator.


Random generation of anime characters by sophisticated AI programs is now so good, it's unreal

#artificialintelligence

Never would we have thought that characters designed by AI programs jumped from rudimentary to ultra-advanced in the space of three years. In 2015, an artificial intelligence program called Chainer was introduced to the world, which generated anime characters based on users' inputs and helped artists come up with their own ideas. It was relatively basic and created content that looked like it was haphazardly drawn. Nevertheless, it was a first attempt to design an AI that could create anime characters. But it became the stepping stone for a more sophisticated program featured on a website called MakeGirls.moe in 2017.


Full-body High-resolution Anime Generation with Progressive Structure-conditional Generative Adversarial Networks

Hamada, Koichi, Tachibana, Kentaro, Li, Tianqi, Honda, Hiroto, Uchida, Yusuke

arXiv.org Machine Learning

We propose Progressive Structure-conditional Generative Adversarial Networks (PSGAN), a new framework that can generate full-body and high-resolution character images based on structural information. Recent progress in generative adversarial networks with progressive training has made it possible to generate high-resolution images. However, existing approaches have limitations in achieving both high image quality and structural consistency at the same time. Our method tackles the limitations by progressively increasing the resolution of both generated images and structural conditions during training. In this paper, we empirically demonstrate the effectiveness of this method by showing the comparison with existing approaches and video generation results of diverse anime characters at 1024x1024 based on target pose sequences. We also create a novel dataset containing full-body 1024x1024 high-resolution images and exact 2D pose keypoints using Unity 3D Avatar models.