Goto

Collaborating Authors

 image url


WAON: Large-Scale and High-Quality Japanese Image-Text Pair Dataset for Vision-Language Models

Sugiura, Issa, Kurita, Shuhei, Oda, Yusuke, Kawahara, Daisuke, Okabe, Yasuo, Okazaki, Naoaki

arXiv.org Artificial Intelligence

Large-scale and high-quality image-text pair datasets play an important role in developing high-performing Vision-Language Models (VLMs). In this work, we introduce WAON, a large-scale and high-quality Japanese image-text pair dataset containing approximately 155 million examples, collected from Common Crawl. Our dataset construction pipeline employs various techniques, including filtering and deduplication, which have been shown to be effective in previous studies. To evaluate its effectiveness, we also construct WAON-Bench, a manually curated benchmark for Japanese cultural image classification, consisting of 374 classes. To assess the effectiveness of our dataset, we conduct experiments using both WAON and the Japanese subset of ReLAION, one of the most widely used vision-language datasets. We fine-tune SigLIP2, a strong multilingual model, on both datasets. The results demonstrate that WAON enhances model performance on WAON-Bench more efficiently than ReLAION and achieves higher accuracy across all evaluated benchmarks. Furthermore, the model fine-tuned on WAON achieves state-of-the-art performance on several Japanese cultural benchmarks. We release our dataset, model, and code at https://speed1313.github.io/WAON.


Midjourney Mastery: A Guide to Using Image Prompts - Metaroids

#artificialintelligence

Midjourney is an AI tool that leads to the door of limitless creativity, transforming human imagination into visual art. Only this time, we'll unlock it not with words but with IMAGES. An image can be used as a prompt on Midjourney, serving as a reference to the art it will generate. All you have to do is combine them with different photos, texts, and other elements; you name it. You can even get more creative outputs with a bit of thinking outside the box. Don't worry; it'll be an easy task.


Computer vision API- Skyl.ai

#artificialintelligence

Computer vision APIs let you run computer vision tasks programmatically at scale in real time. Once set up, the computer vision API can run computer vision tasks simultaneously on millions of data. This makes it easy to integrate these APIs into your apps or websites and deliver cutting edge computer vision backed experiences to your customers easily. For example, you might have a reverse image search engine which takes in a photo as an input and returns a set of similar images from the web. You can implement this in no time using computer vision APIs even though you do not have any expertise in machine learning or computer vision.


How to Generate Text from Images with Python

#artificialintelligence

In the Google Search: State of the Union last May, John Mueller and Martin Splitt spent about a fourth of the address to image-related topics. They announced a big list of improvements to Google Image Search and predicted that it would be a massive untapped opportunity for SEO. SEO Clarity, an SEO tool vendor, released a very interesting report around the same time. Among other findings, they found that more than a third of web search results include images. Images are important to search visitors not only because they are visually more attractive than text, but they also convey context instantly that would require a lot more time when reading text.


Large-Scale Serverless Machine Learning Inference with Azure Functions

#artificialintelligence

This article is part of #ServerlessSeptember. You'll find other helpful articles, detailed tutorials, and videos in this all-things-Serverless content collection. New articles are published every day -- that's right, every day -- from community members and cloud advocates in the month of September. Azure Functions recently announced the general availability of their Python language support. We can use Python 3.6 and Python's large ecosystem of packages, such as TensorFlow, to build serverless functions. Today, we'll look at how we can use TensorFlow with Python Azure Functions to perform large-scale machine learning inference.