Generative AI
Visual Verity in AI-Generated Imagery: Computational Metrics and Human-Centric Analysis
Aziz, Memoona, Rehman, Umair, Safi, Syed Ali, Abbasi, Amir Zaib
The rapid advancements in AI technologies have revolutionized the production of graphical content across various sectors, including entertainment, advertising, and e-commerce. These developments have spurred the need for robust evaluation methods to assess the quality and realism of AI-generated images. To address this, we conducted three studies. First, we introduced and validated a questionnaire called Visual Verity, which measures photorealism, image quality, and text-image alignment. Second, we applied this questionnaire to assess images from AI models (DALL-E2, DALL-E3, GLIDE, Stable Diffusion) and camera-generated images, revealing that camera-generated images excelled in photorealism and text-image alignment, while AI models led in image quality. We also analyzed statistical properties, finding that camera-generated images scored lower in hue, saturation, and brightness. Third, we evaluated computational metrics' alignment with human judgments, identifying MS-SSIM and CLIP as the most consistent with human assessments. Additionally, we proposed the Neural Feature Similarity Score (NFSS) for assessing image quality. Our findings highlight the need for refining computational metrics to better capture human visual perception, thereby enhancing AI-generated content evaluation.
Why A.I. Isn't Going to Make Art
In 1953, Roald Dahl published "The Great Automatic Grammatizator," a short story about an electrical engineer who secretly desires to be a writer. One day, after completing construction of the world's fastest calculating machine, the engineer realizes that "English grammar is governed by rules that are almost mathematical in their strictness." He constructs a fiction-writing machine that can produce a five-thousand-word short story in thirty seconds; a novel takes fifteen minutes and requires the operator to manipulate handles and foot pedals, as if he were driving a car or playing an organ, to regulate the levels of humor and pathos. The resulting novels are so popular that, within a year, half the fiction published in English is a product of the engineer's invention. Is there anything about art that makes us think it can't be created by pushing a button, as in Dahl's imagination?
Data Augmentation for Image Classification using Generative AI
Rahat, Fazle, Hossain, M Shifat, Ahmed, Md Rubel, Jha, Sumit Kumar, Ewetz, Rickard
Scaling laws dictate that the performance of AI models is proportional to the amount of available data. Data augmentation is a promising solution to expanding the dataset size. Traditional approaches focused on augmentation using rotation, translation, and resizing. Recent approaches use generative AI models to improve dataset diversity. However, the generative methods struggle with issues such as subject corruption and the introduction of irrelevant artifacts. In this paper, we propose the Automated Generative Data Augmentation (AGA). The framework combines the utility of large language models (LLMs), diffusion models, and segmentation models to augment data. AGA preserves foreground authenticity while ensuring background diversity. Specific contributions include: i) segment and superclass based object extraction, ii) prompt diversity with combinatorial complexity using prompt decomposition, and iii) affine subject manipulation. We evaluate AGA against state-of-the-art (SOTA) techniques on three representative datasets, ImageNet, CUB, and iWildCam. The experimental evaluation demonstrates an accuracy improvement of 15.6% and 23.5% for in and out-of-distribution data compared to baseline models, respectively. There is also a 64.3% improvement in SIC score compared to the baselines.
Generative AI creates playable version of Doom game with no code
An AI-generated recreation of the classic computer game Doom can be played normally despite having no computer code or graphics. Researchers behind the project say similar AI models could be used to create games from scratch in the future, just as they create text and images today. The model, called GameNGen, was made by Dani Valevski at Google Research and his colleagues, who declined to speak to New Scientist. According to their paper on the research, the AI can be played for up to 20 seconds while retaining all the features of the original, such as scores, ammunition levels and map layouts. Players can attack enemies, open doors and interact with the environment as usual.
AI can spot tuberculosis early by listening to your cough
The same underlying technology powering massively popular generative AI models like from large tech firms like OpenAI is now being used to scan for early signs of lung disease. Google, one of the leaders in new AI models, is partnering with a healthcare startup that's analyzing vast datasets of coughs and sneezes to detect signs of tuberculous or other respiratory diseases before they get worse. It's one of numerous ways the quickly evolving technology is rapidly reshaping early detection of disease across the healthcare industry. What happens once that initial diagnosis is made, however, still requires quintessential human clinical expertise. Earlier this year, Google released details about a new healthcare self-supervised, deep-learning model they dubbed Health Acoustics Representation (HeAR).
Chatbots Are Primed to Warp Reality
More and more people are learning about the world through chatbots and the software's kin, whether they mean to or not. Google has rolled out generative AI to users of its search engine on at least four continents, placing AI-written responses above the usual list of links; as many as 1 billion people may encounter this feature by the end of the year. Meta's AI assistant has been integrated into Facebook, Messenger, WhatsApp, and Instagram, and is sometimes the default option when a user taps the search bar. And Apple is expected to integrate generative AI into Siri, Mail, Notes, and other apps this fall. Less than two years after ChatGPT's launch, bots are quickly becoming the default filters for the web.
Image-Perfect Imperfections: Safety, Bias, and Authenticity in the Shadow of Text-To-Image Model Evolution
Wu, Yixin, Shen, Yun, Backes, Michael, Zhang, Yang
Text-to-image models, such as Stable Diffusion (SD), undergo iterative updates to improve image quality and address concerns such as safety. Improvements in image quality are straightforward to assess. However, how model updates resolve existing concerns and whether they raise new questions remain unexplored. This study takes an initial step in investigating the evolution of text-to-image models from the perspectives of safety, bias, and authenticity. Our findings, centered on Stable Diffusion, indicate that model updates paint a mixed picture. While updates progressively reduce the generation of unsafe images, the bias issue, particularly in gender, intensifies. We also find that negative stereotypes either persist within the same Non-White race group or shift towards other Non-White race groups through SD updates, yet with minimal association of these traits with the White race group. Additionally, our evaluation reveals a new concern stemming from SD updates: State-of-the-art fake image detectors, initially trained for earlier SD versions, struggle to identify fake images generated by updated versions. We show that fine-tuning these detectors on fake images generated by updated versions achieves at least 96.6\% accuracy across various SD versions, addressing this issue. Our insights highlight the importance of continued efforts to mitigate biases and vulnerabilities in evolving text-to-image models.
Getting Inspiration for Feature Elicitation: App Store- vs. LLM-based Approach
Wei, Jialiang, Courbis, Anne-Lise, Lambolais, Thomas, Xu, Binbin, Bernard, Pierre Louis, Dray, Gรฉrard, Maalej, Walid
Over the past decade, app store (AppStore)-inspired requirements elicitation has proven to be highly beneficial. Developers often explore competitors' apps to gather inspiration for new features. With the advance of Generative AI, recent studies have demonstrated the potential of large language model (LLM)-inspired requirements elicitation. LLMs can assist in this process by providing inspiration for new feature ideas. While both approaches are gaining popularity in practice, there is a lack of insight into their differences. We report on a comparative study between AppStore- and LLM-based approaches for refining features into sub-features. By manually analyzing 1,200 sub-features recommended from both approaches, we identified their benefits, challenges, and key differences. While both approaches recommend highly relevant sub-features with clear descriptions, LLMs seem more powerful particularly concerning novel unseen app scopes. Moreover, some recommended features are imaginary with unclear feasibility, which suggests the importance of a human-analyst in the elicitation loop.
ChatGPT has doubled its weekly active users to 200 million
ChatGPT now has 200 million weekly active users, according to OpenAI. That represents a doubling of the weekly audience of 100 million the company announced last November. A representative from the company told Engadget that API usage has also doubled since the July release of GPT-4o mini. User numbers aren't the only big growth OpenAI has seen over the past year. CEO Sam Altman reportedly told employees this summer that the company's annualized revenue -- which takes a monthly revenue figure and stretches it out over a whole year -- had reached 3.4 billion, up from 1.6 billion at the end of 2023.
OpenAI and Anthropic agree to share their models with the US AI Safety Institute
OpenAI and Anthropic have agreed to share AI models -- before and after release -- with the US AI Safety Institute. The agency, established through an executive order by President Biden in 2023, will offer safety feedback to the companies to improve their models. OpenAI CEO Sam Altman hinted at the agreement earlier this month. "Safety is essential to fueling breakthrough technological innovation. With these agreements in place, we look forward to beginning our technical collaborations with Anthropic and OpenAI to advance the science of AI safety," Elizabeth Kelly, director of the US AI Safety Institute, wrote in a statement.