Goto

Collaborating Authors

 Generative AI


Understanding Generative AI Content with Embedding Models

arXiv.org Artificial Intelligence

The construction of high-quality numerical features is critical to any quantitative data analysis. Feature engineering has been historically addressed by carefully hand-crafting data representations based on domain expertise. This work views the internal representations of modern deep neural networks (DNNs), called embeddings, as an automated form of traditional feature engineering. For trained DNNs, we show that these embeddings can reveal interpretable, high-level concepts in unstructured sample data. We use these embeddings in natural language and computer vision tasks to uncover both inherent heterogeneity in the underlying data and human-understandable explanations for it. In particular, we find empirical evidence that there is inherent separability between real data and that generated from AI models.


Generating Realistic X-ray Scattering Images Using Stable Diffusion and Human-in-the-loop Annotations

arXiv.org Artificial Intelligence

We fine-tuned a foundational stable diffusion model using X-ray scattering images and their corresponding descriptions to generate new scientific images from given prompts. However, some of the generated images exhibit significant unrealistic artifacts, commonly known as "hallucinations". To address this issue, we trained various computer vision models on a dataset composed of 60% human-approved generated images and 40% experimental images to detect unrealistic images. The classified images were then reviewed and corrected by human experts, and subsequently used to further refine the classifiers in next rounds of training and inference. Our evaluations demonstrate the feasibility of generating high-fidelity, domain-specific images using a fine-tuned diffusion model. We anticipate that generative AI will play a crucial role in enhancing data augmentation and driving the development of digital twins in scientific research facilities.


Diffusion-Based Visual Art Creation: A Survey and New Perspectives

arXiv.org Artificial Intelligence

The integration of generative AI in visual art has revolutionized not only how visual content is created but also how AI interacts with and reflects the underlying domain knowledge. This survey explores the emerging realm of diffusion-based visual art creation, examining its development from both artistic and technical perspectives. We structure the survey into three phases, data feature and framework identification, detailed analyses using a structured coding process, and open-ended prospective outlooks. Our findings reveal how artistic requirements are transformed into technical challenges and highlight the design and application of diffusion-based methods within visual art creation. We also provide insights into future directions from technical and synergistic perspectives, suggesting that the confluence of generative AI and art has shifted the creative paradigm and opened up new possibilities. By summarizing the development and trends of this emerging interdisciplinary area, we aim to shed light on the mechanisms through which AI systems emulate and possibly, enhance human capacities in artistic perception and creativity.


Fox News AI Newsletter: US leads world in fastest AI development: report

FOX News

Fox News chief political anchor Bret Baier has the latest on the pros and cons of the bombshell developments on'Special Report.' TOP OF THE CHARTS: The U.S. topped another study that looked at the fastest-developing artificial intelligence industries in the world, according to a new report. AI ON THE BALLOT: A librarian running as a nonpartisan candidate for mayor of Cheyenne, Wyoming, promises to allow an artificial intelligence bot created by OpenAI to govern the state's capital city. AI POWER PLAY: Google has its eye on the prize -- artificial intelligence -- and it's making a bold power play in the tech arena. The company's recent Made by Google event was more than just showcasing new technology.


The US Government Wants You--Yes, You--to Hunt Down Generative AI Flaws

WIRED

At the 2023 Defcon hacker conference in Las Vegas, prominent AI tech companies partnered with algorithmic integrity and transparency groups to sic thousands of attendees on generative AI platforms and find weaknesses in these critical systems. This "red-teaming" exercise, which also had support from the US government, took a step in opening these increasingly influential yet opaque systems to scrutiny. Now, the ethical AI and algorithmic assessment nonprofit Humane Intelligence is taking this model one step further. On Wednesday, the group announced a call for participation with the US National Institute of Standards and Technology, inviting any US resident to participate in the qualifying round of a nationwide red-teaming effort to evaluate AI office productivity software. The qualifier will take place online and is open to both developers and anyone in the general public as part of NIST's AI challenges, known as Assessing Risks and Impacts of AI, or ARIA.


OpenAI strikes deal to use content from The New Yorker, Vogue, Vanity Fair

Al Jazeera

OpenAI has struck a multi-year deal with Condé Nast to allow the Microsoft-backed startup to use content from media brands including The New Yorker, Vogue, GQ, Vanity Fair and Bon Appétit. Under the agreement announced on Tuesday, OpenAI will have permission to display content from Condé Nast's stable of media properties in its artificial intelligence-powered products, including ChatGPT and its SearchGPT prototype. Sam Altman-led OpenAI and Condé Nast did not disclose the terms of the deal. "We're committed to working with Condé Nast and other news publishers to ensure that as AI plays a larger role in news discovery and delivery, it maintains accuracy, integrity, and respect for quality reporting," OpenAI COO Brad Lightcap said in a statement posted on the startup's website. In a memo to staff, Condé Nast CEO Roger Lynch said it is important to embrace new technologies and protect intellectual property at a time when tech companies are eroding media companies' ability to monetize content.


Epistemic Injustice in Generative AI

arXiv.org Artificial Intelligence

While traditional discussions of epistemic injustice have While algorithms have traditionally been leveraged to primarily centered on interpersonal human interactions present and organize human-generated content, the advent (McKinnon 2017; Tsosie 2012), existing research on algorithmic of generative AI has started to fundamentally shift this epistemic injustice has largely been limited to epistemic paradigm. Generative AI models can now create content - injustices produced by decision-making and classification spanning text, imagery, and beyond - that resembles that of algorithms. However, we argue that the distinctive authors, journalists, painters, or photographers. In this paper, characteristics of generative AI give rise to novel forms of we take generative AI to be the class of machine learning epistemic injustice that necessitate a dedicated analytical models trained on massive amounts of data, typically media framework. To address this, we expand upon the established such as text, images, audio or video, in order to produce philosophical discourse on epistemic injustice and introduce representative instances of such media (García-Peñalvo and an account of "generative algorithmic epistemic injustice," Vázquez-Ingelmo 2023).


Deep Generative Models in Robotics: A Survey on Learning from Multimodal Demonstrations

arXiv.org Artificial Intelligence

Learning from Demonstrations, the field that proposes to learn robot behavior models from data, is gaining popularity with the emergence of deep generative models. Although the problem has been studied for years under names such as Imitation Learning, Behavioral Cloning, or Inverse Reinforcement Learning, classical methods have relied on models that don't capture complex data distributions well or don't scale well to large numbers of demonstrations. In recent years, the robot learning community has shown increasing interest in using deep generative models to capture the complexity of large datasets. In this survey, we aim to provide a unified and comprehensive review of the last year's progress in the use of deep generative models in robotics. We present the different types of models that the community has explored, such as energy-based models, diffusion models, action value maps, or generative adversarial networks. We also present the different types of applications in which deep generative models have been used, from grasp generation to trajectory generation or cost learning. One of the most important elements of generative models is the generalization out of distributions. In our survey, we review the different decisions the community has made to improve the generalization of the learned models. Finally, we highlight the research challenges and propose a number of future directions for learning deep generative models in robotics.


MUSES: 3D-Controllable Image Generation via Multi-Modal Agent Collaboration

arXiv.org Artificial Intelligence

Despite recent advancements in text-to-image generation, most existing methods struggle to create images with multiple objects and complex spatial relationships in 3D world. To tackle this limitation, we introduce a generic AI system, namely MUSES, for 3D-controllable image generation from user queries. Specifically, our MUSES addresses this challenging task by developing a progressive workflow with three key components, including (1) Layout Manager for 2D-to-3D layout lifting, (2) Model Engineer for 3D object acquisition and calibration, (3) Image Artist for 3D-to-2D image rendering. By mimicking the collaboration of human professionals, this multi-modal agent pipeline facilitates the effective and automatic creation of images with 3D-controllable objects, through an explainable integration of top-down planning and bottom-up generation. Additionally, we find that existing benchmarks lack detailed descriptions of complex 3D spatial relationships of multiple objects. To fill this gap, we further construct a new benchmark of T2I-3DisBench (3D image scene), which describes diverse 3D image scenes with 50 detailed prompts. Extensive experiments show the state-of-the-art performance of MUSES on both T2I-CompBench and T2I-3DisBench, outperforming recent strong competitors such as DALL-E 3 and Stable Diffusion 3. These results demonstrate a significant step of MUSES forward in bridging natural language, 2D image generation, and 3D world.


Generative AI in Industrial Machine Vision -- A Review

arXiv.org Artificial Intelligence

Machine vision enhances automation, quality control, and operational efficiency in industrial applications by enabling machines to interpret and act on visual data. While traditional computer vision algorithms and approaches remain widely utilized, machine learning has become pivotal in current research activities. In particular, generative AI demonstrates promising potential by improving pattern recognition capabilities, through data augmentation, increasing image resolution, and identifying anomalies for quality control. However, the application of generative AI in machine vision is still in its early stages due to challenges in data diversity, computational requirements, and the necessity for robust validation methods. A comprehensive literature review is essential to understand the current state of generative AI in industrial machine vision, focusing on recent advancements, applications, and research trends. Thus, a literature review based on the PRISMA guidelines was conducted, analyzing over 1,200 papers on generative AI in industrial machine vision. Our findings reveal various patterns in current research, with the primary use of generative AI being data augmentation, for machine vision tasks such as classification and object detection. Furthermore, we gather a collection of application challenges together with data requirements to enable a successful application of generative AI in industrial machine vision. This overview aims to provide researchers with insights into the different areas and applications within current research, highlighting significant advancements and identifying opportunities for future work.