Goto

Collaborating Authors

 Generative AI


Stylish and Functional: Guided Interpolation Subject to Physical Constraints

arXiv.org Artificial Intelligence

Generative AI is revolutionizing engineering design practices by enabling rapid prototyping and manipulation of designs. One example of design manipulation involves taking two reference design images and using them as prompts to generate a design image that combines aspects of both. Real engineering designs have physical constraints and functional requirements in addition to aesthetic design considerations. Internet-scale foundation models commonly used for image generation, however, are unable to take these physical constraints and functional requirements into consideration as part of the generation process. We consider the problem of generating a design inspired by two input designs, and propose a zero-shot framework toward enforcing physical, functional requirements over the generation process by leveraging a pretrained diffusion model as the backbone. As a case study, we consider the example of rotational symmetry in generation of wheel designs. Automotive wheels are required to be rotationally symmetric for physical stability. We formulate the requirement of rotational symmetry by the use of a symmetrizer, and we use this symmetrizer to guide the diffusion process towards symmetric wheel generations. Our experimental results find that the proposed approach makes generated interpolations with higher realism than methods in related work, as evaluated by Fr\'echet inception distance (FID). We also find that our approach generates designs that more closely satisfy physical and functional requirements than generating without the symmetry guidance.


OpenAI brings ChatGPT to WhatsApp

Engadget

ChatGPT is now available on WhatsApp. Starting today, if you add 1 (800) CHAT-GPT to your contacts -- that's 1 (800) 242-8478 -- you can start using the chatbot over Meta's messaging app. In this iteration, ChatGPT is limited to text-only input, so there's no Advanced Voice Mode or visual input on offer, but you still get all the smarts of the o1-mini model. What's more, over WhatsApp ChatGPT is available everywhere OpenAI offers its chatbot, with no account necessary. OpenAI is working on a way to authenticate existing users over WhatApp, though the company did not share a timeline for when that feature might launch.


Fox News AI Newsletter: OpenAI responds to Elon Musk's lawsuit

FOX News

Raj Goyle, CEO of intelligence firm Bodhala and former Democratic Kansas state representative, told Fox News Digital it is encouraging to see members of both parties come together to try and determine the source of these drones. SpaceX and Tesla founder Elon Musk speaks during an America PAC town hall on October 26, 2024, in Lancaster, Pennsylvania. AI WARS: OpenAI is pushing back against Elon Musk's latest attempt to rework his lawsuit against the artificial intelligence giant that seeks to prevent the company from moving to a for-profit structure, noting in a blog post and legal filing that Musk had argued for it to do so years ago. AGE OF AI: OpenAI CEO Sam Altman is joining the list of U.S. tech titans donating to President-elect Trump's inaugural fund, a spokesperson exclusively told Fox News Digital. ARTIFICIAL INTELLIGENCE: The House task force on artificial intelligence is urging the U.S. government to aim for "a flexible sectoral regulatory framework" for the technology in a nearly 300-page report released Tuesday morning.


Generative AI and Climate Change Are on a Collision Course

WIRED

The summer of 2024 broke the record for Earth's hottest day since data collection began, sparking widespread media coverage and public debate. This also happens to be the year that both Microsoft and Google, two of the leading big tech companies investing heavily in AI research and development, missed their climate targets. While this also made headlines and spurred indignation, AI's environmental impacts are still far from being common knowledge. In reality, AI's current "bigger is better" paradigm--epitomized by tech companies' pursuit of ever bigger, more powerful large language models that are presented as the solution to every problem--comes with very significant costs to the environment. These range from generating colossal amounts of energy to power the data centers that run tools such as ChatGPT and Midjourney to the millions of gallons of freshwater that are pumped through these data centers to make sure they don't overheat and the tons of rare earth metals needed to build the hardware they contain.


Surrealistic-like Image Generation with Vision-Language Models

arXiv.org Artificial Intelligence

Recent advances in generative AI make it convenient to create different types of content, including text, images, and code. In this paper, we explore the generation of images in the style of paintings in the surrealism movement using vision-language generative models, including DALL-E, Deep Dream Generator, and DreamStudio. Our investigation starts with the generation of images under various image generation settings and different models. The primary objective is to identify the most suitable model and settings for producing such images. Additionally, we aim to understand the impact of using edited base images on the generated resulting images. Through these experiments, we evaluate the performance of selected models and gain valuable insights into their capabilities in generating such images. Our analysis shows that Dall-E 2 performs the best when using the generated prompt by ChatGPT.


Enhancing Diffusion Models for High-Quality Image Generation

arXiv.org Artificial Intelligence

This report presents the comprehensive implementation, evaluation, and optimization of Denoising Diffusion Probabilistic Models (DDPMs) and Denoising Diffusion Implicit Models (DDIMs), which are state-of-the-art generative models. During inference, these models take random noise as input and iteratively generate high-quality images as output. The study focuses on enhancing their generative capabilities by incorporating advanced techniques such as Classifier-Free Guidance (CFG), Latent Diffusion Models with Variational Autoencoders (VAE), and alternative noise scheduling strategies. The motivation behind this work is the growing demand for efficient and scalable generative AI models that can produce realistic images across diverse datasets, addressing challenges in applications such as art creation, image synthesis, and data augmentation. Evaluations were conducted on datasets including CIFAR-10 and ImageNet-100, with a focus on improving inference speed, computational efficiency, and image quality metrics like Frechet Inception Distance (FID). Results demonstrate that DDIM + CFG achieves faster inference and superior image quality. Challenges with VAE and noise scheduling are also highlighted, suggesting opportunities for future optimization. This work lays the groundwork for developing scalable, efficient, and high-quality generative AI systems to benefit industries ranging from entertainment to robotics.


From Human Annotation to LLMs: SILICON Annotation Workflow for Management Research

arXiv.org Artificial Intelligence

Unstructured text data annotation and analysis are fundamental to management research, often relying on human annotators through crowdsourcing platforms. While Large Language Models (LLMs) promise to provide a cost-effective and efficient alternative to human annotation, there lacks a systematic workflow that evaluate when LLMs are suitable or how to proceed with LLM-based text annotation in a reproducible manner. This paper addresses this methodological gap by introducing the ``SILICON" (\textbf{S}ystematic \textbf{I}nference with \textbf{L}LMs for \textbf{I}nformation \textbf{C}lassificati\textbf{o}n and \textbf{N}otation) workflow. The workflow integrates established principles of human annotation with systematic prompt optimization and model selection, addressing challenges such as developing robust annotation guidelines, establishing high-quality human baselines, optimizing prompts, and ensuring reproducibility across LLMs. We validate the SILICON workflow through seven case studies covering common management research tasks, including business proposal evaluation, dialog intent and breakdown analysis, review attribute detection. Our findings highlight the importance of validating annotation guideline agreement, the superiority of expert-developed human baselines over crowdsourced ones, the iterative nature of prompt optimization, and the necessity of testing multiple LLMs. Notably, we propose a regression-based methodology to empirically compare LLM outputs across prompts and models. Our workflow advances management research by establishing reproducible processes for LLM-based annotation that maintain scientific rigor. We provide practical guidance for researchers to effectively navigate the evolving landscape of generative AI tools effectively while maintaining transparency and reproducibility.


Dialogue with the Machine and Dialogue with the Art World: Evaluating Generative AI for Culturally-Situated Creativity

arXiv.org Artificial Intelligence

This paper proposes dialogue as a method for evaluating generative AI tools for culturally-situated creative practice, that recognizes the socially situated nature of art. Drawing on sociologist Howard Becker's concept of Art Worlds, this method expands the scope of traditional AI and creativity evaluations beyond benchmarks, user studies with crowd-workers, or focus groups conducted with artists. Our method involves two mutually informed dialogues: 1) 'dialogues with art worlds' placing artists in conversation with experts such as art historians, curators, and archivists, and 2)'dialogues with the machine,' facilitated through structured artist- and critic-led experimentation with state-of-the-art generative AI tools. We demonstrate the value of this method through a case study with artists and experts steeped in non-western art worlds, specifically the Persian Gulf. We trace how these dialogues help create culturally rich and situated forms of evaluation for representational possibilities of generative AI that mimic the reception of generative artwork in the broader art ecosystem. Putting artists in conversation with commentators also allow artists to shift their use of the tools to respond to their cultural and creative context. Our study can provide generative AI researchers an understanding of the complex dynamics of technology, human creativity and the socio-politics of art worlds, to build more inclusive machines for diverse art worlds.


Methods to Assess the UK Government's Current Role as a Data Provider for AI

arXiv.org Artificial Intelligence

Governments typically collect and steward a vast amount of high-quality data on their citizens and institutions, and the UK government is exploring how it can better publish and provision this data to the benefit of the AI landscape. However, the compositions of generative AI training corpora remain closely guarded secrets, making the planning of data sharing initiatives difficult. To address this, we devise two methods to assess UK government data usage for the training of Large Language Models (LLMs) and 'peek behind the curtain' in order to observe the UK government's current contributions as a data provider for AI. The first method, an ablation study that utilises LLM 'unlearning', seeks to examine the importance of the information held on UK government websites for LLMs and their performance in citizen query tasks. The second method, an information leakage study, seeks to ascertain whether LLMs are aware of the information held in the datasets published on the UK government's open data initiative data$.$gov$.$uk. Our findings indicate that UK government websites are important data sources for AI (heterogenously across subject matters) while data$.$gov$.$uk is not. This paper serves as a technical report, explaining in-depth the designs, mechanics, and limitations of the above experiments. It is accompanied by a complementary non-technical report on the ODI website in which we summarise the experiments and key findings, interpret them, and build a set of actionable recommendations for the UK government to take forward as it seeks to design AI policy. While we focus on UK open government data, we believe that the methods introduced in this paper present a reproducible approach to tackle the opaqueness of AI training corpora and provide organisations a framework to evaluate and maximize their contributions to AI development.


Generative AI Toolkit -- a framework for increasing the quality of LLM-based applications over their whole life cycle

arXiv.org Artificial Intelligence

Since their introduction LLM have gained widespread traction in different domains. They can be used as stand-alone products, but also to augment existing software products such as applications (also called agentic functions) or machine learning agents (also called LLM-based agents) to increase their capabilities. In this section, we show challenges during development and operation of LLM-based applications on three examples. Users interact with LLM-based applications by entering input into the LLM, the so-called prompt. Jang et al. showed in 2023 that the LLM's output is very sensitive to variations of the prompt [1]. Thus, the task of finding the best prompt to generate expected or best output leads to manual, trial-and-error-prompt experimenting - a method well known as prompt-engineering (cf. White et al. in 2023 for ChatGPT [2] or a survey of prompt techniques by Schulhoff et al. in 2024 [3]). Additionally, the outputs of an LLM-based application can not only vary, but also be wrong without telling a user ("hallucination", cf.