Generative AI
CAD-Prompted Generative Models: A Pathway to Feasible and Novel Engineering Designs
Chong, Leah, Rayan, Jude, Dow, Steven, Lykourentzou, Ioanna, Ahmed, Faez
Text-to-image generative models have increasingly been used to assist designers during concept generation in various creative domains, such as graphic design, user interface design, and fashion design. However, their applications in engineering design remain limited due to the models' challenges in generating images of feasible designs concepts. To address this issue, this paper introduces a method that improves the design feasibility by prompting the generation with feasible CAD images. In this work, the usefulness of this method is investigated through a case study with a bike design task using an off-the-shelf text-to-image model, Stable Diffusion 2.1. A diverse set of bike designs are produced in seven different generation settings with varying CAD image prompting weights, and these designs are evaluated on their perceived feasibility and novelty. Results demonstrate that the CAD image prompting successfully helps text-to-image models like Stable Diffusion 2.1 create visibly more feasible design images. While a general tradeoff is observed between feasibility and novelty, when the prompting weight is kept low around 0.35, the design feasibility is significantly improved while its novelty remains on par with those generated by text prompts alone. The insights from this case study offer some guidelines for selecting the appropriate CAD image prompting weight for different stages of the engineering design process. When utilized effectively, our CAD image prompting method opens doors to a wider range of applications of text-to-image models in engineering design.
Multi-step Inference over Unstructured Data
Kalyanpur, Aditya, Saravanakumar, Kailash, Barres, Victor, McFate, CJ, Moon, Lori, Seifu, Nati, Eremeev, Maksim, Barrera, Jose, Brown, Eric, Ferrucci, David
The advent of Large Language Models (LLMs) and Generative AI has revolutionized natural language applications across various domains. However, high-stakes decision-making tasks in fields such as medical, legal and finance require a level of precision, comprehensiveness, and logical consistency that pure LLM or Retrieval-Augmented-Generation (RAG) approaches often fail to deliver. At Elemental Cognition (EC), we have developed a neuro-symbolic AI platform to tackle these problems. The platform integrates fine-tuned LLMs for knowledge extraction and alignment with a robust symbolic reasoning engine for logical inference, planning and interactive constraint solving. We describe Cora, a Collaborative Research Assistant built on this platform, that is designed to perform complex research and discovery tasks in high-stakes domains. This paper discusses the multi-step inference challenges inherent in such domains, critiques the limitations of existing LLM-based methods, and demonstrates how Cora's neuro-symbolic approach effectively addresses these issues. We provide an overview of the system architecture, key algorithms for knowledge extraction and formal reasoning, and present preliminary evaluation results that highlight Cora's superior performance compared to well-known LLM and RAG baselines.
A Text-to-Game Engine for UGC-Based Role-Playing Games
Zhang, Lei, Peng, Xuezheng, Yang, Shuyi, Wang, Feiyang
The shift from professionally generated content (PGC) to user-generated content (UGC) has revolutionized various media formats, from text to video. With the rapid advancements in generative AI, a similar shift is set to transform the game industry, particularly in the realm of role-playing games (RPGs). This paper introduces a new framework for a text-to-game engine that utilizes foundation models to convert simple textual inputs into complex, interactive RPG experiences. The engine dynamically renders the game story in a multi-modal format and adjusts the game character, environment, and mechanics in real-time in response to player actions. Using this framework, we developed the "Zagii" game engine, which has successfully supported hundreds of RPG games across a diverse range of genres and facilitated tens of thousands of online user gameplay instances. This validates the effectiveness of our frame-work. Our work showcases the potential for a more open and democratized gaming paradigm, highlighting the transformative impact of generative AI on the game life cycle.
Controlling the Fidelity and Diversity of Deep Generative Models via Pseudo Density
Li, Shuangqi, Liu, Chen, Zhang, Tong, Le, Hieu, Sรผsstrunk, Sabine, Salzmann, Mathieu
We introduce an approach to bias deep generative models, such as GANs and diffusion models, towards generating data with either enhanced fidelity or increased diversity. Our approach involves manipulating the distribution of training and generated data through a novel metric for individual samples, named pseudo density, which is based on the nearest-neighbor information from real samples. Our approach offers three distinct techniques to adjust the fidelity and diversity of deep generative models: 1) Per-sample perturbation, enabling precise adjustments for individual samples towards either more common or more unique characteristics; 2) Importance sampling during model inference to enhance either fidelity or diversity in the generated data; 3) Fine-tuning with importance sampling, which guides the generative model to learn an adjusted distribution, thus controlling fidelity and diversity. Furthermore, our fine-tuning method demonstrates the ability to improve the Frechet Inception Distance (FID) for pre-trained generative models with minimal iterations.
Microsoft Quits OpenAI Board Seat Amid Antitrust Scrutiny of AI Partnerships
Microsoft has relinquished its seat on the board of OpenAI, saying its participation is no longer needed because the ChatGPT maker has improved its governance since being roiled by boardroom chaos last year. In a Tuesday letter, Microsoft confirmed it was resigning, "effective immediately," from its role as an observer on the artificial intelligence company's board. "We appreciate the support shown by OpenAI leadership and the OpenAI board as we made this decision," the letter said. The surprise departure comes amid intensifying scrutiny from antitrust regulators of the powerful AI partnership. Microsoft has reportedly invested 13 billion in OpenAI.
Housetraining robot dogs: How generative AI might change consumer IoT
Examining potential changes to consumer IoT could provide some answers. Specifically, the vast range of areas where the technology finds home and personal uses, from smart home controls through smart watches and other wearables to VR gaming--to name just a handful. The underlying technological changes sparking interest in this specific area mirror those in IoT as a whole. IoT is much more than a huge collection of "things," such as automated sensing devices and attached actuators to take limited actions. These devices, of course, play a key role.
Non-convergence of Adam and other adaptive stochastic gradient descent optimization methods for non-vanishing learning rates
Dereich, Steffen, Graeber, Robin, Jentzen, Arnulf
Deep learning algorithms - typically consisting of a class of deep neural networks trained by a stochastic gradient descent (SGD) optimization method - are nowadays the key ingredients in many artificial intelligence (AI) systems and have revolutionized our ways of working and living in modern societies. For example, SGD methods are used to train powerful large language models (LLMs) such as versions of ChatGPT and Gemini, SGD methods are employed to create successful generative AI based text-to-image creation models such as Midjourney, DALL-E, and Stable Diffusion, but SGD methods are also used to train DNNs to approximately solve scientific models such as partial differential equation (PDE) models from physics and biology and optimal control and stopping problems from engineering. It is known that the plain vanilla standard SGD method fails to converge even in the situation of several convex optimization problems if the learning rates are bounded away from zero. However, in many practical relevant training scenarios, often not the plain vanilla standard SGD method but instead adaptive SGD methods such as the RMSprop and the Adam optimizers, in which the learning rates are modified adaptively during the training process, are employed. This naturally rises the question whether such adaptive optimizers, in which the learning rates are modified adaptively during the training process, do converge in the situation of non-vanishing learning rates. In this work we answer this question negatively by proving that adaptive SGD methods such as the popular Adam optimizer fail to converge to any possible random limit point if the learning rates are asymptotically bounded away from zero. In our proof of this non-convergence result we establish suitable pathwise a priori bounds for a class of accelerated and adaptive SGD methods, which are also of independent interest.
Spatial-Temporal Generative AI for Traffic Flow Estimation with Sparse Data of Connected Vehicles
Xue, Jianzhe, Xu, Yunting, Yuan, Dongcheng, Zha, Caoyi, Du, Hongyang, Zhou, Haibo, Niyato, Dusit
Traffic flow estimation (TFE) is crucial for intelligent transportation systems. Traditional TFE methods rely on extensive road sensor networks and typically incur significant costs. Sparse mobile crowdsensing enables a cost-effective alternative by utilizing sparsely distributed probe vehicle data (PVD) provided by connected vehicles. However, as pointed out by the central limit theorem, the sparsification of PVD leads to the degradation of TFE accuracy. In response, this paper introduces a novel and cost-effective TFE framework that leverages sparse PVD and improves accuracy by applying the spatial-temporal generative artificial intelligence (GAI) framework. Within this framework, the conditional encoder mines spatial-temporal correlations in the initial TFE results derived from averaging vehicle speeds of each region, and the generative decoder generates high-quality and accurate TFE outputs. Additionally, the design of the spatial-temporal neural network is discussed, which is the backbone of the conditional encoder for effectively capturing spatial-temporal correlations. The effectiveness of the proposed TFE approach is demonstrated through evaluations based on real-world connected vehicle data. The experimental results affirm the feasibility of our sparse PVD-based TFE framework and highlight the significant role of the spatial-temporal GAI framework in enhancing the accuracy of TFE.
FACTS About Building Retrieval Augmented Generation-based Chatbots
Akkiraju, Rama, Xu, Anbang, Bora, Deepak, Yu, Tan, An, Lu, Seth, Vishal, Shukla, Aaditya, Gundecha, Pritam, Mehta, Hridhay, Jha, Ashwin, Raj, Prithvi, Balasubramanian, Abhinav, Maram, Murali, Muthusamy, Guru, Annepally, Shivakesh Reddy, Knowles, Sidney, Du, Min, Burnett, Nick, Javiya, Sean, Marannan, Ashok, Kumari, Mamta, Jha, Surbhi, Dereszenski, Ethan, Chakraborty, Anupam, Ranjan, Subhash, Terfai, Amina, Surya, Anoop, Mercer, Tracey, Thanigachalam, Vinodh Kumar, Bar, Tamar, Krishnan, Sanjana, Kilaru, Samy, Jaksic, Jasmine, Algarici, Nave, Liberman, Jacob, Conway, Joey, Nayyar, Sonu, Boitano, Justin
Enterprise chatbots, powered by generative AI, are emerging as key applications to enhance employee productivity. Retrieval Augmented Generation (RAG), Large Language Models (LLMs), and orchestration frameworks like Langchain and Llamaindex are crucial for building these chatbots. However, creating effective enterprise chatbots is challenging and requires meticulous RAG pipeline engineering. This includes fine-tuning embeddings and LLMs, extracting documents from vector databases, rephrasing queries, reranking results, designing prompts, honoring document access controls, providing concise responses, including references, safeguarding personal information, and building orchestration agents. We present a framework for building RAG-based chatbots based on our experience with three NVIDIA chatbots: for IT/HR benefits, financial earnings, and general content. Our contributions are three-fold: introducing the FACTS framework (Freshness, Architectures, Cost, Testing, Security), presenting fifteen RAG pipeline control points, and providing empirical results on accuracy-latency tradeoffs between large and small LLMs. To the best of our knowledge, this is the first paper of its kind that provides a holistic view of the factors as well as solutions for building secure enterprise-grade chatbots."
RoBus: A Multimodal Dataset for Controllable Road Networks and Building Layouts Generation
Li, Tao, Li, Ruihang, Zheng, Huangnan, Ye, Shanding, Li, Shijian, Pan, Zhijie
Automated 3D city generation, focusing on road networks and building layouts, is in high demand for applications in urban design, multimedia games and autonomous driving simulations. The surge of generative AI facilitates designing city layouts based on deep learning models. However, the lack of high-quality datasets and benchmarks hinders the progress of these data-driven methods in generating road networks and building layouts. Furthermore, few studies consider urban characteristics, which generally take graphics as analysis objects and are crucial for practical applications, to control the generative process. To alleviate these problems, we introduce a multimodal dataset with accompanying evaluation metrics for controllable generation of Road networks and Building layouts (RoBus), which is the first and largest open-source dataset in city generation so far. RoBus dataset is formatted as images, graphics and texts, with $72,400$ paired samples that cover around $80,000km^2$ globally. We analyze the RoBus dataset statistically and validate the effectiveness against existing road networks and building layouts generation methods. Additionally, we design new baselines that incorporate urban characteristics, such as road orientation and building density, in the process of generating road networks and building layouts using the RoBus dataset, enhancing the practicality of automated urban design. The RoBus dataset and related codes are published at https://github.com/tourlics/RoBus_Dataset.