Generative AI
Sociotechnical Approach to Enterprise Generative Artificial Intelligence (E-GenAI)
Jimenez, Leoncio, Venegas, Francisco
In this theoretical article, a sociotechnical approach is proposed to characterize. First, the business ecosystem, focusing on the relationships among Providers, Enterprise, and Customers through SCM, ERP, and CRM platforms to align: (1) Business Intelligence (BI), Fuzzy Logic (FL), and TRIZ (Theory of Inventive Problem Solving), through the OID model, and (2) Knowledge Management (KM) and Imperfect Knowledge Management (IKM), through the OIDK model. Second, the article explores the E-GenAI business ecosystem, which integrates GenAI-based platforms for SCM, ERP, and CRM with GenAI-based platforms for BI, FL, TRIZ, KM, and IKM, to align Large Language Models (LLMs) through the E-GenAI (OID) model. Finally, to understand the dynamics of LLMs, we utilize finite automata to model the relationships between Followers and Followees. This facilitates the construction of LLMs that can identify specific characteristics of users on a social media platform.
Enhancing Guardrails for Safe and Secure Healthcare AI
Generative AI holds immense promise in addressing global healthcare access challenges, with numerous innovative applications now ready for use across various healthcare domains. However, a significant barrier to the widespread adoption of these domain-specific AI solutions is the lack of robust safety mechanisms to effectively manage issues such as hallucination, misinformation, and ensuring truthfulness. Left unchecked, these risks can compromise patient safety and erode trust in healthcare AI systems. While general-purpose frameworks like Llama Guard are useful for filtering toxicity and harmful content, they do not fully address the stringent requirements for truthfulness and safety in healthcare contexts. This paper examines the unique safety and security challenges inherent to healthcare AI, particularly the risk of hallucinations, the spread of misinformation, and the need for factual accuracy in clinical settings. I propose enhancements to existing guardrails frameworks, such as Nvidia NeMo Guardrails, to better suit healthcare-specific needs. By strengthening these safeguards, I aim to ensure the secure, reliable, and accurate use of AI in healthcare, mitigating misinformation risks and improving patient safety.
Blox-Net: Generative Design-for-Robot-Assembly Using VLM Supervision, Physics Simulation, and a Robot with Reset
Goldberg, Andrew, Kondap, Kavish, Qiu, Tianshuang, Ma, Zehan, Fu, Letian, Kerr, Justin, Huang, Huang, Chen, Kaiyuan, Fang, Kuan, Goldberg, Ken
Generative AI systems have shown impressive capabilities in creating text, code, and images. Inspired by the rich history of research in industrial ''Design for Assembly'', we introduce a novel problem: Generative Design-for-Robot-Assembly (GDfRA). The task is to generate an assembly based on a natural language prompt (e.g., ''giraffe'') and an image of available physical components, such as 3D-printed blocks. The output is an assembly, a spatial arrangement of these components, and instructions for a robot to build this assembly. The output must 1) resemble the requested object and 2) be reliably assembled by a 6 DoF robot arm with a suction gripper. We then present Blox-Net, a GDfRA system that combines generative vision language models with well-established methods in computer vision, simulation, perturbation analysis, motion planning, and physical robot experimentation to solve a class of GDfRA problems with minimal human supervision. Blox-Net achieved a Top-1 accuracy of 63.5% in the ''recognizability'' of its designed assemblies (eg, resembling giraffe as judged by a VLM). These designs, after automated perturbation redesign, were reliably assembled by a robot, achieving near-perfect success across 10 consecutive assembly iterations with human intervention only during reset prior to assembly. Surprisingly, this entire design process from textual word (''giraffe'') to reliable physical assembly is performed with zero human intervention.
GeoBiked: A Dataset with Geometric Features and Automated Labeling Techniques to Enable Deep Generative Models in Engineering Design
Mueller, Phillip, Mueller, Sebastian, Mikelsons, Lars
We provide a dataset for enabling Deep Generative Models (DGMs) in engineering design and propose methods to automate data labeling by utilizing large-scale foundation models. GeoBiked is curated to contain 4 355 bicycle images, annotated with structural and technical features and is used to investigate two automated labeling techniques: The utilization of consolidated latent features (Hyperfeatures) from image-generation models to detect geometric correspondences (e.g. the position of the wheel center) in structural images and the generation of diverse text descriptions for structural images. GPT-4o, a vision-language-model (VLM), is instructed to analyze images and produce diverse descriptions aligned with the system-prompt. By representing technical images as Diffusion-Hyperfeatures, drawing geometric correspondences between them is possible. The detection accuracy of geometric points in unseen samples is improved by presenting multiple annotated source images. GPT-4o has sufficient capabilities to generate accurate descriptions of technical images. Grounding the generation only on images leads to diverse descriptions but causes hallucinations, while grounding it on categorical labels restricts the diversity. Using both as input balances creativity and accuracy. Successfully using Hyperfeatures for geometric correspondence suggests that this approach can be used for general point-detection and annotation tasks in technical images. Labeling such images with text descriptions using VLMs is possible, but dependent on the models detection capabilities, careful prompt-engineering and the selection of input information. Applying foundation models in engineering design is largely unexplored. We aim to bridge this gap with a dataset to explore training, finetuning and conditioning DGMs in this field and suggesting approaches to bootstrap foundation models to process technical images.
OpenAI released its advanced voice mode to more people. Here's how to get it.
The update also adds new voices. Shortly after the launch of GPT-4o, OpenAI was criticized for the similarity between the female voice in its demo videos, named Sky, and that of Scarlett Johansson, who played an AI love interest in the movie Her. OpenAI then removed the voice. Now it has launched five new voices, named Arbor, Maple, Sol, Spruce, and Vale, which will be available in both the standard and advanced voice modes. MIT Technology Review has not heard them yet, but OpenAI says they were made using professional voice actors from around the world.
OpenAI's X account was hacked to promote a crypto scam
OpenAI opened a newsroom Twitter account earlier this month and it's already been hacked. The new handle was taken over by a crypto scammer promoting a fake OpenAI token that was in reality a scam to steal Bitcoins. That follows similar hacks of three key OpenAI employee X accounts over the last 15 months, including the one belonging to CTO Mira Murati. The fraudster enticed potential victims by saying the OpenAI token is somehow "driven by artificial intelligence-based language models." It then threw in a jumble of crypto and AI buzzwords that were probably enough to lure in some gullible users.
What the US can learn from the role of AI in other elections
When the generative-AI boom first kicked off, one of the biggest concerns among pundits and experts was that hyperrealistic AI deepfakes could be used to influence elections. But new research from the Alan Turing Institute in the UK shows that those fears might have been overblown. AI-generated falsehoods and deepfakes seem to have had no effect on election results in the UK, France, and the European Parliament, as well as other elections around the world so far this year. Instead of using generative AI to interfere in elections, state actors such as Russia are relying on well-established techniques--such as social bots that flood comment sections--to sow division and create confusion, says Sam Stockwell, the researcher who conducted the study. But one of the most consequential elections of the year is still ahead of us.
Generative AI Hype Feels Inescapable. Tackle It Head On With Education
Arvind Narayanan, a computer science professor at Princeton University, is best known for calling out the hype surrounding artificial intelligence in his Substack, AI Snake Oil, written with PhD candidate Sayash Kapoor. The two authors recently released a book based on their popular newsletter about AI's shortcomings. But don't get it twisted--they aren't against using new technology. "It's easy to misconstrue our message as saying that all of AI is harmful or dubious," Narayanan says. He makes clear, during a conversation with WIRED, that his rebuke is not aimed at the software per say, but rather the culprits who continue to spread misleading claims about artificial intelligence.
Generative AI-driven forecasting of oil production
Gandhi, Yash, Zheng, Kexin, Jha, Birendra, Nomura, Ken-ichi, Nakano, Aiichiro, Vashishta, Priya, Kalia, Rajiv K.
Forecasting oil production from oilfields with multiple wells is an important problem in petroleum and geothermal energy extraction, as well as energy storage technologies. The accuracy of oil forecasts is a critical determinant of economic projections, hydrocarbon reserves estimation, construction of fluid processing facilities, and energy price fluctuations. Leveraging generative AI techniques, we model time series forecasting of oil and water productions across four multi-well sites spanning four decades. Our goal is to effectively model uncertainties and make precise forecasts to inform decision-making processes at the field scale. We utilize an autoregressive model known as TimeGrad and a variant of a transformer architecture named Informer, tailored specifically for forecasting long sequence time series data. Predictions from both TimeGrad and Informer closely align with the ground truth data. The overall performance of the Informer stands out, demonstrating greater efficiency compared to TimeGrad in forecasting oil production rates across all sites.
RAGProbe: An Automated Approach for Evaluating RAG Applications
Sivasothy, Shangeetha, Barnett, Scott, Kurniawan, Stefanus, Rasool, Zafaryab, Vasa, Rajesh
Retrieval Augmented Generation (RAG) is increasingly being used when building Generative AI applications. Evaluating these applications and RAG pipelines is mostly done manually, via a trial and error process. Automating evaluation of RAG pipelines requires overcoming challenges such as context misunderstanding, wrong format, incorrect specificity, and missing content. Prior works therefore focused on improving evaluation metrics as well as enhancing components within the pipeline using available question and answer datasets. However, they have not focused on 1) providing a schema for capturing different types of question-answer pairs or 2) creating a set of templates for generating question-answer pairs that can support automation of RAG pipeline evaluation. In this paper, we present a technique for generating variations in question-answer pairs to trigger failures in RAG pipelines. We validate 5 open-source RAG pipelines using 3 datasets. Our approach revealed the highest failure rates when prompts combine multiple questions: 91% for questions when spanning multiple documents and 78% for questions from a single document; indicating a need for developers to prioritise handling these combined questions. 60% failure rate was observed in academic domain dataset and 53% and 62% failure rates were observed in open-domain datasets. Our automated approach outperforms the existing state-of-the-art methods, by increasing the failure rate by 51% on average per dataset. Our work presents an automated approach for continuously monitoring the health of RAG pipelines, which can be integrated into existing CI/CD pipelines, allowing for improved quality.