Generative AI
Generating and Detecting Various Types of Fake Image and Audio Content: A Review of Modern Deep Learning Technologies and Tools
Dehghani, Arash, Saberi, Hossein
This paper reviews the state-of-the-art in deepfake generation and detection, focusing on modern deep learning technologies and tools based on the latest scientific advancements. The rise of deepfakes, leveraging techniques like Variational Autoencoders (VAEs), Generative Adversarial Networks (GANs), Diffusion models and other generative models, presents significant threats to privacy, security, and democracy. This fake media can deceive individuals, discredit real people and organizations, facilitate blackmail, and even threaten the integrity of legal, political, and social systems. Therefore, finding appropriate solutions to counter the potential threats posed by this technology is essential. We explore various deepfake methods, including face swapping, voice conversion, reenactment and lip synchronization, highlighting their applications in both benign and malicious contexts. The review critically examines the ongoing "arms race" between deepfake generation and detection, analyzing the challenges in identifying manipulated contents. By examining current methods and highlighting future research directions, this paper contributes to a crucial understanding of this rapidly evolving field and the urgent need for robust detection strategies to counter the misuse of this powerful technology. While focusing primarily on audio, image, and video domains, this study allows the reader to easily grasp the latest advancements in deepfake generation and detection.
Practical Design and Benchmarking of Generative AI Applications for Surgical Billing and Coding
Rollman, John C., Rogers, Bruce, Zaribafzadeh, Hamed, Buckland, Daniel, Rogers, Ursula, Gagnon, Jennifer, Meireles, Ozanan, Jennings, Lindsay, Bennett, Jim, Nicholson, Jennifer, Lad, Nandan, Cendales, Linda, Seas, Andreas, Martinino, Alessandro, Hwang, E. Shelley, Kirk, Allan D.
Background: Healthcare has many manual processes that can benefit from automation and augmentation with Generative Artificial Intelligence (AI), the medical billing and coding process. However, current foundational Large Language Models (LLMs) perform poorly when tasked with generating accurate International Classification of Diseases, 10th edition, Clinical Modification (ICD-10-CM) and Current Procedural Terminology (CPT) codes. Additionally, there are many security and financial challenges in the application of generative AI to healthcare. We present a strategy for developing generative AI tools in healthcare, specifically for medical billing and coding, that balances accuracy, accessibility, and patient privacy. Methods: We fine tune the PHI-3 Mini and PHI-3 Medium LLMs using institutional data and compare the results against the PHI-3 base model, a PHI-3 RAG application, and GPT-4o. We use the post operative surgical report as input and the patients billing claim the associated ICD-10, CPT, and Modifier codes as the target result. Performance is measured by accuracy of code generation, proportion of invalid codes, and the fidelity of the billing claim format. Results: Both fine-tuned models performed better or as well as GPT-4o. The Phi-3 Medium fine-tuned model showed the best performance (ICD-10 Recall and Precision: 72%, 72%; CPT Recall and Precision: 77%, 79%; Modifier Recall and Precision: 63%, 64%). The Phi-3 Medium fine-tuned model only fabricated 1% of ICD-10 codes and 0.6% of CPT codes generated. Conclusions: Our study shows that a small model that is fine-tuned on domain-specific data for specific tasks using a simple set of open-source tools and minimal technological and monetary requirements performs as well as the larger contemporary consumer models.
Synthetic Data Privacy Metrics
Steier, Amy, Ramaswamy, Lipika, Manoel, Andre, Haushalter, Alexa
Recent advancements in generative AI have made it possible to create synthetic datasets that can be as accurate as real-world data for training AI models, powering statistical insights, and fostering collaboration with sensitive datasets while offering strong privacy guarantees. Effectively measuring the empirical privacy of synthetic data is an important step in the process. However, while there is a multitude of new privacy metrics being published every day, there currently is no standardization. In this paper, we review the pros and cons of popular metrics that include simulations of adversarial attacks. We also review current best practices for amending generative models to enhance the privacy of the data they create (e.g.
Cosmos World Foundation Model Platform for Physical AI
NVIDIA, null, :, null, Agarwal, Niket, Ali, Arslan, Bala, Maciej, Balaji, Yogesh, Barker, Erik, Cai, Tiffany, Chattopadhyay, Prithvijit, Chen, Yongxin, Cui, Yin, Ding, Yifan, Dworakowski, Daniel, Fan, Jiaojiao, Fenzi, Michele, Ferroni, Francesco, Fidler, Sanja, Fox, Dieter, Ge, Songwei, Ge, Yunhao, Gu, Jinwei, Gururani, Siddharth, He, Ethan, Huang, Jiahui, Huffman, Jacob, Jannaty, Pooya, Jin, Jingyi, Kim, Seung Wook, Klár, Gergely, Lam, Grace, Lan, Shiyi, Leal-Taixe, Laura, Li, Anqi, Li, Zhaoshuo, Lin, Chen-Hsuan, Lin, Tsung-Yi, Ling, Huan, Liu, Ming-Yu, Liu, Xian, Luo, Alice, Ma, Qianli, Mao, Hanzi, Mo, Kaichun, Mousavian, Arsalan, Nah, Seungjun, Niverty, Sriharsha, Page, David, Paschalidou, Despoina, Patel, Zeeshan, Pavao, Lindsey, Ramezanali, Morteza, Reda, Fitsum, Ren, Xiaowei, Sabavat, Vasanth Rao Naik, Schmerling, Ed, Shi, Stella, Stefaniak, Bartosz, Tang, Shitao, Tchapmi, Lyne, Tredak, Przemek, Tseng, Wei-Cheng, Varghese, Jibin, Wang, Hao, Wang, Haoxiang, Wang, Heng, Wang, Ting-Chun, Wei, Fangyin, Wei, Xinyue, Wu, Jay Zhangjie, Xu, Jiashu, Yang, Wei, Yen-Chen, Lin, Zeng, Xiaohui, Zeng, Yu, Zhang, Jing, Zhang, Qinsheng, Zhang, Yuxuan, Zhao, Qingqing, Zolkowski, Artur
Physical AI needs to be trained digitally first. It needs a digital twin of itself, the policy model, and a digital twin of the world, the world model. In this paper, we present the Cosmos World Foundation Model Platform to help developers build customized world models for their Physical AI setups. We position a world foundation model as a general-purpose world model that can be fine-tuned into customized world models for downstream applications. Our platform covers a video curation pipeline, pre-trained world foundation models, examples of post-training of pre-trained world foundation models, and video tokenizers. To help Physical AI builders solve the most critical problems of our society, we make our platform open-source and our models open-weight with permissive licenses available via https://github.com/NVIDIA/Cosmos.
AGGA: A Dataset of Academic Guidelines for Generative AI and Large Language Models
Jiao, Junfeng, Afroogh, Saleh, Chen, Kevin, Atkinson, David, Dhurandhar, Amit
This study introduces AGGA, a dataset comprising 80 academic guidelines for the use of Generative AIs (GAIs) and Large Language Models (LLMs) in academic settings, meticulously collected from official university websites. The dataset contains 188,674 words and serves as a valuable resource for natural language processing tasks commonly applied in requirements engineering, such as model synthesis, abstraction identification, and document structure assessment. Additionally, AGGA can be further annotated to function as a benchmark for various tasks, including ambiguity detection, requirements categorization, and the identification of equivalent requirements. Our methodologically rigorous approach ensured a thorough examination, with a selection of universities that represent a diverse range of global institutions, including top-ranked universities across six continents.
'Virtual employees' could join workforce as soon as this year, OpenAI boss says
Virtual employees could join workforces this year and transform how companies work, according to the chief executive of OpenAI. The first artificial intelligence agents may start working for organisations this year, wrote Sam Altman, as AI firms push for uses that generate returns on substantial investment in the technology. Microsoft, the biggest backer of the company behind ChatGPT, has already announced the introduction of AI agents – tools that can carry out tasks autonomously – with the blue-chip consulting firm McKinsey among the early adopters. "We believe that, in 2025, we may see the first AI agents'join the workforce' and materially change the output of companies," wrote Altman in a blogpost published on Monday. OpenAI is reportedly planning to launch an AI agent codenamed "Operator" this month, after Microsoft announced its Copilot Studio product and rival Anthropic launched the Claude 3.5 Sonnet AI model, which can carry out tasks on the computer such as moving a mouse cursor and typing text.
AI means the end of internet search as we've known it
The biggest change to the way search engines have delivered information to us since the 1990s is happening right now. Which means instead of keywords, you use real questions, expressed in natural language. And instead of links, you'll increasingly be met with answers, written by generative AI and based on live information from all across the internet, delivered the same way. Of course, Google--the company that has defined search for the past 25 years--is trying to be out front on this. In May of 2023, it began testing AI-generated responses to search queries, using its large language model (LLM) to deliver the kinds of answers you might expect from an expert source or trusted friend.
Foundations of GenIR
Ai, Qingyao, Zhan, Jingtao, Liu, Yiqun
The chapter discusses the foundational impact of modern generative AI models on information access (IA) systems. In contrast to traditional AI, the large-scale training and superior data modeling of generative AI models enable them to produce high-quality, human-like responses, which brings brand new opportunities for the development of IA paradigms. In this chapter, we identify and introduce two of them in details, i.e., information generation and information synthesis. Information generation allows AI to create tailored content addressing user needs directly, enhancing user experience with immediate, relevant outputs. Information synthesis leverages the ability of generative AI to integrate and reorganize existing information, providing grounded responses and mitigating issues like model hallucination, which is particularly valuable in scenarios requiring precision and external knowledge. This chapter delves into the foundational aspects of generative models, including architecture, scaling, and training, and discusses their applications in multi-modal scenarios. Additionally, it examines the retrieval-augmented generation paradigm and other methods for corpus modeling and understanding, demonstrating how generative AI can enhance information access systems. It also summarizes potential challenges and fruitful directions for future studies.
Political Events using RAG with LLMs
Arslan, Muhammad, Munawar, Saba, Cruz, Christophe
In the contemporary digital landscape, media content stands as the foundation for political news analysis, offering invaluable insights sourced from various channels like news articles, social media updates, speeches, and reports. Natural Language Processing (NLP) has revolutionized Political Information Extraction (IE), automating tasks such as Event Extraction (EE) from these diverse media outlets. While traditional NLP methods often necessitate specialized expertise to build rule-based systems or train machine learning models with domain-specific datasets, the emergence of Large Language Models (LLMs) driven by Generative Artificial Intelligence (GenAI) presents a promising alternative. These models offer accessibility, alleviating challenges associated with model construction from scratch and reducing the dependency on extensive datasets during the training phase, thus facilitating rapid implementation. However, challenges persist in handling domain-specific tasks, leading to the development of the Retrieval-Augmented Generation (RAG) framework. RAG enhances LLMs by integrating external data retrieval, enriching their contextual understanding, and expanding their knowledge base beyond pre-existing training data. To illustrate RAG's efficacy, we introduce the Political EE system, specifically tailored to extract political event information from news articles. Understanding these political insights is essential for remaining informed about the latest political advancements, whether on a national or global scale.
Physics-Constrained Generative Artificial Intelligence for Rapid Takeoff Trajectory Design
To aid urban air mobility (UAM), electric vertical takeoff and landing (eVTOL) aircraft are being targeted. Conventional multidisciplinary analysis and optimization (MDAO) can be expensive, while surrogate-based optimization can struggle with challenging physical constraints. This work proposes physics-constrained generative adversarial networks (physicsGAN), to intelligently parameterize the takeoff control profiles of an eVTOL aircraft and to transform the original design space to a feasible space. Specifically, the transformed feasible space refers to a space where all designs directly satisfy all design constraints. The physicsGAN-enabled surrogate-based takeoff trajectory design framework was demonstrated on the Airbus A3 Vahana. The physicsGAN generated only feasible control profiles of power and wing angle in the feasible space with around 98.9% of designs satisfying all constraints. The proposed design framework obtained 99.6% accuracy compared with simulation-based optimal design and took only 2.2 seconds, which reduced the computational time by around 200 times. Meanwhile, data-driven GAN-enabled surrogate-based optimization took 21.9 seconds using a derivative-free optimizer, which was around an order of magnitude slower than the proposed framework. Moreover, the data-driven GAN-based optimization using gradient-based optimizers could not consistently find the optimal design during random trials and got stuck in an infeasible region, which is problematic in real practice. Therefore, the proposed physicsGAN-based design framework outperformed data-driven GAN-based design to the extent of efficiency (2.2 seconds), optimality (99.6% accurate), and feasibility (100% feasible). According to the literature review, this is the first physics-constrained generative artificial intelligence enabled by surrogate models.