Generative AI
A roadmap for generative mapping: unlocking the power of generative AI for map-making
Wu, Sidi, Henggeler, Katharina, Chen, Yizi, Hurni, Lorenz
Maps are broadly relevant across various fields, serving as valuable tools for presenting spatial phenomena and communicating spatial knowledge. However, map-making is still largely confined to those with expertise in GIS and cartography due to the specialized software and complex workflow involved, from data processing to visualization. While generative AI has recently demonstrated its remarkable capability in creating various types of content and its wide accessibility to the general public, its potential in generating maps is yet to be fully realized. This paper highlights the key applications of generative AI in map-making, summarizes recent advancements in generative AI, identifies the specific technologies required and the challenges of using current methods, and provides a roadmap for developing a generative mapping system (GMS) to make map-making more accessible.
Developing Retrieval Augmented Generation (RAG) based LLM Systems from PDFs: An Experience Report
Khan, Ayman Asad, Hasan, Md Toufique, Kemell, Kai Kristian, Rasku, Jussi, Abrahamsson, Pekka
This paper presents an experience report on the development of Retrieval Augmented Generation (RAG) systems using PDF documents as the primary data source. The RAG architecture combines generative capabilities of Large Language Models (LLMs) with the precision of information retrieval. This approach has the potential to redefine how we interact with and augment both structured and unstructured knowledge in generative models to enhance transparency, accuracy, and contextuality of responses. The paper details the end-to-end pipeline, from data collection, preprocessing, to retrieval indexing and response generation, highlighting technical challenges and practical solutions. We aim to offer insights to researchers and practitioners developing similar systems using two distinct approaches: OpenAI's Assistant API with GPT Series and Llama's open-source models. The practical implications of this research lie in enhancing the reliability of generative AI systems in various sectors where domain-specific knowledge and real-time information retrieval is important. The Python code used in this work is also available at: https://github.com/GPT-Laboratory/RAG-LLM-Development-Guidebook-from-PDFs.
Diffusion Transformer Policy
Hou, Zhi, Zhang, Tianyi, Xiong, Yuwen, Pu, Hengjun, Zhao, Chengyang, Tong, Ronglei, Qiao, Yu, Dai, Jifeng, Chen, Yuntao
Recent large visual-language action models pretrained on diverse robot datasets have demonstrated the potential for generalizing to new environments with a few in-domain data. However, those approaches usually predict discretized or continuous actions by a small action head, which limits the ability in handling diverse action spaces. In contrast, we model the continuous action with a large multi-modal diffusion transformer, dubbed as Diffusion Transformer Policy, in which we directly denoise action chunks by a large transformer model rather than a small action head. By leveraging the scaling capability of transformers, the proposed approach can effectively model continuous end-effector actions across large diverse robot datasets, and achieve better generalization performance. Extensive experiments demonstrate Diffusion Transformer Policy pretrained on diverse robot data can generalize to different embodiments, including simulation environments like Maniskill2 and Calvin, as well as the real-world Franka arm. Specifically, without bells and whistles, the proposed approach achieves state-ofthe-art performance with only a single third-view camera stream in the Calvin novel task setting (ABC D), improving the average number of tasks completed in a row of 5 to 3.6, and the pretraining stage significantly facilitates the success sequence length on the Calvin by over 1.2. The code will be publicly available. Traditional robot learning paradigm usually relies on large-scale data collected for a specific robot and task, but collecting robot data for generalist tasks is time-consuming and expensive due to the limitations of robot hardware in the real world. Nowadays, the foundational models OpenAI (2022; 2023; 2021); Rombach et al. (2021) in Natural Language Process and Computer Vision, pretrained on broad, diverse, task-agnostic datasets, have demonstrated powerful ability in solving downstream tasks either zero-shot or with a few task-specific samples. It is principally possible that a general robot policy exposed to large scale diverse robot datasets improves generalization and performance on downstream tasks Brohan et al. (2022; 2023).
Generative AI Agents in Autonomous Machines: A Safety Perspective
Jabbour, Jason, Reddi, Vijay Janapa
The integration of Generative Artificial Intelligence (AI) into autonomous machines represents a major paradigm shift in how these systems operate and unlocks new solutions to problems once deemed intractable. Although generative AI agents provide unparalleled capabilities, they also have unique safety concerns. These challenges require robust safeguards, especially for autonomous machines that operate in high-stakes environments. This work investigates the evolving safety requirements when generative models are integrated as agents into physical autonomous machines, comparing these to safety considerations in less critical AI applications. We explore the challenges and opportunities to ensure the safe deployment of generative AI-driven autonomous machines. Furthermore, we provide a forward-looking perspective on the future of AI-driven autonomous systems and emphasize the importance of evaluating and communicating safety risks. As an important step towards addressing these concerns, we recommend the development and implementation of comprehensive safety scorecards for the use of generative AI technologies in autonomous machines.
Hey GPT, Can You be More Racist? Analysis from Crowdsourced Attempts to Elicit Biased Content from Generative AI
Guo, Hangzhi, Venkit, Pranav Narayanan, Jang, Eunchae, Srinath, Mukund, Zhang, Wenbo, Mingole, Bonam, Gupta, Vipul, Varshney, Kush R., Sundar, S. Shyam, Yadav, Amulya
The widespread adoption of large language models (LLMs) and generative AI (GenAI) tools across diverse applications has amplified the importance of addressing societal biases inherent within these technologies. While the NLP community has extensively studied LLM bias, research investigating how non-expert users perceive and interact with biases from these systems remains limited. As these technologies become increasingly prevalent, understanding this question is crucial to inform model developers in their efforts to mitigate bias. To address this gap, this work presents the findings from a university-level competition, which challenged participants to design prompts for eliciting biased outputs from GenAI tools. We quantitatively and qualitatively analyze the competition submissions and identify a diverse set of biases in GenAI and strategies employed by participants to induce bias in GenAI. Our finding provides unique insights into how non-expert users perceive and interact with biases from GenAI tools.
Secret Use of Large Language Model (LLM)
Zhang, Zhiping, Shen, Chenxinran, Yao, Bingsheng, Wang, Dakuo, Li, Tianshi
The advancements of Large Language Models (LLMs) have decentralized the responsibility for the transparency of AI usage. Specifically, LLM users are now encouraged or required to disclose the use of LLM-generated content for varied types of real-world tasks. However, an emerging phenomenon, users' secret use of LLM, raises challenges in ensuring end users adhere to the transparency requirement. Our study used mixed-methods with an exploratory survey (125 real-world secret use cases reported) and a controlled experiment among 300 users to investigate the contexts and causes behind the secret use of LLMs. We found that such secretive behavior is often triggered by certain tasks, transcending demographic and personality differences among users. Task types were found to affect users' intentions to use secretive behavior, primarily through influencing perceived external judgment regarding LLM usage. Our results yield important insights for future work on designing interventions to encourage more transparent disclosure of the use of LLMs or other AI technologies.
Economic Anthropology in the Era of Generative Artificial Intelligence
Sheldon, Zachary, Kumar, Peeyush
To model Callon's position in the form of an LLM, one need only train the model on the already available textual corpus of post-industrial Western capitalism, given that any performed instance of the discourse token "economics" or "the economy" will form the statistically average center of attention for an associative network of other, token-level terms. However, although token-level linguistic performatives of the kind described by Callon have played a historically outsized role in the economies of capitalist states, the power of the performative token does not exhaust the concept of economics as a field of human "social creativity", understood as the linguistically/symbolically mediated, historically/mythologically self-consciousness agency of intelligent beings conceptualizing and transforming their own conditions of existence (Graeber 2012). Marcel Mauss, on the other hand, acknowledged the formal autonomy of generative exchange as an existentially human practice that took up various "forms and reasons" across different cases, opening the possibility for theorizing type-level conceptual distinctions based on their functional parallelism across diverse societies, and, in Mauss's own radical argument, even identifying deficiencies in the dominant form of exchange from the perspective of non-dominant forms. Insofar as reflective attention to ethnographic type-tokens like kula, potlach, or mana enhances human economic anthropologists' capacity to recognize patterns of value-creation and transformation within any new set of ethnographic data, a Maussian methodology can meaningfully inform the mechanics of machine learning and provide a touchstone for the integration of anthropological knowledge with AI research. In a future publication, Sheldon will elaborate on this contrast between the "flat ontology" of Actor Network Theory and the "depth ontology" that continues to be generatively employed by logicians, mathematicians, and computer scientists (as well as mystics, magicians, and illusionists), both ancient and modern.
Writing backwards can trick an AI into providing a bomb recipe
State-of-the-art generative AI models like ChatGPT can be tricked into giving instructions on how to make a bomb by simply writing the request in reverse, warn researchers. Large language models (LLMs) like ChatGPT are trained on vast swathes of data from the internet and can create a range of outputs โ some of which their makers would prefer didn't spill out again. Unshackled, they are equally likely to be able to provide a decent cake recipe as know how to make explosives from household chemicals.
ChatGPT's desktop app finally comes to Windows, with features missing
Even though Microsoft is a heavy investor in OpenAI, the company behind ChatGPT chose to first release the AI chatbot's desktop app on macOS before Windows back in May of this year. Now, the wait is over for Windows users. The ChatGPT desktop app is finally available on Windows, but with an important caveat: this is an "early version" for paid subscribers who are part of ChatGPT's Plus, Team, Enterprise, or Edu plans. Once you install the app, all you need to do is press the Alt Space keyboard shortcut to launch a new conversation with ChatGPT. The desktop app has access to OpenAI's latest AI models, and you can perform all the core tasks you'd expect from ChatGPT, including asking it questions, having it analyze images, and uploading files to it.
Silicon Valley Takes Artificial General Intelligence Seriously--Washington Must Too
Artificial General Intelligence--machines that can learn and perform any cognitive task that a human can--has long been relegated to the realm of science fiction. But recent developments show that AGI is no longer a distant speculation; it's an impending reality that demands our immediate attention. On Sept. 17, during a Senate Judiciary Subcommittee hearing titled "Oversight of AI: Insiders' Perspectives," whistleblowers from leading AI companies sounded the alarm on the rapid advancement toward AGI and the glaring lack of oversight. Helen Toner, a former board member of OpenAI and director of strategy at Georgetown University's Center for Security and Emerging Technology, testified that, "The biggest disconnect that I see between AI insider perspectives and public perceptions of AI companies is when it comes to the idea of artificial general intelligence." She continued that leading AI companies such as OpenAI, Google, and Anthropic are "treating building AGI as an entirely serious goal."