Generative AI
On the Promises and Challenges of Multimodal Foundation Models for Geographical, Environmental, Agricultural, and Urban Planning Applications
Tan, Chenjiao, Cao, Qian, Li, Yiwei, Zhang, Jielu, Yang, Xiao, Zhao, Huaqin, Wu, Zihao, Liu, Zhengliang, Yang, Hao, Wu, Nemin, Tang, Tao, Ye, Xinyue, Chai, Lilong, Liu, Ninghao, Li, Changying, Mu, Lan, Liu, Tianming, Mai, Gengchen
The advent of large language models (LLMs) has heightened interest in their potential for multimodal applications that integrate language and vision. This paper explores the capabilities of GPT-4V in the realms of geography, environmental science, agriculture, and urban planning by evaluating its performance across a variety of tasks. Data sources comprise satellite imagery, aerial photos, ground-level images, field images, and public datasets. The model is evaluated on a series of tasks including geo-localization, textual data extraction from maps, remote sensing image classification, visual question answering, crop type identification, disease/pest/weed recognition, chicken behavior analysis, agricultural object counting, urban planning knowledge question answering, and plan generation. The results indicate the potential of GPT-4V in geo-localization, land cover classification, visual question answering, and basic image understanding. However, there are limitations in several tasks requiring fine-grained recognition and precise counting. While zero-shot learning shows promise, performance varies across problem domains and image complexities. The work provides novel insights into GPT-4V's capabilities and limitations for real-world geospatial, environmental, agricultural, and urban planning challenges. Further research should focus on augmenting the model's knowledge and reasoning for specialized domains through expanded training. Overall, the analysis demonstrates foundational multimodal intelligence, highlighting the potential of multimodal foundation models (FMs) to advance interdisciplinary applications at the nexus of computer vision and language.
Large Generative AI Models for Telecom: The Next Big Thing?
Bariah, Lina, Zhao, Qiyang, Zou, Hang, Tian, Yu, Bader, Faouzi, Debbah, Merouane
The evolution of generative artificial intelligence (GenAI) constitutes a turning point in reshaping the future of technology in different aspects. Wireless networks in particular, with the blooming of self-evolving networks, represent a rich field for exploiting GenAI and reaping several benefits that can fundamentally change the way how wireless networks are designed and operated nowadays. To be specific, large GenAI models are envisioned to open up a new era of autonomous wireless networks, in which multi-modal GenAI models trained over various Telecom data, can be fine-tuned to perform several downstream tasks, eliminating the need for building and training dedicated AI models for each specific task and paving the way for the realization of artificial general intelligence (AGI)-empowered wireless networks. In this article, we aim to unfold the opportunities that can be reaped from integrating large GenAI models into the Telecom domain. In particular, we first highlight the applications of large GenAI models in future wireless networks, defining potential use-cases and revealing insights on the associated theoretical and practical challenges. Furthermore, we unveil how 6G can open up new opportunities through connecting multiple on-device large GenAI models, and hence, paves the way to the collective intelligence paradigm. Finally, we put a forward-looking vision on how large GenAI models will be the key to realize self-evolving networks.
Building AI Safely Is Getting Harder and Harder
This is Atlantic Intelligence, an eight-week series in which The Atlantic's leading thinkers on AI will help you understand the complexity and opportunities of this groundbreaking technology. The bedrock of the AI revolution is the internet, or more specifically, the ever-expanding bounty of data that the web makes available to train algorithms. ChatGPT, Midjourney, and other generative-AI models "learn" by detecting patterns in massive amounts of text, images, and videos scraped from the internet. The process entails hoovering up huge quantities of books, art, memes, and, inevitably, the troves of racist, sexist, and illicit material distributed across the web. Earlier this week, Stanford researchers found a particularly alarming example of that toxicity: The largest publicly available image data set used to train AIs, LAION-5B, reportedly contains more than 1,000 images depicting the sexual abuse of children, out of more than 5 billion in total.
The Big Questions About AI in 2024
Let us be thankful for the AI industry. Its leaders may be nudging humans closer to extinction, but this year, they provided us with a gloriously messy spectacle of progress. When I say "year," I mean the long year that began late last November, when OpenAI released ChatGPT and, in doing so, launched generative AI into the cultural mainstream. In the months that followed, politicians, teachers, Hollywood screenwriters, and just about everyone else tried to understand what this means for their future. Cash fire-hosed into AI companies, and their executives, now glowed up into international celebrities, fell into Succession-style infighting.
NVIDIA at the Center of the Generative AI Ecosystem โ For Now
Generative AI has attracted worldwide attention as a foundational technology with almost unlimited applications (see my previous column, "Generative AI as a New Innovation Platform," Communications, October 2023). Goldman Sachs estimates applications of this technology could raise global GNP by $7 trillion (7%) during the next decade.9 At the center of the new ecosystem is NVIDIA, whose high-end graphical processing units (GPUs) account for approximately 80% of the market for GPUs that power generative AI software.5,20 Established in 1993, NVIDIA's founders, led by Jen-Hsun (Jensen) Huang, initially saw a need for powerful specialized chips that could take over graphics processing from PC or workstation central processing units (CPUs), a market dominated by Intel. The company went public in 1999 and exceeded $1 billion in revenue in 2002.
Why Are Lawyers Afraid of AI?
Andrew Perlman, Dean of the Suffolk University School of Law in Boston, is no stranger to examining innovative legal technology, but his recent experiment with generative artificial intelligence (AI)--Open AI's ChatGPT, to be precise--led him to think the technology may create bigger changes to the way law is practiced than the Internet itself. Perlman published one of the legal community's first evaluations of ChatGPT's capabilities in creating convincing arguments and answers to typical questions, in "The Implications of ChatGPT For Legal Services and Society" (https://bit.ly/3NuhFxG),
Pub/Sub Message Brokers for GenAI
Saleh, Alaa, Pirttikangas, Susanna, Lovรฉn, Lauri
In today's digital world, Generative Artificial Intelligence (GenAI) such as Large Language Models (LLMs) is becoming increasingly prevalent, extending its reach across diverse applications. This surge in adoption has sparked a significant increase in demand for data-centric GenAI models, highlighting the necessity for robust data communication infrastructures. Central to this need are message brokers, which serve as essential channels for data transfer within various system components. This survey aims to delve into a comprehensive analysis of traditional and modern message brokers, offering a comparative study of prevalent platforms. Our study considers numerous criteria including, but not limited to, open-source availability, integrated monitoring tools, message prioritization mechanisms, capabilities for parallel processing, reliability, distribution and clustering functionalities, authentication processes, data persistence strategies, fault tolerance, and scalability. Furthermore, we explore the intrinsic constraints that the design and operation of each message broker might impose, recognizing that these limitations are crucial in understanding their real-world applicability. We then leverage these insights to propose a sophisticated message broker framework -- one designed with the adaptability and robustness necessary to meet the evolving requisites of GenAI applications. Finally, this study examines the enhancement of message broker mechanisms specifically for GenAI contexts, emphasizing the criticality of developing a versatile message broker framework. Such a framework would be poised for quick adaptation, catering to the dynamic and growing demands of GenAI in the foreseeable future. Through this dual-pronged approach, we intend to contribute a foundational compendium that can guide future innovations and infrastructural advancements in the realm of GenAI data communication.
Asymmetric Bias in Text-to-Image Generation with Adversarial Attacks
Shahgir, Haz Sameen, Kong, Xianghao, Steeg, Greg Ver, Dong, Yue
The widespread use of Text-to-Image (T2I) models in content generation requires careful examination of their safety, including their robustness to adversarial attacks. Despite extensive research into this, the reasons for their effectiveness are underexplored. This paper presents an empirical study on adversarial attacks against T2I models, focusing on analyzing factors associated with attack success rates (ASRs). We introduce a new attack objective - entity swapping using adversarial suffixes and two gradient-based attack algorithms. Human and automatic evaluations reveal the asymmetric nature of ASRs on entity swap: for example, it is easier to replace "human" with "robot" in the prompt "a human dancing in the rain." with an adversarial suffix but is significantly harder in reverse. We further propose probing metrics to establish indicative signals from the model's beliefs to the adversarial ASR. We identify conditions resulting in a 60% success probability for adversarial attacks and others where this likelihood drops below 5%.
Next Steps for Human-Centered Generative AI: A Technical Perspective
Chen, Xiang 'Anthony', Burke, Jeff, Du, Ruofei, Hong, Matthew K., Jacobs, Jennifer, Laban, Philippe, Li, Dingzeyu, Peng, Nanyun, Willis, Karl D. D., Wu, Chien-Sheng, Zhou, Bolei
Through iterative, cross-disciplinary discussions, we define and propose next-steps for Human-centered Generative AI (HGAI). We contribute a comprehensive research agenda that lays out future directions of Generative AI spanning three levels: aligning with human values; assimilating human intents; and augmenting human abilities. By identifying these next-steps, we intend to draw interdisciplinary research teams to pursue a coherent set of emergent ideas in HGAI, focusing on their interested topics while maintaining a coherent big picture of the future work landscape.
Spear Phishing With Large Language Models
Recent progress in artificial intelligence (AI), particularly in the domain of large language models (LLMs), has resulted in powerful and versatile dual-use systems. This intelligence can be put towards a wide variety of beneficial tasks, yet it can also be used to cause harm. This study explores one such harm by examining how LLMs can be used for spear phishing, a form of cybercrime that involves manipulating targets into divulging sensitive information. I first explore LLMs' ability to assist with the reconnaissance and message generation stages of a spear phishing attack, where I find that LLMs are capable of assisting with the email generation phase of a spear phishing attack. To explore how LLMs could potentially be harnessed to scale spear phishing campaigns, I then create unique spear phishing messages for over 600 British Members of Parliament using OpenAI's GPT-3.5 and GPT-4 models. My findings provide some evidence that these messages are not only realistic but also cost-effective, with each email costing only a fraction of a cent to generate. Next, I demonstrate how basic prompt engineering can circumvent safeguards installed in LLMs, highlighting the need for further research into robust interventions that can help prevent models from being misused. To further address these evolving risks, I explore two potential solutions: structured access schemes, such as application programming interfaces, and LLM-based defensive systems.