Goto

Collaborating Authors

 Generative AI


Integrating Artificial Open Generative Artificial Intelligence into Software Supply Chain Security

arXiv.org Artificial Intelligence

While new technologies emerge, human errors always looming. Software supply chain is increasingly complex and intertwined, the security of a service has become paramount to ensuring the integrity of products, safeguarding data privacy, and maintaining operational continuity. In this work, we conducted experiments on the promising open Large Language Models (LLMs) into two main software security challenges: source code language errors and deprecated code, with a focus on their potential to replace conventional static and dynamic security scanners that rely on predefined rules and patterns. Our findings suggest that while LLMs present some unexpected results, they also encounter significant limitations, particularly in memory complexity and the management of new and unfamiliar data patterns. Despite these challenges, the proactive application of LLMs, coupled with extensive security databases and continuous updates, holds the potential to fortify Software Supply Chain (SSC) processes against emerging threats.


A Self-Efficacy Theory-based Study on the Teachers Readiness to Teach Artificial Intelligence in Public Schools in Sri Lanka

arXiv.org Artificial Intelligence

The need for and challenges of teaching artificial intelligence (AI) at primary, secondary, and upper-secondary levels have been a major focus of recent academic discussions [1],[2],[3]. Often referred to as AI4K12 [4], this area explores global initiatives that introduce AI to students from kindergarten through high school. The rapid advancements in deep learning and generative AI technologies suggest AI will become a transformative force. This realisation has prompted governments and policymakers to recognise the need to prepare future citizens for a world heavily influenced by AI. As AI becomes increasingly integrated into information systems, concerns are mounting about citizens' ability to use these systems responsibly and understand the consequences of not doing so [5]. Furthermore, anxieties regarding AI's potential impact on societal sustainability highlight the need to equip future workforces with the skills to combine human creativity with AI's potential to create sustainable systems.


Did artificial intelligence shape the 2024 US election?

Al Jazeera

Days after New Hampshire voters received a robocall with an artificially generated voice that resembled President Joe Biden's, the Federal Communications Commission banned the use of AI-generated voices in robocalls. The 2024 United States election would be the first to unfold amid wide public access to AI generators, which let people create images, audio and video โ€“ some for nefarious purposes. Institutions rushed to limit AI-enabled misdeeds. Sixteen states enacted legislation around AI's use in elections and campaigns; many of these states required disclaimers in synthetic media published close to an election. The Election Assistance Commission, a federal agency supporting election administrators, published an "AI toolkit" with tips election officials could use to communicate about elections in an age of fabricated information.


A theory of appropriateness with applications to generative artificial intelligence

arXiv.org Artificial Intelligence

What is appropriateness? Humans navigate a multi-scale mosaic of interlocking notions of what is appropriate for different situations. We act one way with our friends, another with our family, and yet another in the office. Likewise for AI, appropriate behavior for a comedy-writing assistant is not the same as appropriate behavior for a customer-service representative. What determines which actions are appropriate in which contexts? And what causes these standards to change over time? Since all judgments of AI appropriateness are ultimately made by humans, we need to understand how appropriateness guides human decision making in order to properly evaluate AI decision making and improve it. This paper presents a theory of appropriateness: how it functions in human society, how it may be implemented in the brain, and what it means for responsible deployment of generative AI technology.


LearnLM: Improving Gemini for Learning

arXiv.org Artificial Intelligence

Today's generative AI systems are tuned to present information by default rather than engage users in service of learning as a human tutor would. To address the wide range of potential education use cases for these systems, we reframe the challenge of injecting pedagogical behavior as one of \textit{pedagogical instruction following}, where training and evaluation examples include system-level instructions describing the specific pedagogy attributes present or desired in subsequent model turns. This framing avoids committing our models to any particular definition of pedagogy, and instead allows teachers or developers to specify desired model behavior. It also clears a path to improving Gemini models for learning -- by enabling the addition of our pedagogical data to post-training mixtures -- alongside their rapidly expanding set of capabilities. Both represent important changes from our initial tech report. We show how training with pedagogical instruction following produces a LearnLM model (available on Google AI Studio) that is preferred substantially by expert raters across a diverse set of learning scenarios, with average preference strengths of 31\% over GPT-4o, 11\% over Claude 3.5, and 13\% over the Gemini 1.5 Pro model LearnLM was based on.


New AI tool generates realistic satellite images of future flooding

AIHub

A generative AI model visualizes how floods in Texas would look like in satellite imagery. The original photo is on the left, and the AI generated image is in on the right. Visualizing the potential impacts of a hurricane on people's homes before it hits can help residents prepare and decide whether to evacuate. MIT scientists have developed a method that generates satellite imagery from the future to depict how a region would look after a potential flooding event. The method combines a generative artificial intelligence model with a physics-based flood model to create realistic, birds-eye-view images of a region, showing where flooding is likely to occur given the strength of an oncoming storm.


ChatGPT search tool vulnerable to manipulation and deception, tests show

The Guardian

OpenAI's ChatGPT search tool may be open to manipulation using hidden content, and can return malicious code from websites it searches, a Guardian investigation has found. The Guardian's journalism is independent. We will earn a commission if you buy something through an affiliate link. OpenAI has made the search product available to paying customers and is encouraging users to make it their default search tool. But the investigation has revealed potential security issues with the new system.


Tokens, the oft-overlooked appetizer: Large language models, the distributional hypothesis, and meaning

arXiv.org Artificial Intelligence

Tokenization is a necessary component within the current architecture of many language models, including the transformer-based large language models (LLMs) of Generative AI, yet its impact on the model's cognition is often overlooked. We argue that LLMs demonstrate that the Distributional Hypothesis (DH) is sufficient for reasonably human-like language performance, and that the emergence of human-meaningful linguistic units among tokens motivates linguistically-informed interventions in existing, linguistically-agnostic tokenization techniques, particularly with respect to their roles as (1) semantic primitives and as (2) vehicles for conveying salient distributional patterns from human language to the model. We explore tokenizations from a BPE tokenizer; extant model vocabularies obtained from Hugging Face and tiktoken; and the information in exemplar token vectors as they move through the layers of a RoBERTa (large) model. Besides creating sub-optimal semantic building blocks and obscuring the model's access to the necessary distributional patterns, we describe how tokenization pretraining can be a backdoor for bias and other unwanted content, which current alignment practices may not remediate. Additionally, we relay evidence that the tokenization algorithm's objective function impacts the LLM's cognition, despite being meaningfully insulated from the main system intelligence.


Video Is Worth a Thousand Images: Exploring the Latest Trends in Long Video Generation

arXiv.org Artificial Intelligence

An image may convey a thousand words, but a video composed of hundreds or thousands of image frames tells a more intricate story. Despite significant progress in multimodal large language models (MLLMs), generating extended videos remains a formidable challenge. As of this writing, OpenAI's Sora, the current state-of-the-art system, is still limited to producing videos that are up to one minute in length. This limitation stems from the complexity of long video generation, which requires more than generative AI techniques for approximating density functions essential aspects such as planning, story development, and maintaining spatial and temporal consistency present additional hurdles. Integrating generative AI with a divide-and-conquer approach could improve scalability for longer videos while offering greater control. In this survey, we examine the current landscape of long video generation, covering foundational techniques like GANs and diffusion models, video generation strategies, large-scale training datasets, quality metrics for evaluating long videos, and future research areas to address the limitations of the existing video generation capabilities. We believe it would serve as a comprehensive foundation, offering extensive information to guide future advancements and research in the field of long video generation.


The Value of AI-Generated Metadata for UGC Platforms: Evidence from a Large-scale Field Experiment

arXiv.org Artificial Intelligence

AI-generated content (AIGC), such as advertisement copy, product descriptions, and social media posts, is becoming ubiquitous in business practices. However, the value of AI-generated metadata, such as titles, remains unclear on user-generated content (UGC) platforms. To address this gap, we conducted a large-scale field experiment on a leading short-video platform in Asia to provide about 1 million users access to AI-generated titles for their uploaded videos. Our findings show that the provision of AI-generated titles significantly boosted content consumption, increasing valid watches by 1.6% and watch duration by 0.9%. When producers adopted these titles, these increases jumped to 7.1% and 4.1%, respectively. This viewership-boost effect was largely attributed to the use of this generative AI (GAI) tool increasing the likelihood of videos having a title by 41.4%. The effect was more pronounced for groups more affected by metadata sparsity. Mechanism analysis revealed that AI-generated metadata improved user-video matching accuracy in the platform's recommender system. Interestingly, for a video for which the producer would have posted a title anyway, adopting the AI-generated title decreased its viewership on average, implying that AI-generated titles may be of lower quality than human-generated ones. However, when producers chose to co-create with GAI and significantly revised the AI-generated titles, the videos outperformed their counterparts with either fully AI-generated or human-generated titles, showcasing the benefits of human-AI co-creation. This study highlights the value of AI-generated metadata and human-AI metadata co-creation in enhancing user-content matching and content consumption for UGC platforms.