Generative AI
Is the Digital Forensics and Incident Response Pipeline Ready for Text-Based Threats in LLM Era?
Bhandarkar, Avanti, Wilson, Ronald, Swarup, Anushka, Zhu, Mengdi, Woodard, Damon
In the era of generative AI, the widespread adoption of Neural Text Generators (NTGs) presents new cybersecurity challenges, particularly within the realms of Digital Forensics and Incident Response (DFIR). These challenges primarily involve the detection and attribution of sources behind advanced attacks like spearphishing and disinformation campaigns. As NTGs evolve, the task of distinguishing between human and NTG-authored texts becomes critically complex. This paper rigorously evaluates the DFIR pipeline tailored for text-based security systems, specifically focusing on the challenges of detecting and attributing authorship of NTG-authored texts. By introducing a novel human-NTG co-authorship text attack, termed CS-ACT, our study uncovers significant vulnerabilities in traditional DFIR methodologies, highlighting discrepancies between ideal scenarios and real-world conditions. Utilizing 14 diverse datasets and 43 unique NTGs, up to the latest GPT-4, our research identifies substantial vulnerabilities in the forensic profiling phase, particularly in attributing authorship to NTGs. Our comprehensive evaluation points to factors such as model sophistication and the lack of distinctive style within NTGs as significant contributors for these vulnerabilities. Our findings underscore the necessity for more sophisticated and adaptable strategies, such as incorporating adversarial learning, stylizing NTGs, and implementing hierarchical attribution through the mapping of NTG lineages to enhance source attribution. This sets the stage for future research and the development of more resilient text-based security systems.
How Do Students Interact with an LLM-powered Virtual Teaching Assistant in Different Educational Settings?
Maiti, Pratyusha, Goel, Ashok K.
In Jill Watson has been equipped with OpenAI's GPT-this paper, we analyze student interactions with Jill across 3.5 Turbo model, accessed via the OpenAI API, and coupled multiple courses and colleges, focusing on the types and with several other technologies to facilitate more nuanced, complexity of student questions based on Bloom's Revised context-aware, and safe interactions with students. Jill has Taxonomy and tool usage patterns. We find that, by supporting been deployed in both online and offline classrooms[10] across a wide range of cognitive demands, Jill encourages different educational institutes and courses. This paper examines students to engage in sophisticated, higher-order cognitive student interactions with Jill Watson, to understand questions. However, the frequency of usage varies significantly how AI-based educational tools may engage students in meaningful across deployments, and the types of questions asked and deeper learning experiences.
Silicon Valley's 'Audacity Crisis'
Two years ago, OpenAI released the public beta of DALL-E 2, an image-generation tool that immediately signified that we'd entered a new technological era. Trained off a huge body of data, DALL-E 2 produced unsettlingly good, delightful, and frequently unexpected outputs; my Twitter feed filled up with images derived from prompts such as close-up photo of brushing teeth with toothbrush covered with nacho cheese. Suddenly, it seemed as though machines could create just about anything in response to simple prompts. You likely know the story from there: A few months later, ChatGPT arrived, millions of people started using it, the student essay was pronounced dead, Web3 entrepreneurs nearly broke their ankles scrambling to pivot their companies to AI, and the technology industry was consumed by hype. The generative-AI revolution began in earnest.
Building a Domain-specific Guardrail Model in Production
Niknazar, Mohammad, Haley, Paul V, Ramanan, Latha, Truong, Sang T., Shrinivasan, Yedendra, Bhowmick, Ayan Kumar, Dey, Prasenjit, Jagmohan, Ashish, Maheshwari, Hema, Ponoth, Shom, Smith, Robert, Vempaty, Aditya, Haber, Nick, Koyejo, Sanmi, Sundararajan, Sharad
Generative AI holds the promise of enabling a range of sought-after capabilities and revolutionizing workflows in various consumer and enterprise verticals. However, putting a model in production involves much more than just generating an output. It involves ensuring the model is reliable, safe, performant and also adheres to the policy of operation in a particular domain. Guardrails as a necessity for models has evolved around the need to enforce appropriate behavior of models, especially when they are in production. In this paper, we use education as a use case, given its stringent requirements of the appropriateness of content in the domain, to demonstrate how a guardrail model can be trained and deployed in production. Specifically, we describe our experience in building a production-grade guardrail model for a K-12 educational platform. We begin by formulating the requirements for deployment to this sensitive domain. We then describe the training and benchmarking of our domain-specific guardrail model, which outperforms competing open- and closed- instruction-tuned models of similar and larger size, on proprietary education-related benchmarks and public benchmarks related to general aspects of safety. Finally, we detail the choices we made on architecture and the optimizations for deploying this service in production; these range across the stack from the hardware infrastructure to the serving layer to language model inference optimizations. We hope this paper will be instructive to other practitioners looking to create production-grade domain-specific services based on generative AI and large language models.
A Survey Forest Diagram : Gain a Divergent Insight View on a Specific Research Topic
Li, Jinghong, Gu, Wen, Ota, Koichi, Hasegawa, Shinobu
With the exponential growth in the number of papers and the trend of AI research, the use of Generative AI for information retrieval and question-answering has become popular for conducting research surveys. However, novice researchers unfamiliar with a particular field may not significantly improve their efficiency in interacting with Generative AI because they have not developed divergent thinking in that field. This study aims to develop an in-depth Survey Forest Diagram that guides novice researchers in divergent thinking about the research topic by indicating the citation clues among multiple papers, to help expand the survey perspective for novice researchers.
IncidentResponseGPT: Generating Traffic Incident Response Plans with Generative Artificial Intelligence
Grigorev, Artur, Saleh, Adriana-Simona Mihaita Khaled, Ou, Yuming
The proposed IncidentResponseGPT framework - a novel system that applies generative artificial intelligence (AI) to potentially enhance the efficiency and effectiveness of traffic incident response. This model allows for synthesis of region-specific incident response guidelines and generates incident response plans adapted to specific area, aiming to expedite decision-making for traffic management authorities. This approach aims to accelerate incident resolution times by suggesting various recommendations (e.g. optimal rerouting strategies, estimating resource needs) to minimize the overall impact on the urban traffic network. The system suggests specific actions, including dynamic lane closures, optimized rerouting and dispatching appropriate emergency resources. IncidentResponseGPT employs the Technique for Order Preference by Similarity to Ideal Solution (TOPSIS) to rank generated response plans based on criteria like impact minimization and resource efficiency based on their proximity to an human-proposed solution.
Testing Large Language Models on Driving Theory Knowledge and Skills for Connected Autonomous Vehicles
Tang, Zuoyin, He, Jianhua, Pei, Dashuai, Liu, Kezhong, Gao, Tao
Handling long tail corner cases is a major challenge faced by autonomous vehicles (AVs). While large language models (LLMs) hold great potentials to handle the corner cases with excellent generalization and explanation capabilities and received increasing research interest on application to autonomous driving, there are still technical barriers to be tackled, such as strict model performance and huge computing resource requirements of LLMs. In this paper, we investigate a new approach of applying remote or edge LLMs to support autonomous driving. A key issue for such LLM assisted driving system is the assessment of LLMs on their understanding of driving theory and skills, ensuring they are qualified to undertake safety critical driving assistance tasks for CAVs. We design and run driving theory tests for several proprietary LLM models (OpenAI GPT models, Baidu Ernie and Ali QWen) and open-source LLM models (Tsinghua MiniCPM-2B and MiniCPM-Llama3-V2.5) with more than 500 multiple-choices theory test questions. Model accuracy, cost and processing latency are measured from the experiments. Experiment results show that while model GPT-4 passes the test with improved domain knowledge and Ernie has an accuracy of 85% (just below the 86% passing threshold), other LLM models including GPT-3.5 fail the test. For the test questions with images, the multimodal model GPT4-o has an excellent accuracy result of 96%, and the MiniCPM-Llama3-V2.5 achieves an accuracy of 76%. While GPT-4 holds stronger potential for CAV driving assistance applications, the cost of using model GPT4 is much higher, almost 50 times of that of using GPT3.5. The results can help make decision on the use of the existing LLMs for CAV applications and balancing on the model performance and cost.
Visual Stereotypes of Autism Spectrum in DALL-E, Stable Diffusion, SDXL, and Midjourney
Wodziński, Maciej, Rządeczka, Marcin, Szuła, Anastazja, Sokół, Marta, Moskalewicz, Marcin
Avoiding systemic discrimination requires investigating AI models' potential to propagate stereotypes resulting from the inherent biases of training datasets. Our study investigated how text-to-image models unintentionally perpetuate non-rational beliefs regarding autism. The research protocol involved generating images based on 53 prompts aimed at visualizing concrete objects and abstract concepts related to autism across four models: DALL-E, Stable Diffusion, SDXL, and Midjourney (N=249). Expert assessment of results was performed via a framework of 10 deductive codes representing common stereotypes contested by the community regarding their presence and spatial intensity, quantified on ordinal scales and subject to statistical analysis of inter-rater reliability and size effects. The models frequently utilised controversial themes and symbols which were unevenly distributed, however, with striking homogeneity in terms of skin colour, gender, and age, with autistic individuals portrayed as engaged in solitary activities, interacting with objects rather than people, and displaying stereotypical emotional expressions such as pale, anger, or sad. Secondly we observed representational insensitivity regarding autism images despite directional prompting aimed at falsifying the above results. Additionally, DALL-E explicitly denied perpetuating stereotypes. We interpret this as ANNs mirroring the human cognitive architecture regarding the discrepancy between background and reflective knowledge, as justified by our previous research on autism-related stereotypes in humans.
Meta launches open-source AI app 'competitive' with closed rivals
Meta has claimed that its new artificial intelligence model is the first open-source system that will rival products from competitors such as OpenAI and Anthropic. In a blogpost, the company said its new model, with the unwieldy name of Llama 3.1 405B, "is competitive" with others – including those from OpenAI and Anthropic – "across a range of tasks". If true, it would mean that for the first time, one of the most powerful AI models in the world is available without an intermediary charging for access – or controlling what its technology is used for. "Developers can fully customise the models for their needs and applications, train on new datasets, and conduct additional fine-tuning," Meta said. "This enables the broader developer community and the world to more fully realise the power of generative AI. Developers can fully customise for their applications and run in any environment … all without sharing data with Meta."
Llama 3.1 is Meta's latest salvo in the battle for AI dominance
Meta on Tuesday announced the release of Llama 3.1, the latest version of its large language model that the company claims now rivals competitors from OpenAI and Anthropic. The new model comes just three months after Meta launched Llama 3 by integrating it into Meta AI, a chatbot that now lives in Facebook, Messenger, Instagram and WhatsApp and also powers the company's smart glasses. In the interim, OpenAI and Anthropic already released new versions of their own AI models, a sign that Silicon Valley's AI arms race isn't slowing down any time soon. In a blog post, Meta said that the new model, called Llama 3.1 405B, is the first openly available model that can compete available rivals in general knowledge, math skills and translating across multiple languages. The model was trained on more than 16,000 NVIDIA H100 GPUs, currently the fastest available chips that cost roughly 25,000 each, and can beat rivals on over 150 benchmarks, Meta claimed.