Goto

Collaborating Authors

 Generative AI


The Use of Large Language Models (LLM) for Cyber Threat Intelligence (CTI) in Cybercrime Forums

arXiv.org Artificial Intelligence

Large language models (LLMs) can be used to analyze cyber threat intelligence (CTI) data from cybercrime forums, which contain extensive information and key discussions about emerging cyber threats. However, to date, the level of accuracy and efficiency of LLMs for such critical tasks has yet to be thoroughly evaluated. Hence, this study assesses the accuracy of an LLM system built on the OpenAI GPT-3.5-turbo model [7] to extract CTI information. To do so, a random sample of 500 daily conversations from three cybercrime forums, XSS, Exploit_in, and RAMP, was extracted, and the LLM system was instructed to summarize the conversations and code 10 key CTI variables, such as whether a large organization and/or a critical infrastructure is being targeted. Then, two coders reviewed each conversation and evaluated whether the information extracted by the LLM was accurate. The LLM system performed strikingly well, with an average accuracy score of 98%. Various ways to enhance the model were uncovered, such as the need to help the LLM distinguish between stories and past events, as well as being careful with verb tenses in prompts. Nevertheless, the results of this study highlight the efficiency and relevance of using LLMs for cyber threat intelligence.


An Autonomous GIS Agent Framework for Geospatial Data Retrieval

arXiv.org Artificial Intelligence

Abstract: Powered by the emerging large language models (LLMs), autonomous geographic information systems (GIS) agents have the potential to accomplish spatial analyses and cartographic tasks. However, a research gap exists to support fully autonomous GIS agents: how to enable agents to discover and download the necessary data for geospatial analyses. This study proposes an autonomous GIS agent framework capable of retrieving required geospatial data by generating, executing, and debugging programs. The framework utilizes the LLM as the decision-maker, selects the appropriate data source (s) from a pre-defined source list, and fetches the data from the chosen source. Each data source has a handbook that records the metadata and technical details for data retrieval. The proposed framework is designed in a plug-and-play style to ensure flexibility and extensibility. Human users or autonomous data scrawlers can add new data sources by adding new handbooks. We developed a prototype agent based on the framework, released as a QGIS plugin (GeoData Retrieve Agent) and a Python program. Experiment results demonstrate its capability of retrieving data from various sources including OpenStreetMap, administrative boundaries and demographic data from the US Census Bureau, satellite basemaps from ESRI World Imagery, global digital elevation model (DEM) from OpenTopography.org, Our study is among the first attempts to develop an autonomous geospatial data retrieval agent. Keywords: autonomous GIS; geospatial data retrieval; large language models; generative AI; GIS agent; AI assistant 1 Introduction In recent years, large language models (LLMs) have drawn tremendous attention from researchers.


Could ChatGPT get an Engineering Degree? Evaluating Higher Education Vulnerability to AI Assistants

arXiv.org Artificial Intelligence

AI assistants are being increasingly used by students enrolled in higher education institutions. While these tools provide opportunities for improved teaching and education, they also pose significant challenges for assessment and learning outcomes. We conceptualize these challenges through the lens of vulnerability, the potential for university assessments and learning outcomes to be impacted by student use of generative AI. We investigate the potential scale of this vulnerability by measuring the degree to which AI assistants can complete assessment questions in standard university-level STEM courses. Specifically, we compile a novel dataset of textual assessment questions from 50 courses at EPFL and evaluate whether two AI assistants, GPT-3.5 and GPT-4 can adequately answer these questions. We use eight prompting strategies to produce responses and find that GPT-4 answers an average of 65.8% of questions correctly, and can even produce the correct answer across at least one prompting strategy for 85.1% of questions. When grouping courses in our dataset by degree program, these systems already pass non-project assessments of large numbers of core courses in various degree programs, posing risks to higher education accreditation that will be amplified as these models improve. Our results call for revising program-level assessment design in higher education in light of advances in generative AI.


Patchview: LLM-Powered Worldbuilding with Generative Dust and Magnet Visualization

arXiv.org Artificial Intelligence

Large language models (LLMs) can help writers build story worlds by generating world elements, such as factions, characters, and locations. However, making sense of many generated elements can be overwhelming. Moreover, if the user wants to precisely control aspects of generated elements that are difficult to specify verbally, prompting alone may be insufficient. We introduce Patchview, a customizable LLM-powered system that visually aids worldbuilding by allowing users to interact with story concepts and elements through the physical metaphor of magnets and dust. Elements in Patchview are visually dragged closer to concepts with high relevance, facilitating sensemaking. The user can also steer the generation with verbally elusive concepts by indicating the desired position of the element between concepts. When the user disagrees with the LLM's visualization and generation, they can correct those by repositioning the element. These corrections can be used to align the LLM's future behaviors to the user's perception. With a user study, we show that Patchview supports the sensemaking of world elements and steering of element generation, facilitating exploration during the worldbuilding process. Patchview provides insights on how customizable visual representation can help sensemake, steer, and align generative AI model behaviors with the user's intentions.


Generative Design of Periodic Orbits in the Restricted Three-Body Problem

arXiv.org Artificial Intelligence

The Three-Body Problem has fascinated scientists for centuries and it has been crucial in the design of modern space missions. Recent developments in Generative Artificial Intelligence hold transformative promise for addressing this longstanding problem. This work investigates the use of Variational Autoencoder (VAE) and its internal representation to generate periodic orbits. We utilize a comprehensive dataset of periodic orbits in the Circular Restricted Three-Body Problem (CR3BP) to train deep-learning architectures that capture key orbital characteristics, and we set up physical evaluation metrics for the generated trajectories. Through this investigation, we seek to enhance the understanding of how Generative AI can improve space mission planning and astrodynamics research, leading to novel, data-driven approaches in the field.


A Comparison of LLM Finetuning Methods & Evaluation Metrics with Travel Chatbot Use Case

arXiv.org Artificial Intelligence

This research compares large language model (LLM) fine-tuning methods, including Quantized Low Rank Adapter (QLoRA), Retrieval Augmented fine-tuning (RAFT), and Reinforcement Learning from Human Feedback (RLHF), and additionally compared LLM evaluation methods including End to End (E2E) benchmark method of "Golden Answers", traditional natural language processing (NLP) metrics, RAG Assessment (Ragas), OpenAI GPT-4 evaluation metrics, and human evaluation, using the travel chatbot use case. The travel dataset was sourced from the the Reddit API by requesting posts from travel-related subreddits to get travel-related conversation prompts and personalized travel experiences, and augmented for each fine-tuning method. We used two pretrained LLMs utilized for fine-tuning research: LLaMa 2 7B, and Mistral 7B. QLoRA and RAFT are applied to the two pretrained models. The inferences from these models are extensively evaluated against the aforementioned metrics. The best model according to human evaluation and some GPT-4 metrics was Mistral RAFT, so this underwent a Reinforcement Learning from Human Feedback (RLHF) training pipeline, and ultimately was evaluated as the best model. Our main findings are that: 1) quantitative and Ragas metrics do not align with human evaluation, 2) Open AI GPT-4 evaluation most aligns with human evaluation, 3) it is essential to keep humans in the loop for evaluation because, 4) traditional NLP metrics insufficient, 5) Mistral generally outperformed LLaMa, 6) RAFT outperforms QLoRA, but still needs postprocessing, 7) RLHF improves model performance significantly. Next steps include improving data quality, increasing data quantity, exploring RAG methods, and focusing data collection on a specific city, which would improve data quality by narrowing the focus, while creating a useful product.


Deep Generative Models for Subgraph Prediction

arXiv.org Artificial Intelligence

Graph Neural Networks (GNNs) are important across different domains, such as social network analysis and recommendation systems, due to their ability to model complex relational data. This paper introduces subgraph queries as a new task for deep graph learning. Unlike traditional graph prediction tasks that focus on individual components like link prediction or node classification, subgraph queries jointly predict the components of a target subgraph based on evidence that is represented by an observed subgraph. For instance, a subgraph query can predict a set of target links and/or node labels. To answer subgraph queries, we utilize a probabilistic deep Graph Generative Model. Specifically, we inductively train a Variational Graph Auto-Encoder (VGAE) model, augmented to represent a joint distribution over links, node features and labels. Bayesian optimization is used to tune a weighting for the relative importance of links, node features and labels in a specific domain. We describe a deterministic and a sampling-based inference method for estimating subgraph probabilities from the VGAE generative graph distribution, without retraining, in zero-shot fashion. For evaluation, we apply the inference methods on a range of subgraph queries on six benchmark datasets. We find that inference from a model achieves superior predictive performance, surpassing independent prediction baselines with improvements in AUC scores ranging from 0.06 to 0.2 points, depending on the dataset.


OpenAI's unreleased ChatGPT detector could catch cheating students

PCWorld

OpenAI disclosed Sunday that the company has new methods to tell if the content you're reading has been authored by ChatGPT, though they haven't been released. The method of watermarking its own AI-generated content apparently works, according to OpenAI. But it also can be apparently easily duped. The company is also considering how releasing and applying the tool might be used to stigmatize groups that use AI as an actual writing tool, including those who can't speak the language well enough to generate polished content. A second method, metadata, holds more promise.


Elon Musk sues OpenAI again, alleging 'deceit of Shakespearean proportions'

The Guardian

Elon Musk is once again suing OpenAI and its chief executive, Sam Altman, resurrecting a legal battle against his former partners with a case that now claims they manipulated him into co-founding the artificial intelligence company. Months after abruptly withdrawing a similar lawsuit without explanation, Musk filed a new lawsuit on Monday in a northern California federal court. OpenAI denied the allegations in a statement to the Guardian, pointing to its previous blogposts about Musk's initial lawsuit earlier this year. Musk's latest complaint claims the case is a "textbook tale of altruism versus greed", repeating allegations in his previous suit that his former co-founders in OpenAI betrayed him by turning the company from a non-profit into a largely for-profit enterprise. "The perfidy and deceit is of Shakespearean proportions," it states.


Elon Musk drags OpenAI into federal court

Engadget

Elon Musk has filed another lawsuit against OpenAI and the company's CEO Sam Altman, two months after withdrawing a previous one. Musk once again alleges that OpenAI breached its founding commitments by putting commercial concerns ahead of the public good. This time around, though, the suit has been filed in federal court rather than in a state court. That's because the new filing alleges that OpenAI violated federal racketeering laws by conspiring to defraud Musk, according to his lawyer, Marc Toberoff. "The previous suit lacked teeth -- and I don't believe in the tooth fairy," Toberoff told The New York Times. "This is a much more forceful lawsuit."