Goto

Collaborating Authors

 chat log


Sears Exposed AI Chatbot Phone Calls and Text Chats to Anyone on the Web

WIRED

Customer conversations with chatbots can include contact information and personal details that make it easier for scammers to launch phishing attacks and commit fraud.


GAMER PAT: Research as a Serious Game

arXiv.org Artificial Intelligence

As generative AI increasingly outperforms students in producing academic writing, a critical question arises: how can we preserve the motivation, creativity, and intellectual growth of novice researchers in an age of automated academic achievement? This paper introduces GAMER PAT (GAme MastER, Paper Authoring Tutor), a prompt-engineered AI chatbot that reframes research paper writing as a serious game. Through role-playing mechanics, users interact with a co-author NPC and anonymous reviewer NPCs, turning feedback into "missions" and advancing through a narrative-driven writing process. Our study reports on 26+ gameplay chat logs, including both autoethnography and use by graduate students under supervision. Using qualitative log analysis with SCAT (Steps for Coding and Theorization), we identified an emergent four-phase scaffolding pattern: (1) question posing, (2) meta-perspective, (3) structuring, and (4) recursive reflection. These results suggest that GAMER PAT supports not only the structural development of research writing but also reflective and motivational aspects. We present this work as a descriptive account of concept and process, not a causal evaluation. We also include a speculative outlook envisioning how humans may continue to cultivate curiosity and agency alongside AI-driven research. This arXiv version thus provides both a descriptive report of design and usage, and a forward-looking provocation for future empirical studies.


LLM-Driven Learning Analytics Dashboard for Teachers in EFL Writing Education

arXiv.org Artificial Intelligence

This paper presents the development of a dashboard designed specifically for teachers in English as a Foreign Language (EFL) writing education. Leveraging LLMs, the dashboard facilitates the analysis of student interactions with an essay writing system, which integrates ChatGPT for real-time feedback. The dashboard aids teachers in monitoring student behavior, identifying noneducational interaction with ChatGPT, and aligning instructional strategies with learning objectives. By combining insights from NLP and Human-Computer Interaction (HCI), this study demonstrates how a human-centered approach can enhance the effectiveness of teacher dashboards, particularly in ChatGPT-integrated learning.


Shh, ChatGPT. That's a Secret.

The Atlantic - Technology

This past spring, a man in Washington State worried that his marriage was on the verge of collapse. "I am depressed and going a little crazy, still love her and want to win her back," he typed into ChatGPT. With the chatbot's help, he wanted to write a letter protesting her decision to file for divorce and post it to their bedroom door. "Emphasize my deep guilt, shame, and remorse for not nurturing and being a better husband, father, and provider," he wrote. In another message, he asked ChatGPT to write his wife a poem "so epic that it could make her change her mind but not cheesy or over the top." The man's chat history was included in the WildChat data set, a collection of 1 million ChatGPT conversations gathered consensually by researchers to document how people are interacting with the popular chatbot.


Conti Inc.: Understanding the Internal Discussions of a large Ransomware-as-a-Service Operator with Machine Learning

arXiv.org Artificial Intelligence

Ransomware-as-a-service (RaaS) is increasing the scale and complexity of ransomware attacks. Understanding the internal operations behind RaaS has been a challenge due to the illegality of such activities. The recent chat leak of the Conti RaaS operator, one of the most infamous ransomware operators on the international scene, offers a key opportunity to better understand the inner workings of such organizations. This paper analyzes the main topic discussions in the Conti chat leak using machine learning techniques such as Natural Language Processing (NLP) and Latent Dirichlet Allocation (LDA), as well as visualization strategies. Five discussion topics are found: 1) Business, 2) Technical, 3) Internal tasking/Management, 4) Malware, and 5) Customer Service/Problem Solving. Moreover, the distribution of topics among Conti members shows that only 4% of individuals have specialized discussions while almost all individuals (96%) are all-rounders, meaning that their discussions revolve around the five topics. The results also indicate that a significant proportion of Conti discussions are non-tech related. This study thus highlights that running such large RaaS operations requires a workforce skilled beyond technical abilities, with individuals involved in various tasks, from management to customer service or problem solving. The discussion topics also show that the organization behind the Conti RaaS oper5086933ator shares similarities with a large firm. We conclude that, although RaaS represents an example of specialization in the cybercrime industry, only a few members are specialized in one topic, while the rest runs and coordinates the RaaS operation.


CS1QA: A Dataset for Assisting Code-based Question Answering in an Introductory Programming Course

arXiv.org Artificial Intelligence

We introduce CS1QA, a dataset for code-based question answering in the programming education domain. CS1QA consists of 9,237 question-answer pairs gathered from chat logs in an introductory programming class using Python, and 17,698 unannotated chat data with code. Each question is accompanied with the student's code, and the portion of the code relevant to answering the question. We carefully design the annotation process to construct CS1QA, and analyze the collected dataset in detail. The tasks for CS1QA are to predict the question type, the relevant code snippet given the question and the code and retrieving an answer from the annotated corpus. Results for the experiments on several baseline models are reported and thoroughly analyzed. The tasks for CS1QA challenge models to understand both the code and natural language. This unique dataset can be used as a benchmark for source code comprehension and question answering in the educational setting.


What is federated learning?

#artificialintelligence

The Transform Technology Summits start October 13th with Low-Code/No Code: Enabling Enterprise Agility. One of the key challenges of machine learning is the need for large amounts of data. Gathering training datasets for machine learning models poses privacy, security, and processing risks that organizations would rather avoid. One technique that can help address some of these challenges is "federated learning." By distributing the training of models across user devices, federated learning makes it possible to take advantage of machine learning while minimizing the need to collect user data.


What is federated learning?

#artificialintelligence

One of the key challenges of machine learning is the need for large amounts of data. Gathering training datasets for machine learning models poses privacy, security, and processing risks that organizations would rather avoid. One technique that can help address some of these challenges is "federated learning." By distributing the training of models across user devices, federated learning makes it possible to take advantage of machine learning while minimizing the need to collect user data. The traditional process for developing machine learning applications is to gather a large dataset, train a model on the data, and run the trained model on a cloud server that users can reach through different applications such as web search, translation, text generation, and image processing.


Operationalising the data puddle

#artificialintelligence

I've put together a list of the data I want to record and analyse. I've also put together a checklist of the things I'll need to run the D&D campaign that will actually be generating all that beautiful data. Now I need to start operationalising this bad boy. First up, how am I actually going to export the data from all the sources I've identified? I'd rather not be messing around with data scraping, so would prefer (where possible) any tools I use to natively export to convenient file formats.


Top 15 Chatbot Datasets for NLP Projects

#artificialintelligence

An effective chatbot requires a massive amount of training data in order to quickly solve user inquiries without human intervention. However, the primary bottleneck in chatbot development is obtaining realistic, task-oriented dialog data to train these machine learning-based systems. We've put together the ultimate list of the best conversational datasets to train a chatbot, broken down into question-answer data, customer support data, dialogue data and multilingual data. Question-Answer Dataset: This corpus includes Wikipedia articles, manually-generated factoid questions from them, and manually-generated answers to these questions, for use in academic research. The WikiQA Corpus: A publicly available set of question and sentence pairs, collected and annotated for research on open-domain question answering.