Goto

Collaborating Authors

 Generative AI


Generative AI and Its Impact on Personalized Intelligent Tutoring Systems

arXiv.org Artificial Intelligence

Generative Artificial Intelligence (AI) is revolutionizing educational technology by enabling highly personalized and adaptive learning environments within Intelligent Tutoring Systems (ITS). This report delves into the integration of Generative AI, particularly large language models (LLMs) like GPT-4, into ITS to enhance personalized education through dynamic content generation, real-time feedback, and adaptive learning pathways. We explore key applications such as automated question generation, customized feedback mechanisms, and interactive dialogue systems that respond to individual learner needs. The report also addresses significant challenges, including ensuring pedagogical accuracy, mitigating inherent biases in AI models, and maintaining learner engagement. Future directions highlight the potential advancements in multimodal AI integration, emotional intelligence in tutoring systems, and the ethical implications of AI-driven education. By synthesizing current research and practical implementations, this report underscores the transformative potential of Generative AI in creating more effective, equitable, and engaging educational experiences.


JurEE not Judges: safeguarding llm interactions with small, specialised Encoder Ensembles

arXiv.org Artificial Intelligence

We introduce JurEE, an ensemble of efficient, encoder-only transformer models designed to strengthen safeguards in AI-User interactions within LLM-based systems. Unlike existing LLM-as-Judge methods, which often struggle with generalization across risk taxonomies and only provide textual outputs, JurEE offers probabilistic risk estimates across a wide range of prevalent risks. Our approach leverages diverse data sources and employs progressive synthetic data generation techniques, including LLM-assisted augmentation, to enhance model robustness and performance. We create an in-house benchmark comprising of other reputable benchmarks such as the OpenAI Moderation Dataset and ToxicChat, where we find JurEE significantly outperforms baseline models, demonstrating superior accuracy, speed, and cost-efficiency. This makes it particularly suitable for applications requiring stringent content moderation, such as customer-facing chatbots. The encoder-ensemble's modular design allows users to set tailored risk thresholds, enhancing its versatility across various safety-related applications. JurEE's collective decision-making process, where each specialized encoder model contributes to the final output, not only improves predictive accuracy but also enhances interpretability. This approach provides a more efficient, performant, and economical alternative to traditional LLMs for large-scale implementations requiring robust content moderation.


Performance in a dialectal profiling task of LLMs for varieties of Brazilian Portuguese

arXiv.org Artificial Intelligence

Advances in generative AI have enabled near-human responses, crucial for overcoming the Turing test Danziger [2018]. However, achieving this requires algorithms to replicate ethically questionable human behaviors, including biases learned by large language models (LLMs) Freitag [2021]. Biases can be explicit, consciously manipulated, or implicit, operating unconsciously through automatic associations. These biases affect generative AI in two key areas: the rules and filters applied during LLM fine-tuning, and the linguistic datasets used for training. However, the specifics of these biases--whether in rules, filters, or dataset selection--remain unclear Bender et al. [2021].


On The Planning Abilities of OpenAI's o1 Models: Feasibility, Optimality, and Generalizability

arXiv.org Artificial Intelligence

Recent advancements in Large Language Models (LLMs) have showcased their ability to perform complex reasoning tasks, but their effectiveness in planning remains underexplored. In this study, we evaluate the planning capabilities of OpenAI's o1 models across a variety of benchmark tasks, focusing on three key aspects: feasibility, optimality, and generalizability. Through empirical evaluations on constraint-heavy tasks (e.g., $\textit{Barman}$, $\textit{Tyreworld}$) and spatially complex environments (e.g., $\textit{Termes}$, $\textit{Floortile}$), we highlight o1-preview's strengths in self-evaluation and constraint-following, while also identifying bottlenecks in decision-making and memory management, particularly in tasks requiring robust spatial reasoning. Our results reveal that o1-preview outperforms GPT-4 in adhering to task constraints and managing state transitions in structured environments. However, the model often generates suboptimal solutions with redundant actions and struggles to generalize effectively in spatially complex tasks. This pilot study provides foundational insights into the planning limitations of LLMs, offering key directions for future research on improving memory management, decision-making, and generalization in LLM-based planning. Code available at https://github.com/VITA-Group/o1-planning.


Pig Butchering Scams Are Going High Tech

WIRED

As digital scamming explodes in Southeast Asia, including so called "pig butchering" investment scams, the United Nations Office on Drugs and Crime (UNODC) issued a comprehensive report this week with a dire warning about the rapid growth of this criminal ecosystem. Many digital scams have traditionally relied on social engineering, or tricking victims into giving away their money willingly, rather than leaning on malware or other highly technical methods. But researchers have increasingly sounded the alarm that scammers are incorporating generative AI content and deepfakes to expand the scale and effectiveness of their operations. And the UN report offers the clearest evidence yet that these high tech tools are turning an already urgent situation into a crisis. In addition to buying written scripts to use with potential victims or relying on templates for malicious websites, attackers have increasingly been leaning on generative AI platforms to create communication content in multiple languages and deepfake generators that can create photos or even video of nonexistent people to show victims and enhance verisimilitude.


Multiple Ships Cooperative Navigation and Collision Avoidance using Multi-agent Reinforcement Learning with Communication

arXiv.org Artificial Intelligence

In the real world, unmanned surface vehicles (USV) often need to coordinate with each other to accomplish specific tasks. However, achieving cooperative control in multi-agent systems is challenging due to issues such as non-stationarity and partial observability. Recent advancements in Multi-Agent Reinforcement Learning (MARL) provide new perspectives to address these challenges. Therefore, we propose using the multi-agent deep deterministic policy gradient (MADDPG) algorithm with communication to address multiple ships' cooperation problems under partial observability. We developed two tasks based on OpenAI's gym environment: cooperative navigation and cooperative collision avoidance. In these tasks, ships must not only learn effective control strategies but also establish communication protocols with other agents. We analyze the impact of external noise on communication, the effect of inter-agent communication on performance, and the communication patterns learned by the agents. The results demonstrate that our proposed framework effectively addresses cooperative navigation and collision avoidance among multiple vessels, significantly outperforming traditional single-agent algorithms. Agents establish a consistent communication protocol, enabling them to compensate for missing information through shared observations and achieve better coordination.


The Future of Learning in the Age of Generative AI: Automated Question Generation and Assessment with Large Language Models

arXiv.org Artificial Intelligence

In recent years, large language models (LLMs) and generative AI have revolutionized natural language processing (NLP), offering unprecedented capabilities in education. This chapter explores the transformative potential of LLMs in automated question generation and answer assessment. It begins by examining the mechanisms behind LLMs, emphasizing their ability to comprehend and generate human-like text. The chapter then discusses methodologies for creating diverse, contextually relevant questions, enhancing learning through tailored, adaptive strategies. Key prompting techniques, such as zero-shot and chain-of-thought prompting, are evaluated for their effectiveness in generating high-quality questions, including open-ended and multiple-choice formats in various languages. Advanced NLP methods like fine-tuning and prompt-tuning are explored for their role in generating task-specific questions, despite associated costs. The chapter also covers the human evaluation of generated questions, highlighting quality variations across different methods and areas for improvement. Furthermore, it delves into automated answer assessment, demonstrating how LLMs can accurately evaluate responses, provide constructive feedback, and identify nuanced understanding or misconceptions. Examples illustrate both successful assessments and areas needing improvement. The discussion underscores the potential of LLMs to replace costly, time-consuming human assessments when appropriately guided, showcasing their advanced understanding and reasoning capabilities in streamlining educational processes.


OpenR: An Open Source Framework for Advanced Reasoning with Large Language Models

arXiv.org Artificial Intelligence

In this technical report, we introduce OpenR, an open-source framework designed to integrate key components for enhancing the reasoning capabilities of large language models (LLMs). OpenR unifies data acquisition, reinforcement learning training (both online and offline), and non-autoregressive decoding into a cohesive software platform. Our goal is to establish an open-source platform and community to accelerate the development of LLM reasoning. Inspired by the success of OpenAI's o1 model, which demonstrated improved reasoning abilities through step-by-step reasoning and reinforcement learning, OpenR integrates test-time compute, reinforcement learning, and process supervision to improve reasoning in LLMs. Our work is the first to provide an open-source framework that explores the core techniques of OpenAI's o1 model with reinforcement learning, achieving advanced reasoning capabilities beyond traditional autoregressive methods. We demonstrate the efficacy of OpenR by evaluating it on the MATH dataset, utilising publicly available data and search methods. Our initial experiments confirm substantial gains, with relative improvements in reasoning and performance driven by test-time computation and reinforcement learning through process reward models. The OpenR framework, including code, models, and datasets, is accessible at https://openreasoner.github.io.


CogView2: Faster and Better Text-to-Image Generation via Hierarchical Transformers

Neural Information Processing Systems

Development of transformer-based text-to-image models is impeded by its slow generation and complexity, for high-resolution images. In this work, we put forward a solution based on hierarchical transformers and local parallel autoregressive generation. The new text-to-image system, CogView2, shows very competitive generation compared to concurrent state-of-the-art DALL-E-2, and naturally supports interactive text-guided editing on images.


PettingZoo: Gym for Multi-Agent Reinforcement Learning

Neural Information Processing Systems

This paper introduces the PettingZoo library and the accompanying Agent Environment Cycle ("AEC") games model. PettingZoo is a library of diverse sets of multi-agent environments with a universal, elegant Python API. PettingZoo was developed with the goal of accelerating research in Multi-Agent Reinforcement Learning ("MARL"), by making work more interchangeable, accessible and reproducible akin to what OpenAI's Gym library did for single-agent reinforcement learning. PettingZoo's API, while inheriting many features of Gym, is unique amongst MARL APIs in that it's based around the novel AEC games model. We argue, in part through case studies on major problems in popular MARL environments, that the popular game models are poor conceptual models of the games commonly used with MARL, that they promote severe bugs that are hard to detect, and that the AEC games model addresses these problems.