Goto

Collaborating Authors

 real-world example






Investigating Pedagogical Teacher and Student LLM Agents: Genetic Adaptation Meets Retrieval Augmented Generation Across Learning Style

arXiv.org Artificial Intelligence

Effective teaching requires adapting instructional strategies to accommodate the diverse cognitive and behavioral profiles of students, a persistent challenge in education and teacher training. While Large Language Models (LLMs) offer promise as tools to simulate such complex pedagogical environments, current simulation frameworks are limited in two key respects: (1) they often reduce students to static knowledge profiles, and (2) they lack adaptive mechanisms for modeling teachers who evolve their strategies in response to student feedback. To address these gaps, \textbf{we introduce a novel simulation framework that integrates LLM-based heterogeneous student agents with a self-optimizing teacher agent}. The teacher agent's pedagogical policy is dynamically evolved using a genetic algorithm, allowing it to discover and refine effective teaching strategies based on the aggregate performance of diverse learners. In addition, \textbf{we propose Persona-RAG}, a Retrieval Augmented Generation module that enables student agents to retrieve knowledge tailored to their individual learning styles. Persona-RAG preserves the retrieval accuracy of standard RAG baselines while enhancing personalization, an essential factor in modeling realistic educational scenarios. Through extensive experiments, we demonstrate how our framework supports the emergence of distinct and interpretable teaching patterns when interacting with varied student populations. Our results highlight the potential of LLM-driven simulations to inform adaptive teaching practices and provide a testbed for training human educators in controlled, data-driven environments.


H-CoT: Hijacking the Chain-of-Thought Safety Reasoning Mechanism to Jailbreak Large Reasoning Models, Including OpenAI o1/o3, DeepSeek-R1, and Gemini 2.0 Flash Thinking

arXiv.org Artificial Intelligence

Warning: This paper contains potentially offensive and harmful text. Large Reasoning Models (LRMs) have recently extended their powerful reasoning capabilities to safety checks--using chain-of-thought reasoning to decide whether a request should be answered. While this new approach offers a promising route for balancing model utility and safety, its robustness remains underexplored. To address this gap, we introduce Malicious-Educator, a benchmark that disguises extremely dangerous or malicious requests beneath seemingly legitimate educational prompts. Our experiments reveal severe security flaws in popular commercial-grade LRMs, including OpenAI o1/o3, DeepSeek-R1, and Gemini 2.0 Flash Thinking. For instance, although OpenAI's o1 model initially maintains a high refusal rate of about 98%, subsequent model updates significantly compromise its safety; and attackers can easily extract criminal strategies from DeepSeek-R1 and Gemini 2.0 Flash Thinking without any additional tricks. To further highlight these vulnerabilities, we propose Hฤณacking Chain-of-Thought (H-CoT), a universal and transferable attack method that leverages the model's own displayed intermediate reasoning to jailbreak its safety reasoning mechanism. Under H-CoT, refusal rates sharply decline--dropping from 98% to below 2%--and, in some instances, even transform initially cautious tones into ones that are willing to provide harmful content. We hope these findings underscore the urgent need for more robust safety mechanisms to preserve the benefits of advanced reasoning capabilities without compromising ethical standards.


Jailbreaking Large Language Models with Symbolic Mathematics

arXiv.org Artificial Intelligence

Recent advancements in AI safety have led to increased efforts in training and red-teaming large language models (LLMs) to mitigate unsafe content generation. However, these safety mechanisms may not be comprehensive, leaving potential vulnerabilities unexplored. This paper introduces MathPrompt, a novel jailbreaking technique that exploits LLMs' advanced capabilities in symbolic mathematics to bypass their safety mechanisms. By encoding harmful natural language prompts into mathematical problems, we demonstrate a critical vulnerability in current AI safety measures. Our experiments across 13 state-of-the-art LLMs reveal an average attack success rate of 73.6\%, highlighting the inability of existing safety training mechanisms to generalize to mathematically encoded inputs. Analysis of embedding vectors shows a substantial semantic shift between original and encoded prompts, helping explain the attack's success. This work emphasizes the importance of a holistic approach to AI safety, calling for expanded red-teaming efforts to develop robust safeguards across all potential input types and their associated risks.


6 Crucial Considerations for MLOps Success

#artificialintelligence

Interest in AI / ML is exploding, but these new techniques and technologies present some unique challenges that can result in suboptimal results if not addressed correctly. Dysfunctional AI / ML efforts can be characterized by high costs, an inability to scale, and slow or unnecessarily limited outcomes -- but it doesn't have to be that way. In a recent webinar, MLOps in Action: Real-World Examples for Establishing Best Practices, the Maven Wave / Atos team delivered a comprehensive look at how to diagnose problems and improve on the AI / ML efforts by focusing on ten facets in an MLOps assessment. During the discussion, six takeaways emerged that illuminate what to expect from an MLOps approach and how to best proceed. A common problem with any new technology is the wishful thinking that it will be a panacea for whatever challenges the enterprise faces.


7 Real-World Examples of Machine Learning in Current Times - Wequity

#artificialintelligence

Machine Learning has been around since the early days of computer science and has gained notable traction as more & more people begin to realize how advanced it's becoming. Today, Machine Learning algorithms apply to various fields, including some of the most common problems. As per Statista, the most wide-scale application of AI & ML in 2021 lies in enhancing the customer experience with a popularity of 57%. It gets followed by'generating customer insights' with 50% favor. AI & ML remains at the top of the most disruptive technologies worldwide.


7 Real-World Examples of Machine Learning in Current Times

#artificialintelligence

Machine Learning has been around since the early days of computer science and has gained notable traction as more & more people begin to realize how advanced it's becoming. Today, Machine Learning algorithms apply to various fields, including some of the most common problems. As per Statista, the most wide-scale application of AI & ML in 2021 lies in enhancing the customer experience with a popularity of 57%. It gets followed by'generating customer insights' with 50% favor. AI & ML remains at the top of the most disruptive technologies worldwide.