AITopics

2502.1349

Genre: Research Report > New Finding (0.67)

Industry:

Health & Medicine (0.68)
Law (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceFeb-19-2025

ThinkGuard: Deliberative Slow Thinking Leads to Cautious Guardrails

Wen, Xiaofei, Zhou, Wenxuan, Mo, Wenjie Jacky, Chen, Muhao

Ensuring the safety of large language models (LLMs) is critical as they are deployed in real-world applications. Existing guardrails rely on rule-based filtering or single-pass classification, limiting their ability to handle nuanced safety violations. To address this, we propose ThinkGuard, a critique-augmented guardrail model that distills knowledge from high-capacity LLMs by generating structured critiques alongside safety labels. Fine-tuned on critique-augmented data, the captured deliberative thinking ability drastically enhances the guardrail's cautiousness and interpretability. Evaluated on multiple safety benchmarks, ThinkGuard achieves the highest average F1 and AUPRC, outperforming all baselines. Compared to LLaMA Guard 3, ThinkGuard improves accuracy by 16.1% and macro F1 by 27.0%. Moreover, it surpasses label-only fine-tuned models, confirming that structured critiques enhance both classification precision and nuanced safety reasoning while maintaining computational efficiency.

category, guardrail model, language model, (14 more...)

2502.13458

Country:

Europe > Austria > Vienna (0.14)
Asia > Singapore (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(10 more...)

Genre: Research Report (0.82)

Industry:

Law (1.00)
Information Technology (0.67)
Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

BBC NewsFeb-18-2025, 12:15:05 GMT

'Hopeless' to potentially handy: law firm puts AI to the test

This was the second time Linklaters had run its LinksAI benchmark tests, with the original exercise taking place in October 2023. In the first run, OpenAI's GPT 2, 3 and 4 were tested alongside Google's Bard. The exam has now been expanded to include o1, from OpenAI, and Google's Gemini 2.0, which was also released at the end of 2024. It did not involve DeepSeek's R1 - the apparently low cost Chinese model which astonished the world last month - or any other non-US AI tool. The test involved posing the type of questions which would require advice from a "competent mid-level lawyer" with two years' experience.

large language model, lawyer, machine learning, (10 more...)

BBC News

Industry: Law (0.99)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.51)

Al JazeeraFeb-18-2025, 09:27:42 GMT

AI: who's responsible for children's safety?

Megan Gracia says her teenage son, Sewell Setzer III, died by suicide after developing a harmful attachment to an AI companion chatbot. She has filed a lawsuit against Character.AI, accusing the company of negligence. In this episode of Now You Know, Megan looks back on some of the warning signs other parents might find useful as they navigate this digital age. We also speak to one of her lawyers, Meetali Jain, about this unique case.

artificial intelligence, natural language, safety, (1 more...)

Al Jazeera

Industry: Law > Litigation (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.37)

Agentic Deep Graph Reasoning Yields Self-Organizing Knowledge Networks

Buehler, Markus J.

infrastructure design biodegradable microplastic material, large language model, machine learning, (21 more...)

We present an agentic, autonomous graph expansion framework that iteratively structures and refines knowledge in situ. Unlike conventional knowledge graph construction methods relying on static extraction or single-pass learning, our approach couples a reasoning-native large language model with a continually updated graph representation. At each step, the system actively generates new concepts and relationships, merges them into a global graph, and formulates subsequent prompts based on its evolving structure. Through this feedback-driven loop, the model organizes information into a scale-free network characterized by hub formation, stable modularity, and bridging nodes that link disparate knowledge clusters. Over hundreds of iterations, new nodes and edges continue to appear without saturating, while centrality measures and shortest path distributions evolve to yield increasingly distributed connectivity. Our analysis reveals emergent patterns, such as the rise of highly connected 'hub' concepts and the shifting influence of 'bridge' nodes, indicating that agentic, self-reinforcing graph construction can yield open-ended, coherent knowledge structures. Applied to materials design problems, we present compositional reasoning experiments by extracting node-specific and synergy-level principles to foster genuinely novel knowledge synthesis, yielding cross-domain ideas that transcend rote summarization and strengthen the framework's potential for open-ended scientific discovery. We discuss other applications in scientific discovery and outline future directions for enhancing scalability and interpretability.

2502.13025

Country: North America > United States > Massachusetts (0.27)

Genre:

Research Report > Promising Solution (1.00)
Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Workflow (0.87)

Industry:

Materials > Construction Materials (1.00)
Health & Medicine > Therapeutic Area > Genetic Disease (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
(10 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(2 more...)

Political Neutrality in AI is Impossible- But Here is How to Approximate it

Fisher, Jillian, Appel, Ruth E., Park, Chan Young, Potter, Yujin, Jiang, Liwei, Sorensen, Taylor, Feng, Shangbin, Tsvetkov, Yulia, Roberts, Margaret E., Pan, Jennifer, Song, Dawn, Choi, Yejin

AI systems often exhibit political bias, influencing users' opinions and decision-making. While political neutrality-defined as the absence of bias-is often seen as an ideal solution for fairness and safety, this position paper argues that true political neutrality is neither feasible nor universally desirable due to its subjective nature and the biases inherent in AI training data, algorithms, and user interactions. However, inspired by Joseph Raz's philosophical insight that "neutrality [...] can be a matter of degree" (Raz, 1986), we argue that striving for some neutrality remains essential for promoting balanced AI interactions and mitigating user manipulation. Therefore, we use the term "approximation" of political neutrality to shift the focus from unattainable absolutes to achievable, practical proxies. We propose eight techniques for approximating neutrality across three levels of conceptualizing AI, examining their trade-offs and implementation strategies. In addition, we explore two concrete applications of these approximations to illustrate their practicality. Finally, we assess our framework on current large language models (LLMs) at the output level, providing a demonstration of how it can be evaluated. This work seeks to advance nuanced discussions of political neutrality in AI and promote the development of responsible, aligned language models.

biden, no-knock warrant, political neutrality, (11 more...)

2503.05728

Country:

Asia > China (0.46)
Asia > Russia (0.46)
North America > United States > California > Alameda County > Berkeley (0.14)
(23 more...)

Genre:

Research Report (1.00)
Overview (1.00)

Industry:

Media > News (1.00)
Law > International Law (1.00)
Law > Government & the Courts (1.00)
(15 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Kharinaev, Artyom, Moskvoretskii, Viktor, Shvetsov, Egor, Studenikina, Kseniia, Mikhail, Bykov, Burnaev, Evgeny

Investigating the Impact of Quantization Methods on the Safety and Reliability of Large Language Models

Large Language Models (LLMs) have emerged as powerful tools for addressing modern challenges and enabling practical applications. However, their computational expense remains a significant barrier to widespread adoption. Quantization has emerged as a promising technique to democratize access and enable low resource device deployment. Despite these advancements, the safety and trustworthiness of quantized models remain underexplored, as prior studies often overlook contemporary architectures and rely on overly simplistic benchmarks and evaluations. To address this gap, we introduce OpenSafetyMini, a novel open-ended safety dataset designed to better distinguish between models. We evaluate 4 state-of-the-art quantization techniques across LLaMA and Mistral models using 4 benchmarks, including human evaluations. Our findings reveal that the optimal quantization method varies for 4-bit precision, while vector quantization techniques deliver the best safety and trustworthiness performance at 2-bit precision, providing foundation for future research.

evaluation, language model, quantization, (15 more...)

2502.15799

Country:

North America > United States (0.04)
North America > Canada > Ontario > Toronto (0.04)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine (0.68)
Law (0.46)
Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Baroud, Ibrahim, Raithel, Lisa, Möller, Sebastian, Roller, Roland

Beyond De-Identification: A Structured Approach for Defining and Detecting Indirect Identifiers in Medical Texts

Sharing sensitive texts for scientific purposes requires appropriate techniques to protect the privacy of patients and healthcare personnel. Anonymizing textual data is particularly challenging due to the presence of diverse unstructured direct and indirect identifiers. To mitigate the risk of re-identification, this work introduces a schema of nine categories of indirect identifiers designed to account for different potential adversaries, including acquaintances, family members and medical staff. Using this schema, we annotate 100 MIMIC-III discharge summaries and propose baseline models for identifying indirect identifiers. We will release the annotation guidelines, annotation spans (6,199 annotations in total) and the corresponding MIMIC-III document IDs to support further research in this area.

category, identifier, information, (15 more...)

2502.13342

Country:

North America > United States > Georgia > Fulton County > Atlanta (0.04)
North America > Montserrat (0.04)
Europe > Spain (0.04)
Asia > Middle East > Israel (0.04)

Genre: Research Report (0.64)

Industry:

Law (1.00)
Health & Medicine > Health Care Providers & Services (0.94)
Information Technology > Security & Privacy (0.68)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Mann, Sebastian Porsdam, Jiehao, Joel Seah, Latham, Stephen R., Savulescu, Julian, Aboy, Mateo, Earp, Brian D.

Development of Application-Specific Large Language Models to Facilitate Research Ethics Review

Institutional review boards (IRBs) play a crucial role in ensuring the ethical conduct of human subjects research, but face challenges including inconsistency, delays, and inefficiencies. We propose the development and implementation of application-specific large language models (LLMs) to facilitate IRB review processes. These IRB-specific LLMs would be fine-tuned on IRB-specific literature and institutional datasets, and equipped with retrieval capabilities to access up-to-date, context-relevant information. We outline potential applications, including pre-review screening, preliminary analysis, consistency checking, and decision support. While addressing concerns about accuracy, context sensitivity, and human oversight, we acknowledge remaining challenges such as over-reliance on AI and the need for transparency. By enhancing the efficiency and quality of ethical review while maintaining human judgment in critical decisions, IRB-specific LLMs offer a promising tool to improve research oversight. We call for pilot studies to evaluate the feasibility and impact of this approach.

large language model, machine learning, natural language, (21 more...)

2501.10741

Country:

Europe > United Kingdom (0.68)
North America > United States (0.46)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.88)

Industry:

Law (1.00)
Health & Medicine (1.00)
Government (0.93)
Information Technology > Security & Privacy (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

FRAME: Boosting LLMs with A Four-Quadrant Multi-Stage Pretraining Strategy

Zhang, Xuemiao, Duan, Feiyu, Xu, Liangyu, Zhou, Yongwei, Wang, Sirui, Weng, Rongxiang, Wang, Jingang, Cai, Xunliang

Large language models (LLMs) have significantly advanced human language understanding and generation, with pretraining data quality and organization being crucial to their performance. Multi-stage pretraining is a promising approach, but existing methods often lack quantitative criteria for data partitioning and instead rely on intuitive heuristics. In this paper, we propose the novel Four-quadRAnt Multi-stage prEtraining strategy (FRAME), guided by the established principle of organizing the pretraining process into four stages to achieve significant loss reductions four times. This principle is grounded in two key findings: first, training on high Perplexity (PPL) data followed by low PPL data, and second, training on low PPL difference (PD) data followed by high PD data, both causing the loss to drop significantly twice and performance enhancements. By partitioning data into four quadrants and strategically organizing them, FRAME achieves a remarkable 16.8% average improvement over random across MMLU and CMMLU for the 3B model, effectively boosting LLM performance.

large language model, natural language, ppl, (18 more...)

2502.05551

Country:

Africa (0.68)
Asia > India (0.67)

Genre: Research Report > New Finding (0.67)

Industry:

Government (1.00)
Law (0.92)
Energy (0.84)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)