AITopics

2507.08621

Country:

Europe (0.28)
North America > United States (0.28)
Asia > Middle East (0.28)

Genre: Research Report > New Finding (0.68)

Industry:

Law (1.00)
Government (1.00)
Energy (1.00)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Przystalski, Karol, Argasiński, Jan K., Grabska-Gradzińska, Iwona, Ochab, Jeremi K.

Stylometry recognizes human and LLM-generated texts in short samples

arXiv.org Artificial IntelligenceJul-25-2025

The paper explores stylometry as a method to distinguish between texts created by Large Language Models (LLMs) and humans, addressing issues of model attribution, intellectual property, and ethical AI use. Stylometry has been used extensively to characterise the style and attribute authorship of texts. By applying it to LLM-generated texts, we identify their emergent writing patterns. The paper involves creating a benchmark dataset based on Wikipedia, with (a) human-written term summaries, (b) texts generated purely by LLMs (GPT-3.5/4, LLaMa 2/3, Orca, and Falcon), (c) processed through multiple text summarisation methods (T5, BART, Gensim, and Sumy), and (d) rephrasing methods (Dipper, T5). The 10-sentence long texts were classified by tree-based models (decision trees and LightGBM) using human-designed (StyloMetrix) and n-gram-based (our own pipeline) stylometric features that encode lexical, grammatical, syntactic, and punctuation patterns. The cross-validated results reached a performance of up to .87 Matthews correlation coefficient in the multiclass scenario with 7 classes, and accuracy between .79 and 1. in binary classification, with the particular example of Wikipedia and GPT-4 reaching up to .98 accuracy on a balanced dataset. Shapley Additive Explanations pinpointed features characteristic of the encyclopaedic text type, individual overused words, as well as a greater grammatical standardisation of LLMs with respect to human-written texts. These results show -- crucially, in the context of the increasingly sophisticated LLMs -- that it is possible to distinguish machine- from human-generated texts at least for a well-defined text type.

classification, large language model, machine learning, (18 more...)

doi: 10.1016/j.eswa.2025.129001

2507.00838

Country:

Asia (0.93)
North America > United States (0.28)
Europe > Ukraine > Sumy Oblast > Sumy (0.25)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)
Government (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Rotolo, Antonino, Ferrigno, Beatrice, Godinez, Jose Miguel Angel Garcia, Novelli, Claudio, Sartor, Giovanni

Foundations for Risk Assessment of AI in Protecting Fundamental Rights

arXiv.org Artificial IntelligenceJul-25-2025

This chapter introduces a conceptual framework for qualitative risk assessment of AI, particularly in the context of the EU AI Act. The framework addresses the complexities of legal compliance and fundamental rights protection by itegrating definitional balancing and defeasible reasoning. Definitional balancing employs proportionality analysis to resolve conflicts between competing rights, while defeasible reasoning accommodates the dynamic nature of legal decision-making. Our approach stresses the need for an analysis of AI deployment scenarios and for identifying potential legal violations and multi-layered impacts on fundamental rights. On the basis of this analysis, we provide philosophical foundations for a logical account of AI risk analysis. In particular, we consider the basic building blocks for conceptually grasping the interaction between AI deployment scenarios and fundamental rights, incorporating in defeasible reasoning definitional balancing and arguments about the contextual promotion or demotion of rights. This layered approach allows for more operative models of assessment of both high-risk AI systems and General Purpose AI (GPAI) systems, emphasizing the broader applicability of the latter. Future work aims to develop a formal model and effective algorithms to enhance AI risk assessment, bridging theoretical insights with practical applications to support responsible AI governance.

artificial intelligence, fundamental rights, rights, (17 more...)

2507.1829

Country:

Europe (1.00)
North America > United States (0.93)

Genre: Research Report (0.64)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (1.00)
Information Technology > Artificial Intelligence > Applied AI (1.00)

MIT Technology ReviewJul-24-2025, 18:59:09 GMT

America's AI watchdog is losing its bite

It found that the security giant Evolv lied about the accuracy of its AI-powered security checkpoints, which are used in stadiums and schools but failed to catch a seven-inch knife that was ultimately used to stab a student. It went after the facial recognition company Intellivision, saying the company made unfounded claims that its tools operated without gender or racial bias. It fined startups promising bogus "AI lawyer" services and one that sold fake product reviews generated with AI. These actions did not result in fines that crippled the companies, but they did stop them from making false statements and offered customers ways to recover their money or get out of contracts. In each case, the FTC found, everyday people had been harmed by AI companies that let their technologies run amok.

ai watchdog, artificial intelligence, white house, (8 more...)

MIT Technology Review

Country:

North America > United States (1.00)
Asia > China (0.06)

Industry:

Law (1.00)
Government > Regional Government > North America Government > United States Government (1.00)

Technology: Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.40)

Simulating multiple human perspectives in socio-ecological systems using large language models

Zeng, Yongchao, Brown, Calum, Kyriakou, Ioannis, Hotz, Ronja, Rounsevell, Mark

Understanding socio-ecological systems requires insights from diverse stakeholder perspectives, which are often hard to access. To enable alternative, simulation-based exploration of different stakeholder perspectives, we develop the HoPeS (Human-Oriented Perspective Shifting) modelling framework. HoPeS employs agents powered by large language models (LLMs) to represent various stakeholders; users can step into the agent roles to experience perspectival differences. A simulation protocol serves as a "scaffold" to streamline multiple perspective-taking simulations, supporting users in reflecting on, transitioning between, and integrating across perspectives. A prototype system is developed to demonstrate HoPeS in the context of institutional dynamics and land use change, enabling both narrative-driven and numerical experiments. In an illustrative experiment, a user successively adopts the perspectives of a system observer and a researcher - a role that analyses data from the embedded land use model to inform evidence-based decision-making for other LLM agents representing various institutions. Despite the user's effort to recommend technically sound policies, discrepancies persist between the policy recommendation and implementation due to stakeholders' competing advocacies, mirroring real-world misalignment between researcher and policymaker perspectives. The user's reflection highlights the subjective feelings of frustration and disappointment as a researcher, especially due to the challenge of maintaining political neutrality while attempting to gain political influence. Despite this, the user exhibits high motivation to experiment with alternative narrative framing strategies, suggesting the system's potential in exploring different perspectives. Further system and protocol refinement are likely to enable new forms of interdisciplinary collaboration in socio-ecological simulations.

large language model, machine learning, simulation, (18 more...)

2507.1768

Country: Europe (1.00)

Genre:

Research Report (1.00)
Overview (1.00)

Industry:

Law (1.00)
Health & Medicine (1.00)
Government (1.00)
Leisure & Entertainment > Games > Computer Games (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Mapping Industry Practices to the EU AI Act's GPAI Code of Practice Safety and Security Measures

Stelling, Lily, Yang, Mick, Gipiškis, Rokas, Staufer, Leon, Chin, Ze Shen, Campos, Siméon, Gil, Ariel, Chen, Michael

This report provides a detailed comparison between the Safety and Security measures proposed in the EU AI Act's General-Purpose AI (GPAI) Code of Practice (Third Draft) and the current commitments and practices voluntarily adopted by leading AI companies. As the EU moves toward enforcing binding obligations for GPAI model providers, the Code of Practice will be key for bridging legal requirements with concrete technical commitments. Our analysis focuses on the draft's Safety and Security section (Commitments II.1-II.16), documenting excerpts from current public-facing documents that are relevant to each individual measure. We systematically reviewed different document types, such as companies' frontier safety frameworks and model cards, from over a dozen companies, including OpenAI, Anthropic, Google DeepMind, Microsoft, Meta, Amazon, and others. This report is not meant to be an indication of legal compliance, nor does it take any prescriptive viewpoint about the Code of Practice or companies' policies. Instead, it aims to inform the ongoing dialogue between regulators and General-Purpose AI model providers by surfacing evidence of industry precedent for various measures. Nonetheless, we were able to find relevant quotes from at least 5 companies' documents for the majority of the measures in Commitments II.1-II.16.

large language model, machine learning, natural language, (23 more...)

2504.15181

Country:

North America > United States (0.67)
Europe (0.45)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.67)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Law (1.00)
Information Technology > Security & Privacy (1.00)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.36)

Investigating Training Data Detection in AI Coders

Li, Tianlin, Wei, Yunxiang, Li, Zhiming, Liu, Aishan, Guo, Qing, Liu, Xianglong, Sun, Dongning, Liu, Yang

Recent advances in code large language models (CodeLLMs) have made them indispensable tools in modern software engineering. However, these models occasionally produce outputs that contain proprietary or sensitive code snippets, raising concerns about potential non-compliant use of training data, and posing risks to privacy and intellectual property. To ensure responsible and compliant deployment of CodeLLMs, training data detection (TDD) has become a critical task. While recent TDD methods have shown promise in natural language settings, their effectiveness on code data remains largely underexplored. This gap is particularly important given code's structured syntax and distinct similarity criteria compared to natural language. To address this, we conduct a comprehensive empirical study of seven state-of-the-art TDD methods on source code data, evaluating their performance across eight CodeLLMs. To support this evaluation, we introduce CodeSnitch, a function-level benchmark dataset comprising 9,000 code samples in three programming languages, each explicitly labeled as either included or excluded from CodeLLM training. Beyond evaluation on the original CodeSnitch, we design targeted mutation strategies to test the robustness of TDD methods under three distinct settings. These mutation strategies are grounded in the well-established Type-1 to Type-4 code clone detection taxonomy. Our study provides a systematic assessment of current TDD techniques for code and offers insights to guide the development of more effective and robust detection methods in the future.

large language model, machine learning, natural language, (18 more...)

2507.17389

Country: Asia (0.14)

Genre: Research Report > New Finding (0.93)

Industry:

Information Technology > Security & Privacy (0.87)
Law (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Lee, Jeongeun, Yu, Youngjae, Lee, Dongha

HIPPO-Video: Simulating Watch Histories with Large Language Models for Personalized Video Highlighting

The exponential growth of video content has made personalized video highlighting an essential task, as user preferences are highly variable and complex. Existing video datasets, however, often lack personalization, relying on isolated videos or simple text queries that fail to capture the intricacies of user behavior. In this work, we introduce HIPPO-Video, a novel dataset for personalized video highlighting, created using an LLM-based user simulator to generate realistic watch histories reflecting diverse user preferences. The dataset includes 2,040 (watch history, saliency score) pairs, covering 20,400 videos across 170 semantic categories. To validate our dataset, we propose HiPHer, a method that leverages these personalized watch histories to predict preference-conditioned segment-wise saliency scores. Through extensive experiments, we demonstrate that our method outperforms existing generic and query-based approaches, showcasing its potential for highly user-centric video highlighting in real-world scenarios.

large language model, machine learning, natural language, (18 more...)

2507.16873

Country: Europe (0.46)

Genre: Research Report > New Finding (0.67)

Industry:

Law (0.93)
Leisure & Entertainment (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

WIREDJul-23-2025, 22:11:07 GMT

Trump Says He's 'Getting Rid of Woke' and Dismisses Copyright Concerns in AI Policy Speech

"You can't be expected to have a successful AI program when every single article, book or anything else that you've read or studied, you're supposed to pay for," Trump said. "We appreciate that, but just can't do it-- because it's not doable." The president also doubled down on his anti-woke rhetoric in his speech. "We are getting rid of woke," he said on Wednesday. "The American people do not want woke Marxist lunacy in the AI models." The remarks came during a keynote speech at a summit hosted by the All-In Podcast and the Hill & Valley Forum.

ai policy speech, artificial intelligence, trump administration, (8 more...)

WIRED

Country:

North America > United States (1.00)
Asia > China (0.06)

Industry:

Law (1.00)
Government > Regional Government > North America Government > United States Government (1.00)

Technology: Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.33)

Al JazeeraJul-23-2025, 20:03:25 GMT

Trump administration unveils wide ranging AI action plan

The administration of United States President Donald Trump has unveiled its new artificial intelligence action plan, which includes a strategy it says will boost the US standing in AI as it competes with China for dominance in the rapidly growing sector. The White House released the 25-page "America's AI Action Plan" on Wednesday. It includes 90 different policy proposals that the administration says will increase AI tools for allies around the globe. It will also promote production of new data centres around the US. It will scrap federal regulations that "hinder AI development", although it is not clear which regulations are in question.

artificial intelligence, chatbot, natural language, (18 more...)

Al Jazeera

Country:

Asia > China (0.53)
North America > United States > New York (0.06)

Industry:

Law > Statutes (1.00)
Government > Regional Government > North America Government > United States Government (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.30)