negligible
The fragility of "cultural tendencies" in LLMs
In a recent study, Lu, Song, and Zhang (2025) (LSZ) propose that large language models (LLMs), when prompted in different languages, display culturally specific tendencies. They report that the two models (i.e., GPT and ERNIE) respond in more interdependent and holistic ways when prompted in Chinese, and more independent and analytic ways when prompted in English. LSZ attribute these differences to deep-seated cultural patterns in the models, claiming that prompt language alone can induce substantial cultural shifts. While we acknowledge the empirical patterns they observed, we find their experiments, methods, and interpretations problematic. In this paper, we critically re-evaluate the methodology, theoretical framing, and conclusions of LSZ. We argue that the reported "cultural tendencies" are not stable traits but fragile artifacts of specific models and task design. To test this, we conducted targeted replications using a broader set of LLMs and a larger number of test items. Our results show that prompt language has minimal effect on outputs, challenging LSZ's claim that these models encode grounded cultural beliefs.
- Europe > Germany > Baden-Württemberg > Stuttgart Region > Stuttgart (0.04)
- Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)
- Asia > Japan (0.04)
- Asia > China > Shanghai > Shanghai (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study > Negative Result (0.68)
Automatic Detection of LLM-generated Code: A Case Study of Claude 3 Haiku
Rahman, Musfiqur, Khatoonabadi, SayedHassan, Abdellatif, Ahmad, Shihab, Emad
Using Large Language Models (LLMs) has gained popularity among software developers for generating source code. However, the use of LLM-generated code can introduce risks of adding suboptimal, defective, and vulnerable code. This makes it necessary to devise methods for the accurate detection of LLM-generated code. Toward this goal, we perform a case study of Claude 3 Haiku (or Claude 3 for brevity) on CodeSearchNet dataset. We divide our analyses into two parts: function-level and class-level. We extract 22 software metric features, such as Code Lines and Cyclomatic Complexity, for each level of granularity. We then analyze code snippets generated by Claude 3 and their human-authored counterparts using the extracted features to understand how unique the code generated by Claude 3 is. In the following step, we use the unique characteristics of Claude 3-generated code to build Machine Learning (ML) models and identify which features of the code snippets make them more detectable by ML models. Our results indicate that Claude 3 tends to generate longer functions, but shorter classes than humans, and this characteristic can be used to detect Claude 3-generated code with ML models with 82% and 66% accuracies for function-level and class-level snippets, respectively.
- North America > Canada > Alberta > Census Division No. 6 > Calgary Metropolitan Region > Calgary (0.14)
- North America > Canada > Quebec > Montreal (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Europe > United Kingdom > England > Greater Manchester > Salford (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.93)
- Law (0.46)
- Information Technology > Software (0.34)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Implications for Governance in Public Perceptions of Societal-scale AI Risks
Gruetzemacher, Ross, Pilditch, Toby D., Liang, Huigang, Manning, Christy, Gates, Vael, Moss, David, Elsey, James W. B., Sleegers, Willem W. A., Kilian, Kyle
Amid growing concerns over AI's societal risks--ranging from civilizational collapse to misinformation and systemic bias--this study explores the perceptions of AI experts and the general US registered voters on the likelihood and impact of 18 specific AI risks, alongside their policy preferences for managing these risks. While both groups favor international oversight over national or corporate governance, our survey reveals a discrepancy: voters perceive AI risks as both more likely and more impactful than experts, and also advocate for slower AI development. Specifically, our findings indicate that policy interventions may best assuage collective concerns if they attempt to more carefully balance mitigation efforts across all classes of societal-scale risks, effectively nullifying the near-vs-long-term debate over AI risks. More broadly, our results will serve not only to enable more substantive policy discussions for preventing and mitigating AI risks, but also to underscore the challenge of consensus building for effective policy implementation.
- North America > United States > Kansas > Sedgwick County > Wichita (0.14)
- Asia > China (0.04)
- South America (0.04)
- (5 more...)
- Research Report > New Finding (1.00)
- Overview (1.00)
- Law Enforcement & Public Safety (1.00)
- Law (1.00)
- Information Technology > Security & Privacy (1.00)
- (6 more...)