AITopics

Lahtinen, Kalle, Vaaras, Einari, Mustanoja, Liisa, Räsänen, Okko

Investigating Affect Mining Techniques for Annotation Sample Selection in the Creation of Finnish Affective Speech Corpus

Study of affect in speech requires suitable data, as emotional expression and perception vary across languages. Until now, no corpus has existed for natural expression of affect in spontaneous Finnish, existing data being acted or from a very specific communicative setting. This paper presents the first such corpus, created by annotating 12,000 utterances for emotional arousal and valence, sampled from three large-scale Finnish speech corpora. To ensure diverse affective expression, sample selection was conducted with an affect mining approach combining acoustic, cross-linguistic speech emotion, and text sentiment features. We compare this method to random sampling in terms of annotation diversity, and conduct post-hoc analyses to identify sampling choices that would have maximized the diversity. As an outcome, the work introduces a spontaneous Finnish affective speech corpus and informs sampling strategies for affective speech corpus creation in other languages or domains.

annotation, artificial intelligence, machine learning, (17 more...)

2505.17833

Country: Europe > Finland (0.31)

Genre:

Research Report > New Finding (0.69)
Research Report > Experimental Study (0.47)

Industry: Materials > Metals & Mining (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.96)
Information Technology > Artificial Intelligence > Cognitive Science > Emotion (0.68)

Does Chain-of-Thought Reasoning Really Reduce Harmfulness from Jailbreaking?

Lu, Chengda, Fan, Xiaoyu, Huang, Yu, Xu, Rongwu, Li, Jijie, Xu, Wei

Jailbreak attacks have been observed to largely fail against recent reasoning models enhanced by Chain-of-Thought (CoT) reasoning. However, the underlying mechanism remains underexplored, and relying solely on reasoning capacity may raise security concerns. In this paper, we try to answer the question: Does CoT reasoning really reduce harmfulness from jailbreaking? Through rigorous theoretical analysis, we demonstrate that CoT reasoning has dual effects on jailbreaking harmfulness. Based on the theoretical insights, we propose a novel jailbreak method, FicDetail, whose practical performance validates our theoretical findings.

large language model, machine learning, natural language, (16 more...)

2505.1765

Country: Asia > China (0.28)

Genre: Research Report > New Finding (0.46)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)
Government (1.00)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.97)
(2 more...)

Twin-2K-500: A dataset for building digital twins of over 2,000 people based on their answers to over 500 questions

Toubia, Olivier, Gui, George Z., Peng, Tianyi, Merlau, Daniel J., Li, Ang, Chen, Haozhe

LLM-based digital twin simulation, where large language models are used to emulate individual human behavior, holds great promise for research in AI, social science, and digital experimentation. However, progress in this area has been hindered by the scarcity of real, individual-level datasets that are both large and publicly available. This lack of high-quality ground truth limits both the development and validation of digital twin methodologies. To address this gap, we introduce a large-scale, public dataset designed to capture a rich and holistic view of individual human behavior. We survey a representative sample of $N = 2,058$ participants (average 2.42 hours per person) in the US across four waves with 500 questions in total, covering a comprehensive battery of demographic, psychological, economic, personality, and cognitive measures, as well as replications of behavioral economics experiments and a pricing survey. The final wave repeats tasks from earlier waves to establish a test-retest accuracy baseline. Initial analyses suggest the data are of high quality and show promise for constructing digital twins that predict human behavior well at the individual and aggregate levels. By making the full dataset publicly available, we aim to establish a valuable testbed for the development and benchmarking of LLM-based persona simulations. Beyond LLM applications, due to its unique breadth and scale the dataset also enables broad social science research, including studies of cross-construct correlations and heterogeneous treatment effects.

large language model, machine learning, natural language, (19 more...)

2505.17479

Country: North America > United States (1.00)

Genre:

Research Report > Experimental Study (1.00)
Questionnaire & Opinion Survey (1.00)
Research Report > New Finding (0.94)

Industry:

Leisure & Entertainment (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Consumer Health (1.00)
(14 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Ispirova, Gordana, Sebek, Michael, Menichetti, Giulia

Informatics for Food Processing

This chapter explores the evolution, classification, and health implications of food processing, while emphasizing the transformative role of machine learning, artificial intelligence (AI), and data science in advancing food informatics. It begins with a historical overview and a critical review of traditional classification frameworks such as NOVA, Nutri-Score, and SIGA, highlighting their strengths and limitations, particularly the subjectivity and reproducibility challenges that hinder epidemiological research and public policy. To address these issues, the chapter presents novel computational approaches, including FoodProX, a random forest model trained on nutrient composition data to infer processing levels and generate a continuous FPro score. It also explores how large language models like BERT and BioBERT can semantically embed food descriptions and ingredient lists for predictive tasks, even in the presence of missing data. A key contribution of the chapter is a novel case study using the Open Food Facts database, showcasing how multimodal AI models can integrate structured and unstructured data to classify foods at scale, offering a new paradigm for food processing assessment in public health and research.

large language model, machine learning, natural language, (20 more...)

2505.17087

Country:

North America > United States (1.00)
Europe (1.00)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Research Report > Strength High (0.93)
Overview (0.85)

Industry:

Materials > Chemicals (1.00)
Law (1.00)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
(11 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

arXiv.org Artificial IntelligenceMay-23-2025

PolyConf: Unlocking Polymer Conformation Generation through Hierarchical Generative Models

Wang, Fanmeng, Guo, Wentao, Ou, Qi, Wang, Hongshuai, Lin, Haitao, Xu, Hongteng, Gao, Zhifeng

Polymer conformation generation is a critical task that enables atomic-level studies of diverse polymer materials. While significant advances have been made in designing conformation generation methods for small molecules and proteins, these methods struggle to generate polymer conformations due to their unique structural characteristics. Meanwhile, the scarcity of polymer conformation datasets further limits the progress, making this important area largely unexplored. In this work, we propose PolyConf, a pioneering tailored polymer conformation generation method that leverages hierarchical generative models to unlock new possibilities. Specifically, we decompose the polymer conformation into a series of local conformations (i.e., the conformations of its repeating units), generating these local conformations through an autoregressive model, and then generating their orientation transformations via a diffusion model to assemble them into the complete polymer conformation. Moreover, we develop the first benchmark with a high-quality polymer conformation dataset derived from molecular dynamics simulations to boost related research in this area. The comprehensive evaluation demonstrates that PolyConf consistently outperforms existing conformation generation methods, thus facilitating advancements in polymer modeling and simulation.

artificial intelligence, machine learning, natural language, (15 more...)

2504.08859

Country: Asia > China (0.28)

Genre: Research Report (0.64)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Materials > Chemicals > Commodity Chemicals (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Natural Language > Generation (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Daily Mail - Science & techMay-22-2025, 20:39:23 GMT

Hidden city built 5,000 years ago by lost advanced civilization discovered underneath vast desert

For centuries, the Rub' al-Khali desert near Saudi Arabia and Dubai -- known as the Empty Quarter -- was dismissed as a lifeless sea of sand. In 2002, Sheikh Mohammed bin Rashid Al Maktoum, ruler of Dubai, spotted unusual dune formations and a large black deposit while flying over the desert. That led to the discovery of Saruq Al-Hadid, an archaeological site rich in remnants of copper and iron smelting, which is now believed to be part of a 5,000-year-old civilization buried beneath the sands. Researchers have now found traces of this ancient society approximately 10 feet beneath the desert surface, hidden in plain sight and long overlooked due to the harsh environment and shifting dunes of the Empty Quarter. This discovery brings fresh life to the legend of a mythical city known as'Atlantis of the Sands.'

civilization, desert, saruq al-hadid, (15 more...)

Daily Mail - Science & tech

Country:

Asia > Middle East > Saudi Arabia > Eastern Province > Rub' al Khali (0.83)
Asia > Middle East > UAE > Dubai Emirate > Dubai (0.47)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.06)
Asia > Middle East > Israel > Haifa District > Haifa (0.05)

Genre: Research Report (0.72)

Industry: Materials (0.37)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.31)

Kuchař, Michal, Fišer, Jaromír, Oswald, Cyril, Vyhlídal, Tomáš

AI-based Decision Support System for Heritage Aircraft Corrosion Prevention

arXiv.org Artificial IntelligenceMay-22-2025

The paper presents a decision support system for the long-term preservation of aeronautical heritage exhibited/stored in sheltered sites. The aeronautical heritage is characterized by diverse materials of which this heritage is constituted. Heritage aircraft are made of ancient aluminum alloys, (ply)wood, and particularly fabrics. The decision support system (DSS) designed, starting from a conceptual model, is knowledge-based on degradation/corrosion mechanisms of prevailing materials of aeronautical heritage. In the case of historical aircraft wooden parts, this knowledge base is filled in by the damage function models developed within former European projects. Model-based corrosion prediction is implemented within the new DSS for ancient aluminum alloys. The novelty of this DSS consists of supporting multi-material heritage protection and tailoring to peculiarities of aircraft exhibition/storage hangars and the needs of aviation museums. The novel DSS is tested on WWII aircraft heritage exhibited in the Aviation Museum Kbely, Military History Institute Prague, Czech Republic.

decision support system, knowledge management, machine learning, (15 more...)

2505.15462

Country:

North America > United States (0.46)
Europe > Czechia > Prague (0.27)

Genre: Research Report (1.00)

Industry:

Transportation > Air (0.69)
Government > Regional Government (0.68)
Materials > Metals & Mining > Aluminum (0.55)

Technology:

Information Technology > Decision Support Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.71)
Information Technology > Knowledge Management > Knowledge Engineering (0.54)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.54)

arXiv.org Artificial IntelligenceMay-21-2025

Autonomous nanoparticle synthesis by design

Anker, Andy S., Jensen, Jonas H., Gonzalez-Duque, Miguel, Moreno, Rodrigo, Smolska, Aleksandra, Juelsholt, Mikkel, Hardion, Vincent, Jorgensen, Mads R. V., Faina, Andres, Quinson, Jonathan, Stoy, Kasper, Vegge, Tejs

Controlled synthesis of materials with specified atomic structures underpins technological advances yet remains reliant on iterative, trial-and-error approaches. Nanoparticles (NPs), whose atomic arrangement dictates their emergent properties, are particularly challenging to synthesise due to numerous tunable parameters. Here, we introduce an autonomous approach explicitly targeting synthesis of atomic-scale structures. Our method autonomously designs synthesis protocols by matching real time experimental total scattering (TS) and pair distribution function (PDF) data to simulated target patterns, without requiring prior synthesis knowledge. We demonstrate this capability at a synchrotron, successfully synthesising two structurally distinct gold NPs: 5 nm decahedral and 10 nm face-centred cubic structures. Ultimately, specifying a simulated target scattering pattern, thus representing a bespoke atomic structure, and obtaining both the synthesised material and its reproducible synthesis protocol on demand may revolutionise materials design. Thus, ScatterLab provides a generalisable blueprint for autonomous, atomic structure-targeted synthesis across diverse systems and applications.

experiment, machine learning, real time system, (20 more...)

2505.13571

Country:

Europe > United Kingdom (0.46)
Europe > Denmark (0.28)
North America > United States (0.28)

Genre: Research Report > New Finding (1.00)

Industry:

Energy (1.00)
Information Technology (0.93)
Materials > Chemicals > Commodity Chemicals (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Architecture > Real Time Systems (1.00)

arXiv.org Artificial IntelligenceMay-21-2025

LLM Context Conditioning and PWP Prompting for Multimodal Validation of Chemical Formulas

Markhasin, Evgeny

Identifying subtle technical errors within complex scientific and technical documents, especially those requiring multimodal interpretation (e.g., formulas in images), presents a significant hurdle for Large Language Models (LLMs) whose inherent error-correction tendencies can mask inaccuracies. This exploratory proof-of-concept (PoC) study investigates structured LLM context conditioning, informed by Persistent Workflow Prompting (PWP) principles, as a methodological strategy to modulate this LLM behavior at inference time. The approach is designed to enhance the reliability of readily available, general-purpose LLMs (specifically Gemini 2.5 Pro and ChatGPT Plus o3) for precise validation tasks, crucially relying only on their standard chat interfaces without API access or model modifications. To explore this methodology, we focused on validating chemical formulas within a single, complex test paper with known textual and image-based errors. Several prompting strategies were evaluated: while basic prompts proved unreliable, an approach adapting PWP structures to rigorously condition the LLM's analytical mindset appeared to improve textual error identification with both models. Notably, this method also guided Gemini 2.5 Pro to repeatedly identify a subtle image-based formula error previously overlooked during manual review, a task where ChatGPT Plus o3 failed in our tests. These preliminary findings highlight specific LLM operational modes that impede detail-oriented validation and suggest that PWP-informed context conditioning offers a promising and highly accessible technique for developing more robust LLM-driven analytical workflows, particularly for tasks requiring meticulous error detection in scientific and technical documents. Extensive validation beyond this limited PoC is necessary to ascertain broader applicability.Keywords: AI-assisted, AI-powered, AI-enhanced, automated, knowledge engineering, machine learning.

formula, large language model, machine learning, (18 more...)

2505.12257

Genre:

Research Report (0.64)
Workflow (0.56)

Industry:

Law (0.68)
Materials > Chemicals (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)