AITopics

2505.17039

Country:

Europe > United Kingdom (0.28)
South America > Brazil > Rio Grande do Sul (0.15)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Industry: Consumer Products & Services > Food, Beverage, Tobacco & Cannabis > Beverages (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Ebrahimi, Seyedeh Fatemeh, Peltonen, Jaakko

Constrained Non-negative Matrix Factorization for Guided Topic Modeling of Minority Topics

arXiv.org Artificial IntelligenceMay-23-2025

Topic models often fail to capture low-prevalence, domain-critical themes, so-called minority topics, such as mental health themes in online comments. While some existing methods can incorporate domain knowledge, such as expected topical content, methods allowing guidance may require overly detailed expected topics, hindering the discovery of topic divisions and variation. We propose a topic modeling solution via a specially constrained NMF. We incorporate a seed word list characterizing minority content of interest, but we do not require experts to pre-specify their division across minority topics. Through prevalence constraints on minority topics and seed word content across topics, we learn distinct data-driven minority topics as well as majority topics. The constrained NMF is fitted via Karush-Kuhn-Tucker (KKT) conditions with multiplicative updates. We outperform several baselines on synthetic data in terms of topic purity, normalized mutual information, and also evaluate topic quality using Jensen-Shannon divergence (JSD). We conduct a case study on YouTube vlog comments, analyzing viewer discussion of mental health content; our model successfully identifies and reveals this domain-relevant minority content.

constraint, machine learning, natural language, (17 more...)

2505.16493

Country:

Asia > Middle East > Jordan (0.04)
Asia > Middle East > Israel (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
(17 more...)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.69)
(2 more...)

PoseX: AI Defeats Physics Approaches on Protein-Ligand Cross Docking

Jiang, Yize, Li, Xinze, Zhang, Yuanyuan, Han, Jin, Xu, Youjun, Pandit, Ayush, Zhang, Zaixi, Wang, Mengdi, Wang, Mengyang, Liu, Chong, Yang, Guang, Choi, Yejin, Li, Wu-Jun, Fu, Tianfan, Wu, Fang, Liu, Junhong

Existing protein-ligand docking studies typically focus on the self-docking scenario, which is less practical in real applications. Moreover, some studies involve heavy frameworks requiring extensive training, posing challenges for convenient and efficient assessment of docking methods. To fill these gaps, we design PoseX, an open-source benchmark to evaluate both self-docking and cross-docking, enabling a practical and comprehensive assessment of algorithmic advances. Specifically, we curated a novel dataset comprising 718 entries for self-docking and 1,312 entries for cross-docking; second, we incorporated 23 docking methods in three methodological categories, including physics-based methods (e.g., Schrödinger Glide), AI docking methods (e.g., DiffDock) and AI co-folding methods (e.g., AlphaFold3); third, we developed a relaxation method for post-processing to minimize conformational energy and refine binding poses; fourth, we built a leaderboard to rank submitted models in real-time. We derived some key insights and conclusions from extensive experiments: (1) AI approaches have consistently outperformed physics-based methods in overall docking success rate. (2) Most intra- and intermolecular clashes of AI approaches can be greatly alleviated with relaxation, which means combining AI modeling with physics-based post-processing could achieve excellent performance. (3) AI co-folding methods exhibit ligand chirality issues, except for Boltz-1x, which introduced physics-inspired potentials to fix hallucinations, suggesting modeling on stereochemistry improves the structural plausibility markedly. (4) Specifying binding pockets significantly promotes docking performance, indicating that pocket information can be leveraged adequately, particularly for AI co-folding methods, in future modeling efforts. The code, dataset, and leaderboard are released at https://github.com/CataAI/PoseX.

ai co-folding method, machine learning, natural language, (19 more...)

2505.017

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > Canada (0.04)
Europe > Germany > Rheinland-Pfalz > Mainz (0.04)
Asia > China > Jiangsu Province > Nanjing (0.04)

Genre: Research Report (0.82)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Model-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.93)

Radmehr, Bahar, Shved, Ekaterina, Güreş, Fatma Betül, Singla, Adish, Käser, Tanja

ClickSight: Interpreting Student Clickstreams to Reveal Insights on Learning Strategies via LLMs

Clickstream data from digital learning environments offer valuable insights into students' learning behaviors, but are challenging to interpret due to their high dimensionality and granularity. Prior approaches have relied mainly on handcrafted features, expert labeling, clustering, or supervised models, therefore often lacking generalizability and scalability. In this work, we introduce ClickSight, an in-context Large Language Model (LLM)-based pipeline that interprets student clickstreams to reveal their learning strategies. ClickSight takes raw clickstreams and a list of learning strategies as input and generates textual interpretations of students' behaviors during interaction. We evaluate four different prompting strategies and investigate the impact of self-refinement on interpretation quality. Our evaluation spans two open-ended learning environments and uses a rubric-based domain-expert evaluation. Results show that while LLMs can reasonably interpret learning strategies from clickstreams, interpretation quality varies by prompting strategy, and self-refinement offers limited improvement. ClickSight demonstrates the potential of LLMs to generate theory-driven insights from educational interaction data.

artificial intelligence, large language model, natural language, (16 more...)

2505.1541

Country:

North America > United States > New York > New York County > New York City (0.14)
Europe > Switzerland (0.05)
Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)
(5 more...)

Genre: Research Report > New Finding (0.48)

Industry:

Education > Educational Technology > Educational Software (0.47)
Education > Educational Setting > Online (0.34)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Rodriguez-Pardo, Carlos, Chiani, Leonardo, Borgonovo, Emanuele, Tavoni, Massimo

Neural Conditional Transport Maps

arXiv.org Machine LearningMay-22-2025

We present a neural framework for learning conditional optimal transport (OT) maps between probability distributions. Our approach introduces a conditioning mechanism capable of processing both categorical and continuous conditioning variables simultaneously. At the core of our method lies a hypernetwork that generates transport layer parameters based on these inputs, creating adaptive mappings that outperform simpler conditioning methods. Comprehensive ablation studies demonstrate the superior performance of our method over baseline configurations. Furthermore, we showcase an application to global sensitivity analysis, offering high performance in computing OT-based sensitivity indices. This work advances the state-of-the-art in conditional optimal transport, enabling broader application of optimal transport principles to complex, high-dimensional domains such as generative modeling and black-box model explainability.

artificial intelligence, machine learning, optimal transport, (17 more...)

arXiv.org Machine Learning

2505.15808

Country:

South America > Colombia (0.04)
North America > Canada (0.04)
Europe > Spain (0.04)
(2 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Tabell, Otto, Tikka, Santtu, Karvanen, Juha

Clustering and Pruning in Causal Data Fusion

arXiv.org Machine LearningMay-22-2025

Data fusion--the process of combining observational and exp erimental data--can enable the identification of causal effects that would otherwise rem ain non-identifiable. Although identification algorithms have been developed for specific s cenarios, do-calculus remains the only general-purpose tool for causal data fusion, particul arly when variables are present in some data sources but not others. However, approaches based on do-calculus may encounter computational challenges as the number of variables increa ses and the causal graph grows in complexity. Consequently, there exists a need to reduce t he size of such models while preserving the essential features. For this purpose, we pro pose pruning (removing unnecessary variables) and clustering (combining variables) as pr eprocessing operations for causal data fusion. We generalize earlier results on a single data s ource and derive conditions for applying pruning and clustering in the case of multiple data sources. We give sufficient conditions for inferring the identifiability or non-identi fiability of a causal effect in a larger graph based on a smaller graph and show how to obtain the corre sponding identifying functional for identifiable causal effects. Examples from ep idemiology and social science demonstrate the use of the results.

artificial intelligence, information fusion, input distribution, (14 more...)

arXiv.org Machine Learning

2505.15215

Country:

Europe > Austria > Vienna (0.14)
South America > Brazil (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Finland > Central Finland > Jyväskylä (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area (0.67)

Technology:

Information Technology > Data Science > Data Integration (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (1.00)

Santos, Pedro P., Sardinha, Alberto, Melo, Francisco S.

Solving General-Utility Markov Decision Processes in the Single-Trial Regime with Online Planning

In this work, we contribute the first approach to solve infinite-horizon discounted general-utility Markov decision processes (GUMDPs) in the single-trial regime, i.e., when the agent's performance is evaluated based on a single trajectory. First, we provide some fundamental results regarding policy optimization in the single-trial regime, investigating which class of policies suffices for optimality, casting our problem as a particular MDP that is equivalent to our original problem, as well as studying the computational hardness of policy optimization in the single-trial regime. Second, we show how we can leverage online planning techniques, in particular a Monte-Carlo tree search algorithm, to solve GUMDPs in the single-trial regime. Third, we provide experimental results showcasing the superior performance of our approach in comparison to relevant baselines.

gumdp, machine learning, reinforcement learning, (17 more...)

2505.15782

Country:

Europe (0.28)
South America > Brazil (0.28)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.84)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)

Ferreira, Alessandro dos Santos, Ramos, Ana Paula Marques, Junior, José Marcato, Gonçalves, Wesley Nunes

Data Augmentation and Resolution Enhancement using GANs and Diffusion Models for Tree Segmentation

Urban forests play a key role in enhancing environmental quality and supporting biodiversity in cities. Mapping and monitoring these green spaces are crucial for urban planning and conservation, yet accurately detecting trees is challenging due to complex landscapes and the variability in image resolution caused by different satellite sensors or UAV flight altitudes. While deep learning architectures have shown promise in addressing these challenges, their effectiveness remains strongly dependent on the availability of large and manually labeled datasets, which are often expensive and difficult to obtain in sufficient quantity. In this work, we propose a novel pipeline that integrates domain adaptation with GANs and Diffusion models to enhance the quality of low-resolution aerial images. Our proposed pipeline enhances low-resolution imagery while preserving semantic content, enabling effective tree segmentation without requiring large volumes of manually annotated data. Leveraging models such as pix2pix, Real-ESRGAN, Latent Diffusion, and Stable Diffusion, we generate realistic and structurally consistent synthetic samples that expand the training dataset and unify scale across domains. This approach not only improves the robustness of segmentation models across different acquisition conditions but also provides a scalable and replicable solution for remote sensing scenarios with scarce annotation resources. Experimental results demonstrated an improvement of over 50% in IoU for low-resolution images, highlighting the effectiveness of our method compared to traditional pipelines.

artificial intelligence, deep learning, machine learning, (20 more...)

2505.15077

Country: South America > Brazil (0.29)

Genre: Research Report > New Finding (0.68)

Industry:

Energy > Renewable > Geothermal > Geothermal Energy Exploration and Development > Geophysical Analysis & Survey (0.36)
Information Technology (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

The Atlantic - TechnologyMay-21-2025, 12:00:00 GMT

What AI Thinks It Knows About You

Large language models such as GPT, Llama, Claude, and DeepSeek can be so fluent that people feel it as a "you," and it answers encouragingly as an "I." The models can write poetry in nearly any given form, read a set of political speeches and promptly sift out and share all the jokes, draw a chart, code a website. How do they do these and so many other things that were just recently the sole realm of humans? Practitioners are left explaining jaw-dropping conversational rabbit-from-a-hat extractions with arm-waving that the models are just predicting one word at a time from an unthinkably large training set scraped from every recorded written or spoken human utterance that can be found--fair enough--or a with a small shrug and a cryptic utterance of "fine-tuning" or "transformers!" These aren't very satisfying answers for how these models can converse so intelligently, and how they sometimes err so weirdly.

claude, golden gate bridge, language model, (14 more...)

The Atlantic - Technology

Country:

Pacific Ocean > North Pacific Ocean > San Francisco Bay > Golden Gate (0.07)
South America > Brazil (0.04)

Industry: Transportation > Air (0.41)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

The GuardianMay-21-2025, 04:00:16 GMT

'Every person that clashed with him has left': the rise, fall and spectacular comeback of Sam Altman

The short-lived firing of Sam Altman, the CEO of possibly the world's most important AI company, was sensational. When he was sacked by OpenAI's board members, some of them believed the stakes could not have been higher – the future of humanity – if the organisation continued under Altman. Imagine Succession, with added apocalypse vibes. In early November 2023, after three weeks of secret calls and varying degrees of paranoia, the OpenAI board agreed: Altman had to go. After his removal, Altman's most loyal staff resigned, and others signed an open letter calling for his reinstatement.

altman, hao, openai, (14 more...)

The Guardian

Country:

North America > United States > California > San Francisco County > San Francisco (0.05)
South America > Colombia (0.04)
South America > Chile (0.04)
Africa > Kenya (0.04)

Industry:

Law (1.00)
Information Technology > Security & Privacy (0.47)
Health & Medicine > Consumer Health (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.79)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.65)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.64)