AITopics

Communications of the ACMJan-21-2025, 18:44:45 GMT

Fit for People, Fit for Purpose: Designing Tech that Matters

I didn't intend to pursue computer science. I was a midwife, focused on women-centered care in our private practice--my third career after trying social work and nursing. I only went to the faculty dean to discuss how I might focus my part-time science degree if I were to go full-time. I left his office as part of the first cohort in the new Information Technology (IT) program at the University of Queensland, and I finished my degree with a university medal. My motivation was not love of technology.

artificial intelligence, designing tech, natural language, (8 more...)

Communications of the ACM

Country:

Oceania > Australia > Queensland (0.37)
Europe > Austria (0.34)
Europe > United Kingdom > England > Greater London > London (0.16)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Communications (0.73)
Information Technology > Artificial Intelligence > Natural Language (0.31)

Regressor-Guided Image Editing Regulates Emotional Response to Reduce Online Engagement

Gebhardt, Christoph, Willardt, Robin, Sadat, Seyedmorteza, Ning, Chih-Wei, Brombach, Andreas, Song, Jie, Hilliges, Otmar, Holz, Christian

Emotions are known to mediate the relationship between users' content consumption and their online engagement, with heightened emotional intensity leading to increased engagement. Building on this insight, we propose three regressor-guided image editing approaches aimed at diminishing the emotional impact of images. These include (i) a parameter optimization approach based on global image transformations known to influence emotions, (ii) an optimization approach targeting the style latent space of a generative adversarial network, and (iii) a diffusion-based approach employing classifier guidance and classifier-free guidance. Our findings demonstrate that approaches can effectively alter the emotional properties of images while maintaining high visual quality. Optimization-based methods primarily adjust low-level properties like color hues and brightness, whereas the diffusion-based approach introduces semantic changes, such as altering appearance or facial expressions. Notably, results from a behavioral study reveal that only the diffusion-based approach successfully elicits changes in viewers' emotional responses while preserving high perceived image quality. In future work, we will investigate the impact of these image adaptations on internet user behavior.

machine learning, natural language, optimization, (16 more...)

2501.12289

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States > California > Los Angeles County > Los Angeles (0.14)
Oceania > Australia (0.04)
(6 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Media (1.00)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
(3 more...)

Purohit, Mirali, Muhawenayo, Gedeon, Rolf, Esther, Kerner, Hannah

How Does the Spatial Distribution of Pre-training Data Affect Geospatial Foundation Models?

Foundation models have made rapid advances in many domains including Earth observation, where Geospatial Foundation Models (GFMs) can help address global challenges such as climate change, agriculture, and disaster response. Previous work on GFMs focused on tailoring model architecture and pre-text tasks, and did not investigate the impact of pre-training data selection on model performance. However, recent works from other domains show that the pre-training data distribution is an important factor influencing the performance of the foundation models. With this motivation, our research explores how the geographic distribution of pre-training data affects the performance of GFMs. We evaluated several pre-training data distributions by sampling different compositions from a global data pool. Our experiments with two GFMs on downstream tasks indicate that balanced and globally representative data compositions often outperform region-specific sampling, highlighting the importance of diversity and global coverage in pre-training data. Our results suggest that the most appropriate data sampling technique may depend on the specific GFM architecture. These findings will support the development of robust GFMs by incorporating quality pre-training data distributions, ultimately improving machine learning solutions for Earth observation.

artificial intelligence, cropharvest, machine learning, (14 more...)

2501.12535

Country:

South America (0.04)
Oceania (0.04)
Europe (0.04)
(7 more...)

Genre: Research Report > New Finding (1.00)

Industry: Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Bruegger, Josh, Catana, Diana Ioana, Macovaz, Vanja, Valdenegro-Toro, Matias, Sabatelli, Matthia, Zullich, Marco

Large-image Object Detection for Fine-grained Recognition of Punches Patterns in Medieval Panel Painting

The attribution of the author of an art piece is typically a laborious manual process, usually relying on subjective evaluations of expert figures. However, there are some situations in which quantitative features of the artwork can support these evaluations. The extraction of these features can sometimes be automated, for instance, with the use of Machine Learning (ML) techniques. An example of these features is represented by repeated, mechanically impressed patterns, called punches, present chiefly in 13th and 14th-century panel paintings from Tuscany. Previous research in art history showcased a strong connection between the shapes of punches and specific artists or workshops, suggesting the possibility of using these quantitative cues to support the attribution. In the present work, we first collect a dataset of large-scale images of these panel paintings. Then, using YOLOv10, a recent and popular object detection model, we train a ML pipeline to perform object detection on the punches contained in the images. Due to the large size of the images, the detection procedure is split across multiple frames by adopting a sliding-window approach with overlaps, after which the predictions are combined for the whole image using a custom non-maximal suppression routine. Our results indicate how art historians working in the field can reliably use our method for the identification and extraction of punches.

artificial intelligence, machine learning, prediction, (15 more...)

2501.12489

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Europe > Italy > Tuscany > Pisa Province > Pisa (0.04)
Oceania > Australia > Queensland > Brisbane (0.04)
(2 more...)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Platanitis, Konstantinos, Arana-Catania, Miguel, Upadhyay, Saurabh, Felicetti, Leonard

A causal learning approach to in-orbit inertial parameter estimation for multi-payload deployers

This paper discusses an approach to inertial parameter estimation for the case of cargo carrying spacecraft that is based on causal learning, i.e. learning from the responses of the spacecraft, under actuation. Different spacecraft configurations (inertial parameter sets) are simulated under different actuation profiles, in order to produce an optimised time-series clustering classifier that can be used to distinguish between them. The actuation is comprised of finite sequences of constant inputs that are applied in order, based on typical actuators available. By learning from the system's responses across multiple input sequences, and then applying measures of time-series similarity and F1-score, an optimal actuation sequence can be chosen either for one specific system configuration or for the overall set of possible configurations. This allows for both estimation of the inertial parameter set without any prior knowledge of state, as well as validation of transitions between different configurations after a deployment event. The optimisation of the actuation sequence is handled by a reinforcement learning model that uses the proximal policy optimisation (PPO) algorithm, by repeatedly trying different sequences and evaluating the impact on classifier performance according to a multi-objective metric.

artificial intelligence, machine learning, spacecraft, (14 more...)

2501.14824

Country:

Europe > Italy > Lombardy > Milan (0.06)
Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States (0.04)
(2 more...)

Genre: Research Report (0.70)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Modality Interactive Mixture-of-Experts for Fake News Detection

Liu, Yifan, Liu, Yaokun, Li, Zelin, Yao, Ruichen, Zhang, Yang, Wang, Dong

The proliferation of fake news on social media platforms disproportionately impacts vulnerable populations, eroding trust, exacerbating inequality, and amplifying harmful narratives. Detecting fake news in multimodal contexts -- where deceptive content combines text and images -- is particularly challenging due to the nuanced interplay between modalities. Existing multimodal fake news detection methods often emphasize cross-modal consistency but ignore the complex interactions between text and visual elements, which may complement, contradict, or independently influence the predicted veracity of a post. To address these challenges, we present Modality Interactive Mixture-of-Experts for Fake News Detection (MIMoE-FND), a novel hierarchical Mixture-of-Experts framework designed to enhance multimodal fake news detection by explicitly modeling modality interactions through an interaction gating mechanism. Our approach models modality interactions by evaluating two key aspects of modality interactions: unimodal prediction agreement and semantic alignment. The hierarchical structure of MIMoE-FND allows for distinct learning pathways tailored to different fusion scenarios, adapting to the unique characteristics of each modality interaction. By tailoring fusion strategies to diverse modality interaction scenarios, MIMoE-FND provides a more robust and nuanced approach to multimodal fake news detection. We evaluate our approach on three real-world benchmarks spanning two languages, demonstrating its superior performance compared to state-of-the-art methods. By enhancing the accuracy and interpretability of fake news detection, MIMoE-FND offers a promising tool to mitigate the spread of misinformation, with the potential to better safeguard vulnerable communities against its harmful effects.

data mining, machine learning, natural language, (13 more...)

2501.12431

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
Oceania > Australia > New South Wales > Sydney (0.05)
(15 more...)

Genre: Research Report > Promising Solution (0.48)

Industry: Media > News (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Multi-Instance Partial-Label Learning with Margin Adjustment

Tang, Wei, Yang, Yin-Fang, Wang, Zhaofei, Zhang, Weijia, Zhang, Min-Ling

Multi-instance partial-label learning (MIPL) is an emerging learning framework where each training sample is represented as a multi-instance bag associated with a candidate label set. Existing MIPL algorithms often overlook the margins for attention scores and predicted probabilities, leading to suboptimal generalization performance. A critical issue with these algorithms is that the highest prediction probability of the classifier may appear on a non-candidate label. In this paper, we propose an algorithm named MIPLMA, i.e., Multi-Instance Partial-Label learning with Margin Adjustment, which adjusts the margins for attention scores and predicted probabilities. We introduce a margin-aware attention mechanism to dynamically adjust the margins for attention scores and propose a margin distribution loss to constrain the margins between the predicted probabilities on candidate and non-candidate label sets. Experimental results demonstrate the superior performance of MIPLMA over existing MIPL algorithms, as well as other well-established multi-instance learning algorithms and partial-label learning algorithms.

algorithm, artificial intelligence, machine learning, (15 more...)

2501.12597

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.28)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(16 more...)

Genre: Research Report > New Finding (0.87)

Industry: Health & Medicine > Diagnostic Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

UI-TARS: Pioneering Automated GUI Interaction with Native Agents

Qin, Yujia, Ye, Yining, Fang, Junjie, Wang, Haoming, Liang, Shihao, Tian, Shizuo, Zhang, Junda, Li, Jiahao, Li, Yunxin, Huang, Shijue, Zhong, Wanjun, Li, Kuanye, Yang, Jiale, Miao, Yu, Lin, Woyu, Liu, Longxiang, Jiang, Xu, Ma, Qianli, Li, Jingyu, Xiao, Xiaojun, Cai, Kai, Li, Chuang, Zheng, Yaowei, Jin, Chaolin, Li, Chen, Zhou, Xiao, Wang, Minchao, Chen, Haoli, Li, Zhaojian, Yang, Haihua, Liu, Haifeng, Lin, Feng, Peng, Tao, Liu, Xin, Shi, Guang

This paper introduces UI-TARS, a native GUI agent model that solely perceives the screenshots as input and performs human-like interactions (e.g., keyboard and mouse operations). Unlike prevailing agent frameworks that depend on heavily wrapped commercial models (e.g., GPT-4o) with expert-crafted prompts and workflows, UI-TARS is an end-to-end model that outperforms these sophisticated frameworks. Experiments demonstrate its superior performance: UI-TARS achieves SOTA performance in 10+ GUI agent benchmarks evaluating perception, grounding, and GUI task execution (see below). Notably, in the OSWorld benchmark, UI-TARS achieves scores of 24.6 with 50 steps and 22.7 with 15 steps, outperforming Claude's 22.0 and 14.9 respectively. In AndroidWorld, UI-TARS achieves 46.6, surpassing GPT-4o's 34.5. UI-TARS incorporates several key innovations: (1) Enhanced Perception: leveraging a large-scale dataset of GUI screenshots for context-aware understanding of UI elements and precise captioning; (2) Unified Action Modeling, which standardizes actions into a unified space across platforms and achieves precise grounding and interaction through large-scale action traces; (3) System-2 Reasoning, which incorporates deliberate reasoning into multi-step decision making, involving multiple reasoning patterns such as task decomposition, reflection thinking, milestone recognition, etc. (4) Iterative Training with Reflective Online Traces, which addresses the data bottleneck by automatically collecting, filtering, and reflectively refining new interaction traces on hundreds of virtual machines. Through iterative training and reflection tuning, UI-TARS continuously learns from its mistakes and adapts to unforeseen situations with minimal human intervention. We also analyze the evolution path of GUI agents to guide the further development of this domain.

large language model, machine learning, natural language, (20 more...)

2501.12326

Country:

North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
North America > Dominican Republic (0.04)
(11 more...)

Genre:

Workflow (1.00)
Research Report (1.00)
Instructional Material > Course Syllabus & Notes (0.46)

Industry:

Information Technology (1.00)
Leisure & Entertainment > Games (0.45)

Technology:

Information Technology > Human Computer Interaction > Interfaces (1.00)
Information Technology > Graphics (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
(2 more...)

Chen, Jiazheng, Liu, Wanchun

Goal-oriented Transmission Scheduling: Structure-guided DRL with a Unified Dual On-policy and Off-policy Approach

Goal-oriented communications prioritize application-driven objectives over data accuracy, enabling intelligent next-generation wireless systems. Efficient scheduling in multi-device, multi-channel systems poses significant challenges due to high-dimensional state and action spaces. We address these challenges by deriving key structural properties of the optimal solution to the goal-oriented scheduling problem, incorporating Age of Information (AoI) and channel states. Specifically, we establish the monotonicity of the optimal state value function (a measure of long-term system performance) w.r.t. channel states and prove its asymptotic convexity w.r.t. AoI states. Additionally, we derive the monotonicity of the optimal policy w.r.t. channel states, advancing the theoretical framework for optimal scheduling. Leveraging these insights, we propose the structure-guided unified dual on-off policy DRL (SUDO-DRL), a hybrid algorithm that combines the stability of on-policy training with the sample efficiency of off-policy methods. Through a novel structural property evaluation framework, SUDO-DRL enables effective and scalable training, addressing the complexities of large-scale systems. Numerical results show SUDO-DRL improves system performance by up to 45% and reduces convergence time by 40% compared to state-of-the-art methods. It also effectively handles scheduling in much larger systems, where off-policy DRL fails and on-policy benchmarks exhibit significant performance loss, demonstrating its scalability and efficacy in goal-oriented communications.

artificial intelligence, machine learning, reinforcement learning, (20 more...)

2501.11921

Country:

Asia > Middle East > Jordan (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
Europe > Germany > North Rhine-Westphalia > Upper Bavaria > Munich (0.04)

Genre: Research Report > New Finding (0.34)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.48)