Oceania
Reward-Centered ReST-MCTS: A Robust Decision-Making Framework for Robotic Manipulation in High Uncertainty Environments
Monte Carlo Tree Search (MCTS) has emerged as a powerful tool for decision-making in robotics, enabling efficient exploration of large search spaces. However, traditional MCTS methods struggle in environments characterized by high uncertainty and noisy data due to their reliance on final-step reward evaluation. The lack of intermediate feedback during search often results in suboptimal decision-making and computational inefficiencies. This paper introduces Reward-Centered ReST-MCTS, a novel framework that enhances MCTS by incorporating intermediate reward shaping. The core of our approach is the Rewarding Center, which refines search trajectories by dynamically assigning partial rewards using rule-based validation, heuristic guidance, and neural estimation. By integrating these mechanisms, our method enables real-time optimization of search paths, mitigating the effects of error propagation. We evaluate Reward-Centered ReST-MCTS in robotic manipulation tasks under high uncertainty, demonstrating consistent improvements in decision accuracy. Compared to baseline methods, including Chain-of-Thought (CoT) prompting and Vanilla ReST-MCTS, our framework achieves a 2-4% accuracy improvement while maintaining computational feasibility. Ablation studies confirm the effectiveness of intermediate feedback in search refinement, particularly in pruning incorrect decision paths early. Furthermore, robustness tests show that our method retains high performance across varying levels of uncertainty.
AILS-NTUA at SemEval-2025 Task 8: Language-to-Code prompting and Error Fixing for Tabular Question Answering
Evangelatos, Andreas, Filandrianos, Giorgos, Lymperaiou, Maria, Voulodimos, Athanasios, Stamou, Giorgos
In this paper, we present our submission to SemEval-2025 Task 8: Question Answering over Tabular Data. This task, evaluated on the DataBench dataset, assesses Large Language Models' (LLMs) ability to answer natural language questions over structured data while addressing topic diversity and table size limitations in previous benchmarks. We propose a system that employs effective LLM prompting to translate natural language queries into executable code, enabling accurate responses, error correction, and interpretability. Our approach ranks first in both subtasks of the competition in the proprietary model category, significantly outperforming the organizer's baseline.
Cluster weighted models for functional data
We propose a method, funWeightClust, based on a family of parsimonious models for clustering heterogeneous functional linear regression data. These models extend cluster weighted models to functional data, and they allow for multivariate functional responses and predictors. The proposed methodology follows the approach used by the the functional high dimensional data clustering (funHDDC) method. We construct an expectation maximization (EM) algorithm for parameter estimation. Using simulated and benchmark data we show that funWeightClust outperforms funHDDC and several two-steps clustering methods. We also use funWeightClust to analyze traffic patterns in Edmonton, Canada.
It's time to ban Chinese AI app DeepSeek from 'government devices,' state AGs urge Congress
Trump counselor Alina Habba responds to concerns of China buying up American real estate on'The Ingraham Angle.' State attorneys general have joined the growing calls from elected officials urging Congress to pass a law banning the Chinese-owned DeepSeek AI app on all government devices, saying "China is a clear and present danger" to the U.S. "DeepSeek appears to be another tool for Chinese spies to attack America's national security," the letter, signed by 21 attorneys general to House and Senate leaders, said. "Given the Chinese desire to steal America's secrets and the ability of DeepSeek to carry out this theft, Congress should quickly pass legislation to ban DeepSeek on government devices," the letter read. "Congress passed similar legislation two years ago to prevent TikTok from stealing information from our government." Montana AG Austin Knudsen, who drafted the letter, wrote that "China is trying to steal America's secrets. Congress should shut down China's latest Trojan horse by passing the No DeepSeek on Government Devices Act."
Dynamic-KGQA: A Scalable Framework for Generating Adaptive Question Answering Datasets
Dammu, Preetam Prabhu Srikar, Naidu, Himanshu, Shah, Chirag
As question answering (QA) systems advance alongside the rapid evolution of foundation models, the need for robust, adaptable, and large-scale evaluation benchmarks becomes increasingly critical. Traditional QA benchmarks are often static and publicly available, making them susceptible to data contamination and memorization by large language models (LLMs). Consequently, static benchmarks may overestimate model generalization and hinder a reliable assessment of real-world performance. In this work, we introduce Dynamic-KGQA, a scalable framework for generating adaptive QA datasets from knowledge graphs (KGs), designed to mitigate memorization risks while maintaining statistical consistency across iterations. Unlike fixed benchmarks, Dynamic-KGQA generates a new dataset variant on every run while preserving the underlying distribution, enabling fair and reproducible evaluations. Furthermore, our framework provides fine-grained control over dataset characteristics, supporting domain-specific and topic-focused QA dataset generation. Additionally, Dynamic-KGQA produces compact, semantically coherent subgraphs that facilitate both training and evaluation of KGQA models, enhancing their ability to leverage structured knowledge effectively. To align with existing evaluation protocols, we also provide static large-scale train/test/validation splits, ensuring comparability with prior methods. By introducing a dynamic, customizable benchmarking paradigm, Dynamic-KGQA enables a more rigorous and adaptable evaluation of QA systems.
Unseen Fake News Detection Through Casual Debiasing
Gong, Shuzhi, Sinnott, Richard, Qi, Jianzhong, Paris, Cecile
The widespread dissemination of fake news on social media poses significant risks, necessitating timely and accurate detection. However, existing methods struggle with unseen news due to their reliance on training data from past events and domains, leaving the challenge of detecting novel fake news largely unresolved. To address this, we identify biases in training data tied to specific domains and propose a debiasing solution FNDCD. Originating from causal analysis, FNDCD employs a reweighting strategy based on classification confidence and propagation structure regularization to reduce the influence of domain-specific biases, enhancing the detection of unseen fake news. Experiments on real-world datasets with non-overlapping news domains demonstrate FNDCD's effectiveness in improving generalization across domains.
Attention Mechanism based Cognition-level Scene Understanding
Given a question-image input, the Visual Commonsense Reasoning (VCR) model can predict an answer with the corresponding rationale, which requires inference ability from the real world. The VCR task, which calls for exploiting the multi-source information as well as learning different levels of understanding and extensive commonsense knowledge, is a cognition-level scene understanding task. The VCR task has aroused researchers' interest due to its wide range of applications, including visual question answering, automated vehicle systems, and clinical decision support. Previous approaches to solving the VCR task generally rely on pre-training or exploiting memory with long dependency relationship encoded models. However, these approaches suffer from a lack of generalizability and losing information in long sequences. In this paper, we propose a parallel attention-based cognitive VCR network PAVCR, which fuses visual-textual information efficiently and encodes semantic information in parallel to enable the model to capture rich information for cognition-level inference. Extensive experiments show that the proposed model yields significant improvements over existing methods on the benchmark VCR dataset. Moreover, the proposed model provides intuitive interpretation into visual commonsense reasoning.
Malware Detection at the Edge with Lightweight LLMs: A Performance Evaluation
Rondanini, Christian, Carminati, Barbara, Ferrari, Elena, Gaudiano, Antonio, Kundu, Ashish
--The rapid evolution of malware attacks calls for the development of innovative detection methods, especially in resource-constrained edge computing. Traditional detection techniques struggle to keep up with modern malware's sophistication and adaptability, prompting a shift towards advanced methodologies like those leveraging Large Language Models (LLMs) for enhanced malware detection. However, deploying LLMs for malware detection directly at edge devices raises several challenges, including ensuring accuracy in constrained environments and addressing edge devices' energy and computational limits. T o tackle these challenges, this paper proposes an architecture leveraging lightweight LLMs' strengths while addressing limitations like reduced accuracy and insufficient computational power . T o evaluate the effectiveness of the proposed lightweight LLM-based approach for edge computing, we perform an extensive experimental evaluation using several state-of-the-art lightweight LLMs. We test them with several publicly available datasets specifically designed for edge and IoT scenarios and different edge nodes with varying computational power and characteristics. In recent years, the rapid evolution of malware attacks has necessitated the development of innovative approaches for their detection, particularly in the resource-constrained edge computing domain. While foundational, traditional detection techniques have struggled to keep pace with modern malware's increasing sophistication and adaptability. This has prompted a shift towards exploring advanced methodologies, including using lightweight Large Language Models (LLMs), to enhance malware detection capabilities in edge environments. DistilGPT -2, and TinyT5 have emerged as promising solutions. These models leverage techniques such as distillation and pruning to significantly reduce their size and computational requirements, making them more suitable for edge-devices deployment. Despite their smaller footprint, these models retain much of their larger counterparts' pattern recognition and contextual understanding capabilities, allowing them to process and analyze complex, unstructured data streams effectively. In the context of malware detection, they offer the potential for improved accuracy, real-time adaptability, and continuous learning while addressing the strict energy, storage, and computational constraints of edge computing environments. However, deploying LLMs for malware detection in edge computing is not without challenges. Model performance: maintaining high accuracy under the constraints of edge environments remains a significant hurdle. While LLMs excel in natural language understanding and pattern recognition, their generalizability across diverse edge scenarios is often limited, particularly when faced with malware's dynamic and adaptive nature [1].
How Do Hackathons Foster Creativity? Towards AI Collaborative Evaluation of Creativity at Scale
Falk, Jeanette, Chen, Yiyi, Rafner, Janet, Zhang, Mike, Bjerva, Johannes, Nolte, Alexander
Hackathons have become popular collaborative events for accelerating the development of creative ideas and prototypes. There are several case studies showcasing creative outcomes across domains such as industry, education, and research. However, there are no large-scale studies on creativity in hackathons which can advance theory on how hackathon formats lead to creative outcomes. We conducted a computational analysis of 193,353 hackathon projects. By operationalizing creativity through usefulness and novelty, we refined our dataset to 10,363 projects, allowing us to analyze how participant characteristics, collaboration patterns, and hackathon setups influence the development of creative projects. The contribution of our paper is twofold: We identified means for organizers to foster creativity in hackathons. We also explore the use of large language models (LLMs) to augment the evaluation of creative outcomes and discuss challenges and opportunities of doing this, which has implications for creativity research at large.
TS-RAG: Retrieval-Augmented Generation based Time Series Foundation Models are Stronger Zero-Shot Forecaster
Ning, Kanghui, Pan, Zijie, Liu, Yu, Jiang, Yushan, Zhang, James Y., Rasul, Kashif, Schneider, Anderson, Ma, Lintao, Nevmyvaka, Yuriy, Song, Dongjin
Recently, Large Language Models (LLMs) and Foundation Models (FMs) have become prevalent for time series forecasting tasks. However, fine-tuning large language models (LLMs) for forecasting enables the adaptation to specific domains but may not generalize well across diverse, unseen datasets. Meanwhile, existing time series foundation models (TSFMs) lack inherent mechanisms for domain adaptation and suffer from limited interpretability, making them suboptimal for zero-shot forecasting. To this end, we present TS-RAG, a retrieval-augmented generation based time series forecasting framework that enhances the generalization capability and interpretability of TSFMs. Specifically, TS-RAG leverages pre-trained time series encoders to retrieve semantically relevant time series segments from a dedicated knowledge database, incorporating contextual patterns for the given time series query. Next, we develop a learnable Mixture-of-Experts (MoE)-based augmentation module, which dynamically fuses retrieved time series patterns with the TSFM's representation of the input query, improving forecasting accuracy without requiring task-specific fine-tuning. Thorough empirical studies on seven public benchmark datasets demonstrate that TS-RAG achieves state-of-the-art zero-shot forecasting performance, outperforming TSFMs by up to 6.51% across diverse domains and showcasing desired interpretability.