Overview
An Optimized Machine Learning Classifier for Detecting Fake Reviews Using Extracted Features
Anees, Shabbir, Anshuman, null, Chaurasia, Ayush, Bogar, Prathmesh
It is well known that fraudulent reviews cast doubt on the legitimacy and dependability of online purchases. The most recent development that leads customers towards darkness is the appearance of human reviews in computer-generated (CG) ones. In this work, we present an advanced machine-learning-based system that analyses these reviews produced by AI with remarkable precision. Our method integrates advanced text preprocessing, multi-modal feature extraction, Harris Hawks Optimization (HHO) for feature selection, and a stacking ensemble classifier. We implemented this methodology on a public dataset of 40,432 Original (OR) and Computer-Generated (CG) reviews. From an initial set of 13,539 features, HHO selected the most applicable 1,368 features, achieving an 89.9% dimensionality reduction. Our final stacking model achieved 95.40% accuracy, 92.81% precision, 95.01% recall, and a 93.90% F1-Score, which demonstrates that the combination of ensemble learning and bio-inspired optimisation is an effective method for machine-generated text recognition. Because large-scale review analytics commonly run on cloud platforms, privacy-preserving techniques such as differential approaches and secure outsourcing are essential to protect user data in these systems.
GPS: General Per-Sample Prompter
Batorski, Pawel, Swoboda, Paul
LLMs are sensitive to prompting, with task performance often hinging on subtle, sometimes imperceptible variations in phrasing. As a result, crafting effective prompts manually remains challenging and time-consuming. Recent automatic prompting methods mitigate this difficulty but face three key limitations: (i) for each new task, they require large datasets to train good prompts;(ii) they rely on costly optimization loops that may take hours; (iii)they typically produce a single task-level prompt that does not adapt to the individual input problem to be solved. We propose GPS, the first general-purpose, per-sample prompting method. Without any task-specific tuning, GPS generates a tailored prompt for each unseen input, improving performance across diverse tasks. The prompter is trained with reinforcement learning on a suite of training tasks and includes a novel regularization for effectively adapting to per-sample prompting. Finally, we employ Minimum Bayes Risk decoding to stabilize inference. Empirically, GPS demonstrates competitive performance: we attain second best results among baselines on text simplification, third best results on summarization and on-par results on classification, while not training on any of these tasks, in contrast to the baselines. For in-domain prompting, we obtain sota on GSM8K. Our work shows the potential of a novel and effective paradigm for automatic prompting: generating adaptive, input-specific prompts without extensive optimization and without access to a task-specific training set. Our code is available at https://github.com/Batorskq/GPS.
EvalCards: A Framework for Standardized Evaluation Reporting
Dhar, Ruchira, Villegas, Danae Sanchez, Karamolegkou, Antonia, Schiavone, Alice, Yuan, Yifei, Chen, Xinyi, Li, Jiaang, Frank, Stella, De Grazia, Laura, Swain, Monorama, Brandl, Stephanie, Hershcovich, Daniel, Sรธgaard, Anders, Elliott, Desmond
Evaluation has long been a central concern in NLP, and transparent reporting practices are more critical than ever in today's landscape of rapidly released open-access models. Drawing on a survey of recent work on evaluation and documentation, we identify three persistent shortcomings in current reporting practices: reproducibility, accessibility, and governance. We argue that existing standardization efforts remain insufficient and introduce Evaluation Disclosure Cards (EvalCards) as a path forward. EvalCards are designed to enhance transparency for both researchers and practitioners while providing a practical foundation to meet emerging governance requirements.
From Raw Features to Effective Embeddings: A Three-Stage Approach for Multimodal Recipe Recommendation
Shin, Jeeho, Kim, Kyungho, Shin, Kijung
Recipe recommendation has become an essential task in web-based food platforms. A central challenge is effectively leveraging rich multimodal features beyond user-recipe interactions. Our analysis shows that even simple uses of multimodal signals yield competitive performance, suggesting that systematic enhancement of these signals is highly promising. We propose TESMR, a 3-stage framework for recipe recommendation that progressively refines raw multimodal features into effective embeddings through: (1) content-based enhancement using foundation models with multimodal comprehension, (2) relation-based enhancement via message propagation over user-recipe interactions, and (3) learning-based enhancement through contrastive learning with learnable embeddings. Experiments on two real-world datasets show that TESMR outperforms existing methods, achieving 7-15% higher Recall@10.
From Perception to Reasoning: Deep Thinking Empowers Multimodal Large Language Models
Zhu, Wenxin, Chen, Andong, Song, Yuchen, Chen, Kehai, Zhu, Conghui, Chen, Ziyan, Zhao, Tiejun
With the remarkable success of Multimodal Large Language Models (MLLMs) in perception tasks, enhancing their complex reasoning capabilities has emerged as a critical research focus. Existing models still suffer from challenges such as opaque reasoning paths and insufficient generalization ability. Chain-of-Thought (CoT) reasoning, which has demonstrated significant efficacy in language models by enhancing reasoning transparency and output interpretability, holds promise for improving model reasoning capabilities when extended to the multimodal domain. This paper provides a systematic review centered on "Multimodal Chain-of-Thought" (MCoT). First, it analyzes the background and theoretical motivations for its inception from the perspectives of technical evolution and task demands. Then, it introduces mainstream MCoT methods from three aspects: CoT paradigms, the post-training stage, and the inference stage, while also analyzing their underlying mechanisms. Furthermore, the paper summarizes existing evaluation benchmarks and metrics, and discusses the application scenarios of MCoT. Finally, it analyzes the challenges currently facing MCoT and provides an outlook on its future research directions.
RI-Loss: A Learnable Residual-Informed Loss for Time Series Forecasting
Wang, Jieting, Shang, Xiaolei, Li, Feijiang, Peng, Furong
Time series forecasting relies on predicting future values from historical data, yet most state-of-the-art approaches-including transformer and multilayer perceptron-based models-optimize using Mean Squared Error (MSE), which has two fundamental weaknesses: its point-wise error computation fails to capture temporal relationships, and it does not account for inherent noise in the data. To overcome these limitations, we introduce the Residual-Informed Loss (RI-Loss), a novel objective function based on the Hilbert-Schmidt Independence Criterion (HSIC). RI-Loss explicitly models noise structure by enforcing dependence between the residual sequence and a random time series, enabling more robust, noise-aware representations. Theoretically, we derive the first non-asymptotic HSIC bound with explicit double-sample complexity terms, achieving optimal convergence rates through Bernstein-type concentration inequalities and Rademacher complexity analysis. This provides rigorous guarantees for RI-Loss optimization while precisely quantifying kernel space interactions. Empirically, experiments across eight real-world benchmarks and five leading forecasting models demonstrate improvements in predictive performance, validating the effectiveness of our approach. The code is publicly available at: https://github.com/shang-xl/RI-Loss.
Weaver: Kronecker Product Approximations of Spatiotemporal Attention for Traffic Network Forecasting
Cheong, Christopher, Davis, Gary, Choi, Seongjin
Spatiotemporal forecasting on transportation networks is a complex task that requires understanding how traffic nodes interact within a dynamic, evolving system dictated by traffic flow dynamics and social behavioral patterns. The importance of transportation networks and ITS for modern mobility and commerce necessitates forecasting models that are not only accurate but also interpretable, efficient, and robust under structural or temporal perturbations. Recent approaches, particularly Transformer-based architectures, have improved predictive performance but often at the cost of high computational overhead and diminished architectural interpretability. In this work, we introduce Weaver, a novel attention-based model that applies Kronecker product approximations (KPA) to decompose the PN X PN spatiotemporal attention of O(P^2N^2) complexity into local P X P temporal and N X N spatial attention maps. This Kronecker attention map enables our Parallel-Kronecker Matrix-Vector product (P2-KMV) for efficient spatiotemporal message passing with O(P^2N + N^2P) complexity. To capture real-world traffic dynamics, we address the importance of negative edges in modeling traffic behavior by introducing Valence Attention using the continuous Tanimoto coefficient (CTC), which provides properties conducive to precise latent graph generation and training stability. To fully utilize the model's learning capacity, we introduce the Traffic Phase Dictionary for self-conditioning. Evaluations on PEMS-BAY and METR-LA show that Weaver achieves competitive performance across model categories while training more efficiently.
LongCat-Flash-Omni Technical Report
Meituan LongCat Team, null, Wang, Bairui, Bayan, null, Xiao, Bin, Zhang, Bo, Rong, Bolin, Chen, Borun, Wan, Chang, Zhang, Chao, Huang, Chen, Chen, Chen, Chen, Chen, Yang, Chengxu, Yang, Chengzuo, Han, Cong, Peng, Dandan, Ruan, Delian, Xin, Detai, Wang, Disong, Yang, Dongchao, Liu, Fanfan, Chen, Fengjiao, Yang, Fengyu, Dong, Gan, Huang, Gang, Xu, Gang, Wan, Guanglu, Tan, Guoqiang, Yu, Guoqiao, Qiu, Haibo, Lu, Hao, Liu, Hongbo, Xiang, Hongyu, Wu, Jiaheng, Yang, Jian, Liu, Jiaxing, Huang, Jing, Wang, Jingang, Ding, Jinrui, Jiang, Juchao, Kuang, Jun, Wang, Jun, Mei, Junhui, Ding, Ke, Zhang, Kefeng, Chen, Lei, Shi, Liang, Qiao, Limeng, Zheng, Liming, Ma, Lin, Guo, Liuyang, Ma, Liya, Sun, Luying, Gao, Man, Zhu, Mengshen, Cao, Miao, Lin, Minliang, Xu, Nuo, Shi, Peng, Zhang, Qi, Fang, Qian, Wang, Qian, Yang, Qian, Wang, Quanxiu, Weng, Rongxiang, Guo, Rongxin, Liang, Ruoxuan, Yang, Senbin, Xu, Shanbo, Lei, Shanglin, Ye, Shengze, Chen, Shimin, Chen, Shuaiqi, Hu, Shujie, Li, Shuo, Yang, Siqi, Xu, Siyu, Ren, Siyu, Li, Song, Liu, Songxiang, Bai, Tianhao, Dai, Tianye, Hong, Wei, Wang, Wei, Zhao, Weixiao, Cao, Wengang, Zhu, Wenlong, He, Wenlong, Su, Xi, Nan, Xi, Zhao, Xiaohan, Wang, Xiaohao, Zhao, Xiaoyu, Wang, Xiaoyu, Li, Xiaoyu, Pan, Xin, Chen, Xin, Sun, Xiusong, Xiang, Xu, Xing, Xudong, Cao, Xuezhi, Cai, Xunliang, Yang, Yang, Tan, Yanli, Yao, Yao, Sun, Yerui, Chen, Yi, Lu, Yifan, Gong, Yin, Zhang, Yining, Chen, Yitian, Gan, Yiyang, Tang, Yuchen, Xie, Yuchen, Wang, Yueqian, Zheng, Yuewen, Zhang, Yufei, Zhong, Yufeng, Qian, Yulei, Peng, Yuqi, Li, Yuqian, Jiang, Yuwei, Hu, Zeyang, Zhang, Zheng, Tian, Zhengkun, Hong, Zhiqing, Zeng, Zhixiong, Mi, Zhuqi, Li, Ziran, Wang, Ziwen, Zhao, Ziyi, Zhuang, Ziyuan, Zhao, Zizhe
We introduce LongCat-Flash-Omni, a state-of-the-art open-source omni-modal model with 560 billion parameters, excelling at real-time audio-visual interaction. By adopting a curriculum-inspired progressive training strategy that transitions from simpler to increasingly complex modality sequence modeling tasks, LongCat-Flash-Omni attains comprehensive multimodal capabilities while maintaining strong unimodal capability. Building upon LongCat-Flash, which adopts a high-performance Shortcut-connected Mixture-of-Experts (MoE) architecture with zero-computation experts, LongCat-Flash-Omni integrates efficient multimodal perception and speech reconstruction modules. Despite its immense size of 560B parameters (with 27B activated), LongCat-Flash-Omni achieves low-latency real-time audio-visual interaction. For training infrastructure, we developed a modality-decoupled parallelism scheme specifically designed to manage the data and model heterogeneity inherent in large-scale multimodal training. This innovative approach demonstrates exceptional efficiency by sustaining over 90% of the throughput achieved by text-only training. Extensive evaluations show that LongCat-Flash-Omni achieves state-of-the-art performance on omni-modal benchmarks among open-source models. Furthermore, it delivers highly competitive results across a wide range of modality-specific tasks, including text, image, and video understanding, as well as audio understanding and generation. We provide a comprehensive overview of the model architecture design, training procedures, and data strategies, and open-source the model to foster future research and development in the community.
From Vision to Validation: A Theory- and Data-Driven Construction of a GCC-Specific AI Adoption Index
Albous, Mohammad Rashed, Anouze, Abdel Latef
Artificial intelligence (AI) is rapidly transforming public - sector processes worldwide, yet standardized measures rarely address the unique drivers, governance models, and cultural nuances of the Gulf Cooperation Council (GCC) countries. This study employs a theory - driven foundation derived from an in - depth analysis of literature review and six National AI Strategies (NASs), coupled with a data - driven approach that utilizes a survey of 203 mid - and senior - level government employees and advanced statistical techniques (K - Means clustering, Principal Component Analysis, and Partial Least Squares Structural Equation Modeling). By combining policy insights with empirical evidence, the research develops and validates a novel AI Adoption Index specifically tailored to the GCC public sector. Findings indicate that robust technical infrastructure and clear policy mandates exert the strongest influence on successful AI implementations, overshadowing organizational readiness in early adoption stages. The combined model explains 70% of the variance in AI outcomes, suggesting that resource - rich environments and top - down policy directives can drive rapid but uneven technology uptake. By consolidating key dimensions (Technical Infrastructure (TI), Organizational Readiness (O R), and Governance Environment (GE)) into a single composite index, this study provides a holistic yet context - sensitive tool for benchmarking AI maturity. The index offers actionable guidance for policymakers seeking to harmonize large - scale deployments w ith ethical and regulatory standards. Beyond advancing academic discourse, these insights inform more strategic allocation of resources, cross - country cooperation, and capacity - building initiatives, thereby supporting sustained AI - driven transformation in the GCC region and beyond.
Deep Reinforcement Learning for Drone Route Optimization in Post-Disaster Road Assessment
Gong, Huatian, Sheu, Jiuh-Biing, Wang, Zheng, Yang, Xiaoguang, Yan, Ran
Rapid post-disaster road damage assessment is critical for effective emergency response, yet traditional optimization methods suffer from excessive computational time and require domain knowledge for algorithm design, making them unsuitable for time-sensitive disaster scenarios. This study proposes an attention-based encoder-decoder model (AEDM) for rapid drone routing decision in post-disaster road damage assessment. The method employs deep reinforcement learning to determine high-quality drone assessment routes without requiring algorithmic design knowledge. A network transformation method is developed to convert link-based routing problems into equivalent node-based formulations, while a synthetic road network generation technique addresses the scarcity of large-scale training datasets. The model is trained using policy optimization with multiple optima (POMO) with multi-task learning capabilities to handle diverse parameter combinations. Experimental results demonstrate two key strengths of AEDM: it outperforms commercial solvers by 20--71\% and traditional heuristics by 23--35\% in solution quality, while achieving rapid inference (1--2 seconds) versus 100--2,000 seconds for traditional methods. The model exhibits strong generalization across varying problem scales, drone numbers, and time constraints, consistently outperforming baseline methods on unseen parameter distributions and real-world road networks. The proposed method effectively balances computational efficiency with solution quality, making it particularly suitable for time-critical disaster response applications where rapid decision-making is essential for saving lives. The source code for AEDM is publicly available at https://github.com/PJ-HTU/AEDM-for-Post-disaster-road-assessment.