Goto

Collaborating Authors

 Guo, Hao


SessionRec: Next Session Prediction Paradigm For Generative Sequential Recommendation

arXiv.org Artificial Intelligence

We introduce SessionRec, a novel next-session prediction paradigm (NSPP) for generative sequential recommendation, addressing the fundamental misalignment between conventional next-item prediction paradigm (NIPP) and real-world recommendation scenarios. Unlike NIPP's item-level autoregressive generation that contradicts actual session-based user interactions, our framework introduces a session-aware representation learning through hierarchical sequence aggregation (intra/inter-session), reducing attention computation complexity while enabling implicit modeling of massive negative interactions, and a session-based prediction objective that better captures users' diverse interests through multi-item recommendation in next sessions. Moreover, we found that incorporating a rank loss for items within the session under the next session prediction paradigm can significantly improve the ranking effectiveness of generative sequence recommendation models. We also verified that SessionRec exhibits clear power-law scaling laws similar to those observed in LLMs. Extensive experiments conducted on public datasets and online A/B test in Meituan App demonstrate the effectiveness of SessionRec. The proposed paradigm establishes new foundations for developing industrial-scale generative recommendation systems through its model-agnostic architecture and computational efficiency.


RegionGCN: Spatial-Heterogeneity-Aware Graph Convolutional Networks

arXiv.org Artificial Intelligence

Modeling spatial heterogeneity in the data generation process is essential for understanding and predicting geographical phenomena. Despite their prevalence in geospatial tasks, neural network models usually assume spatial stationarity, which could limit their performance in the presence of spatial process heterogeneity. By allowing model parameters to vary over space, several approaches have been proposed to incorporate spatial heterogeneity into neural networks. However, current geographically weighting approaches are ineffective on graph neural networks, yielding no significant improvement in prediction accuracy. We assume the crux lies in the over-fitting risk brought by a large number of local parameters. Accordingly, we propose to model spatial process heterogeneity at the regional level rather than at the individual level, which largely reduces the number of spatially varying parameters. We further develop a heuristic optimization procedure to learn the region partition adaptively in the process of model training. Our proposed spatial-heterogeneity-aware graph convolutional network, named RegionGCN, is applied to the spatial prediction of county-level vote share in the 2016 US presidential election based on socioeconomic attributes. Results show that RegionGCN achieves significant improvement over the basic and geographically weighted GCNs. We also offer an exploratory analysis tool for the spatial variation of non-linear relationships through ensemble learning of regional partitions from RegionGCN. Our work contributes to the practice of Geospatial Artificial Intelligence (GeoAI) in tackling spatial heterogeneity.


Impact of Cognitive Load on Human Trust in Hybrid Human-Robot Collaboration

arXiv.org Artificial Intelligence

Human trust plays a crucial role in the effectiveness of human-robot collaboration. Despite its significance, the development and maintenance of an optimal trust level are obstructed by the complex nature of influencing factors and their mechanisms. This study investigates the effects of cognitive load on human trust within the context of a hybrid human-robot collaboration task. An experiment is conducted where the humans and the robot, acting as team members, collaboratively construct pyramids with differentiated levels of task complexity. Our findings reveal that cognitive load exerts diverse impacts on human trust in the robot. Notably, there is an increase in human trust under conditions of high cognitive load. Furthermore, the rewards for performance are substantially higher in tasks with high cognitive load compared to those with low cognitive load, and a significant correlation exists between human trust and the failure risk of performance in tasks with low and medium cognitive load. By integrating interdependent task steps, this research emphasizes the unique dynamics of hybrid human-robot collaboration scenarios. The insights gained not only contribute to understanding how cognitive load influences trust but also assist developers in optimizing collaborative target selection and designing more effective human-robot interfaces in such environments.


Each Fake News is Fake in its Own Way: An Attribution Multi-Granularity Benchmark for Multimodal Fake News Detection

arXiv.org Artificial Intelligence

Social platforms, while facilitating access to information, have also become saturated with a plethora of fake news, resulting in negative consequences. Automatic multimodal fake news detection is a worthwhile pursuit. Existing multimodal fake news datasets only provide binary labels of real or fake. However, real news is alike, while each fake news is fake in its own way. These datasets fail to reflect the mixed nature of various types of multimodal fake news. To bridge the gap, we construct an attributing multi-granularity multimodal fake news detection dataset \amg, revealing the inherent fake pattern. Furthermore, we propose a multi-granularity clue alignment model \our to achieve multimodal fake news detection and attribution. Experimental results demonstrate that \amg is a challenging dataset, and its attribution setting opens up new avenues for future research.


Channel Modeling for FR3 Upper Mid-band via Generative Adversarial Networks

arXiv.org Artificial Intelligence

The upper mid-band (FR3) has been recently attracting interest for new generation of mobile networks, as it provides a promising balance between spectrum availability and coverage, which are inherent limitations of the sub 6GHz and millimeter wave bands, respectively. In order to efficiently design and optimize the network, channel modeling plays a key role since FR3 systems are expected to operate at multiple frequency bands. Data-driven methods, especially generative adversarial networks (GANs), can capture the intricate relationships among data samples, and provide an appropriate tool for FR3 channel modeling. In this work, we present the architecture, link state model, and path generative network of GAN-based FR3 channel modeling. The comparison of our model greatly matches the ray-tracing simulated data.


Multi-Agent, Human-Agent and Beyond: A Survey on Cooperation in Social Dilemmas

arXiv.org Artificial Intelligence

Social dilemmas (SDs, e.g., prisoner's dilemma), spanning various domains including environmental pollution, public health crises, and resource management, present a fundamental conflict between personal interests and the common good [Nowak, 2006]. While cooperation is beneficial for the collective, individuals are tempted to exploit or free-ride others' efforts, potentially leading to a tragedy of the commons. Historically rooted in the study of biological altruism [Smith, 1982], the traditional research on cooperation in SDs has unveiled the pivotal roles of reciprocity and social preferences in fostering cooperative behaviors in human societies [Fehr et al., 2002; Rand and Nowak, 2013]. Recently, propelled by advances in artificial intelligence (AI), this field has been undergoing a profound transformation--as AI agents now increasingly represent and engage with humans, our understanding of how cooperation emerges, evolves, and sustains in SDs is being significantly reshaped. This is particularly evident in two lines of research: multi-agent cooperation, where AI agents interact with each other in SDs, and human-agent cooperation, which examines the intricacies of human interactions with AI agents in SDs.


B^2SFL: A Bi-level Blockchained Architecture for Secure Federated Learning-based Traffic Prediction

arXiv.org Artificial Intelligence

Federated Learning (FL) is a privacy-preserving machine learning (ML) technology that enables collaborative training and learning of a global ML model based on aggregating distributed local model updates. However, security and privacy guarantees could be compromised due to malicious participants and the centralized FL server. This article proposed a bi-level blockchained architecture for secure federated learning-based traffic prediction. The bottom and top layer blockchain store the local model and global aggregated parameters accordingly, and the distributed homomorphic-encrypted federated averaging (DHFA) scheme addresses the secure computation problems. We propose the partial private key distribution protocol and a partially homomorphic encryption/decryption scheme to achieve the distributed privacy-preserving federated averaging model. We conduct extensive experiments to measure the running time of DHFA operations, quantify the read and write performance of the blockchain network, and elucidate the impacts of varying regional group sizes and model complexities on the resulting prediction accuracy for the online traffic flow prediction task. The results indicate that the proposed system can facilitate secure and decentralized federated learning for real-world traffic prediction tasks.


An Empirical Study of Attention Networks for Semantic Segmentation

arXiv.org Artificial Intelligence

Semantic segmentation is a vital problem in computer vision. Recently, a common solution to semantic segmentation is the end-to-end convolution neural network, which is much more accurate than traditional methods.Recently, the decoders based on attention achieve state-of-the-art (SOTA) performance on various datasets. But these networks always are compared with the mIoU of previous SOTA networks to prove their superiority and ignore their characteristics without considering the computation complexity and precision in various categories, which is essential for engineering applications. Besides, the methods to analyze the FLOPs and memory are not consistent between different networks, which makes the comparison hard to be utilized. What's more, various methods utilize attention in semantic segmentation, but the conclusion of these methods is lacking. This paper first conducts experiments to analyze their computation complexity and compare their performance. Then it summarizes suitable scenes for these networks and concludes key points that should be concerned when constructing an attention network. Last it points out some future directions of the attention network.


SoccerNet 2023 Challenges Results

arXiv.org Artificial Intelligence

The SoccerNet 2023 challenges were the third annual video understanding challenges organized by the SoccerNet team. For this third edition, the challenges were composed of seven vision-based tasks split into three main themes. The first theme, broadcast video understanding, is composed of three high-level tasks related to describing events occurring in the video broadcasts: (1) action spotting, focusing on retrieving all timestamps related to global actions in soccer, (2) ball action spotting, focusing on retrieving all timestamps related to the soccer ball change of state, and (3) dense video captioning, focusing on describing the broadcast with natural language and anchored timestamps. The second theme, field understanding, relates to the single task of (4) camera calibration, focusing on retrieving the intrinsic and extrinsic camera parameters from images. The third and last theme, player understanding, is composed of three low-level tasks related to extracting information about the players: (5) re-identification, focusing on retrieving the same players across multiple views, (6) multiple object tracking, focusing on tracking players and the ball through unedited video streams, and (7) jersey number recognition, focusing on recognizing the jersey number of players from tracklets. Compared to the previous editions of the SoccerNet challenges, tasks (2-3-7) are novel, including new annotations and data, task (4) was enhanced with more data and annotations, and task (6) now focuses on end-to-end approaches. More information on the tasks, challenges, and leaderboards are available on https://www.soccer-net.org. Baselines and development kits can be found on https://github.com/SoccerNet.


Beamforming in Wireless Coded-Caching Systems

arXiv.org Artificial Intelligence

Increased capacity in the access network poses capacity challenges on the transport network due to the aggregated traffic. However, there are spatial and time correlation in the user data demands that could potentially be utilized. To that end, we investigate a wireless transport network architecture that integrates beamforming and coded-caching strategies. Especially, our proposed design entails a server with multiple antennas that broadcasts content to cache nodes responsible for serving users. Traditional caching methods face the limitation of relying on the individual memory with additional overhead. Hence, we develop an efficient genetic algorithm-based scheme for beam optimization in the coded-caching system. By exploiting the advantages of beamforming and coded-caching, the architecture achieves gains in terms of multicast opportunities, interference mitigation, and reduced peak backhaul traffic. A comparative analysis of this joint design with traditional, un-coded caching schemes is also conducted to assess the benefits of the proposed approach. Additionally, we examine the impact of various buffering and decoding methods on the performance of the coded-caching scheme. Our findings suggest that proper beamforming is useful in enhancing the effectiveness of the coded-caching technique, resulting in significant reduction in peak backhaul traffic.