Telecommunications
Robust Markov stability for community detection at a scale learned based on the structure
Aref, Samin, Mathiyarasan, Sanchaai
Community detection, the unsupervised task of clustering nodes of a graph, finds applications across various fields. The common approaches for community detection involve optimizing an objective function to partition the nodes into communities at a single scale of granularity. However, the single-scale approaches often fall short of producing partitions that are robust and at a suitable scale. The existing algorithm, PyGenStability, returns multiple robust partitions for a network by optimizing the multi-scale Markov stability function. However, in cases where the suitable scale is not known or assumed by the user, there is no principled method to select a single robust partition at a suitable scale from the multiple partitions that PyGenStability produces. Our proposed method combines the Markov stability framework with a pre-trained machine learning model for scale selection to obtain one robust partition at a scale that is learned based on the graph structure. This automatic scale selection involves using a gradient boosting model pre-trained on hand-crafted and embedding-based network features from a labeled dataset of 10k benchmark networks. This model was trained to predicts the scale value that maximizes the similarity of the output partition to the planted partition of the benchmark network. Combining our scale selection algorithm with the PyGenStability algorithm results in PyGenStabilityOne (PO): a hyperparameter-free multi-scale community detection algorithm that returns one robust partition at a suitable scale without the need for any assumptions, input, or tweaking from the user. We compare the performance of PO against 29 algorithms and show that it outperforms 25 other algorithms by statistically meaningful margins. Our results facilitate choosing between community detection algorithms, among which PO stands out as the accurate, robust, and hyperparameter-free method.
A Survey on Archetypal Analysis
Alcacer, Aleix, Epifanio, Irene, Mair, Sebastian, Mรธrup, Morten
Archetypal analysis (AA) was originally proposed in 1994 by Adele Cutler and Leo Breiman as a computational procedure to extract the distinct aspects called archetypes in observations with each observational record approximated as a mixture (i.e., convex combination) of these archetypes. AA thereby provides straightforward, interpretable, and explainable representations for feature extraction and dimensionality reduction, facilitating the understanding of the structure of high-dimensional data with wide applications throughout the sciences. However, AA also faces challenges, particularly as the associated optimization problem is non-convex. This survey provides researchers and data mining practitioners an overview of methodologies and opportunities that AA has to offer surveying the many applications of AA across disparate fields of science, as well as best practices for modeling data using AA and limitations. The survey concludes by explaining important future research directions concerning AA.
CONTINA: Confidence Interval for Traffic Demand Prediction with Coverage Guarantee
Yang, Chao, Huang, Xiannan, Qiu, Shuhan, Cheng, Yan
Accurate short-term traffic demand prediction is critical for the operation of traffic systems. Besides point estimation, the confidence interval of the prediction is also of great importance. Many models for traffic operations, such as shared bike rebalancing and taxi dispatching, take into account the uncertainty of future demand and require confidence intervals as the input. However, existing methods for confidence interval modeling rely on strict assumptions, such as unchanging traffic patterns and correct model specifications, to guarantee enough coverage. Therefore, the confidence intervals provided could be invalid, especially in a changing traffic environment. To fill this gap, we propose an efficient method, CONTINA (Conformal Traffic Intervals with Adaptation) to provide interval predictions that can adapt to external changes. By collecting the errors of interval during deployment, the method can adjust the interval in the next step by widening it if the errors are too large or shortening it otherwise. Furthermore, we theoretically prove that the coverage of the confidence intervals provided by our method converges to the target coverage level. Experiments across four real-world datasets and prediction models demonstrate that the proposed method can provide valid confidence intervals with shorter lengths. Our method can help traffic management personnel develop a more reasonable and robust operation plan in practice. And we release the code, model and dataset in \href{ https://github.com/xiannanhuang/CONTINA/}{ Github}.
Reconstructing Fine-Grained Network Data using Autoencoder Architectures with Domain Knowledge Penalties
Cheung, Mark, Venkatesan, Sridhar
The ability to reconstruct fine-grained network session data, including individual packets, from coarse-grained feature vectors is crucial for improving network security models. However, the large-scale collection and storage of raw network traffic pose significant challenges, particularly for capturing rare cyberattack samples. These challenges hinder the ability to retain comprehensive datasets for model training and future threat detection. To address this, we propose a machine learning approach guided by formal methods to encode and reconstruct network data. Our method employs autoencoder models with domain-informed penalties to impute PCAP session headers from structured feature representations. Experimental results demonstrate that incorporating domain knowledge through constraint-based loss terms significantly improves reconstruction accuracy, particularly for categorical features with session-level encodings. By enabling efficient reconstruction of detailed network sessions, our approach facilitates data-efficient model training while preserving privacy and storage efficiency.
Learning-Based User Association for MmWave Vehicular Networks With Kernelized Contextual Bandits
--V ehicles require timely channel conditions to determine the base station (BS) to communicate with, but it is costly to estimate the fast-fading mmWave channels frequently. Without additional channel estimations, the proposed Distributed Kernelized Upper Confidence Bound (DK-UCB) algorithm estimates the current instantaneous transmission rates utilizing past contexts, such as the vehicle's location and velocity, along with past instantaneous transmission rates. T o capture the nonlinear mapping from a context to the instantaneous transmission rate, DK-UCB maps a context into the reproducing kernel Hilbert space (RKHS) where a linear mapping becomes observable. T o improve estimation accuracy, we propose a novel kernel function in RKHS which incorporates the propagation characteristics of the mmWave signals. Moreover, DK-UCB encourages a vehicle to share necessary information when it has conducted significant explorations, which speeds up the learning process while maintaining affordable communication costs. To support high data rates, low latency, and massive access, mmWave communication has emerged as a promising technology in vehicular communication networks [1]. Establishing connections between vehicles and BSs, known as user association, is challenging in mmWave vehicular networks.
Self-organisation of common good usage and an application to Internet services
Pires, Diogo L., Mancuso, Vincenzo, Castagno, Paolo, Marsan, Marco Ajmone
Natural and human-made common goods present key challenges due to their susceptibility to degradation, overuse, or congestion. We explore the self-organisation of their usage when individuals have access to several available commons but limited information on them. We propose an extension of the Win-Stay, Lose-Shift (WSLS) strategy for such systems, under which individuals use a resource iteratively until they are unsuccessful and then shift randomly. This simple strategy leads to a distribution of the use of commons with an improvement against random shifting. Selective individuals who retain information on their usage and accordingly adapt their tolerance to failure in each common good improve the average experienced quality for an entire population. Hybrid systems of selective and non-selective individuals can lead to an equilibrium with equalised experienced quality akin to the ideal free distribution. We show that these results can be applied to the server selection problem faced by mobile users accessing Internet services and we perform realistic simulations to test their validity. Furthermore, these findings can be used to understand other real systems such as animal dispersal on grazing and foraging land, and to propose solutions to operators of systems of public transport or other technological commons.
GTS-LUM: Reshaping User Behavior Modeling with LLMs in Telecommunications Industry
Shi, Liu, Zhou, Tianwu, Xu, Wei, Liu, Li, Cui, Zhexin, Liang, Shaoyi, Niu, Haoxing, Tian, Yichong, Guo, Jianwei
As telecommunication service providers shifting their focus to analyzing user behavior for package design and marketing interventions, a critical challenge lies in developing a unified, end-to-end framework capable of modeling long-term and periodic user behavior sequences with diverse time granularities, multi-modal data inputs, and heterogeneous labels. This paper introduces GTS-LUM, a novel user behavior model that redefines modeling paradigms in telecommunication settings. GTS-LUM adopts a (multi-modal) encoder-adapter-LLM decoder architecture, enhanced with several telecom-specific innovations. Specifically, the model incorporates an advanced timestamp processing method to handle varying time granularities. It also supports multi-modal data inputs -- including structured tables and behavior co-occurrence graphs -- and aligns these with semantic information extracted by a tokenizer using a Q-former structure. Additionally, GTS-LUM integrates a front-placed target-aware mechanism to highlight historical behaviors most relevant to the target. Extensive experiments on industrial dataset validate the effectiveness of this end-to-end framework and also demonstrate that GTS-LUM outperforms LLM4Rec approaches which are popular in recommendation systems, offering an effective and generalizing solution for user behavior modeling in telecommunications.
Probabilistic QoS Metric Forecasting in Delay-Tolerant Networks Using Conditional Diffusion Models on Latent Dynamics
Zhang, Enming, Liu, Zheng, Xiang, Yu, Qu, Yanwen
Probabilistic QoS Metric Forecasting in Delay-T olerant Networks Using Conditional Diffusion Models on Latent Dynamics Enming Zhang School of Computer Science Nanjing University of Posts and T elecommunications Nanjing, China b20060123@njupt.edu.cn Zheng Liu School of Computer Science Nanjing University of Posts and T elecommunications Nanjing, China zliu@njupt.edu.cn Y u Xiang School of Computer Science Nanjing University of Posts and T elecommunications Nanjing, China 1221045920@njupt.edu.cn Abstract --Active QoS metric prediction, commonly employed in the maintenance and operation of DTN, could enhance network performance regarding latency, throughput, energy consumption, and dependability. Naturally formulated as a multivariate time series forecasting problem, it attracts substantial research efforts. Traditional mean regression methods for time series forecasting cannot capture the data complexity adequately, resulting in deteriorated performance in operational tasks in DTNs such as routing. This paper formulates the prediction of QoS metrics in DTN as a probabilistic forecasting problem on multivariate time series, where one could quantify the uncertainty of forecasts by characterizing the distribution of these samples. The proposed approach hires diffusion models and incorporates the latent temporal dynamics of non-stationary and multi-mode data into them.
L3GS: Layered 3D Gaussian Splats for Efficient 3D Scene Delivery
Tsai, Yi-Zhen, Zhang, Xuechen, Li, Zheng, Chen, Jiasi
Traditional 3D content representations include dense point clouds that consume large amounts of data and hence network bandwidth, while newer representations such as neural radiance fields suffer from poor frame rates due to their non-standard volumetric rendering pipeline. 3D Gaussian splats (3DGS) can be seen as a generalization of point clouds that meet the best of both worlds, with high visual quality and efficient rendering for real-time frame rates. However, delivering 3DGS scenes from a hosting server to client devices is still challenging due to high network data consumption (e.g., 1.5 GB for a single scene). The goal of this work is to create an efficient 3D content delivery framework that allows users to view high quality 3D scenes with 3DGS as the underlying data representation. The main contributions of the paper are: (1) Creating new layered 3DGS scenes for efficient delivery, (2) Scheduling algorithms to choose what splats to download at what time, and (3) Trace-driven experiments from users wearing virtual reality headsets to evaluate the visual quality and latency. Our system for Layered 3D Gaussian Splats delivery L3GS demonstrates high visual quality, achieving 16.9% higher average SSIM compared to baselines, and also works with other compressed 3DGS representations.
Efficient Multi-Task Learning via Generalist Recommender
Wang, Luyang, Tang, Cangcheng, Zhang, Chongyang, Ruan, Jun, Huang, Kai, Dai, Jason
Multi-task learning (MTL) is a common machine learning technique that allows the model to share information across different tasks and improve the accuracy of recommendations for all of them. Many existing MTL implementations suffer from scalability issues as the training and inference performance can degrade with the increasing number of tasks, which can limit production use case scenarios for MTL-based recommender systems. Inspired by the recent advances of large language models, we developed an end-to-end efficient and scalable Generalist Recommender (GRec). GRec takes comprehensive data signals by utilizing NLP heads, parallel Transformers, as well as a wide and deep structure to process multi-modal inputs. These inputs are then combined and fed through a newly proposed task-sentence level routing mechanism to scale the model capabilities on multiple tasks without compromising performance. Offline evaluations and online experiments show that GRec significantly outperforms our previous recommender solutions. GRec has been successfully deployed on one of the largest telecom websites and apps, effectively managing high volumes of online traffic every day.