Goto

Collaborating Authors

 Telecommunications


Rethinking Data: Towards Better Performing Domain-Specific Small Language Models

arXiv.org Artificial Intelligence

Fine-tuning of Large Language Models (LLMs) for downstream tasks, performed on domain-specific data has shown significant promise. However, commercial use of such LLMs is limited by the high computational cost required for their deployment at scale. On the other hand, small Language Models (LMs) are much more cost effective but have subpar performance in a similar setup. This paper presents our approach to finetuning a small LM, that reaches high accuracy in multiple choice question answering task. We achieve this by improving data quality at each stage of the LM training pipeline. In particular, we start with data structuring resulting in extraction of compact, semantically meaningful text chunks used by a retriever. This allows more efficient knowledge digestion by the LM. Further, we improve the retrieved context by training a lightweight Chunk Re-Ranker (CRR) that generates more accurate relative relevance chunk scores. Finally, we improve the model generalization ability by merging the models fine-tuned with different parameters on different data subsets. We present detailed procedure descriptions, and corresponding experimental findings that show the improvements of each one of the proposed techniques.


Adaptive Entanglement Routing with Deep Q-Networks in Quantum Networks

arXiv.org Artificial Intelligence

The quantum internet holds transformative potential for global communication by harnessing the principles of quantum information processing. Despite significant advancements in quantum communication technologies, the efficient distribution of critical resources, such as qubits, remains a persistent and unresolved challenge. Conventional approaches often fall short of achieving optimal resource allocation, underscoring the necessity for more effective solutions. This study proposes a novel reinforcement learning-based adaptive entanglement routing framework designed to enable resource allocation tailored to the specific demands of quantum applications. The introduced QuDQN model utilizes reinforcement learning to optimize the management of quantum networks, allocate resources efficiently, and enhance entanglement routing. The model integrates key considerations, including fidelity requirements, network topology, qubit capacity, and request demands.


Review on Determining the Number of Communities in Network Data

arXiv.org Machine Learning

This paper reviews statistical methods for hypothesis testing and clustering in network models. We analyze the method by Bickel et al. (2016) for deriving the asymptotic null distribution of the largest eigenvalue, noting its slow convergence and the need for bootstrap corrections. The SCORE method by Jin et al. (2015) and the NCV method by Chen et al. (2018) are evaluated for their efficacy in clustering within Degree-Corrected Block Models, with NCV facing challenges due to its time-intensive nature. We suggest exploring eigenvector entry distributions as a potential efficiency improvement.


The Morning After: Our verdict on the iPhone 16e

Engadget

In Tuesday's newsletter, I laid out how to watch (and what to expect from) Amazon's Alexa press event. But aside from unveiling what Alexa will be capable of, there was no silly hardware and no upgraded Echos, but lots of demos. We learned Alexa will be included with an Amazon Prime subscription, and the company will also offer the enhanced digital assistant separately, for 20 per month. Meanwhile, Apple's new entry-level iPhone, the 16e, launches online and in stores today. The 599 phone is arguably 100 too expensive, but it packs a processor that can deliver Apple Intelligence to the masses.


A Survey of Link Prediction in Temporal Networks

arXiv.org Artificial Intelligence

Temporal networks have gained significant prominence in the past decade for modelling dynamic interactions within complex systems. A key challenge in this domain is Temporal Link Prediction (TLP), which aims to forecast future connections by analysing historical network structures across various applications including social network analysis. While existing surveys have addressed specific aspects of TLP, they typically lack a comprehensive framework that distinguishes between representation and inference methods. This survey bridges this gap by introducing a novel taxonomy that explicitly examines representation and inference from existing methods, providing a novel classification of approaches for TLP. We analyse how different representation techniques capture temporal and structural dynamics, examining their compatibility with various inference methods for both transductive and inductive prediction tasks. Our taxonomy not only clarifies the methodological landscape but also reveals promising unexplored combinations of existing techniques. This taxonomy provides a systematic foundation for emerging challenges in TLP, including model explainability and scalable architectures for complex temporal networks.


Scalable Coordinated Learning for H2M/R Applications over Optical Access Networks (Invited)

arXiv.org Artificial Intelligence

--One of the primary research interests adhering to next-generation fiber-wireless access networks is human-to-machine/robot (H2M/R) collaborative communications facilitating Industry 5.0. This paper discusses scalable H2M/R communications across large geographical distances that also allow rapid onboarding of new machines/robots as 72% training time is saved through global-local coordinated learning. In recent years, several inter-disciplinary technical paradigms like cyber-physical systems, Industrial IoT, robotics, big data, cloud/edge and cognitive computing, and virtual/augmented reality (VR/AR) have received significant attention from both industry and academia. The primary reason behind this development is the inclusion of industry vertical scenarios like Industry 4.0 in the fifth and beyond-fifth generation mobile technologies [1]. Although Industry 4.0 primarily involved connectivity among cyber-physical systems, Industry 5.0 will focus on the "human and machine/robots/cobots" relationship [2] to ensure real-time monitoring of products' condition, use, and the environment through sensors and external data sources, dynamic control of product functions and personalized user experience through embedded software in the products, optimization of use and performance of products, and autonomous delivery of products through coordinated operations with other products and systems.


LLMs Have Rhythm: Fingerprinting Large Language Models Using Inter-Token Times and Network Traffic Analysis

arXiv.org Artificial Intelligence

As Large Language Models (LLMs) become increasingly integrated into many technological ecosystems across various domains and industries, identifying which model is deployed or being interacted with is critical for the security and trustworthiness of the systems. Current verification methods typically rely on analyzing the generated output to determine the source model. However, these techniques are susceptible to adversarial attacks, operate in a post-hoc manner, and may require access to the model weights to inject a verifiable fingerprint. In this paper, we propose a novel passive and non-invasive fingerprinting technique that operates in real-time and remains effective even under encrypted network traffic conditions. Our method leverages the intrinsic autoregressive generation nature of language models, which generate text one token at a time based on all previously generated tokens, creating a unique temporal pattern like a rhythm or heartbeat that persists even when the output is streamed over a network. We find that measuring the Inter-Token Times (ITTs)-time intervals between consecutive tokens-can identify different language models with high accuracy. We develop a Deep Learning (DL) pipeline to capture these timing patterns using network traffic analysis and evaluate it on 16 Small Language Models (SLMs) and 10 proprietary LLMs across different deployment scenarios, including local host machine (GPU/CPU), Local Area Network (LAN), Remote Network, and Virtual Private Network (VPN). The experimental results confirm that our proposed technique is effective and maintains high accuracy even when tested in different network conditions. This work opens a new avenue for model identification in real-world scenarios and contributes to more secure and trustworthy language model deployment.


MobiLLM: Enabling LLM Fine-Tuning on the Mobile Device via Server Assisted Side Tuning

arXiv.org Artificial Intelligence

Large Language Model (LLM) at mobile devices and its potential applications never fail to fascinate. However, on-device LLM fine-tuning poses great challenges due to extremely high memory requirements and slow training speeds. Even with parameter-efficient fine-tuning (PEFT) methods that update only a small subset of parameters, resource-constrained mobile devices cannot afford them. In this paper, we propose MobiLLM to enable memory-efficient transformer LLM fine-tuning on a mobile device via server-assisted side-tuning. Particularly, MobiLLM allows the resource-constrained mobile device to retain merely a frozen backbone model, while offloading the memory and computation-intensive backpropagation of a trainable side-network to a high-performance server. Unlike existing fine-tuning methods that keep trainable parameters inside the frozen backbone, MobiLLM separates a set of parallel adapters from the backbone to create a backpropagation bypass, involving only one-way activation transfers from the mobile device to the server with low-width quantization during forward propagation. In this way, the data never leaves the mobile device while the device can remove backpropagation through the local backbone model and its forward propagation can be paralyzed with the server-side execution. Thus, MobiLLM preserves data privacy while significantly reducing the memory and computational burdens for LLM fine-tuning. Through extensive experiments, we demonstrate that MobiLLM can enable a resource-constrained mobile device, even a CPU-only one, to fine-tune LLMs and significantly reduce convergence time and memory usage.


Improving customer service with automatic topic detection in user emails

arXiv.org Artificial Intelligence

This study introduces a novel Natural Language Processing pipeline that enhances customer service efficiency at Telekom Srbija, a leading Serbian telecommunications company, through automated email topic detection and labelling. Central to the pipeline is BERTopic, a modular architecture that allows unsupervised topic modelling. After a series of preprocessing and post-processing steps, we assign one of 12 topics and several additional labels to incoming emails, allowing customer service to filter and access them through a custom-made application. The model's performance was evaluated by assessing the speed and correctness of the automatically assigned topics across a test dataset of 100 customer emails. The pipeline shows broad applicability across languages, particularly for those that are low-resourced and morphologically rich. The system now operates in the company's production environment, streamlining customer service operations through automated email classification.


Exponential Topology-enabled Scalable Communication in Multi-agent Reinforcement Learning

arXiv.org Artificial Intelligence

In cooperative multi-agent reinforcement learning (MARL), well-designed communication protocols can effectively facilitate consensus among agents, thereby enhancing task performance. Moreover, in large-scale multi-agent systems commonly found in real-world applications, effective communication plays an even more critical role due to the escalated challenge of partial observability compared to smaller-scale setups. In this work, we endeavor to develop a scalable communication protocol for MARL. Unlike previous methods that focus on selecting optimal pairwise communication links-a task that becomes increasingly complex as the number of agents grows-we adopt a global perspective on communication topology design. Specifically, we propose utilizing the exponential topology to enable rapid information dissemination among agents by leveraging its small-diameter and small-size properties. This approach leads to a scalable communication protocol, named ExpoComm. To fully unlock the potential of exponential graphs as communication topologies, we employ memory-based message processors and auxiliary tasks to ground messages, ensuring that they reflect global information and benefit decision-making. Extensive experiments on large-scale cooperative benchmarks, including MAgent and Infrastructure Management Planning, demonstrate the superior performance and robust zero-shot transferability of ExpoComm compared to existing communication strategies. The code is publicly available at https://github.com/LXXXXR/ExpoComm.