Evaluating Open-Source Large Language Models for Technical Telecom Question Answering

Caraus, Arina, Buscemi, Alessio, Kumar, Sumit, Turcanu, Ion

Sep-29-2025–arXiv.org Artificial Intelligence

Large Language Models (LLMs) have shown remarkable capabilities across various fields. However, their performance in technical domains such as telecommunications remains underexplored. This paper evaluates two open-source LLMs, Gemma 3 27B and DeepSeek R1 32B, on factual and reasoning-based questions derived from advanced wireless communications material. We construct a benchmark of 105 question-answer pairs and assess performance using lexical metrics, semantic similarity, and LLM-as-a-judge scoring. We also analyze consistency, judgment reliability, and hallucination through source attribution and score variance. Results show that Gemma excels in semantic fidelity and LLM-rated correctness, while DeepSeek demonstrates slightly higher lexical consistency. Additional findings highlight current limitations in telecom applications and the need for domain-adapted models to support trustworthy Artificial Intelligence (AI) assistants in engineering.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

Sep-29-2025

arXiv.org PDF

Add feedback

Country:
- Europe (0.14)

Genre:
- Overview (1.00)
- Research Report > New Finding (0.88)

Industry:
- Telecommunications (0.52)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found