AITopics | Schwyz

Collaborating Authors

Schwyz

Semantic-KG: Using Knowledge Graphs to Construct Benchmarks for Measuring Semantic Similarity

Wei, Qiyao, Morrell, Edward, Goetz, Lea, van der Schaar, Mihaela

arXiv.org Artificial IntelligenceNov-26-2025

Evaluating the open-form textual responses generated by Large Language Models (LLMs) typically requires measuring the semantic similarity of the response to a (human generated) reference. However, there is evidence that current semantic similarity methods may capture syntactic or lexical forms over semantic content. While benchmarks exist for semantic equivalence, they often suffer from high generation costs due to reliance on subjective human judgment, limited availability for domain-specific applications, and unclear definitions of equivalence. This paper introduces a novel method for generating benchmarks to evaluate semantic similarity methods for LLM outputs, specifically addressing these limitations. Our approach leverages knowledge graphs (KGs) to generate pairs of natural-language statements that are semantically similar or dissimilar, with dissimilar pairs categorized into one of four sub-types. We generate benchmark datasets in four different domains (general knowledge, biomedicine, finance, biology), and conduct a comparative study of semantic similarity methods including traditional natural language processing scores and LLM-as-a-judge predictions. We observe that the sub-type of semantic variation, as well as the domain of the benchmark impact the performance of semantic similarity methods, with no method being consistently superior. Our results present important implications for the use of LLM-as-a-judge in detecting the semantic content of text. Code is available at https://github.com/QiyaoWei/semantic-kg and the dataset is available at https://huggingface.co/datasets/QiyaoWei/Semantic-KG.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2511.19925

Country:

North America > Canada (0.14)
Europe > Norway (0.04)
North America > United States > New York (0.04)
(15 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Government (1.00)
Banking & Finance > Economy (0.46)
Health & Medicine > Therapeutic Area (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Recommender Systems for Democracy: Toward Adversarial Robustness in Voting Advice Applications

Berdoz, Frédéric, Brunner, Dustin, Vonlanthen, Yann, Wattenhofer, Roger

arXiv.org Artificial IntelligenceMay-20-2025

V oting advice applications (V AAs) help millions of voters understand which political parties or candidates best align with their views. This paper explores the potential risks these applications pose to the democratic process when targeted by adversarial entities. In particular, we expose 11 manipulation strategies and measure their impact using data from Switzerland's primary V AA, Smartvote, collected during the last two national elections. We find that altering application parameters, such as the matching method, can shift a party's recommendation frequency by up to 105%. Cherry-picking questionnaire items can increase party recommendation frequency by over 261%, while subtle changes to parties' or candidates' responses can lead to a 248% increase. To address these vulnerabilities, we propose adversarial robustness properties V AAs should satisfy, introduce empirical metrics for assessing the resilience of various matching methods, and suggest possible avenues for research toward mitigating the effect of manipulation. Our framework is key to ensuring secure and reliable AI-based V AAs poised to emerge in the near future.

artificial intelligence, machine learning, recommendation, (18 more...)

arXiv.org Artificial Intelligence

2505.13329

Country:

North America > United States (0.46)
Europe > Switzerland > Zürich > Zürich (0.14)
Europe > Switzerland > Basel-City > Basel (0.04)
(11 more...)

Genre:

Questionnaire & Opinion Survey (1.00)
Research Report (0.82)

Industry: Government > Voting & Elections (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.50)

Add feedback

SwiLTra-Bench: The Swiss Legal Translation Benchmark

Niklaus, Joel, Merane, Jakob, Nenadic, Luka, Ahmadi, Sina, Gao, Yingqiang, Chevalley, Cyrill A. H., Humbel, Claude, Gösken, Christophe, Tanzi, Lorenzo, Lüthi, Thomas, Palombo, Stefan, Poff, Spencer, Yang, Boling, Wu, Nan, Guillod, Matthew, Mamié, Robin, Brunner, Daniel, Pereyra, Julio, Grupen, Niko

arXiv.org Artificial IntelligenceMar-3-2025

In Switzerland legal translation is uniquely important due to the country's four official languages and requirements for multilingual legal documentation. However, this process traditionally relies on professionals who must be both legal experts and skilled translators -- creating bottlenecks and impacting effective access to justice. To address this challenge, we introduce SwiLTra-Bench, a comprehensive multilingual benchmark of over 180K aligned Swiss legal translation pairs comprising laws, headnotes, and press releases across all Swiss languages along with English, designed to evaluate LLM-based translation systems. Our systematic evaluation reveals that frontier models achieve superior translation performance across all document types, while specialized translation systems excel specifically in laws but under-perform in headnotes. Through rigorous testing and human expert validation, we demonstrate that while fine-tuning open SLMs significantly improves their translation quality, they still lag behind the best zero-shot prompted frontier models such as Claude-3.5-Sonnet. Additionally, we present SwiLTra-Judge, a specialized LLM evaluation system that aligns best with human expert assessments.

computational linguistic, proceedings, translation, (14 more...)

arXiv.org Artificial Intelligence

2503.01372

Country:

Europe > Switzerland > Appenzell Innerrhoden > Appenzell (0.05)
Europe > Switzerland > Zürich > Zürich (0.04)
Europe > Switzerland > Basel-City > Basel (0.04)
(23 more...)

Genre: Research Report > New Finding (0.46)

Industry: Law (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

Add feedback

StatBot.Swiss: Bilingual Open Data Exploration in Natural Language

Nooralahzadeh, Farhad, Zhang, Yi, Smith, Ellery, Maennel, Sabine, Matthey-Doret, Cyril, de Fondville, Raphaël, Stockinger, Kurt

arXiv.org Artificial IntelligenceJun-6-2024

The potential for improvements brought by Large Language Models (LLMs) in Text-to-SQL systems is mostly assessed on monolingual English datasets. However, LLMs' performance for other languages remains vastly unexplored. In this work, we release the StatBot.Swiss dataset, the first bilingual benchmark for evaluating Text-to-SQL systems based on real-world applications. The StatBot.Swiss dataset contains 455 natural language/SQL-pairs over 35 big databases with varying level of complexity for both English and German. We evaluate the performance of state-of-the-art LLMs such as GPT-3.5-Turbo and mixtral-8x7b-instruct for the Text-to-SQL translation task using an in-context learning approach. Our experimental analysis illustrates that current LLMs struggle to generalize well in generating SQL queries on our novel bilingual dataset.

computational linguistic, dataset, query, (15 more...)

arXiv.org Artificial Intelligence

2406.0317

Country:

Europe > Switzerland > Basel-City > Basel (0.05)
Europe > Switzerland > Zürich > Zürich (0.05)
Europe > Switzerland > Schwyz > Schwyz (0.04)
(7 more...)

Genre: Research Report (0.82)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.47)
Transportation > Ground (0.46)
Automobiles & Trucks (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Dialect Transfer for Swiss German Speech Translation

Paonessa, Claudio, Schraner, Yanick, Deriu, Jan, Hürlimann, Manuela, Vogel, Manfred, Cieliebak, Mark

arXiv.org Artificial IntelligenceOct-13-2023

This paper investigates the challenges in building Swiss German speech translation systems, specifically focusing on the impact of dialect diversity and differences between Swiss German and Standard German. Swiss German is a spoken language with no formal writing system, it comprises many diverse dialects and is a low-resource language with only around 5 million speakers. The study is guided by two key research questions: how does the inclusion and exclusion of dialects during the training of speech translation models for Swiss German impact the performance on specific dialects, and how do the differences between Swiss German and Standard German impact the performance of the systems? We show that dialect diversity and linguistic differences pose significant challenges to Swiss German speech translation, which is in line with linguistic hypotheses derived from empirical investigations.

bleu score, dialect, experiment, (15 more...)

arXiv.org Artificial Intelligence

2310.09088

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Switzerland > Basel-City > Basel (0.05)
Europe > Switzerland > Zürich > Zürich (0.04)
(10 more...)

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Prompting as Probing: Using Language Models for Knowledge Base Construction

Alivanistos, Dimitrios, Santamaría, Selene Báez, Cochez, Michael, Kalo, Jan-Christoph, van Krieken, Emile, Thanapalasingam, Thiviyan

arXiv.org Artificial IntelligenceJun-19-2023

Language Models (LMs) have proven to be useful in various downstream applications, such as summarisation, translation, question answering and text classification. LMs are becoming increasingly important tools in Artificial Intelligence, because of the vast quantity of information they can store. In this work, we present ProP (Prompting as Probing), which utilizes GPT-3, a large Language Model originally proposed by OpenAI in 2020, to perform the task of Knowledge Base Construction (KBC). ProP implements a multi-step approach that combines a variety of prompting techniques to achieve this. Our results show that manual prompt curation is essential, that the LM must be encouraged to give answer sets of variable lengths, in particular including empty answer sets, that true/false questions are a useful device to increase precision on suggestions generated by the LM, that the size of the LM is a crucial factor, and that a dictionary of entity aliases improves the LM score. Our evaluation study indicates that these proposed techniques can substantially enhance the quality of the final predictions: ProP won track 2 of the LM-KBC competition, outperforming the baseline by 36.4 percentage points.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2208.11057

Country:

Europe > Spain > Castilla-La Mancha (0.14)
Africa > Eswatini (0.14)
Europe > Ukraine (0.04)
(73 more...)

Genre: Research Report > New Finding (0.86)

Industry:

Media (1.00)
Automobiles & Trucks > Manufacturer (0.93)
Government (0.68)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.76)

Add feedback

CODET: A Benchmark for Contrastive Dialectal Evaluation of Machine Translation

Alam, Md Mahfuz Ibn, Ahmadi, Sina, Anastasopoulos, Antonios

arXiv.org Artificial IntelligenceMay-26-2023

Neural machine translation (NMT) systems exhibit limited robustness in handling source-side linguistic variations. Their performance tends to degrade when faced with even slight deviations in language usage, such as different domains or variations introduced by second-language speakers. It is intuitive to extend this observation to encompass dialectal variations as well, but the work allowing the community to evaluate MT systems on this dimension is limited. To alleviate this issue, we compile and release \dataset, a contrastive dialectal benchmark encompassing 882 different variations from nine different languages. We also quantitatively demonstrate the challenges large MT models face in effectively translating dialectal variants. We are releasing all code and data.

artificial intelligence, machine translation, natural language, (18 more...)

arXiv.org Artificial Intelligence

2305.17267

Country:

Europe > Germany (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Italy > Veneto (0.04)
(67 more...)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

Daedalean Wins Prize For Artificial Intelligence - Liwaiwai

#artificialintelligenceOct-18-2020, 02:30:14 GMT

The AiCon event under the patronage of Federal Councilor Guy Parmelin was held for the first time this week. The first national AI award was also presented at the event. The winner is Daedalean, which is developing autonomous piloting systems. AiCon was launched successfully this week. The series of events have been organized under the patronage of Guy Parmelin, Member of the Federal Council, the Swiss Federal government.

artificial intelligence, daedalean win prize, switzerland, (7 more...)

#artificialintelligence

Country:

Europe > Switzerland > Zürich > Zürich (0.10)
Europe > Switzerland > Schwyz > Schwyz (0.07)

Technology: Information Technology > Artificial Intelligence (1.00)

Add feedback

A Machine Learning Approach for Flagging Incomplete Bid-rigging Cartels

Wallimann, Hannes, Imhof, David, Huber, Martin

arXiv.org Machine LearningApr-12-2020

We propose a new method for flagging bid rigging, which is particularly useful for detecting incomplete bid-rigging cartels. Our approach combines screens, i.e. statistics derived from the distribution of bids in a tender, with machine learning to predict the probability of collusion. As a methodological innovation, we calculate such screens for all possible subgroups of three or four bids within a tender and use summary statistics like the mean, median, maximum, and minimum of each screen as predictors in the machine learning algorithm. This approach tackles the issue that competitive bids in incomplete cartels distort the statistical signals produced by bid rigging. We demonstrate that our algorithm outperforms previously suggested methods in applications to incomplete cartels based on empirical data from Switzerland.

cartel, competitive bid, correct classification rate, (12 more...)

arXiv.org Machine Learning

2004.05629

Country:

Europe > Switzerland > Fribourg > Fribourg (0.04)
North America > United States > Ohio (0.04)
Asia > Japan (0.04)
(8 more...)

Genre: Research Report > New Finding (0.45)

Industry:

Law Enforcement & Public Safety > Fraud (1.00)
Law (1.00)
Construction & Engineering (0.92)
Government (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Add feedback

Learning Quadratic Games on Networks

Leng, Yan, Dong, Xiaowen, Pentland, Alex

arXiv.org Machine LearningDec-18-2018

Individuals, or organizations, cooperate with or compete against one another in a wide range of practical situations. In the economics literature, such strategic interactions are often modeled as games played on networks, where an individual's payoff depends not only on her action but also that of her neighbors. The current literature has largely focused on analyzing the characteristics of network games in the scenario where the structure of the network, which is represented by a graph, is known beforehand. It is often the case, however, that the actions of the players are readily observable while the underlying interaction network remains hidden. In this paper, we propose two novel frameworks for learning, from the observations on individual actions, network games with linear-quadratic payoffs, and in particular the structure of the interaction network. Our frameworks are based on the Nash equilibrium of such games and involve solving a joint optimization problem for the graph structure and the individual marginal benefits. We test the proposed frameworks in synthetic settings and further study several factors that affect their learning performance. Moreover, with experiments on three real world examples, we show that our methods can effectively and more accurately learn the games than the baselines. The proposed approach is among the first of its kind for learning quadratic games, and have both theoretical and practical implications for understanding strategic interactions in a network environment.

artificial intelligence, machine learning, optimization problem, (17 more...)

arXiv.org Machine Learning

1811.0879

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
North America > United States > Washington > King County > Bellevue (0.04)
(5 more...)

Genre: Research Report (1.00)

Industry:

Banking & Finance > Trading (0.68)
Health & Medicine (0.67)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.34)

Add feedback