AITopics | werden

Collaborating Authors

werden

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Modeling Motivated Reasoning in Law: Evaluating Strategic Role Conditioning in LLM Summarization

Cho, Eunjung, Hoyle, Alexander, Hermstrüwer, Yoan

arXiv.org Artificial IntelligenceOct-10-2025

Large Language Models (LLMs) are increasingly used to generate user-tailored summaries, adapting outputs to specific stakeholders. In legal contexts, this raises important questions about motivated reasoning -- how models strategically frame information to align with a stakeholder's position within the legal system. Building on theories of legal realism and recent trends in legal practice, we investigate how LLMs respond to prompts conditioned on different legal roles (e.g., judges, prosecutors, attorneys) when summarizing judicial decisions. We introduce an evaluation framework grounded in legal fact and reasoning inclusion, also considering favorability towards stakeholders. Our results show that even when prompts include balancing instructions, models exhibit selective inclusion patterns that reflect role-consistent perspectives. These findings raise broader concerns about how similar alignment may emerge as LLMs begin to infer user roles from prior interactions or context, even without explicit role instructions. Our results underscore the need for role-aware evaluation of LLM summarization behavior in high-stakes legal settings.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2509.00529

Country: Europe > Switzerland (0.92)

Genre: Research Report > New Finding (1.00)

Industry:

Law > Criminal Law (0.70)
Law > Litigation (0.69)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

Generative KI für TA

Eppler, Wolfgang, Heil, Reinhard

arXiv.org Artificial IntelligenceSep-3-2025

Many scientists use generative AI in their scientific work. People working in technology assessment (TA) are no exception. TA's approach to generative AI is twofold: on the one hand, generative AI is used for TA work, and on the other hand, generative AI is the subject of TA research. After briefly outlining the phenomenon of generative AI and formulating requirements for its use in TA, the following article discusses in detail the structural causes of the problems associated with it. Although generative AI is constantly being further developed, the structurally induced risks remain. The article concludes with proposed solutions and brief notes on their feasibility, as well as some examples of the use of generative AI in TA work.

artificial intelligence, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2509.02053

Country: North America (0.28)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (1.00)

Add feedback

Large Means Left: Political Bias in Large Language Models Increases with Their Number of Parameters

Exler, David, Schutera, Mark, Reischl, Markus, Rettenberger, Luca

arXiv.org Artificial IntelligenceMay-8-2025

With the increasing prevalence of artificial intelligence, careful evaluation of inherent biases needs to be conducted to form the basis for alleviating the effects these predispositions can have on users. Large language models (LLMs) are predominantly used by many as a primary source of information for various topics. LLMs frequently make factual errors, fabricate data (hallucinations), or present biases, exposing users to misinformation and influencing opinions. Educating users on their risks is key to responsible use, as bias, unlike hallucinations, cannot be caught through data verification. We quantify the political bias of popular LLMs in the context of the recent vote of the German Bundestag using the score produced by the Wahl-O-Mat. This metric measures the alignment between an individual's political views and the positions of German political parties. We compare the models' alignment scores to identify factors influencing their political preferences. Doing so, we discover a bias toward left-leaning parties, most dominant in larger LLMs. Also, we find that the language we use to communicate with the models affects their political views. Additionally, we analyze the influence of a model's origin and release date and compare the results to the outcome of the recent vote of the Bundestag. Our results imply that LLMs are prone to exhibiting political bias. Large corporations with the necessary means to develop LLMs, thus, knowingly or unknowingly, have a responsibility to contain these biases, as they can influence each voter's decision-making process and inform public opinion in general and at scale.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2505.04393

Country: Europe > Germany (0.48)

Genre: Research Report > New Finding (0.66)

Industry: Government > Regional Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.75)

Add feedback

Requirements for Quality Assurance of AI Models for Early Detection of Lung Cancer

Hahn, Horst K., May, Matthias S., Dicken, Volker, Walz, Michael, Eßeling, Rainer, Lassen-Schmidt, Bianca, Rischen, Robert, Vogel-Claussen, Jens, Nikolaou, Konstantin, Barkhausen, Jörg

arXiv.org Artificial IntelligenceFeb-24-2025

Lung cancer is the second most common cancer and the leading cause of cancer-related deaths worldwide. Survival largely depends on tumor stage at diagnosis, and early detection with low-dose CT can significantly reduce mortality in high-risk patients. AI can improve the detection, measurement, and characterization of pulmonary nodules while reducing assessment time. However, the training data, functionality, and performance of available AI systems vary considerably, complicating software selection and regulatory evaluation. Manufacturers must specify intended use and provide test statistics, but they can choose their training and test data, limiting standardization and comparability. Under the EU AI Act, consistent quality assurance is required for AI-based nodule detection, measurement, and characterization. This position paper proposes systematic quality assurance grounded in a validated reference dataset, including real screening cases plus phantom data to verify volume and growth rate measurements. Regular updates shall reflect demographic shifts and technological advances, ensuring ongoing relevance. Consequently, ongoing AI quality assurance is vital. Regulatory challenges are also adressed. While the MDR and the EU AI Act set baseline requirements, they do not adequately address self-learning algorithms or their updates. A standardized, transparent quality assessment - based on sensitivity, specificity, and volumetric accuracy - enables an objective evaluation of each AI solution's strengths and weaknesses. Establishing clear testing criteria and systematically using updated reference data lay the groundwork for comparable performance metrics, informing tenders, guidelines, and recommendations.

lungenkreb, positionspapier anf orderungen, werden, (10 more...)

arXiv.org Artificial Intelligence

2502.17639

Country:

North America > United States (0.95)
South America > Uruguay > Maldonado > Maldonado (0.04)
Europe > Switzerland > Basel-City > Basel (0.04)
(5 more...)

Genre: Research Report (0.65)

Industry:

Health & Medicine > Therapeutic Area > Oncology > Lung Cancer (0.61)
Government > Regional Government > North America Government > United States Government > FDA (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.34)

Add feedback

A Cautionary Tale About "Neutrally" Informative AI Tools Ahead of the 2025 Federal Elections in Germany

Dormuth, Ina, Franke, Sven, Hafer, Marlies, Katzke, Tim, Marx, Alexander, Müller, Emmanuel, Neider, Daniel, Pauly, Markus, Rutinowski, Jérôme

arXiv.org Artificial IntelligenceFeb-21-2025

In this study, we examine the reliability of AI-based Voting Advice Applications (VAAs) and large language models (LLMs) in providing objective political information. Our analysis is based upon a comparison with party responses to 38 statements of the Wahl-O-Mat, a well-established German online tool that helps inform voters by comparing their views with political party positions. For the LLMs, we identify significant biases. They exhibit a strong alignment (over 75% on average) with left-wing parties and a substantially lower alignment with center-right (smaller 50%) and right-wing parties (around 30%). Furthermore, for the VAAs, intended to objectively inform voters, we found substantial deviations from the parties' stated positions in Wahl-O-Mat: While one VAA deviated in 25% of cases, another VAA showed deviations in more than 50% of cases. For the latter, we even observed that simple prompt injections led to severe hallucinations, including false claims such as non-existent connections between political parties and right-wing extremist ties.

cautionary tale, neutrally, wahl-o-mat, (14 more...)

arXiv.org Artificial Intelligence

2502.15568

Country:

North America > United States (0.51)
Asia > China (0.14)
Europe > Ukraine (0.05)
(3 more...)

Genre: Research Report > New Finding (0.87)

Industry:

Government > Voting & Elections (1.00)
Government > Regional Government > North America Government > United States Government (0.51)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Entwicklung einer Webanwendung zur Generierung von skolemisierten RDF Daten f\"ur die Verwaltung von Lieferketten

Laas, Roman

arXiv.org Artificial IntelligenceJan-14-2025

F\"ur eine fr\"uhzeitige Erkennung von Lieferengp\"assen m\"ussen Lieferketten in einer geeigneten digitalen Form vorliegen, damit sie verarbeitet werden k\"onnen. Der f\"ur die Datenmodellierung ben\"otigte Arbeitsaufwand ist jedoch, gerade IT-fremden Personen, nicht zuzumuten. Es wurde deshalb im Rahmen dieser Arbeit eine Webanwendung entwickelt, welche die zugrunde liegende Komplexit\"at f\"ur den Benutzer verschleiern soll. Konkret handelt es sich dabei um eine grafische Benutzeroberfl\"ache, auf welcher Templates instanziiert und miteinander verkn\"upft werden k\"onnen. F\"ur die Definition dieser Templates wurden in dieser Arbeit geeignete Konzepte erarbeitet und erweitert. Zur Erhebung der Benutzerfreundlichkeit der Webanwendung wurde abschlie{\ss}end eine Nutzerstudie mit mehreren Testpersonen durchgef\"uhrt. Diese legte eine Vielzahl von n\"utzlichen Verbesserungsvorschl\"agen offen. -- For early detection of supply bottlenecks, supply chains must be available in a suitable digital form so that they can be processed. However, the amount of work required for data modeling cannot be expected of people who are not familiar with IT topics. Therefore, a web application was developed in the context of this thesis, which is supposed to disguise the underlying complexity for the user. Specifically, this is a graphical user interface on which templates can be instantiated and linked to each other. Suitable concepts for the definition of these templates were developed and extended in this thesis. Finally, a user study with several test persons was conducted to determine the usability of the web application. This revealed a large number of useful suggestions for improvement.

artificial intelligence, programming language, werden, (17 more...)

arXiv.org Artificial Intelligence

2503.15495

Country:

Europe > Netherlands > Drenthe > Assen (0.24)
Europe > Germany > Hesse > Darmstadt Region > Wiesbaden (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Software > Programming Languages (1.00)
Information Technology > Communications > Web (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)

Add feedback

Evaluation of the Code Generation Capabilities of ChatGPT 4: A Comparative Analysis in 19 Programming Languages

Gilbert, L. C.

arXiv.org Artificial IntelligenceJan-4-2025

This bachelor's thesis examines the capabilities of ChatGPT 4 in code generation across 19 programming languages. The study analyzed solution rates across three difficulty levels, types of errors encountered, and code quality in terms of runtime and memory efficiency through a quantitative experiment. A total of 188 programming problems were selected from the LeetCode platform, and ChatGPT 4 was given three attempts to produce a correct solution with feedback. ChatGPT 4 successfully solved 39.67% of all tasks, with success rates decreasing significantly as problem complexity increased. Notably, the model faced considerable challenges with hard problems across all languages. ChatGPT 4 demonstrated higher competence in widely used languages, likely due to a larger volume and higher quality of training data. The solution rates also revealed a preference for languages with low abstraction levels and static typing. For popular languages, the most frequent error was "Wrong Answer," whereas for less popular languages, compiler and runtime errors prevailed, suggesting frequent misunderstandings and confusion regarding the structural characteristics of these languages. The model exhibited above-average runtime efficiency in all programming languages, showing a tendency toward statically typed and low-abstraction languages. Memory efficiency results varied significantly, with above-average performance in 14 languages and below-average performance in five languages. A slight preference for low-abstraction languages and a leaning toward dynamically typed languages in terms of memory efficiency were observed. Future research should include a larger number of tasks, iterations, and less popular languages. Additionally, ChatGPT 4's abilities in code interpretation and summarization, debugging, and the development of complex, practical code could be analyzed further. ---- Diese Bachelorarbeit untersucht die F\"ahigkeiten von ChatGPT 4 zur Code-Generierung in 19 Programmiersprachen. Betrachtet wurden die L\"osungsraten zwischen drei Schwierigkeitsgraden, die aufgetretenen Fehlerarten und die Qualit\"at des Codes hinsichtlich der Laufzeit- und Speichereffizienz in einem quantitativen Experiment. Dabei wurden 188 Programmierprobleme der Plattform LeetCode entnommen, wobei ChatGPT 4 jeweils drei Versuche hatte, mittels Feedback eine korrekte L\"osung zu generieren. ChatGPT 4 l\"oste 39,67 % aller Aufgaben erfolgreich, wobei die Erfolgsrate mit zunehmendem Schwierigkeitsgrad deutlich abnahm und bei komplexen Problemen in allen Sprachen signifikante Schwierigkeiten auftraten. Das Modell zeigte eine h\"ohere Kompetenz in weit verbreiteten Sprachen, was wahrscheinlich auf eine gr\"o{\ss}ere Menge und h\"ohere Qualit\"at der Trainingsdaten zur\"uckzuf\"uhren ist. Bez\"uglich der L\"osungsraten zeigte das Modell zudem eine Pr\"aferenz f\"ur Sprachen mit niedrigem Abstraktionsniveau und statischer Typisierung. Bei Sprachen hoher Popularit\"at trat der Fehler Wrong Answer am h\"aufigsten auf, w\"ahrend bei weniger popul\"aren Sprachen Compiler- und Laufzeitfehler \"uberwogen, was auf h\"aufige Missverst\"andnisse und Verwechslungen bez\"uglich der spezifischen strukturellen Eigenschaften dieser Sprachen zur\"uckzuf\"uhren ist. ChatGPT 4 demonstrierte in allen Programmiersprachen eine \"uberdurchschnittliche Laufzeiteffizienz und tendierte diesbez\"uglich erneut zu statisch typisierten und niedrig abstrahierten Sprachen. Die Werte zur Speichereffizienz variierten erheblich, wobei in 14 Sprachen \"uberdurchschnittliche und in f\"unf Sprachen unterdurchschnittliche Werte erzielt wurden. Es zeigte sich diesbez\"uglich eine leichte Tendenz zugunsten von niedrig abstrahierten sowie eine Pr\"aferenz zu dynamisch typisierten Sprachen. Zuk\"unftige Forschung sollte eine h\"ohere Anzahl an Aufgaben, Iterationen und unpopul\"aren Sprachen einbeziehen. Dar\"uber hinaus k\"onnten die F\"ahigkeiten von ChatGPT 4 in der Code-Interpretation und -Zusammenfassung, im Debugging und in der Entwicklung komplexer, praxisbezogener Codes analysiert werden.

chat gpt 4, large language model, machine learning, (21 more...)

arXiv.org Artificial Intelligence

doi: 10.5281/zenodo.14599318

2501.02338

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Chatbots im Schulunterricht: Wir testen das Fobizz-Tool zur automatischen Bewertung von Hausaufgaben

Muehlhoff, Rainer, Henningsen, Marte

arXiv.org Artificial IntelligenceDec-17-2024

(English) This study examines the AI-powered grading tool "AI Grading Assistant" by the German company Fobizz, designed to support teachers in evaluating and providing feedback on student assignments. Against the societal backdrop of an overburdened education system and rising expectations for artificial intelligence as a solution to these challenges, the investigation evaluates the tool's functional suitability through two test series. The results reveal significant shortcomings: The tool's numerical grades and qualitative feedback are often random and do not improve even when its suggestions are incorporated. The highest ratings are achievable only with texts generated by ChatGPT. False claims and nonsensical submissions frequently go undetected, while the implementation of some grading criteria is unreliable and opaque. Since these deficiencies stem from the inherent limitations of large language models (LLMs), fundamental improvements to this or similar tools are not immediately foreseeable. The study critiques the broader trend of adopting AI as a quick fix for systemic problems in education, concluding that Fobizz's marketing of the tool as an objective and time-saving solution is misleading and irresponsible. Finally, the study calls for systematic evaluation and subject-specific pedagogical scrutiny of the use of AI tools in educational contexts.

hlhoff henningsen 2024, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2412.06651

Country:

Europe > Germany > Saarland (0.05)
Europe > Germany > Rhineland-Palatinate (0.04)
Europe > Germany > Schleswig-Holstein (0.04)
(8 more...)

Genre: Research Report (0.70)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.51)

Add feedback

Der Effizienz- und Intelligenzbegriff in der Lexikographie und kuenstlichen Intelligenz: kann ChatGPT die lexikographische Textsorte nachbilden?

Arias-Arias, Ivan, Vazquez, Maria Jose Dominguez, Riveiro, Carlos Valcarcel

arXiv.org Artificial IntelligenceDec-11-2024

By means of pilot experiments for the language pair German and Galician, this paper examines the concept of efficiency and intelligence in lexicography and artificial intelligence, AI. The aim of the experiments is to gain empirically and statistically based insights into the lexicographical text type,dictionary article, in the responses of ChatGPT 3.5, as well as into the lexicographical data on which this chatbot was trained. Both quantitative and qualitative methods are used for this purpose. The analysis is based on the evaluation of the outputs of several sessions with the same prompt in ChatGPT 3.5. On the one hand, the algorithmic performance of intelligent systems is evaluated in comparison with data from lexicographical works. On the other hand, the ChatGPT data supplied is analysed using specific text passages of the aforementioned lexicographical text type. The results of this study not only help to evaluate the efficiency of this chatbot regarding the creation of dictionary articles, but also to delve deeper into the concept of intelligence, the thought processes and the actions to be carried out in both disciplines.

effizienz-und intelligenzbegriff, lexicography, lexiko, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.5788/34-1-1879.

2412.08599

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
Europe > Czechia > South Moravian Region > Brno (0.05)
Europe > Slovenia > Central Slovenia > Municipality of Ljubljana > Ljubljana (0.04)
(11 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.46)

Add feedback

Local Transcription Models in Home Care Nursing in Switzerland: an Interdisciplinary Case Study

Kramer, Jeremy, Kravchenko, Tetiana, Kaufmann, Beatrice, Thilo, Friederike J. S., Kurpicz-Briki, Mascha

arXiv.org Artificial IntelligenceSep-27-2024

Latest advances in the field of natural language processing (NLP) enable new use cases for different domains, including the medical sector. In particular, transcription can be used to support automation in the nursing documentation process and give nurses more time to interact with the patients. However, different challenges including (a) data privacy, (b) local languages and dialects, and (c) domain-specific vocabulary need to be addressed. In this case study, we investigate the case of home care nursing documentation in Switzerland. We assessed different transcription tools and models, and conducted several experiments with OpenAI Whisper, involving different variations of German (i.e., dialects, foreign accent) and manually curated example texts by a domain expert of home care nursing. Our results indicate that even the used out-of-the-box model performs sufficiently well to be a good starting point for future research in the field.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2409.18819

Country:

Europe > Switzerland > Basel-City > Basel (0.05)
Europe > Switzerland > Bern > Bern (0.05)
Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.04)
Asia > Indonesia > Bali (0.04)

Genre: Research Report > New Finding (0.49)

Industry:

Health & Medicine > Health Care Providers & Services (0.92)
Information Technology > Security & Privacy (0.87)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.36)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.36)

Add feedback