AITopics | Bezemer, Cor-Paul

Collaborating Authors

Bezemer, Cor-Paul

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Software Engineering and Foundation Models: Insights from Industry Blogs Using a Jury of Foundation Models

Li, Hao, Bezemer, Cor-Paul, Hassan, Ahmed E.

arXiv.org Artificial IntelligenceJan-6-2025

Foundation models (FMs) such as large language models (LLMs) have significantly impacted many fields, including software engineering (SE). The interaction between SE and FMs has led to the integration of FMs into SE practices (FM4SE) and the application of SE methodologies to FMs (SE4FM). While several literature surveys exist on academic contributions to these trends, we are the first to provide a practitioner's view. We analyze 155 FM4SE and 997 SE4FM blog posts from leading technology companies, leveraging an FM-powered surveying approach to systematically label and summarize the discussed activities and tasks. We observed that while code generation is the most prominent FM4SE task, FMs are leveraged for many other SE activities such as code understanding, summarization, and API recommendation. The majority of blog posts on SE4FM are about model deployment & operation, and system architecture & orchestration. Although the emphasis is on cloud deployments, there is a growing interest in compressing FMs and deploying them on smaller devices such as edge or mobile devices. We outline eight future research directions inspired by our gained insights, aiming to bridge the gap between academic findings and real-world applications. Our study not only enriches the body of knowledge on practical applications of FM4SE and SE4FM but also demonstrates the utility of FMs as a powerful and efficient approach in conducting literature surveys within technical and grey literature domains. Our dataset, results, code and used prompts can be found in our online replication package at https://github.com/SAILResearch/fmse-blogs.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2410.09012

Country:

North America > Canada > Ontario > Kingston (0.14)
North America > Canada > Alberta > Census Division No. 11 > Edmonton Metropolitan Region > Edmonton (0.14)

Genre:

Research Report (1.00)
Overview (1.00)

Industry: Information Technology > Services (0.88)

Technology:

Information Technology > Software Engineering (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Add feedback

Studying the Impact of TensorFlow and PyTorch Bindings on Machine Learning Software Quality

Li, Hao, Rajbahadur, Gopi Krishnan, Bezemer, Cor-Paul

arXiv.org Artificial IntelligenceJul-7-2024

Bindings for machine learning frameworks (such as TensorFlow and PyTorch) allow developers to integrate a framework's functionality using a programming language different from the framework's default language (usually Python). In this paper, we study the impact of using TensorFlow and PyTorch bindings in C#, Rust, Python and JavaScript on the software quality in terms of correctness (training and test accuracy) and time cost (training and inference time) when training and performing inference on five widely used deep learning models. Our experiments show that a model can be trained in one binding and used for inference in another binding for the same framework without losing accuracy. Our study is the first to show that using a non-default binding can help improve machine learning software quality from the time cost perspective compared to the default Python binding while still achieving the same level of correctness.

artificial intelligence, machine learning, ml framework, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3678168

2407.05466

Country:

Europe (1.00)
North America > Canada > Alberta (0.28)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > Canada > Ontario > Kingston (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology (1.00)
Health & Medicine (0.92)
Education > Educational Technology > Educational Software > Computer Based Training (0.61)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Keeping Deep Learning Models in Check: A History-Based Approach to Mitigate Overfitting

Li, Hao, Rajbahadur, Gopi Krishnan, Lin, Dayi, Bezemer, Cor-Paul, Ming, Zhen, Jiang, null

arXiv.org Artificial IntelligenceJan-18-2024

In software engineering, deep learning models are increasingly deployed for critical tasks such as bug detection and code review. However, overfitting remains a challenge that affects the quality, reliability, and trustworthiness of software systems that utilize deep learning models. Overfitting can be (1) prevented (e.g., using dropout or early stopping) or (2) detected in a trained model (e.g., using correlation-based approaches). Both overfitting detection and prevention approaches that are currently used have constraints (e.g., requiring modification of the model structure, and high computing resources). In this paper, we propose a simple, yet powerful approach that can both detect and prevent overfitting based on the training history (i.e., validation losses). Our approach first trains a time series classifier on training histories of overfit models. This classifier is then used to detect if a trained model is overfit. In addition, our trained classifier can be used to prevent overfitting by identifying the optimal point to stop a model's training. We evaluate our approach on its ability to identify and prevent overfitting in real-world samples. We compare our approach against correlation-based detection approaches and the most commonly used prevention approach (i.e., early stopping). Our approach achieves an F1 score of 0.91 which is at least 5% higher than the current best-performing non-intrusive overfitting detection approach. Furthermore, our approach can stop training to avoid overfitting at least 32% of the times earlier than early stopping and has the same or a better rate of returning the best model.

artificial intelligence, machine learning, training history, (18 more...)

arXiv.org Artificial Intelligence

2401.10359

Country:

North America > Canada > Alberta (0.14)
Europe > Middle East > Cyprus (0.14)

Genre: Research Report > New Finding (0.67)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

GlitchBench: Can large multimodal models detect video game glitches?

Taesiri, Mohammad Reza, Feng, Tianjun, Bezemer, Cor-Paul, Nguyen, Anh

arXiv.org Artificial IntelligenceDec-8-2023

Large multimodal models (LMMs) have evolved from large language models (LLMs) to integrate multiple input modalities, such as visual inputs. This integration augments the capacity of LLMs for tasks requiring visual comprehension and reasoning. However, the extent and limitations of their enhanced abilities are not fully understood, especially when it comes to real-world tasks. To address this gap, we introduce GlitchBench, a novel benchmark derived from video game quality assurance tasks, to test and evaluate the reasoning capabilities of LMMs. Our benchmark is curated from a variety of unusual and glitched scenarios from video games and aims to challenge both the visual and linguistic reasoning powers of LMMs in detecting and interpreting out-of-the-ordinary events. We evaluate multiple state-of-the-art LMMs, and we show that GlitchBench presents a new challenge for these models. Code and data are available at: https://glitchbench.github.io/

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2312.05291

Country:

Asia > Middle East (0.28)
Europe (0.28)
North America > Canada > Alberta (0.14)

Genre: Research Report > New Finding (0.45)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Games (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback