Goto

Collaborating Authors

 version 2


Modeling Biological Multifunctionality with Echo State Networks

Leventi-Peetz, Anastasia-Maria, Peetz, Jörg-Volker, Weber, Kai, Zacharis, Nikolaos

arXiv.org Artificial Intelligence

In this work, a three-dimensional multicomponent reaction-diffusion model has been developed, combining excitable-system dynamics with diffusion processes and sharing conceptual features with the FitzHugh-Nagumo model. Designed to capture the spatiotemporal behavior of biological systems, particularly electrophysiological processes, the model was solved numerically to generate time-series data. These data were subsequently used to train and evaluate an Echo State Network (ESN), which successfully reproduced the system's dynamic behavior. The results demonstrate that simulating biological dynamics using data-driven, multifunctional ESN models is both feasible and effective.


Can Large Language Models be Effective Online Opinion Miners?

Heo, Ryang, Seo, Yongsik, Lee, Junseong, Lee, Dongha

arXiv.org Artificial Intelligence

The surge of user-generated online content presents a wealth of insights into customer preferences and market trends. However, the highly diverse, complex, and context-rich nature of such contents poses significant challenges to traditional opinion mining approaches. To address this, we introduce Online Opinion Mining Benchmark (OOMB), a novel dataset and evaluation protocol designed to assess the ability of large language models (LLMs) to mine opinions effectively from diverse and intricate online environments. OOMB provides extensive (entity, feature, opinion) tuple annotations and a comprehensive opinion-centric summary that highlights key opinion topics within each content, thereby enabling the evaluation of both the extractive and abstractive capabilities of models. Through our proposed benchmark, we conduct a comprehensive analysis of which aspects remain challenging and where LLMs exhibit adaptability, to explore whether they can effectively serve as opinion miners in realistic online scenarios. This study lays the foundation for LLM-based opinion mining and discusses directions for future research in this field.


A modular framework for automated evaluation of procedural content generation in serious games with deep reinforcement learning agents

Kalafatis, Eleftherios, Mitsis, Konstantinos, Zarkogianni, Konstantia, Athanasiou, Maria, Nikita, Konstantina

arXiv.org Artificial Intelligence

Serious Games (SGs) are nowadays shifting focus to include procedural content generation (PCG) in the development process as a means of offering personalized and enhanced player experience. However, the development of a framework to assess the impact of PCG techniques when integrated into SGs remains particularly challenging. This study proposes a methodology for automated evaluation of PCG integration in SGs, incorporating deep reinforcement learning (DRL) game testing agents. To validate the proposed framework, a previously introduced SG featuring card game mechanics and incorporating three different versions of PCG for nonplayer character (NPC) creation has been deployed. Version 1 features random NPC creation, while versions 2 and 3 utilize a genetic algorithm approach. These versions are used to test the impact of different dynamic SG environments on the proposed framework's agents. The obtained results highlight the superiority of the DRL game testing agents trained on Versions 2 and 3 over those trained on Version 1 in terms of win rate (i.e. number of wins per played games) and training time. More specifically, within the execution of a test emulating regular gameplay, both Versions 2 and 3 peaked at a 97% win rate and achieved statistically significant higher (p=0009) win rates compared to those achieved in Version 1 that peaked at 94%. Overall, results advocate towards the proposed framework's capability to produce meaningful data for the evaluation of procedurally generated content in SGs.


Assessing and Refining ChatGPT's Performance in Identifying Targeting and Inappropriate Language: A Comparative Study

Baran, Barbarestani, Isa, Maks, Piek, Vossen

arXiv.org Artificial Intelligence

This study evaluates the effectiveness of ChatGPT, an advanced AI model for natural language processing, in identifying targeting and inappropriate language in online comments. With the increasing challenge of moderating vast volumes of user-generated content on social network sites, the role of AI in content moderation has gained prominence. We compared ChatGPT's performance against crowd-sourced annotations and expert evaluations to assess its accuracy, scope of detection, and consistency. Our findings highlight that ChatGPT performs well in detecting inappropriate content, showing notable improvements in accuracy through iterative refinements, particularly in Version 6. However, its performance in targeting language detection showed variability, with higher false positive rates compared to expert judgments. This study contributes to the field by demonstrating the potential of AI models like ChatGPT to enhance automated content moderation systems while also identifying areas for further improvement. The results underscore the importance of continuous model refinement and contextual understanding to better support automated moderation and mitigate harmful online behavior.


PingPong: A Benchmark for Role-Playing Language Models with User Emulation and Multi-Model Evaluation

Gusev, Ilya

arXiv.org Artificial Intelligence

We introduce a novel benchmark for evaluating the role-playing capabilities of language models. Our approach leverages language models themselves to emulate users in dynamic, multi-turn conversations and to assess the resulting dialogues. The framework consists of three main components: a player model assuming a specific character role, an interrogator model simulating user behavior, and a judge model evaluating conversation quality. We conducted experiments comparing automated evaluations with human annotations to validate our approach, demonstrating strong correlations across multiple criteria. This work provides a foundation for a robust and dynamic evaluation of model capabilities in interactive scenarios.


Physics simulation capabilities of LLMs

Ali-Dib, Mohamad, Menou, Kristen

arXiv.org Artificial Intelligence

[Abridged abstract] Large Language Models (LLMs) can solve some undergraduate-level to graduate-level physics textbook problems and are proficient at coding. Combining these two capabilities could one day enable AI systems to simulate and predict the physical world. We present an evaluation of state-of-the-art (SOTA) LLMs on PhD-level to research-level computational physics problems. We condition LLM generation on the use of well-documented and widely-used packages to elicit coding capabilities in the physics and astrophysics domains. We contribute $\sim 50$ original and challenging problems in celestial mechanics (with REBOUND), stellar physics (with MESA), 1D fluid dynamics (with Dedalus) and non-linear dynamics (with SciPy). Since our problems do not admit unique solutions, we evaluate LLM performance on several soft metrics: counts of lines that contain different types of errors (coding, physics, necessity and sufficiency) as well as a more "educational" Pass-Fail metric focused on capturing the salient physical ingredients of the problem at hand. As expected, today's SOTA LLM (GPT4) zero-shot fails most of our problems, although about 40\% of the solutions could plausibly get a passing grade. About $70-90 \%$ of the code lines produced are necessary, sufficient and correct (coding \& physics). Physics and coding errors are the most common, with some unnecessary or insufficient lines. We observe significant variations across problem class and difficulty. We identify several failure modes of GPT4 in the computational physics domain. Our reconnaissance work provides a snapshot of current computational capabilities in classical physics and points to obvious improvement targets if AI systems are ever to reach a basic level of autonomy in physics simulation capabilities.


Biological Organisms as End Effectors

Galipon, Josephine, Shimizu, Shoya, Tadakuma, Kenjiro

arXiv.org Artificial Intelligence

In robotics, an end effector is a device at the end of a robotic arm that is designed to physically interact with objects in the environment or with the environment itself. Effectively, it serves as the hand of the robot, carrying out tasks on behalf of humans. But could we turn this concept on its head and consider using living organisms themselves as end effectors? This paper introduces a novel idea of using whole living organisms as end effectors for robotics. We showcase this by demonstrating that pill bugs and chitons -- types of small, harmless creatures -- can be utilized as functional grippers. Crucially, this method does not harm these creatures, enabling their release back into nature after use. How this concept may be expanded to other organisms and applications is also discussed.


Traffic Sign Recognition Dataset and Data Augmentation

Ge, Jingzhan

arXiv.org Artificial Intelligence

Although there are many datasets for traffic sign classification, there are few datasets collected for traffic sign recognition and few of them obtain enough instances especially for training a model with the deep learning method. The deep learning method is almost the only way to train a model for real-world usage that covers various highly similar classes compared with the traditional way such as through color, shape, etc. Plus, due to the appearance frequency of different classes of traffic signs in the real world, the imbalance between different classes' instances in the datasets makes the training results even worse. Also, for some certain sign classes, their sign meanings were destined to can't get enough instances in the dataset. To solve this problem, we purpose a unique data augmentation method for the traffic sign recognition dataset that takes advantage of the standard of the traffic sign. We called it TSR dataset augmentation. We based on the benchmark Tsinghua-Tencent 100K (TT100K) dataset to verify the unique data augmentation method. The iteration version datasets based on TT100K, data augmentation method source code and the training results introduced in this paper are publicly available. Deep learning is a machine learning technique that teaches computers to do what humans are born with: learn by example. In deep learning, computer models learn to perform tasks directly from images, text, or sound. Deep learning models can achieve state-of-the-art precision, sometimes exceeding human levels. Models are trained by using a large set of labeled data[1].


Stable Diffusion update removes ability to copy artist styles or make NSFW works

Engadget

Stable Diffusion, the AI that can generate images from text in an astonishingly realistic way, has been updated with a bunch of new features. However, many users aren't happy, complaining that the new software can no longer generate pictures in the styles of specific artists or generate NSFW artworks, The Verge has reported. Version 2 does introduce a number of new features. Key among those is a new text encoder called OpenCLIP that "greatly improves the quality of the generated images compared to earlier V1 releases," according to Stability AI, the company behind Stable Diffusion. It also includes a new NSFW filter from LAION designed to remove adult content.


Top 8 Interview Questions on TensorFlow - Analytics Vidhya

#artificialintelligence

This article was published as a part of the Data Science Blogathon. TensorFlow is one of the most well-liked and promising deep learning frameworks for devising novel deep learning solutions. Given its popularity and wide usage in companies, startups, and business firms to automate things and develop new systems, it is imperative to have a crystal clear understanding of this framework. In this article, I have compiled a list of eight frequently asked questions that might help you become more familiar with the TensorFlow framework and ace your next interview! Following are some of the questions with detailed answers.