version 2
Modeling Biological Multifunctionality with Echo State Networks
Leventi-Peetz, Anastasia-Maria, Peetz, Jörg-Volker, Weber, Kai, Zacharis, Nikolaos
In this work, a three-dimensional multicomponent reaction-diffusion model has been developed, combining excitable-system dynamics with diffusion processes and sharing conceptual features with the FitzHugh-Nagumo model. Designed to capture the spatiotemporal behavior of biological systems, particularly electrophysiological processes, the model was solved numerically to generate time-series data. These data were subsequently used to train and evaluate an Echo State Network (ESN), which successfully reproduced the system's dynamic behavior. The results demonstrate that simulating biological dynamics using data-driven, multifunctional ESN models is both feasible and effective.
Can Large Language Models be Effective Online Opinion Miners?
Heo, Ryang, Seo, Yongsik, Lee, Junseong, Lee, Dongha
The surge of user-generated online content presents a wealth of insights into customer preferences and market trends. However, the highly diverse, complex, and context-rich nature of such contents poses significant challenges to traditional opinion mining approaches. To address this, we introduce Online Opinion Mining Benchmark (OOMB), a novel dataset and evaluation protocol designed to assess the ability of large language models (LLMs) to mine opinions effectively from diverse and intricate online environments. OOMB provides extensive (entity, feature, opinion) tuple annotations and a comprehensive opinion-centric summary that highlights key opinion topics within each content, thereby enabling the evaluation of both the extractive and abstractive capabilities of models. Through our proposed benchmark, we conduct a comprehensive analysis of which aspects remain challenging and where LLMs exhibit adaptability, to explore whether they can effectively serve as opinion miners in realistic online scenarios. This study lays the foundation for LLM-based opinion mining and discusses directions for future research in this field.
- Europe (1.00)
- Asia (1.00)
- North America > United States > Minnesota (0.27)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.67)
- Transportation > Passenger (1.00)
- Transportation > Ground > Road (1.00)
- Automobiles & Trucks > Manufacturer (1.00)
A modular framework for automated evaluation of procedural content generation in serious games with deep reinforcement learning agents
Kalafatis, Eleftherios, Mitsis, Konstantinos, Zarkogianni, Konstantia, Athanasiou, Maria, Nikita, Konstantina
Serious Games (SGs) are nowadays shifting focus to include procedural content generation (PCG) in the development process as a means of offering personalized and enhanced player experience. However, the development of a framework to assess the impact of PCG techniques when integrated into SGs remains particularly challenging. This study proposes a methodology for automated evaluation of PCG integration in SGs, incorporating deep reinforcement learning (DRL) game testing agents. To validate the proposed framework, a previously introduced SG featuring card game mechanics and incorporating three different versions of PCG for nonplayer character (NPC) creation has been deployed. Version 1 features random NPC creation, while versions 2 and 3 utilize a genetic algorithm approach. These versions are used to test the impact of different dynamic SG environments on the proposed framework's agents. The obtained results highlight the superiority of the DRL game testing agents trained on Versions 2 and 3 over those trained on Version 1 in terms of win rate (i.e. number of wins per played games) and training time. More specifically, within the execution of a test emulating regular gameplay, both Versions 2 and 3 peaked at a 97% win rate and achieved statistically significant higher (p=0009) win rates compared to those achieved in Version 1 that peaked at 94%. Overall, results advocate towards the proposed framework's capability to produce meaningful data for the evaluation of procedurally generated content in SGs.
- Europe > Greece (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- Europe > Netherlands > Limburg > Maastricht (0.04)
- (3 more...)
- Research Report > Experimental Study (0.88)
- Research Report > New Finding (0.66)
- Leisure & Entertainment > Games > Computer Games (1.00)
- Health & Medicine (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.89)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Assessing and Refining ChatGPT's Performance in Identifying Targeting and Inappropriate Language: A Comparative Study
Baran, Barbarestani, Isa, Maks, Piek, Vossen
This study evaluates the effectiveness of ChatGPT, an advanced AI model for natural language processing, in identifying targeting and inappropriate language in online comments. With the increasing challenge of moderating vast volumes of user-generated content on social network sites, the role of AI in content moderation has gained prominence. We compared ChatGPT's performance against crowd-sourced annotations and expert evaluations to assess its accuracy, scope of detection, and consistency. Our findings highlight that ChatGPT performs well in detecting inappropriate content, showing notable improvements in accuracy through iterative refinements, particularly in Version 6. However, its performance in targeting language detection showed variability, with higher false positive rates compared to expert judgments. This study contributes to the field by demonstrating the potential of AI models like ChatGPT to enhance automated content moderation systems while also identifying areas for further improvement. The results underscore the importance of continuous model refinement and contextual understanding to better support automated moderation and mitigate harmful online behavior.
- Health & Medicine (1.00)
- Media (0.68)
PingPong: A Benchmark for Role-Playing Language Models with User Emulation and Multi-Model Evaluation
We introduce a novel benchmark for evaluating the role-playing capabilities of language models. Our approach leverages language models themselves to emulate users in dynamic, multi-turn conversations and to assess the resulting dialogues. The framework consists of three main components: a player model assuming a specific character role, an interrogator model simulating user behavior, and a judge model evaluating conversation quality. We conducted experiments comparing automated evaluations with human annotations to validate our approach, demonstrating strong correlations across multiple criteria. This work provides a foundation for a robust and dynamic evaluation of model capabilities in interactive scenarios.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- Asia > Singapore (0.04)
- Asia > China > Jiangsu Province > Yancheng (0.04)
- Research Report (0.64)
- Questionnaire & Opinion Survey (0.47)
Physics simulation capabilities of LLMs
Ali-Dib, Mohamad, Menou, Kristen
[Abridged abstract] Large Language Models (LLMs) can solve some undergraduate-level to graduate-level physics textbook problems and are proficient at coding. Combining these two capabilities could one day enable AI systems to simulate and predict the physical world. We present an evaluation of state-of-the-art (SOTA) LLMs on PhD-level to research-level computational physics problems. We condition LLM generation on the use of well-documented and widely-used packages to elicit coding capabilities in the physics and astrophysics domains. We contribute $\sim 50$ original and challenging problems in celestial mechanics (with REBOUND), stellar physics (with MESA), 1D fluid dynamics (with Dedalus) and non-linear dynamics (with SciPy). Since our problems do not admit unique solutions, we evaluate LLM performance on several soft metrics: counts of lines that contain different types of errors (coding, physics, necessity and sufficiency) as well as a more "educational" Pass-Fail metric focused on capturing the salient physical ingredients of the problem at hand. As expected, today's SOTA LLM (GPT4) zero-shot fails most of our problems, although about 40\% of the solutions could plausibly get a passing grade. About $70-90 \%$ of the code lines produced are necessary, sufficient and correct (coding \& physics). Physics and coding errors are the most common, with some unnecessary or insufficient lines. We observe significant variations across problem class and difficulty. We identify several failure modes of GPT4 in the computational physics domain. Our reconnaissance work provides a snapshot of current computational capabilities in classical physics and points to obvious improvement targets if AI systems are ever to reach a basic level of autonomy in physics simulation capabilities.
- North America > Canada > Ontario > Toronto (0.14)
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
- North America > United States > New York (0.04)
- (2 more...)
- Overview (1.00)
- Research Report (0.81)
Biological Organisms as End Effectors
Galipon, Josephine, Shimizu, Shoya, Tadakuma, Kenjiro
In robotics, an end effector is a device at the end of a robotic arm that is designed to physically interact with objects in the environment or with the environment itself. Effectively, it serves as the hand of the robot, carrying out tasks on behalf of humans. But could we turn this concept on its head and consider using living organisms themselves as end effectors? This paper introduces a novel idea of using whole living organisms as end effectors for robotics. We showcase this by demonstrating that pill bugs and chitons -- types of small, harmless creatures -- can be utilized as functional grippers. Crucially, this method does not harm these creatures, enabling their release back into nature after use. How this concept may be expanded to other organisms and applications is also discussed.
- Pacific Ocean > North Pacific Ocean > Sea of Japan (0.04)
- Asia > Japan > Honshū > Tōhoku > Yamagata Prefecture (0.04)
- Asia > Japan > Honshū > Tōhoku > Miyagi Prefecture > Sendai (0.04)
Traffic Sign Recognition Dataset and Data Augmentation
Although there are many datasets for traffic sign classification, there are few datasets collected for traffic sign recognition and few of them obtain enough instances especially for training a model with the deep learning method. The deep learning method is almost the only way to train a model for real-world usage that covers various highly similar classes compared with the traditional way such as through color, shape, etc. Plus, due to the appearance frequency of different classes of traffic signs in the real world, the imbalance between different classes' instances in the datasets makes the training results even worse. Also, for some certain sign classes, their sign meanings were destined to can't get enough instances in the dataset. To solve this problem, we purpose a unique data augmentation method for the traffic sign recognition dataset that takes advantage of the standard of the traffic sign. We called it TSR dataset augmentation. We based on the benchmark Tsinghua-Tencent 100K (TT100K) dataset to verify the unique data augmentation method. The iteration version datasets based on TT100K, data augmentation method source code and the training results introduced in this paper are publicly available. Deep learning is a machine learning technique that teaches computers to do what humans are born with: learn by example. In deep learning, computer models learn to perform tasks directly from images, text, or sound. Deep learning models can achieve state-of-the-art precision, sometimes exceeding human levels. Models are trained by using a large set of labeled data[1].
- North America > United States > Texas > Dallas County > Dallas (0.04)
- North America > United States > Nevada > Clark County > Las Vegas (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- (7 more...)
Stable Diffusion update removes ability to copy artist styles or make NSFW works
Stable Diffusion, the AI that can generate images from text in an astonishingly realistic way, has been updated with a bunch of new features. However, many users aren't happy, complaining that the new software can no longer generate pictures in the styles of specific artists or generate NSFW artworks, The Verge has reported. Version 2 does introduce a number of new features. Key among those is a new text encoder called OpenCLIP that "greatly improves the quality of the generated images compared to earlier V1 releases," according to Stability AI, the company behind Stable Diffusion. It also includes a new NSFW filter from LAION designed to remove adult content.
Top 8 Interview Questions on TensorFlow - Analytics Vidhya
This article was published as a part of the Data Science Blogathon. TensorFlow is one of the most well-liked and promising deep learning frameworks for devising novel deep learning solutions. Given its popularity and wide usage in companies, startups, and business firms to automate things and develop new systems, it is imperative to have a crystal clear understanding of this framework. In this article, I have compiled a list of eight frequently asked questions that might help you become more familiar with the TensorFlow framework and ace your next interview! Following are some of the questions with detailed answers.