weighted avg 0
Swift Hydra: Self-Reinforcing Generative Framework for Anomaly Detection with Multiple Mamba Models
Do, Nguyen, Nguyen, Truc, Hassanaly, Malik, Alharbi, Raed, Seo, Jung Taek, Thai, My T.
Despite a plethora of anomaly detection models developed over the years, their ability to generalize to unseen anomalies remains an issue, particularly in critical systems. This paper aims to address this challenge by introducing Swift Hydra, a new framework for training an anomaly detection method based on generative AI and reinforcement learning (RL). Through featuring an RL policy that operates on the latent variables of a generative model, the framework synthesizes novel and diverse anomaly samples that are capable of bypassing a detection model. These generated synthetic samples are, in turn, used to augment the detection model, further improving its ability to handle challenging anomalies. Swift Hydra also incorporates Mamba models structured as a Mixture of Experts (MoE) to enable scalable adaptation of the number of Mamba experts based on data complexity, effectively capturing diverse feature distributions without increasing the model's inference time. Empirical evaluations on ADBench benchmark demonstrate that Swift Hydra outperforms other state-of-the-art anomaly detection models while maintaining a relatively short inference time. From these results, our research highlights a new and auspicious paradigm of integrating RL and generative AI for advancing anomaly detection.
- North America > United States > New York > New York County > New York City (0.04)
- Asia > South Korea (0.04)
- Asia > Middle East > Saudi Arabia (0.04)
- Energy (1.00)
- Health & Medicine > Therapeutic Area (0.46)
- Government > Regional Government > North America Government > United States Government (0.46)
- Information Technology > Security & Privacy (0.45)
- Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.44)
Comparative Study of Zero-Shot Cross-Lingual Transfer for Bodo POS and NER Tagging Using Gemini 2.0 Flash Thinking Experimental Model
Narzary, Sanjib, Brahma, Bihung, Mahilary, Haradip, Brahma, Mahananda, Som, Bidisha, Nandi, Sukumar
Part-of-Speech (POS) tagging and Named Entity Recognition (NER) are fundamental tasks within the field of Natural Language Processing (NLP), serving as essential prerequisites for a multitude of downstream applications. POS tagging, the process of assigning grammatical categories to individual words within a sentence (e.g., noun, verb, adjective, adverb), provides crucial syntactic information that underpins higher-level language understanding. NER, on the contrary, focuses on identifying and classifying named entities - real-world objects that are designated with a proper name - into predefined semantic categories such as persons, organizations, locations, dates, times, and quantities [1, 2]. The synergy of POS and NER tagging empowers a wide spectrum of NLP applications. In information extraction, NER helps to pinpoint key entities, while POS tags help to understand the relationships between these entities and other words in the text, facilitating the extraction of structured information from unstructured text [3]. Machine translation systems benefit from POS tagging to improve syntactic analysis and word order prediction, and NER to ensure accurate translation of named entities in languages [4]. Question-answer systems rely on both NER and POS to understand the question's intent, identify relevant entities and relationships in the knowledge base, and formulate accurate answers. Text summarization algorithms leverage NER to identify salient entities and POS tags to preserve grammatical coherence and readability in summaries.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Europe > Sweden > Vaestra Goetaland > Gothenburg (0.04)
- Asia > Indonesia > Bali (0.04)
- (7 more...)
Open or Closed LLM for Lesser-Resourced Languages? Lessons from Greek
Pavlopoulos, John, Bakagianni, Juli, Pouli, Kanella, Gavriilidou, Maria
Natural Language Processing (NLP) for lesser-resourced languages faces persistent challenges, including limited datasets, inherited biases from high-resource languages, and the need for domain-specific solutions. This study addresses these gaps for Modern Greek through three key contributions. First, we evaluate the performance of open-source (Llama-70b) and closed-source (GPT-4o mini) large language models (LLMs) on seven core NLP tasks with dataset availability, revealing task-specific strengths, weaknesses, and parity in their performance. Second, we expand the scope of Greek NLP by reframing Authorship Attribution as a tool to assess potential data usage by LLMs in pre-training, with high 0-shot accuracy suggesting ethical implications for data provenance. Third, we showcase a legal NLP case study, where a Summarize, Translate, and Embed (STE) methodology outperforms the traditional TF-IDF approach for clustering \emph{long} legal texts. Together, these contributions provide a roadmap to advance NLP in lesser-resourced languages, bridging gaps in model evaluation, task innovation, and real-world impact.
- North America > United States > New York > New York County > New York City (0.04)
- North America > Dominican Republic (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- (4 more...)
INSIGHTBUDDY-AI: Medication Extraction and Entity Linking using Large Language Models and Ensemble Learning
Romero, Pablo, Han, Lifeng, Nenadic, Goran
Medication Extraction and Mining play an important role in healthcare NLP research due to its practical applications in hospital settings, such as their mapping into standard clinical knowledge bases (SNOMED-CT, BNF, etc.). In this work, we investigate state-of-the-art LLMs in text mining tasks on medications and their related attributes such as dosage, route, strength, and adverse effects. In addition, we explore different ensemble learning methods (\textsc{Stack-Ensemble} and \textsc{Voting-Ensemble}) to augment the model performances from individual LLMs. Our ensemble learning result demonstrated better performances than individually fine-tuned base models BERT, RoBERTa, RoBERTa-L, BioBERT, BioClinicalBERT, BioMedRoBERTa, ClinicalBERT, and PubMedBERT across general and specific domains. Finally, we build up an entity linking function to map extracted medical terminologies into the SNOMED-CT codes and the British National Formulary (BNF) codes, which are further mapped to the Dictionary of Medicines and Devices (dm+d), and ICD. Our model's toolkit and desktop applications are publicly available (at \url{https://github.com/HECTA-UoM/ensemble-NER}).
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Europe > United Kingdom > England > Greater Manchester > Manchester (0.05)
- Europe > Netherlands > South Holland > Leiden (0.04)
- Health & Medicine > Health Care Technology (0.68)
- Health & Medicine > Therapeutic Area (0.46)
A Temporal Convolutional Network-based Approach for Network Intrusion Detection
Nazre, Rukmini, Budke, Rujuta, Oak, Omkar, Sawant, Suraj, Joshi, Amit
Network intrusion detection is critical for securing modern networks, yet the complexity of network traffic poses significant challenges to traditional methods. This study proposes a Temporal Convolutional Network(TCN) model featuring a residual block architecture with dilated convolutions to capture dependencies in network traffic data while ensuring training stability. The TCN's ability to process sequences in parallel enables faster, more accurate sequence modeling than Recurrent Neural Networks. Evaluated on the Edge-IIoTset dataset, which includes 15 classes with normal traffic and 14 cyberattack types, the proposed model achieved an accuracy of 96.72% and a loss of 0.0688, outperforming 1D CNN, CNN-LSTM, CNN-GRU, CNN-BiLSTM, and CNN-GRU-LSTM models. A class-wise classification report, encompassing metrics such as recall, precision, accuracy, and F1-score, demonstrated the TCN model's superior performance across varied attack categories, including Malware, Injection, and DDoS. These results underscore the model's potential in addressing the complexities of network intrusion detection effectively.
- Information Technology > Security & Privacy (1.00)
- Government > Military > Cyberwarfare (0.48)
Automatic deductive coding in discourse analysis: an application of large language models in learning analytics
Zhang, Lishan, Wu, Han, Huang, Xiaoshan, Duan, Tengfei, Du, Hanxiang
Deductive coding is a common discourse analysis method widely used by learning science and learning analytics researchers for understanding teaching and learning interactions. It often requires researchers to manually label all discourses to be analyzed according to a theoretically guided coding scheme, which is time-consuming and labor-intensive. The emergence of large language models such as GPT has opened a new avenue for automatic deductive coding to overcome the limitations of traditional deductive coding. To evaluate the usefulness of large language models in automatic deductive coding, we employed three different classification methods driven by different artificial intelligence technologies, including the traditional text classification method with text feature engineering, BERT-like pretrained language model and GPT-like pretrained large language model (LLM). We applied these methods to two different datasets and explored the potential of GPT and prompt engineering in automatic deductive coding. By analyzing and comparing the accuracy and Kappa values of these three classification methods, we found that GPT with prompt engineering outperformed the other two methods on both datasets with limited number of training samples. By providing detailed prompt structures, the reported work demonstrated how large language models can be used in the implementation of automatic deductive coding.
- North America > United States (0.14)
- North America > Canada > Quebec > Montreal (0.14)
- Asia > China > Beijing > Beijing (0.04)
- (2 more...)
- Instructional Material (1.00)
- Research Report > New Finding (0.46)
- Education > Educational Setting > Online (0.93)
- Education > Educational Technology > Educational Software > Computer Based Training (0.93)
- Education > Assessment & Standards > Student Performance (0.68)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Predicting DNA fragmentation: A non-destructive analogue to chemical assays using machine learning
Jacobs, Byron A, Shaik, Ifthakaar, Lin, Frando
Globally, infertility rates are increasing, with 2.5\% of all births being assisted by in vitro fertilisation (IVF) in 2022. Male infertility is the cause for approximately half of these cases. The quality of sperm DNA has substantial impact on the success of IVF. The assessment of sperm DNA is traditionally done through chemical assays which render sperm cells ineligible for IVF. Many compounding factors lead to the population crisis, with fertility rates dropping globally in recent history. As such assisted reproductive technologies (ART) have been the focus of recent research efforts. Simultaneously, artificial intelligence has grown ubiquitous and is permeating more aspects of modern life. With the advent of state-of-the-art machine learning and its exceptional performance in many sectors, this work builds on these successes and proposes a novel framework for the prediction of sperm cell DNA fragmentation from images of unstained sperm. Rendering a predictive model which preserves sperm integrity and allows for optimal selection of sperm for IVF.
- North America > United States (0.68)
- Asia > Singapore > Central Region > Singapore (0.04)
- Africa > South Africa > Gauteng > Pretoria (0.04)
- (2 more...)
GPT-4 Generated Narratives of Life Events using a Structured Narrative Prompt: A Validation Study
Lynch, Christopher J., Jensen, Erik, Munro, Madison H., Zamponi, Virginia, Martinez, Joseph, O'Brien, Kevin, Feldhaus, Brandon, Smith, Katherine, Reinhold, Ann Marie, Gore, Ross
Large Language Models (LLMs) play a pivotal role in generating vast arrays of narratives, facilitating a systematic exploration of their effectiveness for communicating life events in narrative form. In this study, we employ a zero-shot structured narrative prompt to generate 24,000 narratives using OpenAI's GPT-4. From this dataset, we manually classify 2,880 narratives and evaluate their validity in conveying birth, death, hiring, and firing events. Remarkably, 87.43% of the narratives sufficiently convey the intention of the structured prompt. To automate the identification of valid and invalid narratives, we train and validate nine Machine Learning models on the classified datasets. Leveraging these models, we extend our analysis to predict the classifications of the remaining 21,120 narratives. All the ML models excelled at classifying valid narratives as valid, but experienced challenges at simultaneously classifying invalid narratives as invalid. Our findings not only advance the study of LLM capabilities, limitations, and validity but also offer practical insights for narrative generation and natural language processing applications.
- North America > Puerto Rico > Peñuelas > Peñuelas (0.04)
- North America > United States > Wisconsin > Milwaukee County > Milwaukee (0.04)
- North America > United States > Virginia > Suffolk (0.04)
- (2 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
Crosslingual Retrieval Augmented In-context Learning for Bangla
Li, Xiaoqian, Nie, Ercong, Liang, Sheng
The promise of Large Language Models (LLMs) in Natural Language Processing has often been overshadowed by their limited performance in low-resource languages such as Bangla. To address this, our paper presents a pioneering approach that utilizes cross-lingual retrieval augmented in-context learning. By strategically sourcing semantically similar prompts from high-resource language, we enable multilingual pretrained language models (MPLMs), especially the generative model BLOOMZ, to successfully boost performance on Bangla tasks. Our extensive evaluation highlights that the cross-lingual retrieval augmented prompts bring steady improvements to MPLMs over the zero-shot performance.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > Canada > Ontario > Toronto (0.04)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
- (11 more...)
Evaluation of ChatGPT Model for Vulnerability Detection
Cheshkov, Anton, Zadorozhny, Pavel, Levichev, Rodion
In this technical report, we evaluated the performance of the ChatGPT and GPT-3 models for the task of vulnerability detection in code. Our evaluation was conducted on our real-world dataset, using binary and multi-label classification tasks on CWE vulnerabilities. We decided to evaluate the model because it has shown good performance on other code-based tasks, such as solving programming challenges and understanding code at a high level. However, we found that the ChatGPT model performed no better than a dummy classifier for both binary and multi-label classification tasks for code vulnerability detection.