Goto

Collaborating Authors

 Overview


A Novel Approach to Malicious Code Detection Using CNN-BiLSTM and Feature Fusion

arXiv.org Artificial Intelligence

With the rapid advancement of Internet technology, the threat of malware to computer systems and network security has intensified. Malware affects individual privacy and security and poses risks to critical infrastructures of enterprises and nations. The increasing quantity and complexity of malware, along with its concealment and diversity, challenge traditional detection techniques. Static detection methods struggle against variants and packed malware, while dynamic methods face high costs and risks that limit their application. Consequently, there is an urgent need for novel and efficient malware detection techniques to improve accuracy and robustness. This study first employs the minhash algorithm to convert binary files of malware into grayscale images, followed by the extraction of global and local texture features using GIST and LBP algorithms. Additionally, the study utilizes IDA Pro to decompile and extract opcode sequences, applying N-gram and tf-idf algorithms for feature vectorization. The fusion of these features enables the model to comprehensively capture the behavioral characteristics of malware. In terms of model construction, a CNN-BiLSTM fusion model is designed to simultaneously process image features and opcode sequences, enhancing classification performance. Experimental validation on multiple public datasets demonstrates that the proposed method significantly outperforms traditional detection techniques in terms of accuracy, recall, and F1 score, particularly in detecting variants and obfuscated malware with greater stability. The research presented in this paper offers new insights into the development of malware detection technologies, validating the effectiveness of feature and model fusion, and holds promising application prospects.


Automatic Speech Recognition with BERT and CTC Transformers: A Review

arXiv.org Artificial Intelligence

This review paper provides a comprehensive analysis of recent advances in automatic speech recognition (ASR) with bidirectional encoder representations from transformers BERT and connectionist temporal classification (CTC) transformers. The paper first introduces the fundamental concepts of ASR and discusses the challenges associated with it. It then explains the architecture of BERT and CTC transformers and their potential applications in ASR. The paper reviews several studies that have used these models for speech recognition tasks and discusses the results obtained. Additionally, the paper highlights the limitations of these models and outlines potential areas for further research. All in all, this review provides valuable insights for researchers and practitioners who are interested in ASR with BERT and CTC transformers.


Conformal Prediction: A Data Perspective

arXiv.org Artificial Intelligence

The recent rapid development of well-designed and powerful machine learning (ML) models has significantly transformed our lives. However, the success of these models is often evaluated based on the accuracy of their predictions, which, while important, is not sufficient in many real-world scenarios. In high-stakes applications, it is equally critical to assess the uncertainty of model outputs. Uncertainty quantification (UQ) has long been a central problem in fields like statistics and ML. Several well-established methods, such as Bayesian inference and resampling techniques, have been widely adopted to address UQ. However, Bayesian posterior intervals are only valid if the parametric assumptions of the model are correctly specified, which may not always be the case in practical applications.


Information Discovery in e-Commerce

arXiv.org Artificial Intelligence

Electronic commerce, or e-commerce, is the buying and selling of goods and services, or the transmitting of funds or data online. E-commerce platforms come in many kinds, with global players such as Amazon, Airbnb, Alibaba, eBay and platforms targeting specific geographic regions. Information retrieval has a natural role to play in e-commerce, especially in connecting people to goods and services. Information discovery in e-commerce concerns different types of search (e.g., exploratory search vs. lookup tasks), recommender systems, and natural language processing in e-commerce portals. The rise in popularity of e-commerce sites has made research on information discovery in e-commerce an increasingly active research area. This is witnessed by an increase in publications and dedicated workshops in this space. Methods for information discovery in e-commerce largely focus on improving the effectiveness of e-commerce search and recommender systems, on enriching and using knowledge graphs to support e-commerce, and on developing innovative question answering and bot-based solutions that help to connect people to goods and services. In this survey, an overview is given of the fundamental infrastructure, algorithms, and technical solutions for information discovery in e-commerce. The topics covered include user behavior and profiling, search, recommendation, and language technology in e-commerce.


Enhanced Robot Planning and Perception through Environment Prediction

arXiv.org Artificial Intelligence

Mobile robots rely on maps to navigate through an environment. In the absence of any map, the robots must build the map online from partial observations as they move in the environment. Traditional methods build a map using only direct observations. In contrast, humans identify patterns in the observed environment and make informed guesses about what to expect ahead. Modeling these patterns explicitly is difficult due to the complexity of the environments. However, these complex models can be approximated well using learning-based methods in conjunction with large training data. By extracting patterns, robots can use direct observations and predictions of what lies ahead to better navigate an unknown environment. In this dissertation, we present several learning-based methods to equip mobile robots with prediction capabilities for efficient and safer operation. In the first part of the dissertation, we learn to predict using geometrical and structural patterns in the environment. Partially observed maps provide invaluable cues for accurately predicting the unobserved areas. We first demonstrate the capability of general learning-based approaches to model these patterns for a variety of overhead map modalities. Then we employ task-specific learning for faster navigation in indoor environments by predicting 2D occupancy in the nearby regions. This idea is further extended to 3D point cloud representation for object reconstruction. Predicting the shape of the full object from only partial views, our approach paves the way for efficient next-best-view planning. In the second part of the dissertation, we learn to predict using spatiotemporal patterns in the environment. We focus on dynamic tasks such as target tracking and coverage where we seek decentralized coordination between robots. We first show how graph neural networks can be used for more scalable and faster inference.


A Systematic Assessment of OpenAI o1-Preview for Higher Order Thinking in Education

arXiv.org Artificial Intelligence

As artificial intelligence (AI) continues to advance, it demonstrates capabilities comparable to human intelligence, with significant potential to transform education and workforce development. This study evaluates OpenAI o1-preview's ability to perform higher-order cognitive tasks across 14 dimensions, including critical thinking, systems thinking, computational thinking, design thinking, metacognition, data literacy, creative thinking, abstract reasoning, quantitative reasoning, logical reasoning, analogical reasoning, and scientific reasoning. We used validated instruments like the Ennis-Weir Critical Thinking Essay Test and the Biological Systems Thinking Test to compare the o1-preview's performance with human performance systematically. Our findings reveal that o1-preview outperforms humans in most categories, achieving 150% better results in systems thinking, computational thinking, data literacy, creative thinking, scientific reasoning, and abstract reasoning. However, compared to humans, it underperforms by around 25% in logical reasoning, critical thinking, and quantitative reasoning. In analogical reasoning, both o1-preview and humans achieved perfect scores. Despite these strengths, the o1-preview shows limitations in abstract reasoning, where human psychology students outperform it, highlighting the continued importance of human oversight in tasks requiring high-level abstraction. These results have significant educational implications, suggesting a shift toward developing human skills that complement AI, such as creativity, abstract reasoning, and critical thinking. This study emphasizes the transformative potential of AI in education and calls for a recalibration of educational goals, teaching methods, and curricula to align with an AI-driven world.


AI security and cyber risk in IoT systems

arXiv.org Artificial Intelligence

However, this extensive integration of IoT devices has also introduced significant cybersecurity risks. The Internet of Things (IoT) has attracted the attention of cybersecurity professionals after cyber-attackers started using IoT devices as botnets (Palekar and Radhika 2022). IoT devices are often vulnerable to various cyber threats, including distributed denial-of-service (DDoS) attacks, botnet exploitation, and data breaches, all of which can compromise critical systems' integrity, confidentiality, and availability. Understanding and mitigating the risks associated with IoT deployments is crucial in this evolving landscape, especially given the interdependencies between IoT components and systems.


AI-driven innovation in medicaid: enhancing access, cost efficiency, and population health management

arXiv.org Artificial Intelligence

Medicaid is a federal-state program that provides healthcare to over 80 million low-income Americans, including pregnant women, children, and individuals with disabilities. Up against a host of problems, including rising healthcare costs, disparity in access, and the management of chronic conditions among at-risk groups, Medicaid is one of the biggest healthcare payers in the U.S. Just as Medicare does, the use of Artificial Intelligence (AI) offers a major opportunity to change the delivery of care and operational efficiency in Medicaid [1] [16]. While there has been extensive conversation about AI in Medicare, the unique population and requirements of Medicaid require customized AI applications [1]. Chronic disease management, improving admin tasks, and a reduction in costs are amongst the ways AI tools can help, especially by focusing on social determinants of health (SDOH) that are important for Medicaid populations. The study will assess the ability of AI-enabled systems to reinforce Medicaid in handling its particular challenges while facilitating fair and quality care for its entire population of beneficiaries [8] [9].


Federated Learning in Practice: Reflections and Projections

arXiv.org Artificial Intelligence

Federated Learning (FL) is a machine learning technique that enables multiple entities to collaboratively learn a shared model without exchanging their local data. Over the past decade, FL systems have achieved substantial progress, scaling to millions of devices across various learning domains while offering meaningful differential privacy (DP) guarantees. Production systems from organizations like Google, Apple, and Meta demonstrate the real-world applicability of FL. However, key challenges remain, including verifying server-side DP guarantees and coordinating training across heterogeneous devices, limiting broader adoption. Additionally, emerging trends such as large (multi-modal) models and blurred lines between training, inference, and personalization challenge traditional FL frameworks. In response, we propose a redefined FL framework that prioritizes privacy principles rather than rigid definitions. We also chart a path forward by leveraging trusted execution environments and open-source ecosystems to address these challenges and facilitate future advancements in FL.


Audio Description Generation in the Era of LLMs and VLMs: A Review of Transferable Generative AI Technologies

arXiv.org Artificial Intelligence

Audio descriptions (ADs) function as acoustic commentaries designed to assist blind persons and persons with visual impairments in accessing digital media content on television and in movies, among other settings. As an accessibility service typically provided by trained AD professionals, the generation of ADs demands significant human effort, making the process both time-consuming and costly. Recent advancements in natural language processing (NLP) and computer vision (CV), particularly in large language models (LLMs) and vision-language models (VLMs), have allowed for getting a step closer to automatic AD generation. This paper reviews the technologies pertinent to AD generation in the era of LLMs and VLMs: we discuss how state-of-the-art NLP and CV technologies can be applied to generate ADs and identify essential research directions for the future.