AITopics

2502.13146

Country: North America > United States (0.28)

Genre:

Research Report (0.70)
Overview (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Kim, Changhoon, Qi, Yanjun

A Comprehensive Survey on Concept Erasure in Text-to-Image Diffusion Models

Text-to-Image (T2I) models have made remarkable progress in generating high-quality, diverse visual content from natural language prompts. However, their ability to reproduce copyrighted styles, sensitive imagery, and harmful content raises significant ethical and legal concerns. Concept erasure offers a proactive alternative to external filtering by modifying T2I models to prevent the generation of undesired content. In this survey, we provide a structured overview of concept erasure, categorizing existing methods based on their optimization strategies and the architectural components they modify. We categorize concept erasure methods into fine-tuning for parameter updates, closed-form solutions for efficient edits, and inference-time interventions for content restriction without weight modification. Additionally, we explore adversarial attacks that bypass erasure techniques and discuss emerging defenses. To support further research, we consolidate key datasets, evaluation metrics, and benchmarks for assessing erasure effectiveness and model robustness. This survey serves as a comprehensive resource, offering insights into the evolving landscape of concept erasure, its challenges, and future directions.

concept erasure, diffusion model, erasure, (14 more...)

2502.14896

Genre: Overview (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)

Kevin, Jun, Yugopuspito, Pujianto

SmartLLM: Smart Contract Auditing using Custom Generative AI

Smart contracts are essential to decentralized finance (DeFi) and blockchain ecosystems but are increasingly vulnerable to exploits due to coding errors and complex attack vectors. Traditional static analysis tools and existing vulnerability detection methods often fail to address these challenges comprehensively, leading to high false-positive rates and an inability to detect dynamic vulnerabilities. This paper introduces SmartLLM, a novel approach leveraging fine-tuned LLaMA 3.1 models with Retrieval-Augmented Generation (RAG) to enhance the accuracy and efficiency of smart contract auditing. By integrating domain-specific knowledge from ERC standards and employing advanced techniques such as QLoRA for efficient fine-tuning, SmartLLM achieves superior performance compared to static analysis tools like Mythril and Slither, as well as zero-shot large language model (LLM) prompting methods such as GPT-3.5 and GPT-4. Experimental results demonstrate a perfect recall of 100% and an accuracy score of 70%, highlighting the model's robustness in identifying vulnerabilities, including reentrancy and access control issues. This research advances smart contract security by offering a scalable and effective auditing solution, supporting the secure adoption of decentralized applications.

contract, smart contract, vulnerability, (14 more...)

2502.13167

Country:

Asia > Indonesia > Java > Jakarta > Jakarta (0.04)
North America > United States > New York > New York County > New York City (0.04)

Genre:

Research Report (1.00)
Overview (0.67)

Industry:

Information Technology > Security & Privacy (1.00)
Banking & Finance > Economy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.42)

Kidambi, Ananth K., Singh, Guramrit, Dilkas, Paulius, Meel, Kuldeep S.

Towards Practical First-Order Model Counting

First-order model counting (FOMC) is the problem of counting the number of models of a sentence in first-order logic. Since lifted inference techniques rely on reductions to variants of FOMC, the design of scalable methods for FOMC has attracted attention from both theoreticians and practitioners over the past decade. Recently, a new approach based on first-order knowledge compilation was proposed. This approach, called Crane, instead of simply providing the final count, generates definitions of (possibly recursive) functions that can be evaluated with different arguments to compute the model count for any domain size. However, this approach is not fully automated, as it requires manual evaluation of the constructed functions. The primary contribution of this work is a fully automated compilation algorithm, called Gantry, which transforms the function definitions into C++ code equipped with arbitrary-precision arithmetic. These additions allow the new FOMC algorithm to scale to domain sizes over 500,000 times larger than the current state of the art, as demonstrated through experimental results.

algorithm, artificial intelligence, logic & formal reasoning, (19 more...)

2502.12278

Country:

Asia (0.28)
North America > Canada > Ontario > Toronto (0.14)

Genre:

Research Report (1.00)
Overview (0.93)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)

EssayJudge: A Multi-Granular Benchmark for Assessing Automated Essay Scoring Capabilities of Multimodal Large Language Models

Su, Jiamin, Yan, Yibo, Fu, Fangteng, Zhang, Han, Ye, Jingheng, Liu, Xiang, Huo, Jiahao, Zhou, Huiyu, Hu, Xuming

Automated Essay Scoring (AES) plays a crucial role in educational assessment by providing scalable and consistent evaluations of writing tasks. However, traditional AES systems face three major challenges: (1) reliance on handcrafted features that limit generalizability, (2) difficulty in capturing fine-grained traits like coherence and argumentation, and (3) inability to handle multimodal contexts. In the era of Multimodal Large Language Models (MLLMs), we propose EssayJudge, the first multimodal benchmark to evaluate AES capabilities across lexical-, sentence-, and discourse-level traits. By leveraging MLLMs' strengths in trait-specific scoring and multimodal context understanding, EssayJudge aims to offer precise, context-rich evaluations without manual feature engineering, addressing longstanding AES limitations. Our experiments with 18 representative MLLMs reveal gaps in AES performance compared to human evaluation, particularly in discourse-level traits, highlighting the need for further advancements in MLLM-based AES research. Our dataset and code will be available upon acceptance.

large language model, machine learning, natural language, (21 more...)

2502.11916

Country: Asia (0.46)

Genre:

Overview (1.00)
Research Report > New Finding (0.46)

Industry:

Education > Assessment & Standards > Student Performance (1.00)
Education > Educational Technology > Educational Software > Computer-Aided Assessment (0.85)
Education > Educational Technology > Educational Software > Computer Based Training (0.61)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Koloski, Boshko, Margeloiu, Andrei, Jiang, Xiangjian, Škrlj, Blaž, Simidjievski, Nikola, Jamnik, Mateja

LLM Embeddings for Deep Learning on Tabular Data

Tabular deep-learning methods require embedding numerical and categorical input features into high-dimensional spaces before processing them. Existing methods deal with this heterogeneous nature of tabular data by employing separate type-specific encoding approaches. This limits the cross-table transfer potential and the exploitation of pre-trained knowledge. We propose a novel approach that first transforms tabular data into text, and then leverages pre-trained representations from LLMs to encode this data, resulting in a plug-and-play solution to improv ing deep-learning tabular methods. We demonstrate that our approach improves accuracy over competitive models, such as MLP, ResNet and FT-Transformer, by validating on seven classification datasets.

large language model, machine learning, natural language, (15 more...)

2502.11596

Country:

North America > United States (0.29)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)

Genre:

Research Report > Promising Solution (0.48)
Research Report > New Finding (0.46)
Overview > Innovation (0.34)

Industry:

Health & Medicine > Therapeutic Area (0.73)
Education > Educational Setting (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Advances in Multimodal Adaptation and Generalization: From Traditional Approaches to Foundation Models

Dong, Hao, Liu, Moru, Zhou, Kaiyang, Chatzi, Eleni, Kannala, Juho, Stachniss, Cyrill, Fink, Olga

In real-world scenarios, achieving domain adaptation and generalization poses significant challenges, as models must adapt to or generalize across unknown target distributions. Extending these capabilities to unseen multimodal distributions, i.e., multimodal domain adaptation and generalization, is even more challenging due to the distinct characteristics of different modalities. Significant progress has been made over the years, with applications ranging from action recognition to semantic segmentation. Besides, the recent advent of large-scale pre-trained multimodal foundation models, such as CLIP, has inspired works leveraging these models to enhance adaptation and generalization performances or adapting them to downstream tasks. This survey provides the first comprehensive review of recent advances from traditional approaches to foundation models, covering: (1) Multimodal domain adaptation; (2) Multimodal test-time adaptation; (3) Multimodal domain generalization; (4) Domain adaptation and generalization with the help of multimodal foundation models; and (5) Adaptation of multimodal foundation models. For each topic, we formally define the problem and thoroughly review existing methods. Additionally, we analyze relevant datasets and applications, highlighting open challenges and potential future research directions. We maintain an active repository that contains up-to-date literature at https://github.com/donghao51/Awesome-Multimodal-Adaptation.

large language model, machine learning, natural language, (21 more...)

2501.18592

Country: Europe > Germany (0.28)

Genre:

Research Report (1.00)
Overview (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
(4 more...)

Jafarigol, Elaheh, Alaghband, Behnoud, Gilanpour, Azadeh, Hosseinipoor, Saeid, Mirmozafari, Mirhamed

AI/ML-Based Automatic Modulation Recognition: Recent Trends and Future Possibilities

We present a review of high-performance automatic modulation recognition (AMR) models proposed in the literature to classify various Radio Frequency (RF) modulation schemes. We replicated these models and compared their performance in terms of accuracy across a range of signal-to-noise ratios. To ensure a fair comparison, we used the same dataset (RadioML-2016A), the same hardware, and a consistent definition of test accuracy as the evaluation metric, thereby providing a benchmark for future AMR studies. The hyperparameters were selected based on the authors' suggestions in the associated references to achieve results as close as possible to the originals. The replicated models are publicly accessible for further analysis of AMR models. We also present the test accuracies of the selected models versus their number of parameters, indicating their complexities. Building on this comparative analysis, we identify strategies to enhance these models' performance. Finally, we present potential opportunities for improvement, whether through novel architectures, data processing techniques, or training strategies, to further advance the capabilities of AMR models.

accuracy, artificial intelligence, machine learning, (19 more...)

2502.05315

Country: North America > United States (1.00)

Genre:

Research Report (1.00)
Overview (1.00)

Industry:

Information Technology (0.88)
Media (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Abootorabi, Mohammad Mahdi, Zobeiri, Amirhosein, Dehghani, Mahdi, Mohammadkhani, Mohammadali, Mohammadi, Bardia, Ghahroodi, Omid, Baghshah, Mahdieh Soleymani, Asgari, Ehsaneddin

Ask in Any Modality: A Comprehensive Survey on Multimodal Retrieval-Augmented Generation

Large Language Models (LLMs) struggle with hallucinations and outdated knowledge due to their reliance on static training data. Retrieval-Augmented Generation (RAG) mitigates these issues by integrating external dynamic information enhancing factual and updated grounding. Recent advances in multimodal learning have led to the development of Multimodal RAG, incorporating multiple modalities such as text, images, audio, and video to enhance the generated outputs. However, cross-modal alignment and reasoning introduce unique challenges to Multimodal RAG, distinguishing it from traditional unimodal RAG. This survey offers a structured and comprehensive analysis of Multimodal RAG systems, covering datasets, metrics, benchmarks, evaluation, methodologies, and innovations in retrieval, fusion, augmentation, and generation. We precisely review training strategies, robustness enhancements, and loss functions, while also exploring the diverse Multimodal RAG scenarios. Furthermore, we discuss open challenges and future research directions to support advancements in this evolving field. This survey lays the foundation for developing more capable and reliable AI systems that effectively leverage multimodal dynamic external knowledge bases. Resources are available at https://github.com/llm-lab-org/Multimodal-RAG-Survey.

large language model, machine learning, natural language, (20 more...)

2502.08826

Country:

Europe (1.00)
Asia > Middle East (0.93)
North America > United States > Minnesota (0.28)

Genre:

Research Report (1.00)
Overview (1.00)

Industry:

Information Technology (0.92)
Health & Medicine > Diagnostic Medicine > Imaging (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Connector-S: A Survey of Connectors in Multi-modal Large Language Models

Zhu, Xun, Zhang, Zheng, Chen, Xi, Shi, Yiming, Li, Miao, Wu, Ji

With the rapid advancements in multi-modal large language models (MLLMs), connectors play a pivotal role in bridging diverse modalities and enhancing model performance. However, the design and evolution of connectors have not been comprehensively analyzed, leaving gaps in understanding how these components function and hindering the development of more powerful connectors. In this survey, we systematically review the current progress of connectors in MLLMs and present a structured taxonomy that categorizes connectors into atomic operations (mapping, compression, mixture of experts) and holistic designs (multi-layer, multi-encoder, multi-modal scenarios), highlighting their technical contributions and advancements. Furthermore, we discuss several promising research frontiers and challenges, including high-resolution input, dynamic compression, guide information selection, combination strategy, and interpretability. This survey is intended to serve as a foundational reference and a clear roadmap for researchers, providing valuable insights into the design and optimization of next-generation connectors to enhance the performance and adaptability of MLLMs.

artificial intelligence, large language model, survey article, (2 more...)

2502.11453

Genre:

Research Report (1.00)
Overview (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.60)