Goto

Collaborating Authors

 Overview


A Survey on Diffusion Models for Inverse Problems

arXiv.org Artificial Intelligence

Diffusion models have become increasingly popular for generative modeling due to their ability to generate high-quality samples. This has unlocked exciting new possibilities for solving inverse problems, especially in image restoration and reconstruction, by treating diffusion models as unsupervised priors. This survey provides a comprehensive overview of methods that utilize pre-trained diffusion models to solve inverse problems without requiring further training. We introduce taxonomies to categorize these methods based on both the problems they address and the techniques they employ. We analyze the connections between different approaches, offering insights into their practical implementation and highlighting important considerations. We further discuss specific challenges and potential solutions associated with using latent diffusion models for inverse problems. This work aims to be a valuable resource for those interested in learning about the intersection of diffusion models and inverse problems.


Mitigating Backdoor Threats to Large Language Models: Advancement and Challenges

arXiv.org Artificial Intelligence

The advancement of Large Language Models (LLMs) has significantly impacted various domains, including Web search, healthcare, and software development. However, as these models scale, they become more vulnerable to cybersecurity risks, particularly backdoor attacks. By exploiting the potent memorization capacity of LLMs, adversaries can easily inject backdoors into LLMs by manipulating a small portion of training data, leading to malicious behaviors in downstream applications whenever the hidden backdoor is activated by the pre-defined triggers. Moreover, emerging learning paradigms like instruction tuning and reinforcement learning from human feedback (RLHF) exacerbate these risks as they rely heavily on crowdsourced data and human feedback, which are not fully controlled. In this paper, we present a comprehensive survey of emerging backdoor threats to LLMs that appear during LLM development or inference, and cover recent advancement in both defense and detection strategies for mitigating backdoor threats to LLMs. We also outline key challenges in addressing these threats, highlighting areas for future research.


NeuroMax: Enhancing Neural Topic Modeling via Maximizing Mutual Information and Group Topic Regularization

arXiv.org Artificial Intelligence

Recent advances in neural topic models have concentrated on two primary directions: the integration of the inference network (encoder) with a pre-trained language model (PLM) and the modeling of the relationship between words and topics in the generative model (decoder). However, the use of large PLMs significantly increases inference costs, making them less practical for situations requiring low inference times. Furthermore, it is crucial to simultaneously model the relationships between topics and words as well as the interrelationships among topics themselves. In this work, we propose a novel framework called NeuroMax (Neural Topic Model with Maximizing Mutual Information with Pretrained Language Model and Group Topic Regularization) to address these challenges. NeuroMax maximizes the mutual information between the topic representation obtained from the encoder in neural topic models and the representation derived from the PLM. Additionally, NeuroMax employs optimal transport to learn the relationships between topics by analyzing how information is transported among them. Experimental results indicate that NeuroMax reduces inference time, generates more coherent topics and topic groups, and produces more representative document embeddings, thereby enhancing performance on downstream tasks.


Natural Language Generation for Visualizations: State of the Art, Challenges and Future Directions

arXiv.org Artificial Intelligence

Natural language and visualization are two complementary modalities of human communication that play a crucial role in conveying information effectively. While visualizations help people discover trends, patterns, and anomalies in data, natural language descriptions help explain these insights. Thus, combining text with visualizations is a prevalent technique for effectively delivering the core message of the data. Given the rise of natural language generation (NLG), there is a growing interest in automatically creating natural language descriptions for visualizations, which can be used as chart captions, answering questions about charts, or telling data-driven stories. In this survey, we systematically review the state of the art on NLG for visualizations and introduce a taxonomy of the problem. The NLG tasks fall within the domain of Natural Language Interfaces (NLI) for visualization, an area that has garnered significant attention from both the research community and industry. To narrow down the scope of the survey, we primarily concentrate on the research works that focus on text generation for visualizations. To characterize the NLG problem and the design space of proposed solutions, we pose five Wh-questions, why and how NLG tasks are performed for visualizations, what the task inputs and outputs are, as well as where and when the generated texts are integrated with visualizations. We categorize the solutions used in the surveyed papers based on these "five Wh-questions." Finally, we discuss the key challenges and potential avenues for future research in this domain.


A Systematic Review of NLP for Dementia- Tasks, Datasets and Opportunities

arXiv.org Artificial Intelligence

The close link between cognitive decline and language has fostered long-standing collaboration between the NLP and medical communities in dementia research. To examine this, we reviewed over 200 papers applying NLP to dementia related efforts, drawing from medical, technological, and NLP-focused literature. We identify key research areas, including dementia detection, linguistic biomarker extraction, caregiver support, and patient assistance, showing that half of all papers focus solely on dementia detection using clinical data. However, many directions remain unexplored: artificially degraded language models, synthetic data, digital twins, and more. We highlight gaps and opportunities around trust, scientific rigor, applicability, and cross-community collaboration, and showcase the diverse datasets encountered throughout our review: recorded, written, structured, spontaneous, synthetic, clinical, social media based, and more. This review aims to inspire more creative approaches to dementia research within the medical and NLP communities.


A Survey on Graph Neural Networks for Remaining Useful Life Prediction: Methodologies, Evaluation and Future Trends

arXiv.org Artificial Intelligence

The prediction of Remaining Useful Life (RUL) is a critical component in the field of Prognostics and Health Management (PHM), which aims to predict the future state of a system to ensure timely maintenance and prevent unexpected failures (Wang, Xu, Li, Ren, Dong, Chen, Du, Wang, Shi and Zhang, 2024f; Karatzinis, Boutalis and Van Vaerenbergh, 2024; Zhang, Yuan, Jiang and Zhao, 2024b). Accurate RUL prediction enable predictive maintenance, which can significantly reduce downtime, improve safety, and optimize the lifecycle management of machinery and equipment. Additionally, effective RUL prediction can enhance decision-making processes, improve resource allocation, and reduce maintenance costs. In recent years, deep learning has become increasingly important in RUL prediction due to its ability to model complex patterns and dependencies, providing more accurate and reliable predictions compared to traditional methods, such as statistical approaches (Si, Wang, Hu and Zhou, 2011) and physicsbased models (Lei, Li, Gontarz, Lin, Radkowski and Dybala, 2016; Sikorska, Hodkiewicz and Ma, 2011; Li, Zhang, Li and Si, 2024). Existing studies in RUL prediction have primarily focused on utilizing temporal encoders such as Temporal Convolutional Networks (TCN) (Qiu, Niu, Shang, Gao and Xu, 2023), Gated Recurrent Units (GRU), Convolutional Neural Networks (CNN) (Shang, Xu, Qiu, Gao, Jiang and Yi, 2024), and Long Short-Term Memory (LSTM) networks. These methods have achieved strong performance due to their ability to capture temporal information, which refers to the time-based patterns and sequences within the data, such as trends and periodic behaviors. However, they are not effective at capturing spatial information, which limits their performance in RUL prediction.


Abstractive Summarization of Low resourced Nepali language using Multilingual Transformers

arXiv.org Artificial Intelligence

Automatic text summarization in Nepali language is an unexplored area in natural language processing (NLP). Although considerable research has been dedicated to extractive summarization, the area of abstractive summarization, especially for low-resource languages such as Nepali, remains largely unexplored. This study explores the use of multilingual transformer models, specifically mBART and mT5, for generating headlines for Nepali news articles through abstractive summarization. The research addresses key challenges associated with summarizing texts in Nepali by first creating a summarization dataset through web scraping from various Nepali news portals. These multilingual models were then fine-tuned using different strategies. The performance of the fine-tuned models were then assessed using ROUGE scores and human evaluation to ensure the generated summaries were coherent and conveyed the original meaning. During the human evaluation, the participants were asked to select the best summary among those generated by the models, based on criteria such as relevance, fluency, conciseness, informativeness, factual accuracy, and coverage. During the evaluation with ROUGE scores, the 4-bit quantized mBART with LoRA model was found to be effective in generating better Nepali news headlines in comparison to other models and also it was selected 34.05% of the time during the human evaluation, outperforming all other fine-tuned models created for Nepali News headline generation.


BEATS: Optimizing LLM Mathematical Capabilities with BackVerify and Adaptive Disambiguate based Efficient Tree Search

arXiv.org Artificial Intelligence

Large Language Models (LLMs) have exhibited exceptional performance across a broad range of tasks and domains. However, they still encounter difficulties in solving mathematical problems due to the rigorous and logical nature of mathematics. Previous studies have employed techniques such as supervised fine-tuning (SFT), prompt engineering, and search-based methods to improve the mathematical problem-solving abilities of LLMs. Despite these efforts, their performance remains suboptimal and demands substantial computational resources. To address this issue, we propose a novel approach, BEATS, to enhance mathematical problem-solving abilities. Our method leverages newly designed prompts that guide the model to iteratively rewrite, advance by one step, and generate answers based on previous steps. Additionally, we employ a pruning tree search to optimize search time while achieving strong performance. Furthermore, we introduce a new back-verification technique that uses LLMs to validate the correctness of the generated answers. Notably, our method improves Qwen2-7b-Instruct's score from 36.94 to 61.52 (outperforming GPT-4's 42.5) on the MATH benchmark.


Lionsgate's bold move into AI is about to change filmmaking forever

FOX News

With this elaborate integration of AI, there is, of course, the fear that AI will take over or replace human talent. However, the recent collaboration between Lionsgate and Runway shows that it has actually been enhancing the process versus diminishing creativity. Instead of replacing their human counterparts, these technologies are being used as tools to help humans cut down on time for specific tasks, which allows them to focus on the joy of creating. It also enables more creative approaches at a lower cost.


Advanced Clustering Techniques for Speech Signal Enhancement: A Review and Metanalysis of Fuzzy C-Means, K-Means, and Kernel Fuzzy C-Means Methods

arXiv.org Artificial Intelligence

Speech signal processing is a cornerstone of modern communication technologies, tasked with improving the clarity and comprehensibility of audio data in noisy environments. The primary challenge in this field is the effective separation and recognition of speech from background noise, crucial for applications ranging from voice-activated assistants to automated transcription services. The quality of speech recognition directly impacts user experience and accessibility in technology-driven communication. This review paper explores advanced clustering techniques, particularly focusing on the Kernel Fuzzy C-Means (KFCM) method, to address these challenges. Our findings indicate that KFCM, compared to traditional methods like K-Means (KM) and Fuzzy C-Means (FCM), provides superior performance in handling non-linear and non-stationary noise conditions in speech signals. The most notable outcome of this review is the adaptability of KFCM to various noisy environments, making it a robust choice for speech enhancement applications. Additionally, the paper identifies gaps in current methodologies, such as the need for more dynamic clustering algorithms that can adapt in real time to changing noise conditions without compromising speech recognition quality. Key contributions include a detailed comparative analysis of current clustering algorithms and suggestions for further integrating hybrid models that combine KFCM with neural networks to enhance speech recognition accuracy. Through this review, we advocate for a shift towards more sophisticated, adaptive clustering techniques that can significantly improve speech enhancement and pave the way for more resilient speech processing systems.