AITopics | Du, Wei

Collaborating Authors

Du, Wei

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Crime Forecasting: A Spatio-temporal Analysis with Deep Learning Models

Mao, Li, Du, Wei, Wen, Shuo, Li, Qi, Zhang, Tong, Zhong, Wei

arXiv.org Artificial IntelligenceFeb-13-2025

This study uses deep-learning models to predict city partition crime counts on specific days. It helps police enhance surveillance, gather intelligence, and proactively prevent crimes. We formulate crime count prediction as a spatiotemporal sequence challenge, where both input data and prediction targets are spatiotemporal sequences. In order to improve the accuracy of crime forecasting, we introduce a new model that combines Convolutional Neural Networks (CNN) and Long Short-Term Memory (LSTM) networks. We conducted a comparative analysis to access the effects of various data sequences, including raw and binned data, on the prediction errors of four deep learning forecasting models. Directly inputting raw crime data into the forecasting model causes high prediction errors, making the model unsuitable for real - world use. The findings indicate that the proposed CNN-LSTM model achieves optimal performance when crime data is categorized into 10 or 5 groups. Data binning can enhance forecasting model performance, but poorly defined intervals may reduce map granularity. Compared to dividing into 5 bins, binning into 10 intervals strikes an optimal balance, preserving data characteristics and surpassing raw data in predictive modelling efficacy.

artificial intelligence, machine learning, spatio-temporal analysis, (2 more...)

arXiv.org Artificial Intelligence

2502.07465

Genre: Research Report (0.66)

Industry: Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

OpenMathInstruct-2: Accelerating AI for Math with Massive Open-Source Instruction Data

Toshniwal, Shubham, Du, Wei, Moshkov, Ivan, Kisacanin, Branislav, Ayrapetyan, Alexan, Gitman, Igor

arXiv.org Artificial IntelligenceOct-4-2024

Mathematical reasoning continues to be a critical challenge in large language model (LLM) development with significant interest. However, most of the cutting-edge progress in mathematical reasoning with LLMs has become \emph{closed-source} due to lack of access to training data. This lack of data access limits researchers from understanding the impact of different choices for synthesizing and utilizing the data. With the goal of creating a high-quality finetuning (SFT) dataset for math reasoning, we conduct careful ablation experiments on data synthesis using the recently released \texttt{Llama3.1} family of models. Our experiments show that: (a) solution format matters, with excessively verbose solutions proving detrimental to SFT performance, (b) data generated by a strong teacher outperforms equally-sized data generated by a weak student model, (c) SFT is robust to low-quality solutions, allowing for imprecise data filtering, and (d) question diversity is crucial for achieving data scaling gains. Based on these insights, we create the OpenMathInstruct-2 dataset, which consists of 14M question-solution pairs ($\approx$ 600K unique questions), making it nearly eight times larger than the previous largest open-source math reasoning dataset. Finetuning the \texttt{Llama-3.1-8B-Base} using OpenMathInstruct-2 outperforms \texttt{Llama3.1-8B-Instruct} on MATH by an absolute 15.9\% (51.9\% $\rightarrow$ 67.8\%). Finally, to accelerate the open-source efforts, we release the code, the finetuned models, and the OpenMathInstruct-2 dataset under a commercially permissive license.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2410.0156

Country:

North America (0.14)
Asia (0.14)

Genre: Research Report (0.81)

Industry: Education > Educational Setting > Online (0.41)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

TrojanRAG: Retrieval-Augmented Generation Can Be Backdoor Driver in Large Language Models

Cheng, Pengzhou, Ding, Yidong, Ju, Tianjie, Wu, Zongru, Du, Wei, Yi, Ping, Zhang, Zhuosheng, Liu, Gongshen

arXiv.org Artificial IntelligenceJul-7-2024

Large language models (LLMs) have raised concerns about potential security threats despite performing significantly in Natural Language Processing (NLP). Backdoor attacks initially verified that LLM is doing substantial harm at all stages, but the cost and robustness have been criticized. Attacking LLMs is inherently risky in security review, while prohibitively expensive. Besides, the continuous iteration of LLMs will degrade the robustness of backdoors. In this paper, we propose TrojanRAG, which employs a joint backdoor attack in the Retrieval-Augmented Generation, thereby manipulating LLMs in universal attack scenarios. Specifically, the adversary constructs elaborate target contexts and trigger sets. Multiple pairs of backdoor shortcuts are orthogonally optimized by contrastive learning, thus constraining the triggering conditions to a parameter subspace to improve the matching. To improve the recall of the RAG for the target contexts, we introduce a knowledge graph to construct structured data to achieve hard matching at a fine-grained level. Moreover, we normalize the backdoor scenarios in LLMs to analyze the real harm caused by backdoors from both attackers' and users' perspectives and further verify whether the context is a favorable tool for jailbreaking models. Extensive experimental results on truthfulness, language understanding, and harmfulness show that TrojanRAG exhibits versatility threats while maintaining retrieval capabilities on normal queries.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2405.13401

Country:

Asia (0.46)
North America > United States (0.28)

Genre: Research Report (0.64)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Investigating Multi-Hop Factual Shortcuts in Knowledge Editing of Large Language Models

Ju, Tianjie, Chen, Yijin, Yuan, Xinwei, Zhang, Zhuosheng, Du, Wei, Zheng, Yubin, Liu, Gongshen

arXiv.org Artificial IntelligenceJun-2-2024

Recent work has showcased the powerful capability of large language models (LLMs) in recalling knowledge and reasoning. However, the reliability of LLMs in combining these two capabilities into reasoning through multi-hop facts has not been widely explored. This paper systematically investigates the possibilities for LLMs to utilize shortcuts based on direct connections between the initial and terminal entities of multi-hop knowledge. We first explore the existence of factual shortcuts through Knowledge Neurons, revealing that: (i) the strength of factual shortcuts is highly correlated with the frequency of co-occurrence of initial and terminal entities in the pre-training corpora; (ii) few-shot prompting leverage more shortcuts in answering multi-hop questions compared to chain-of-thought prompting. Then, we analyze the risks posed by factual shortcuts from the perspective of multi-hop knowledge editing. Analysis shows that approximately 20% of the failures are attributed to shortcuts, and the initial and terminal entities in these failure instances usually have higher co-occurrences in the pre-training corpus. Finally, we propose erasing shortcut neurons to mitigate the associated risks and find that this approach significantly reduces failures in multiple-hop knowledge editing caused by shortcuts.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2402.119

Country:

North America > United States > Maryland (0.14)
North America > United States > Hawaii (0.14)
Asia > Middle East > UAE (0.14)

Genre: Research Report (0.82)

Industry:

Information Technology (0.47)
Leisure & Entertainment (0.47)
Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

SynGhost: Imperceptible and Universal Task-agnostic Backdoor Attack in Pre-trained Language Models

Cheng, Pengzhou, Du, Wei, Wu, Zongru, Zhang, Fengwei, Chen, Libo, Liu, Gongshen

arXiv.org Artificial IntelligenceMay-24-2024

Pre-training has been a necessary phase for deploying pre-trained language models (PLMs) to achieve remarkable performance in downstream tasks. However, we empirically show that backdoor attacks exploit such a phase as a vulnerable entry point for task-agnostic. In this paper, we first propose $\mathtt{maxEntropy}$, an entropy-based poisoning filtering defense, to prove that existing task-agnostic backdoors are easily exposed, due to explicit triggers used. Then, we present $\mathtt{SynGhost}$, an imperceptible and universal task-agnostic backdoor attack in PLMs. Specifically, $\mathtt{SynGhost}$ hostilely manipulates clean samples through different syntactic and then maps the backdoor to representation space without disturbing the primitive representation. $\mathtt{SynGhost}$ further leverages contrastive learning to achieve universal, which performs a uniform distribution of backdoors in the representation space. In light of the syntactic properties, we also introduce an awareness module to alleviate the interference between different syntactic. Experiments show that $\mathtt{SynGhost}$ holds more serious threats. Not only do severe harmfulness to various downstream tasks on two tuning paradigms but also to any PLMs. Meanwhile, $\mathtt{SynGhost}$ is imperceptible against three countermeasures based on perplexity, fine-pruning, and the proposed $\mathtt{maxEntropy}$.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2402.18945

Country: Asia > China (0.14)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

A Composite Decomposition Method for Large-Scale Global Optimization

Tian, Maojiang, Chen, Minyang, Du, Wei, Tang, Yang, Jin, Yaochu, Yen, Gary G.

arXiv.org Artificial IntelligenceMar-8-2024

Cooperative co-evolution (CC) algorithms, based on the divide-and-conquer strategy, have emerged as the predominant approach to solving large-scale global optimization (LSGO) problems. The efficiency and accuracy of the grouping stage significantly impact the performance of the optimization process. While the general separability grouping (GSG) method has overcome the limitation of previous differential grouping (DG) methods by enabling the decomposition of non-additively separable functions, it suffers from high computational complexity. To address this challenge, this article proposes a composite separability grouping (CSG) method, seamlessly integrating DG and GSG into a problem decomposition framework to utilize the strengths of both approaches. CSG introduces a step-by-step decomposition framework that accurately decomposes various problem types using fewer computational resources. By sequentially identifying additively, multiplicatively and generally separable variables, CSG progressively groups non-separable variables by recursively considering the interactions between each non-separable variable and the formed non-separable groups. Furthermore, to enhance the efficiency and accuracy of CSG, we introduce two innovative methods: a multiplicatively separable variable detection method and a non-separable variable grouping method. These two methods are designed to effectively detect multiplicatively separable variables and efficiently group non-separable variables, respectively. Extensive experimental results demonstrate that CSG achieves more accurate variable grouping with lower computational complexity compared to GSG and state-of-the-art DG series designs.

artificial intelligence, optimization problem, separable variable, (17 more...)

arXiv.org Artificial Intelligence

2403.01192

Country:

Asia > China (0.46)
North America > United States > Oklahoma > Payne County > Stillwater (0.14)

Genre: Research Report > New Finding (0.87)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)

Add feedback

How Large Language Models Encode Context Knowledge? A Layer-Wise Probing Study

Ju, Tianjie, Sun, Weiwei, Du, Wei, Yuan, Xinwei, Ren, Zhaochun, Liu, Gongshen

arXiv.org Artificial IntelligenceMar-4-2024

Previous work has showcased the intriguing capability of large language models (LLMs) in retrieving facts and processing context knowledge. However, only limited research exists on the layer-wise capability of LLMs to encode knowledge, which challenges our understanding of their internal mechanisms. In this paper, we devote the first attempt to investigate the layer-wise capability of LLMs through probing tasks. We leverage the powerful generative capability of ChatGPT to construct probing datasets, providing diverse and coherent evidence corresponding to various facts. We employ $\mathcal V$-usable information as the validation metric to better reflect the capability in encoding context knowledge across different layers. Our experiments on conflicting and newly acquired knowledge show that LLMs: (1) prefer to encode more context knowledge in the upper layers; (2) primarily encode context knowledge within knowledge-related entity tokens at lower layers while progressively expanding more knowledge within other tokens at upper layers; and (3) gradually forget the earlier context knowledge retained within the intermediate layers when provided with irrelevant evidence. Code is publicly available at https://github.com/Jometeorie/probing_llama.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2402.16061

Country:

Europe (0.93)
North America > United States > Maryland (0.14)
North America > United States > Hawaii (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)

Add feedback

Revisiting the Information Capacity of Neural Network Watermarks: Upper Bound Estimation and Beyond

Li, Fangqi, Zhao, Haodong, Du, Wei, Wang, Shilin

arXiv.org Artificial IntelligenceFeb-20-2024

To trace the copyright of deep neural networks, an owner can embed its identity information into its model as a watermark. The capacity of the watermark quantify the maximal volume of information that can be verified from the watermarked model. Current studies on capacity focus on the ownership verification accuracy under ordinary removal attacks and fail to capture the relationship between robustness and fidelity. This paper studies the capacity of deep neural network watermarks from an information theoretical perspective. We propose a new definition of deep neural network watermark capacity analogous to channel capacity, analyze its properties, and design an algorithm that yields a tight estimation of its upper bound under adversarial overwriting. We also propose a universal non-invasive method to secure the transmission of the identity message beyond capacity by multiple rounds of ownership verification. Our observations provide evidence for neural network owners and defenders that are curious about the tradeoff between the integrity of their ownership and the performance degradation of their products.

artificial intelligence, identity message, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2402.1272

Country: Asia > China (0.14)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Detection of Opioid Users from Reddit Posts via an Attention-based Bidirectional Recurrent Neural Network

Wang, Yuchen, Fang, Zhengyu, Du, Wei, Xu, Shuai, Xu, Rong, Li, Jing

arXiv.org Artificial IntelligenceFeb-9-2024

The opioid epidemic, referring to the growing hospitalizations and deaths because of overdose of opioid usage and addiction, has become a severe health problem in the United States. Many strategies have been developed by the federal and local governments and health communities to combat this crisis. Among them, improving our understanding of the epidemic through better health surveillance is one of the top priorities. In addition to direct testing, machine learning approaches may also allow us to detect opioid users by analyzing data from social media because many opioid users may choose not to do the tests but may share their experiences on social media anonymously. In this paper, we take advantage of recent advances in machine learning, collect and analyze user posts from a popular social network Reddit with the goal to identify opioid users. Posts from more than 1,000 users who have posted on three sub-reddits over a period of one month have been collected. In addition to the ones that contain keywords such as opioid, opiate, or heroin, we have also collected posts that contain slang words of opioid such as black or chocolate. We apply an attention-based bidirectional long short memory model to identify opioid users. Experimental results show that the approaches significantly outperform competitive algorithms in terms of F1-score. Furthermore, the model allows us to extract most informative words, such as opiate, opioid, and black, from posts via the attention layer, which provides more insights on how the machine learning algorithm works in distinguishing drug users from non-drug users.

artificial intelligence, machine learning, social media, (18 more...)

arXiv.org Artificial Intelligence

2403.15393

Country: North America > United States (0.88)

Genre: Research Report > New Finding (0.89)

Industry:

Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Addiction Disorder (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Consumer Health (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Knowledge-Assisted Dual-Stage Evolutionary Optimization of Large-Scale Crude Oil Scheduling

Zhang, Wanting, Du, Wei, Yu, Guo, He, Renchu, Du, Wenli, Jin, Yaochu

arXiv.org Artificial IntelligenceJan-9-2024

With the scaling up of crude oil scheduling in modern refineries, large-scale crude oil scheduling problems (LSCOSPs) emerge with thousands of binary variables and non-linear constraints, which are challenging to be optimized by traditional optimization methods. To solve LSCOSPs, we take the practical crude oil scheduling from a marine-access refinery as an example and start with modeling LSCOSPs from crude unloading, transportation, crude distillation unit processing, and inventory management of intermediate products. On the basis of the proposed model, a dual-stage evolutionary algorithm driven by heuristic rules (denoted by DSEA/HR) is developed, where the dual-stage search mechanism consists of global search and local refinement. In the global search stage, we devise several heuristic rules based on the empirical operating knowledge to generate a well-performing initial population and accelerate convergence in the mixed variables space. In the local refinement stage, a repair strategy is proposed to move the infeasible solutions towards feasible regions by further optimizing the local continuous variables. During the whole evolutionary process, the proposed dual-stage framework plays a crucial role in balancing exploration and exploitation. Experimental results have shown that DSEA/HR outperforms the state-of-the-art and widely-used mathematical programming methods and metaheuristic algorithms on LSCOSP instances within a reasonable time.

artificial intelligence, evolutionary algorithm, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2401.10274

Country:

Asia > China (0.14)
Europe > Germany (0.14)

Genre: Research Report (1.00)

Industry:

Energy > Oil & Gas > Upstream (1.00)
Energy > Oil & Gas > Downstream (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.89)

Add feedback