AITopics | Wang, Yihao

Collaborating Authors

Wang, Yihao

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

LIAM: Multimodal Transformer for Language Instructions, Images, Actions and Semantic Maps

Wang, Yihao, Memmesheimer, Raphael, Behnke, Sven

arXiv.org Artificial IntelligenceMar-15-2025

The availability of large language models and open-vocabulary object perception methods enables more flexibility for domestic service robots. The large variability of domestic tasks can be addressed without implementing each task individually by providing the robot with a task description along with appropriate environment information. In this work, we propose LIAM -- an end-to-end model that predicts action transcripts based on language, image, action, and map inputs. Language and image inputs are encoded with a CLIP backbone, for which we designed two pre-training tasks to fine-tune its weights and pre-align the latent spaces. We evaluate our method on the ALFRED dataset, a simulator-generated benchmark for domestic tasks. Our results demonstrate the importance of pre-aligning embedding spaces from different modalities and the efficacy of incorporating semantic maps.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2503.1223

Country: Europe > Germany (0.14)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.50)

Add feedback

A Stronger Mixture of Low-Rank Experts for Fine-Tuning Foundation Models

Sun, Mengyang, Wang, Yihao, Feng, Tao, Zhang, Dan, Zhu, Yifan, Tang, Jie

arXiv.org Artificial IntelligenceFeb-20-2025

In order to streamline the fine-tuning of foundation models, Low-Rank Adapters (LoRAs) have been substantially adopted across various fields, including instruction tuning and domain adaptation. The underlying concept of LoRA involves decomposing a full-rank matrix into the product of two lower-rank matrices, which reduces storage consumption and accelerates the training process. Furthermore, to address the limited expressive capacity of LoRA, the Mixture-of-Expert (MoE) has been introduced for incorporating multiple LoRA adapters. The integration of LoRA experts leads to a visible improvement across several downstream scenes. However, the mixture of LoRAs (MoE-LoRA) still exhibits its low robustness during tuning and inferring. Inspired by the Riemannian Preconditioners which train LoRA as a sub-space projector, we propose a new training strategy for MoE-LoRA, to stabilize and boost its feature learning procedure by multi-space projections. Examinations on SGD and AdamW optimizers demonstrate the effectiveness of our methodology. Source code is available at https://github.com/THUDM/MoELoRA_Riemannian.

arxiv preprint arxiv, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2502.15828

Country: Asia > China (0.14)

Genre: Research Report (0.82)

Industry: Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)

Add feedback

Toward Copyright Integrity and Verifiability via Multi-Bit Watermarking for Intelligent Transportation Systems

Wang, Yihao, Li, Lingxiao, Tang, Yifan, Zhang, Ru, Liu, Jianyi

arXiv.org Artificial IntelligenceFeb-7-2025

Intelligent transportation systems (ITS) use advanced technologies such as artificial intelligence to significantly improve traffic flow management efficiency, and promote the intelligent development of the transportation industry. However, if the data in ITS is attacked, such as tampering or forgery, it will endanger public safety and cause social losses. Therefore, this paper proposes a watermarking that can verify the integrity of copyright in response to the needs of ITS, termed ITSmark. ITSmark focuses on functions such as extracting watermarks, verifying permission, and tracing tampered locations. The scheme uses the copyright information to build the multi-bit space and divides this space into multiple segments. These segments will be assigned to tokens. Thus, the next token is determined by its segment which contains the copyright. In this way, the obtained data contains the custom watermark. To ensure the authorization, key parameters are encrypted during copyright embedding to obtain cipher data. Only by possessing the correct cipher data and private key, can the user entirely extract the watermark. Experiments show that ITSmark surpasses baseline performances in data quality, extraction accuracy, and unforgeability. It also shows unique capabilities of permission verification and tampered location tracing, which ensures the security of extraction and the reliability of copyright verification. Furthermore, ITSmark can also customize the watermark embedding position and proportion according to user needs, making embedding more flexible.

data mining, large language model, machine learning, (21 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/TITS.2025.3535932

2502.05425

Country:

North America > United States > Illinois (0.14)
North America > United States > Texas (0.14)

Genre:

Research Report (1.00)
Overview (0.67)

Industry:

Information Technology > Security & Privacy (1.00)
Transportation > Ground > Road (0.93)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(5 more...)

Add feedback

U-GIFT: Uncertainty-Guided Firewall for Toxic Speech in Few-Shot Scenario

Song, Jiaxin, Wang, Xinyu, Wang, Yihao, Tang, Yifan, Zhang, Ru, Liu, Jianyi, Liu, Gongshen

arXiv.org Artificial IntelligenceJan-1-2025

With the widespread use of social media, user-generated content has surged on online platforms. When such content includes hateful, abusive, offensive, or cyberbullying behavior, it is classified as toxic speech, posing a significant threat to the online ecosystem's integrity and safety. While manual content moderation is still prevalent, the overwhelming volume of content and the psychological strain on human moderators underscore the need for automated toxic speech detection. Previously proposed detection methods often rely on large annotated datasets; however, acquiring such datasets is both costly and challenging in practice. To address this issue, we propose an uncertainty-guided firewall for toxic speech in few-shot scenarios, U-GIFT, that utilizes self-training to enhance detection performance even when labeled data is limited. Specifically, U-GIFT combines active learning with Bayesian Neural Networks (BNNs) to automatically identify high-quality samples from unlabeled data, prioritizing the selection of pseudo-labels with higher confidence for training based on uncertainty estimates derived from model predictions. Extensive experiments demonstrate that U-GIFT significantly outperforms competitive baselines in few-shot detection scenarios. In the 5-shot setting, it achieves a 14.92\% performance improvement over the basic model. Importantly, U-GIFT is user-friendly and adaptable to various pre-trained language models (PLMs). It also exhibits robust performance in scenarios with sample imbalance and cross-domain settings, while showcasing strong generalization across various language applications. We believe that U-GIFT provides an efficient solution for few-shot toxic speech detection, offering substantial support for automated content moderation in cyberspace, thereby acting as a firewall to promote advancements in cybersecurity.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2501.00907

Country: North America > United States > Minnesota (0.28)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Law > Civil Rights & Constitutional Law (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

DeSplat: Decomposed Gaussian Splatting for Distractor-Free Rendering

Wang, Yihao, Klasson, Marcus, Turkulainen, Matias, Wang, Shuzhe, Kannala, Juho, Solin, Arno

arXiv.org Artificial IntelligenceNov-29-2024

Gaussian splatting enables fast novel view synthesis in static 3D environments. However, reconstructing real-world environments remains challenging as distractors or occluders break the multi-view consistency assumption required for accurate 3D reconstruction. Most existing methods rely on external semantic information from pre-trained models, introducing additional computational overhead as pre-processing steps or during optimization. In this work, we propose a novel method, DeSplat, that directly separates distractors and static scene elements purely based on volume rendering of Gaussian primitives. We initialize Gaussians within each camera view for reconstructing the view-specific distractors to separately model the static 3D scene and distractors in the alpha compositing stages. DeSplat yields an explicit scene separation of static elements and distractors, achieving comparable results to prior distractor-free approaches without sacrificing rendering speed. We demonstrate DeSplat's effectiveness on three benchmark data sets for distractor-free novel view synthesis. See the project website at https://aaltoml.github.io/desplat/.

artificial intelligence, gaussian, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2411.19756

Country: Europe (0.28)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

Linguistic Steganalysis via LLMs: Two Modes for Efficient Detection of Strongly Concealed Stego

Tang, Yifan, Wang, Yihao, Zhang, Ru, Liu, Jianyi

arXiv.org Artificial IntelligenceJun-21-2024

To detect stego (steganographic text) in complex scenarios, linguistic steganalysis (LS) with various motivations has been proposed and achieved excellent performance. However, with the development of generative steganography, some stegos have strong concealment, especially after the emergence of LLMs-based steganography, the existing LS has low detection or cannot detect them. We designed a novel LS with two modes called LSGC. In the generation mode, we created an LS-task "description" and used the generation ability of LLM to explain whether texts to be detected are stegos. On this basis, we rethought the principle of LS and LLMs, and proposed the classification mode. In this mode, LSGC deleted the LS-task "description" and used the "causalLM" LLMs to extract steganographic features. The LS features can be extracted by only one pass of the model, and a linear layer with initialization weights is added to obtain the classification probability. Experiments on strongly concealed stegos show that LSGC significantly improves detection and reaches SOTA performance. Additionally, LSGC in classification mode greatly reduces training time while maintaining high performance.

large language model, machine learning, steganalysis, (18 more...)

arXiv.org Artificial Intelligence

2406.04218

Country: Asia > China (0.14)

Genre: Research Report (0.50)

Industry: Information Technology (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

LLsM: Generative Linguistic Steganography with Large Language Model

Wang, Yihao, Song, Ruiqi, Zhang, Ru, Liu, Jianyi, Li, Lingxiao

arXiv.org Artificial IntelligenceFeb-6-2024

Linguistic Steganography (LS) tasks aim to generate steganographic text (stego) based on secret information. Only authorized recipients can perceive the existence of secrets in the texts and extract them, thereby preserving privacy. However, the controllability of the stego generated by existing schemes is poor, and the stego is difficult to contain specific discourse characteristics such as style. As a result, the stego is easily detectable, compromising covert communication. To address these problems, this paper proposes LLsM, the first LS with the Large Language Model (LLM). We fine-tuned the LLaMA2 with a large-scale constructed dataset encompassing rich discourse characteristics, which enables the fine-tuned LLM to generate texts with specific discourse in a controllable manner. Then the discourse is used as guiding information and inputted into the fine-tuned LLM in the form of the Prompt together with secret. On this basis, the constructed candidate pool will be range encoded and use secret to determine the interval. The same prefix of this interval's beginning and ending is the secret embedded at this moment. Experiments show that LLsM performs superior to prevalent LS-task and related-task baselines regarding text quality, statistical analysis, discourse matching, and anti-steganalysis. In particular, LLsM's MAUVE matric surpasses some baselines by 70%-80%, and its anti-steganalysis performance is 30%-40% higher. Notably, we also present examples of longer stegos generated by LLsM, showing its potential superiority in long LS tasks.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2401.15656

Country:

Asia > China (0.14)
North America > United States (0.14)

Genre: Research Report (1.00)

Industry:

Banking & Finance (0.68)
Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

UP4LS: User Profile Constructed by Multiple Attributes for Enhancing Linguistic Steganalysis

Wang, Yihao, Song, Ruiqi, Zhang, Ru, Liu, Jianyi

arXiv.org Artificial IntelligenceNov-3-2023

Linguistic steganalysis (LS) tasks aim to effectively detect stegos generated by linguistic steganography. Existing LS methods overlook the distinctive user characteristics, leading to weak performance in social networks. The limited occurrence of stegos further complicates detection. In this paper, we propose the UP4LS, a novel framework with the User Profile for enhancing LS performance. Specifically, by delving into post content, we explore user attributes like writing habits, psychological states, and focal areas, thereby building the user profile for LS. For each attribute, we design the identified feature extraction module. The extracted features are mapped to high-dimensional user features via deep-learning networks from existing methods. Then the language model is employed to extract content features. The user and content features are integrated to optimize feature representation. During the training phase, we prioritize the distribution of stegos. Experiments demonstrate that UP4LS can significantly enhance the performance of existing methods, and an overall accuracy improvement of nearly 25%. In particular, the improvement is especially pronounced with fewer stego samples. Additionally, UP4LS also sets the stage for studies on related tasks, encouraging extensive applications on LS tasks.

artificial intelligence, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2311.01775

Country: Europe > United Kingdom (0.67)

Genre: Research Report (0.50)

Industry:

Information Technology > Security & Privacy (0.67)
Government > Regional Government > Europe Government > United Kingdom Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Improving Adversarial Robustness for Free with Snapshot Ensemble

Wang, Yihao

arXiv.org Artificial IntelligenceOct-6-2021

Adversarial training, as one of the few certified defenses against adversarial attacks, can be quite complicated and time-consuming, while the results might not be robust enough. To address the issue of lack of robustness, ensemble methods were proposed, aiming to get the final output by weighting the selected results from repeatedly trained processes. It is proved to be very useful in achieving robust and accurate results, but the computational and memory costs are even higher. Snapshot ensemble, a new ensemble method that combines several local minima in a single training process to make the final prediction, was proposed recently, which reduces the time spent on training multiple networks and the memory to store the results. Based on the snapshot ensemble, we present a new method that is easier to implement: unlike original snapshot ensemble that seeks for local minima, our snapshot ensemble focuses on the last few iterations of a training and stores the sets of parameters from them. Our algorithm is much simpler but the results are no less accurate than the original ones: based on different hyperparameters and datasets, our snapshot ensemble has shown a 5% to 30% increase in accuracy when compared to the traditional adversarial training.

artificial intelligence, machine learning, neural network, (10 more...)

arXiv.org Artificial Intelligence

2110.03124

Country: North America > United States > Illinois (0.14)

Genre: Research Report (0.50)

Industry: Information Technology > Security & Privacy (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback