A Classification System Approach in Predicting Chinese Censorship

Feb-6-2025–arXiv.org Artificial Intelligence

This paper is dedicated to using a classifier to predict whether a Weibo post would be censored under the Chinese internet. Through randomized sampling from \citeauthor{Fu2021} and Chinese tokenizing strategies, we constructed a cleaned Chinese phrase dataset with binary censorship markings. Utilizing various probability-based information retrieval methods on the data, we were able to derive 4 logistic regression models for classification. Furthermore, we experimented with pre-trained transformers to perform similar classification tasks. After evaluating both the macro-F1 and ROC-AUC metrics, we concluded that the Fined-Tuned BERT model exceeds other strategies in performance.

information retrieval, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

Feb-6-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - New York > New York County > New York City (0.04)
- Asia > China
  - Beijing > Beijing (0.04)

Genre:
- Research Report
  - New Finding (0.50)
  - Experimental Study (0.36)

Industry:
- Law > Civil Rights & Constitutional Law (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Information Retrieval (0.69)
  - Machine Learning > Statistical Learning
    - Regression (0.71)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found