Chen, Zilong
NatureLM: Deciphering the Language of Nature for Scientific Discovery
Xia, Yingce, Jin, Peiran, Xie, Shufang, He, Liang, Cao, Chuan, Luo, Renqian, Liu, Guoqing, Wang, Yue, Liu, Zequn, Chen, Yuan-Jyue, Guo, Zekun, Bai, Yeqi, Deng, Pan, Min, Yaosen, Lu, Ziheng, Hao, Hongxia, Yang, Han, Li, Jielan, Liu, Chang, Zhang, Jia, Zhu, Jianwei, Wu, Kehan, Zhang, Wei, Gao, Kaiyuan, Pei, Qizhi, Wang, Qian, Liu, Xixian, Li, Yanting, Zhu, Houtian, Lu, Yeqing, Ma, Mingqian, Wang, Zun, Xie, Tian, Maziarz, Krzysztof, Segler, Marwin, Yang, Zhao, Chen, Zilong, Shi, Yu, Zheng, Shuxin, Wu, Lijun, Hu, Chen, Dai, Peggy, Liu, Tie-Yan, Liu, Haiguang, Qin, Tao
Foundation models have revolutionized natural language processing and artificial intelligence, significantly enhancing how machines comprehend and generate human languages. Inspired by the success of these foundation models, researchers have developed foundation models for individual scientific domains, including small molecules, materials, proteins, DNA, and RNA. However, these models are typically trained in isolation, lacking the ability to integrate across different scientific domains. Recognizing that entities within these domains can all be represented as sequences, which together form the "language of nature", we introduce Nature Language Model (briefly, NatureLM), a sequence-based science foundation model designed for scientific discovery. Pre-trained with data from multiple scientific domains, NatureLM offers a unified, versatile model that enables various applications including: (i) generating and optimizing small molecules, proteins, RNA, and materials using text instructions; (ii) cross-domain generation/design, such as protein-to-molecule and protein-to-RNA generation; and (iii) achieving state-of-the-art performance in tasks like SMILES-to-IUPAC translation and retrosynthesis on USPTO-50k. NatureLM offers a promising generalist approach for various scientific tasks, including drug discovery (hit generation/optimization, ADMET optimization, synthesis), novel material design, and the development of therapeutic proteins or nucleotides. We have developed NatureLM models in different sizes (1 billion, 8 billion, and 46.7 billion parameters) and observed a clear improvement in performance as the model size increases.
DimensionX: Create Any 3D and 4D Scenes from a Single Image with Controllable Video Diffusion
Sun, Wenqiang, Chen, Shuo, Liu, Fangfu, Chen, Zilong, Duan, Yueqi, Zhang, Jun, Wang, Yikai
In this paper, we introduce \textbf{DimensionX}, a framework designed to generate photorealistic 3D and 4D scenes from just a single image with video diffusion. Our approach begins with the insight that both the spatial structure of a 3D scene and the temporal evolution of a 4D scene can be effectively represented through sequences of video frames. While recent video diffusion models have shown remarkable success in producing vivid visuals, they face limitations in directly recovering 3D/4D scenes due to limited spatial and temporal controllability during generation. To overcome this, we propose ST-Director, which decouples spatial and temporal factors in video diffusion by learning dimension-aware LoRAs from dimension-variant data. This controllable video diffusion approach enables precise manipulation of spatial structure and temporal dynamics, allowing us to reconstruct both 3D and 4D representations from sequential frames with the combination of spatial and temporal dimensions. Additionally, to bridge the gap between generated videos and real-world scenes, we introduce a trajectory-aware mechanism for 3D generation and an identity-preserving denoising strategy for 4D generation. Extensive experiments on various real-world and synthetic datasets demonstrate that DimensionX achieves superior results in controllable video generation, as well as in 3D and 4D scene generation, compared with previous methods.
Freeplane: Unlocking Free Lunch in Triplane-Based Sparse-View Reconstruction Models
Sun, Wenqiang, Wang, Zhengyi, Chen, Shuo, Wang, Yikai, Chen, Zilong, Zhu, Jun, Zhang, Jun
Creating 3D assets from single-view images is a complex task that demands a deep understanding of the world. Recently, feed-forward 3D generative models have made significant progress by training large reconstruction models on extensive 3D datasets, with triplanes being the preferred 3D geometry representation. However, effectively utilizing the geometric priors of triplanes, while minimizing artifacts caused by generated inconsistent multi-view images, remains a challenge. In this work, we present \textbf{Fre}quency modulat\textbf{e}d tri\textbf{plane} (\textbf{Freeplane}), a simple yet effective method to improve the generation quality of feed-forward models without additional training. We first analyze the role of triplanes in feed-forward methods and find that the inconsistent multi-view images introduce high-frequency artifacts on triplanes, leading to low-quality 3D meshes. Based on this observation, we propose strategically filtering triplane features and combining triplanes before and after filtering to produce high-quality textured meshes. These techniques incur no additional cost and can be seamlessly integrated into pre-trained feed-forward models to enhance their robustness against the inconsistency of generated multi-view images. Both qualitative and quantitative results demonstrate that our method improves the performance of feed-forward models by simply modulating triplanes. All you need is to modulate the triplanes during inference.
On the Utility of External Agent Intention Predictor for Human-AI Coordination
Wang, Chenxu, Chen, Zilong, Cangelosi, Angelo, Liu, Huaping
Reaching a consensus on the team plans is vital to human-AI coordination. Although previous studies provide approaches through communications in various ways, it could still be hard to coordinate when the AI has no explainable plan to communicate. To cover this gap, we suggest incorporating external models to assist humans in understanding the intentions of AI agents. In this paper, we propose a two-stage paradigm that first trains a Theory of Mind (ToM) model from collected offline trajectories of the target agent, and utilizes the model in the process of human-AI collaboration by real-timely displaying the future action predictions of the target agent. Such a paradigm leaves the AI agent as a black box and thus is available for improving any agents. To test our paradigm, we further implement a transformer-based predictor as the ToM model and develop an extended online human-AI collaboration platform for experiments. The comprehensive experimental results verify that human-AI teams can achieve better performance with the help of our model. A user assessment attached to the experiment further demonstrates that our paradigm can significantly enhance the situational awareness of humans. Our study presents the potential to augment the ability of humans via external assistance in human-AI collaboration, which may further inspire future research.
BIC: Twitter Bot Detection with Text-Graph Interaction and Semantic Consistency
Lei, Zhenyu, Wan, Herun, Zhang, Wenqian, Feng, Shangbin, Chen, Zilong, Li, Jundong, Zheng, Qinghua, Luo, Minnan
Twitter bots are automatic programs operated by malicious actors to manipulate public opinion and spread misinformation. Research efforts have been made to automatically identify bots based on texts and networks on social media. Existing methods only leverage texts or networks alone, and while few works explored the shallow combination of the two modalities, we hypothesize that the interaction and information exchange between texts and graphs could be crucial for holistically evaluating bot activities on social media. In addition, according to a recent survey (Cresci, 2020), Twitter bots are constantly evolving while advanced bots steal genuine users' tweets and dilute their malicious content to evade detection. This results in greater inconsistency across the timeline of novel Twitter bots, which warrants more attention. In light of these challenges, we propose BIC, a Twitter Bot detection framework with text-graph Interaction and semantic Consistency. Specifically, in addition to separately modeling the two modalities on social media, BIC employs a text-graph interaction module to enable information exchange across modalities in the learning process. In addition, given the stealing behavior of novel Twitter bots, BIC proposes to model semantic consistency in tweets based on attention weights while using it to augment the decision process. Extensive experiments demonstrate that BIC consistently outperforms state-of-the-art baselines on two widely adopted datasets. Further analyses reveal that text-graph interactions and modeling semantic consistency are essential improvements and help combat bot evolution.
KRACL: Contrastive Learning with Graph Context Modeling for Sparse Knowledge Graph Completion
Tan, Zhaoxuan, Chen, Zilong, Feng, Shangbin, Zhang, Qingyue, Zheng, Qinghua, Li, Jundong, Luo, Minnan
Knowledge Graph Embeddings (KGE) aim to map entities and relations to low dimensional spaces and have become the \textit{de-facto} standard for knowledge graph completion. Most existing KGE methods suffer from the sparsity challenge, where it is harder to predict entities that appear less frequently in knowledge graphs. In this work, we propose a novel framework KRACL to alleviate the widespread sparsity in KGs with graph context and contrastive learning. Firstly, we propose the Knowledge Relational Attention Network (KRAT) to leverage the graph context by simultaneously projecting neighboring triples to different latent spaces and jointly aggregating messages with the attention mechanism. KRAT is capable of capturing the subtle semantic information and importance of different context triples as well as leveraging multi-hop information in knowledge graphs. Secondly, we propose the knowledge contrastive loss by combining the contrastive loss with cross entropy loss, which introduces more negative samples and thus enriches the feedback to sparse entities. Our experiments demonstrate that KRACL achieves superior results across various standard knowledge graph benchmarks, especially on WN18RR and NELL-995 which have large numbers of low in-degree entities. Extensive experiments also bear out KRACL's effectiveness in handling sparse knowledge graphs and robustness against noisy triples.
TwiBot-22: Towards Graph-Based Twitter Bot Detection
Feng, Shangbin, Tan, Zhaoxuan, Wan, Herun, Wang, Ningnan, Chen, Zilong, Zhang, Binchi, Zheng, Qinghua, Zhang, Wenqian, Lei, Zhenyu, Yang, Shujie, Feng, Xinshun, Zhang, Qingyue, Wang, Hongrui, Liu, Yuhan, Bai, Yuyang, Wang, Heng, Cai, Zijian, Wang, Yanbo, Zheng, Lijing, Ma, Zihan, Li, Jundong, Luo, Minnan
Twitter bot detection has become an increasingly important task to combat misinformation, facilitate social media moderation, and preserve the integrity of the online discourse. State-of-the-art bot detection methods generally leverage the graph structure of the Twitter network, and they exhibit promising performance when confronting novel Twitter bots that traditional methods fail to detect. However, very few of the existing Twitter bot detection datasets are graph-based, and even these few graph-based datasets suffer from limited dataset scale, incomplete graph structure, as well as low annotation quality. In fact, the lack of a large-scale graph-based Twitter bot detection benchmark that addresses these issues has seriously hindered the development and evaluation of novel graph-based bot detection approaches. In this paper, we propose TwiBot-22, a comprehensive graph-based Twitter bot detection benchmark that presents the largest dataset to date, provides diversified entities and relations on the Twitter network, and has considerably better annotation quality than existing datasets. In addition, we re-implement 35 representative Twitter bot detection baselines and evaluate them on 9 datasets, including TwiBot-22, to promote a fair comparison of model performance and a holistic understanding of research progress. To facilitate further research, we consolidate all implemented codes and datasets into the TwiBot-22 evaluation framework, where researchers could consistently evaluate new models and datasets. The TwiBot-22 Twitter bot detection benchmark and evaluation framework are publicly available at https://twibot22.github.io/.
Knowledge Graph Augmented Political Perspective Detection in News Media
Feng, Shangbin, Chen, Zilong, Li, Qingyao, Luo, Minnan
Identifying political perspective in news media has become an important task due to the rapid growth of political commentary and the increasingly polarized ideologies. Previous approaches only focus on leveraging the semantic information and leaves out the rich social and political context that helps individuals understand political stances. In this paper, we propose a perspective detection method that incorporates external knowledge of real-world politics. Specifically, we construct a contemporary political knowledge graph with 1,071 entities and 10,703 triples. We then build a heterogeneous information network for each news document that jointly models article semantics and external knowledge in knowledge graphs. Finally, we apply gated relational graph convolutional networks and conduct political perspective detection as graph-level classification. Extensive experiments show that our method achieves the best performance and outperforms state-of-the-art methods by 5.49\%. Numerous ablation studies further bear out the necessity of external knowledge and the effectiveness of our graph-based approach.
Encoding Heterogeneous Social and Political Context for Entity Stance Prediction
Feng, Shangbin, Chen, Zilong, Yu, Peisheng, Luo, Minnan
Political stance detection has become an important task due to the increasingly polarized political ideologies. Most existing works focus on identifying perspectives in news articles or social media posts, while social entities, such as individuals and organizations, produce these texts and actually take stances. In this paper, we propose the novel task of entity stance prediction, which aims to predict entities' stances given their social and political context. Specifically, we retrieve facts from Wikipedia about social entities regarding contemporary U.S. politics. We then annotate social entities' stances towards political ideologies with the help of domain experts. After defining the task of entity stance prediction, we propose a graph-based solution, which constructs a heterogeneous information network from collected facts and adopts gated relational graph convolutional networks for representation learning. Our model is then trained with a combination of supervised, self-supervised and unsupervised loss functions, which are motivated by multiple social and political phenomenons. We conduct extensive experiments to compare our method with existing text and graph analysis baselines. Our model achieves highest stance detection accuracy and yields inspiring insights regarding social entity stances. We further conduct ablation study and parameter analysis to study the mechanism and effectiveness of our proposed approach.