Goto

Collaborating Authors

 Inner Mongolia


MSCMHMST: A traffic flow prediction model based on Transformer

arXiv.org Artificial Intelligence

This study proposes a hybrid model based on Transformers, named MSCMHMST, aimed at addressing key challenges in traffic flow prediction. Traditional single-method approaches show limitations in traffic prediction tasks, whereas hybrid methods, by integrating the strengths of different models, can provide more accurate and robust predictions. The MSCMHMST model introduces a multi-head, multi-scale attention mechanism, allowing the model to parallel process different parts of the data and learn its intrinsic representations from multiple perspectives, thereby enhancing the model's ability to handle complex situations. This mechanism enables the model to capture features at various scales effectively, understanding both short-term changes and long-term trends. Verified through experiments on the PeMS04/08 dataset with specific experimental settings, the MSCMHMST model demonstrated excellent robustness and accuracy in long, medium, and short-term traffic flow predictions. The results indicate that this model has significant potential, offering a new and effective solution for the field of traffic flow prediction.


Perception-Guided EEG Analysis: A Deep Learning Approach Inspired by Level of Detail (LOD) Theory

arXiv.org Artificial Intelligence

Objective: This study aims to explore a novel deep learning approach for analyzing electroencephalogram (EEG) data and guiding human perceptual states, inspired by the Level of Detail (LOD) theory. The core objective is to improve the accuracy of identifying perceptual states from EEG signals and to provide new avenues for personalized psychological therapy. Methods: The research employs portable EEG devices to collect data, combined with music rhythm signals for analysis. We introduce the LOD theory to dynamically adjust the processing levels of EEG signals, extracting core features related to perception. The software system is developed using the Unity engine, integrating audio materials and MIDI structures, and enabling the integration of EEG data with Unity. The deep learning model includes a Convolutional Neural Network (CNN) for feature extraction and classification, and a Deep Q-Network (DQN) for reinforcement learning to optimize music rhythm adjustment strategies. Results: The CNN model achieved a 94.05% accuracy in the perceptual state classification task, demonstrating excellent classification performance. The DQN model successfully guided subjects' EEG signals to the target perceptual state with a 92.45% success rate on the validation set, requiring an average of 13.2 rhythm cycles to complete the state guidance. However, subjective feedback from users indicated that approximately 50% of the researchers experienced psychological sensations corresponding to the target state during the rhythm adjustment process, suggesting room for improvement in the system's effectiveness.


Towards Expressive Video Dubbing with Multiscale Multimodal Context Interaction

arXiv.org Artificial Intelligence

Automatic Video Dubbing (AVD) generates speech aligned with lip motion and facial emotion from scripts. Recent research focuses on modeling multimodal context to enhance prosody expressiveness but overlooks two key issues: 1) Multiscale prosody expression attributes in the context influence the current sentence's prosody. 2) Prosody cues in context interact with the current sentence, impacting the final prosody expressiveness. To tackle these challenges, we propose M2CI-Dubber, a Multiscale Multimodal Context Interaction scheme for AVD. This scheme includes two shared M2CI encoders to model the multiscale multimodal context and facilitate its deep interaction with the current sentence. By extracting global and local features for each modality in the context, utilizing attention-based mechanisms for aggregation and interaction, and employing an interaction-based graph attention network for fusion, the proposed approach enhances the prosody expressiveness of synthesized speech for the current sentence. Experiments on the Chem dataset show our model outperforms baselines in dubbing expressiveness. The code and demos are available at \textcolor[rgb]{0.93,0.0,0.47}{https://github.com/AI-S2-Lab/M2CI-Dubber}.


Quantization of Climate Change Impacts on Renewable Energy Generation Capacity: A Super-Resolution Recurrent Diffusion Model

arXiv.org Artificial Intelligence

Driven by global climate change and the ongoing energy transition, the coupling between power supply capabilities and meteorological factors has become increasingly significant. Over the long term, accurately quantifying the power generation capacity of renewable energy under the influence of climate change is essential for the development of sustainable power systems. However, due to interdisciplinary differences in data requirements, climate data often lacks the necessary hourly resolution to capture the short-term variability and uncertainties of renewable energy resources. To address this limitation, a super-resolution recurrent diffusion model (SRDM) has been developed to enhance the temporal resolution of climate data and model the short-term uncertainty. The SRDM incorporates a pre-trained decoder and a denoising network, that generates long-term, high-resolution climate data through a recurrent coupling mechanism. The high-resolution climate data is then converted into power value using the mechanism model, enabling the simulation of wind and photovoltaic (PV) power generation capacity on future long-term scales. Case studies were conducted in the Ejina region of Inner Mongolia, China, using fifth-generation reanalysis (ERA5) and coupled model intercomparison project (CMIP6) data under two climate pathways: SSP126 and SSP585. The results demonstrate that the SRDM outperforms existing generative models in generating super-resolution climate data. For the Ejina region, under a high-emission pathway, the annual utilization hours of wind power are projected to decrease by 2.82 hours/year, while those for PV power are projected to decrease by 0.26 hours/year. Furthermore, the research highlights the estimation biases introduced when low-resolution climate data is used for power conversion.


Distance-Adaptive Quaternion Knowledge Graph Embedding with Bidirectional Rotation

arXiv.org Artificial Intelligence

Quaternion contains one real part and three imaginary parts, which provided a more expressive hypercomplex space for learning knowledge graph. Existing quaternion embedding models measure the plausibility of a triplet either through semantic matching or geometric distance scoring functions. However, it appears that semantic matching diminishes the separability of entities, while the distance scoring function weakens the semantics of entities. To address this issue, we propose a novel quaternion knowledge graph embedding model. Our model combines semantic matching with entity's geometric distance to better measure the plausibility of triplets. Specifically, in the quaternion space, we perform a right rotation on head entity and a reverse rotation on tail entity to learn rich semantic features. Then, we utilize distance adaptive translations to learn geometric distance between entities. Furthermore, we provide mathematical proofs to demonstrate our model can handle complex logical relationships. Extensive experimental results and analyses show our model significantly outperforms previous models on well-known knowledge graph completion benchmark datasets. Our code is available at https://github.com/llqy123/DaBR.


MM-Eval: A Hierarchical Benchmark for Modern Mongolian Evaluation in LLMs

arXiv.org Artificial Intelligence

Large language models (LLMs) excel in high-resource languages but face notable challenges in low-resource languages like Mongolian. This paper addresses these challenges by categorizing capabilities into language abilities (syntax and semantics) and cognitive abilities (knowledge and reasoning). To systematically evaluate these areas, we developed MM-Eval, a specialized dataset based on Modern Mongolian Language Textbook I and enriched with WebQSP and MGSM datasets. Preliminary experiments on models including Qwen2-7B-Instruct, GLM4-9b-chat, Llama3.1-8B-Instruct, GPT-4, and DeepseekV2.5 revealed that: 1) all models performed better on syntactic tasks than semantic tasks, highlighting a gap in deeper language understanding; and 2) knowledge tasks showed a moderate decline, suggesting that models can transfer general knowledge from high-resource to low-resource contexts. The release of MM-Eval, comprising 569 syntax, 677 semantics, 344 knowledge, and 250 reasoning tasks, offers valuable insights for advancing NLP and LLMs in low-resource languages like Mongolian. The dataset is available at https://github.com/joenahm/MM-Eval.


Fully Hyperbolic Rotation for Knowledge Graph Embedding

arXiv.org Artificial Intelligence

Hyperbolic rotation is commonly used to effectively model knowledge graphs and their inherent hierarchies. However, existing hyperbolic rotation models rely on logarithmic and exponential mappings for feature transformation. These models only project data features into hyperbolic space for rotation, limiting their ability to fully exploit the hyperbolic space. To address this problem, we propose a novel fully hyperbolic model designed for knowledge graph embedding. Instead of feature mappings, we define the model directly in hyperbolic space with the Lorentz model. Our model considers each relation in knowledge graphs as a Lorentz rotation from the head entity to the tail entity. We adopt the Lorentzian version distance as the scoring function for measuring the plausibility of triplets. Extensive results on standard knowledge graph completion benchmarks demonstrated that our model achieves competitive results with fewer parameters. In addition, our model get the state-of-the-art performance on datasets of CoDEx-s and CoDEx-m, which are more diverse and challenging than before. Our code is available at https://github.com/llqy123/FHRE.


Triple Point Masking

arXiv.org Artificial Intelligence

Existing 3D mask learning methods encounter performance bottlenecks under limited data, and our objective is to overcome this limitation. In this paper, we introduce a triple point masking scheme, named TPM, which serves as a scalable framework for pre-training of masked autoencoders to achieve multi-mask learning for 3D point clouds. Specifically, we augment the baselines with two additional mask choices (i.e., medium mask and low mask) as our core insight is that the recovery process of an object can manifest in diverse ways. Previous high-masking schemes focus on capturing the global representation but lack the fine-grained recovery capability, so that the generated pre-trained weights tend to play a limited role in the fine-tuning process. With the support of the proposed TPM, available methods can exhibit more flexible and accurate completion capabilities, enabling the potential autoencoder in the pre-training stage to consider multiple representations of a single 3D object. In addition, an SVM-guided weight selection module is proposed to fill the encoder parameters for downstream networks with the optimal weight during the fine-tuning stage, maximizing linear accuracy and facilitating the acquisition of intricate representations for new objects. Extensive experiments show that the four baselines equipped with the proposed TPM achieve comprehensive performance improvements on various downstream tasks. Our code and models are available at https://github.com/liujia99/TPM.


China's Chang'e 6 returns with first rocks from far side of the moon

New Scientist

The Chang'e 6 probe being retrieved in Siziwang Banner in Inner Mongolia, China China's Chang'e 6 spacecraft has returned to Earth, bringing back the first chunks of space rock from the far side of the moon. The capsule touched down in Siziwang Banner in Inner Mongolia, China, on 25 June, after separating from an orbiting container 5000 kilometres above the Atlantic Ocean at about 1:20pm local time. What "naked" singularities are revealing about quantum space-time The sample, which should contain around 2 kilograms of material from the moon, then floated down for the last 10 kilometres using parachutes. It landed at 2:07pm before being collected by scientists from the China National Space Administration. The difficulty of landing on the moon's far side, which permanently faces away from Earth and so has no direct communications link, had meant that the region's surface was unexplored until the Chinese spacecraft landed at the start of the month.


Chinese probe returns from far side of the moon

Al Jazeera

A Chinese space probe has returned to Earth carrying samples from the far side of the moon. The Chang'e-6 lunar probe landed in China's northern region of Inner Mongolia on Tuesday. The successful end of the nearly two-month-long mission is a boost for China, which is the first country to bring samples back from the hemisphere of the moon that always faces away from Earth. A livestream carried by state broadcaster CCTV showed the module touching down under a parachute. The China National Space Administration (CNSA) described the mission as "a complete success" and said the probe is functioning normally.