AITopics | Bao, Feilong

Collaborating Authors

Bao, Feilong

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Retrieval-Augmented Dialogue Knowledge Aggregation for Expressive Conversational Speech Synthesis

Liu, Rui, Jia, Zhenqi, Bao, Feilong, Li, Haizhou

arXiv.org Artificial IntelligenceJan-11-2025

Conversational speech synthesis (CSS) aims to take the current dialogue (CD) history as a reference to synthesize expressive speech that aligns with the conversational style. Unlike CD, stored dialogue (SD) contains preserved dialogue fragments from earlier stages of user-agent interaction, which include style expression knowledge relevant to scenarios similar to those in CD. Note that this knowledge plays a significant role in enabling the agent to synthesize expressive conversational speech that generates empathetic feedback. However, prior research has overlooked this aspect. To address this issue, we propose a novel Retrieval-Augmented Dialogue Knowledge Aggregation scheme for expressive CSS, termed RADKA-CSS, which includes three main components: 1) To effectively retrieve dialogues from SD that are similar to CD in terms of both semantic and style. First, we build a stored dialogue semantic-style database (SDSSD) which includes the text and audio samples. Then, we design a multi-attribute retrieval scheme to match the dialogue semantic and style vectors of the CD with the stored dialogue semantic and style vectors in the SDSSD, retrieving the most similar dialogues. 2) To effectively utilize the style knowledge from CD and SD, we propose adopting the multi-granularity graph structure to encode the dialogue and introducing a multi-source style knowledge aggregation mechanism. 3) Finally, the aggregated style knowledge are fed into the speech synthesizer to help the agent synthesize expressive speech that aligns with the conversational style. We conducted a comprehensive and in-depth experiment based on the DailyTalk dataset, which is a benchmarking dataset for the CSS task. Both objective and subjective evaluations demonstrate that RADKA-CSS outperforms baseline models in expressiveness rendering. Code and audio samples can be found at: https://github.com/Coder-jzq/RADKA-CSS.

artificial intelligence, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2501.06467

Country: Asia > China > Inner Mongolia (0.14)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Synthesis (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.68)

Add feedback

Distance-Adaptive Quaternion Knowledge Graph Embedding with Bidirectional Rotation

Wang, Weihua, Liang, Qiuyu, Bao, Feilong, Gao, Guanglai

arXiv.org Artificial IntelligenceDec-12-2024

Quaternion contains one real part and three imaginary parts, which provided a more expressive hypercomplex space for learning knowledge graph. Existing quaternion embedding models measure the plausibility of a triplet either through semantic matching or geometric distance scoring functions. However, it appears that semantic matching diminishes the separability of entities, while the distance scoring function weakens the semantics of entities. To address this issue, we propose a novel quaternion knowledge graph embedding model. Our model combines semantic matching with entity's geometric distance to better measure the plausibility of triplets. Specifically, in the quaternion space, we perform a right rotation on head entity and a reverse rotation on tail entity to learn rich semantic features. Then, we utilize distance adaptive translations to learn geometric distance between entities. Furthermore, we provide mathematical proofs to demonstrate our model can handle complex logical relationships. Extensive experimental results and analyses show our model significantly outperforms previous models on well-known knowledge graph completion benchmark datasets. Our code is available at https://github.com/llqy123/DaBR.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2412.04076

Country: Asia > China > Inner Mongolia (0.14)

Genre:

Research Report (0.50)
Personal > Honors (0.46)

Industry: Leisure & Entertainment (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Unifying Dual-Space Embedding for Entity Alignment via Contrastive Learning

Wang, Cunda, Wang, Weihua, Liang, Qiuyu, Bao, Feilong, Gao, Guanglai

arXiv.org Artificial IntelligenceDec-6-2024

Entity alignment aims to match identical entities across different knowledge graphs (KGs). Graph neural network-based entity alignment methods have achieved promising results in Euclidean space. However, KGs often contain complex structures, including both local and hierarchical ones, which make it challenging to efficiently represent them within a single space. In this paper, we proposed a novel method UniEA, which unifies dual-space embedding to preserve the intrinsic structure of KGs. Specifically, we learn graph structure embedding in both Euclidean and hyperbolic spaces simultaneously to maximize the consistency between the embedding in both spaces. Moreover, we employ contrastive learning to mitigate the misalignment issues caused by similar entities, where embedding of similar neighboring entities within the KG become too close in distance. Extensive experiments on benchmark datasets demonstrate that our method achieves state-of-the-art performance in structure-based EA. Our code is available at https://github.com/wonderCS1213/UniEA.

artificial intelligence, information, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2412.05028

Country:

Europe (1.00)
Asia (1.00)
North America > United States > California (0.93)

Genre: Research Report > Promising Solution (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.66)

Add feedback

Fully Hyperbolic Rotation for Knowledge Graph Embedding

Liang, Qiuyu, Wang, Weihua, Bao, Feilong, Gao, Guanglai

arXiv.org Artificial IntelligenceNov-6-2024

Hyperbolic rotation is commonly used to effectively model knowledge graphs and their inherent hierarchies. However, existing hyperbolic rotation models rely on logarithmic and exponential mappings for feature transformation. These models only project data features into hyperbolic space for rotation, limiting their ability to fully exploit the hyperbolic space. To address this problem, we propose a novel fully hyperbolic model designed for knowledge graph embedding. Instead of feature mappings, we define the model directly in hyperbolic space with the Lorentz model. Our model considers each relation in knowledge graphs as a Lorentz rotation from the head entity to the tail entity. We adopt the Lorentzian version distance as the scoring function for measuring the plausibility of triplets. Extensive results on standard knowledge graph completion benchmarks demonstrated that our model achieves competitive results with fewer parameters. In addition, our model get the state-of-the-art performance on datasets of CoDEx-s and CoDEx-m, which are more diverse and challenging than before. Our code is available at https://github.com/llqy123/FHRE.

artificial intelligence, hyperbolic space, knowledge graph, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.3233/FAIA240668

2411.03622

Country:

Europe (0.93)
Asia > China > Inner Mongolia (0.15)
Asia > Middle East > UAE (0.14)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (1.00)

Add feedback

L^2GC:Lorentzian Linear Graph Convolutional Networks for Node Classification

Liang, Qiuyu, Wang, Weihua, Bao, Feilong, Gao, Guanglai

arXiv.org Artificial IntelligenceJun-14-2024

Linear Graph Convolutional Networks (GCNs) are used to classify the node in the graph data. However, we note that most existing linear GCN models perform neural network operations in Euclidean space, which do not explicitly capture the tree-like hierarchical structure exhibited in real-world datasets that modeled as graphs. In this paper, we attempt to introduce hyperbolic space into linear GCN and propose a novel framework for Lorentzian linear GCN. Specifically, we map the learned features of graph nodes into hyperbolic space, and then perform a Lorentzian linear feature transformation to capture the underlying tree-like structure of data. Experimental results on standard citation networks datasets with semi-supervised learning show that our approach yields new state-of-the-art results of accuracy 74.7$\%$ on Citeseer and 81.3$\%$ on PubMed datasets. Furthermore, we observe that our approach can be trained up to two orders of magnitude faster than other nonlinear GCN models on PubMed dataset. Our code is publicly available at https://github.com/llqy123/LLGC-master.

artificial intelligence, hyperbolic space, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2403.06064

Country: Asia > China > Inner Mongolia (0.15)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.51)

Add feedback

MnTTS2: An Open-Source Multi-Speaker Mongolian Text-to-Speech Synthesis Dataset

Liang, Kailin, Liu, Bin, Hu, Yifan, Liu, Rui, Bao, Feilong, Gao, Guanglai

arXiv.org Artificial IntelligenceDec-11-2022

Text-to-Speech (TTS) synthesis for low-resource languages is an attractive research issue in academia and industry nowadays. Mongolian is the official language of the Inner Mongolia Autonomous Region and a representative low-resource language spoken by over 10 million people worldwide. However, there is a relative lack of open-source datasets for Mongolian TTS. Therefore, we make public an open-source multi-speaker Mongolian TTS dataset, named MnTTS2, for the benefit of related researchers. In this work, we prepare the transcription from various topics and invite three professional Mongolian announcers to form a three-speaker TTS dataset, in which each announcer records 10 hours of speeches in Mongolian, resulting 30 hours in total. Furthermore, we build the baseline system based on the state-of-the-art FastSpeech2 model and HiFi-GAN vocoder. The experimental results suggest that the constructed MnTTS2 dataset is sufficient to build robust multispeaker TTS models for real-world applications. The MnTTS2 dataset, training recipe, and pretrained models are released at: https://github.

artificial intelligence, dataset, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2301.00657

Country:

Europe (0.68)
Asia > China > Inner Mongolia (0.34)

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Synthesis (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Add feedback