Goto

Collaborating Authors

 Li, Zeqian


Compositional Clustering: Applications to Multi-Label Object Recognition and Speaker Identification

arXiv.org Artificial Intelligence

The goal is not just to partition the data into distinct and coherent groups, but also to infer the compositional relationships among the groups. This scenario arises in speaker diarization (i.e., infer who is speaking when from an audio wave) in the presence of simultaneous speech from multiple speakers [6, 36], which occurs frequently in real-world speech settings: The audio at each time t is generated as a composition of the voices of all the people speaking at time t, and the goal is to cluster the audio samples, over all timesteps, into sets of speakers. Hence, if there are 2 people who sometimes speak by themselves and sometimes speak simultaneously, then the clusters would correspond to the speaker sets {1}, {2}, and {1, 2} - the third cluster is not a third independent speaker, but rather the composition of the first two speakers. An analogous scenario arises in open-world (i.e., test classes are disjoint from training classes) multi-label object recognition when clustering images such that each image may contain multiple objects from a fixed set (e.g., the shapes in Figure 1). In some scenarios, the composition function that specifies how examples are generated from other examples might be as simple as superposition by element-wise maximum or addition. However, a more powerful form of composition - and the main motivation for our work - is enabled by compositional embedding models, which are a new technique for few-shot learning.


WEKA-Based: Key Features and Classifier for French of Five Countries

arXiv.org Artificial Intelligence

This paper describes a French dialect recognition system that will appropriately distinguish between different regional French dialects. A corpus of five regions - Monaco, French-speaking, Belgium, French-speaking Switzerland, French-speaking Canada and France, which is targeted forconstruction by the Sketch Engine. The content of the corpus is related to the four themes of eating, drinking, sleeping and living, which are closely linked to popular life. The experimental results were obtained through the processing of a python coded pre-processor and Waikato Environment for Knowledge Analysis (WEKA) data analytic tool which contains many filters and classifiers for machine learning.