Goto

Collaborating Authors

 bine


Learning meters of Arabic and English poems with Recurrent Neural Networks: a step forward for language understanding and synthesis

Yousef, Waleed A., Ibrahime, Omar M., Madbouly, Taha M., Mahmoud, Moustafa A.

arXiv.org Machine Learning

Recognizing a piece of writing as a poem or prose is usually easy for the majority of people; however, only specialists can determine which meter a poem belongs to. In this paper, we build Recurrent Neural Network (RNN) models that can classify poems according to their meters from plain text. The input text is encoded at the character level and directly fed to the models without feature handcrafting. This is a step forward for machine understanding and synthesis of languages in general, and Arabic language in particular. Among the 16 poem meters of Arabic and the 4 meters of English the networks were able to correctly classify poem with an overall accuracy of 96.38\% and 82.31\% respectively. The poem datasets used to conduct this research were massive, over 1.5 million of verses, and were crawled from different nontechnical sources, almost Arabic and English literature sites, and in different heterogeneous and unstructured formats. These datasets are now made publicly available in clean, structured, and documented format for other future research. To the best of the authors' knowledge, this research is the first to address classifying poem meters in a machine learning approach, in general, and in RNN featureless based approach, in particular. In addition, the dataset is the first publicly available dataset ready for the purpose of future computational research.


Learning Vertex Representations for Bipartite Networks

Gao, Ming, He, Xiangnan, Chen, Leihui, Zhou, Aoying

arXiv.org Machine Learning

Recent years have witnessed a widespread increase of interest in network representation learning (NRL). By far most research efforts have focused on NRL for homogeneous networks like social networks where vertices are of the same type, or heterogeneous networks like knowledge graphs where vertices (and/or edges) are of different types. There has been relatively little research dedicated to NRL for bipartite networks. Arguably, generic network embedding methods like node2vec and LINE can also be applied to learn vertex embeddings for bipartite networks by ignoring the vertex type information. However, these methods are suboptimal in doing so, since real-world bipartite networks concern the relationship between two types of entities, which usually exhibit different properties and patterns from other types of network data. For example, E-Commerce recommender systems need to capture the collaborative filtering patterns between customers and products, and search engines need to consider the matching signals between queries and webpages. This work addresses the research gap of learning vertex representations for bipartite networks. We present a new solution BiNE, short for Bipartite Network Embedding}, which accounts for two special properties of bipartite networks: long-tail distribution of vertex degrees and implicit connectivity relations between vertices of the same type. Technically speaking, we make three contributions: (1) We design a biased random walk generator to generate vertex sequences that preserve the long-tail distribution of vertices; (2) We propose a new optimization framework by simultaneously modeling the explicit relations (i.e., observed links) and implicit relations (i.e., unobserved but transitive links); (3) We explore the theoretical foundations of BiNE to shed light on how it works, proving that BiNE can be interpreted as factorizing multiple matrices.


Using Answer Set Programming in an Inference-Based approach to Natural Language Semantics

Nouioua, Farid, Nicolas, Pascal

arXiv.org Artificial Intelligence

I ns ti t ut Gal i lé e - U niv. P ar is - Nord 93430 V il l et ane us e - F RA NC E noui ouaf @l ipn.uni v-pa ri s 13.fr G eneral ly s peaking, form al NL s em antic s i s re ferenti al i .e. it as sum es t hat i t is pos si ble t o c reate a s tati c dis course uni verse and to equat e t he obj ect s of t his uni verse t o the (s tat ic) mea nings of w ords . The me aning of a sent ence is then buil t from t he me anings of the w ords in a c ompos iti onal proces s and the se mant ic inte rpretat ion of a s entenc e i s reduce d to it s logic al i nterpret ati on bas ed on t he t ruth condit ions . The very diffic ult tas k of ada pting the mea ning of a s ent ence to its c ontext is often left to the pragm ati c l evel, and this tas k re quires t o us e a huge a mount of com mon s ens e know ledge a bout the domai n. It has bee n s howe d t hat the above tri-pa rtit ion i s very arti fici al becaus e l inguis ti c a s we ll as e xtra-li nguis tic know ledge i nterac t i n t he s am e gl obal proces s to provide the ne ces sa ry elem ents for unders ta nding. But what kind of rea soni ng is needed for na tural language se manti cs? T he ans we r to thi s que st ion is bas ed on the remark t hat t exts s eldom provide norma l det ail s t hat are a ss umed to be known to the reader.