This paper proposes approaches to automatically createa large number of new bilingual dictionaries for low resource languages, especially resource-poor and endangered languages, from a single input bilingual dictionary. Our algorithms produce translations of wordsin a source language to plentiful target languages using available Wordnets and a machine translator (MT). Since our approaches rely on just one input dictionary, available Wordnets and an MT, they are applicable toany bilingual dictionary as long as one of the two languagesis English or has a Wordnet linked to the Princeton Wordnet. Starting with 5 available bilingual dictionaries,we create 48 new bilingual dictionaries. Of these, 30 pairs of languages are not supported by the popular MTs: Google and Bing.
Amid Chinese President Xi Jinping's moves to bring the media to heel, a "teaching and research center for socialist journalism with Chinese characteristics" opened in Beijing on Sunday, state media reported. The new center, a joint project between Tsinghua University and Fudan University, will likely be used to follow through in implementing orders handed down by Xi in February for news media run by the Communist Party and the government to toe the party line, focusing on what authorities have called "positive reporting." "We should develop journalism in China with a thorough understanding of the good aspects of journalism in other countries, so that wrong or harmful content can be identified," said Tong Bing, a professor at Fudan University. China's state-run media organizations have long been known as Communist Party mouthpieces, but recent moves by Xi have seen the party further cement its grip. In February, Xi toured state media outlets, urging them to play a role in "properly guiding public opinion," part of a ramped-up push by the Chinese president to consolidate the party's grip on power amid growing economic malaise.
Chen, Xing (Wuhan University of Technology) | Li, Lin (Wuhan University of Technology) | Xu, Guandong (Victoria University) | Yang, Zhenglu (The University of Tokyo) | Kitsuregawa, Masaru (The University of Tokyo)
Computing similarity between short microblogs is an important step in microblog recommendation. In this paper, we investigate a topic based approach and a WordNet based approach to estimate similarity scores between microblogs and recommend top related ones to users. Empirical study is conducted to compare their recommendation effectiveness using two evaluation measures. The results show that the WordNet based approach has relatively higher precision than that of the topic based approach using 548 tweets as dataset. In addition, the Kendall tau distance between two lists recommended by WordNet and topic approaches is calculated. Its average of all the 548 pair lists tells us the two approaches have the relative high disaccord in the ranking of related tweets.
Wordnets are an effective resource for natural language processing and information retrieval, especially for semantic processing and meaning related tasks . So far, wordnets have been constructed for many languages . However, the automatic development of wordnets for low-resource languages has not been well studied . In this paper, an Expectation-Maximization algorithm is used to create high quality and large scale wordnets for poor-resource languages . The proposed method benefits from possessing cross-lingual word sense disambiguation and develops a wordnet by only using a bi-lingual dictionary and a mono-lingual corpus . The proposed method has been executed with Persian language and the resulting wordnet has been evaluated through several experiments . The results show that the induced wordnet has a precision score of 90% and a recall score of 35% .
Redkar, Hanumant Harichandra (Indian Institute of Technology Bombay) | Bhingardive, Sudha Baban (Indian Institute of Technology Bombay) | Kanojia, Diptesh (Indian Institute of Technology Bombay) | Bhattacharyya, Pushpak (Indian Institute of Technology Bombay)
WordNet is an online lexical resource which expresses unique concepts in a language. English WordNet is the first WordNet which was developed at Princeton University. Over a period of time, many language WordNets were developed by various organizations all over the world. It has always been a challenge to store the WordNet data. Some WordNets are stored using file system and some WordNets are stored using different database models. In this paper, we present the World WordNet Database Structure which can be used to efficiently store the WordNet information of all languages of the World. This design can be adapted by most language WordNets to store information such as synset data, semantic and lexical relations, ontology details, language specific features, linguistic information, etc. An attempt is made to develop Application Programming Interfaces to manipulate the data from these databases. This database structure can help in various Natural Language Processing applications like Multilingual Information Retrieval, Word Sense Disambiguation, Machine Translation, etc.