Translation by Confusion

AAAI Conferences

A new representational scheme for semantic information about words in different languages is introduced. Each word is represented as a vector in a multidimensional space. In order to derive the representations, basis vectors for one language are computed as linear approximations of 5,000 dimensional vectors of cooccurrence counts. Using an aligned corpus, the basis vectors of words occurring close to a target word in one of the languages under consideration are summed to compute the confusion vector of the target word. The paper describes the derivation of the representations for English and Ih'ench and their application to identifying translation pairs.


Towards a Theory of Polysemy D.Alan Cruse Department of Linguistics University of Manchester Manchester England

AAAI Conferences

Lexical semantics, once the Cinderella of the linguistic sub-disciplines, is currently enjoying an unprecedented degree of attention. A major focus of interest is the central problem of accounting for the astonishing range of variation in the interpretation of a single word in different contexts -- in other words, for the problem of polysemy. There are two orders of facts about polysemy: a satisfactory theory must account in a natural way for both. It is this aspect of polysemy which has been the most actively researched, with the aim of discovering regularity and predictability. Noteworthy recent proposals concerning polysemy have emerged from two main sources: firstly, from cognitive linguists such as Lakoff and Taylor; secondly, from computational linguists, most notably Pustejovsky.


Collocations, Dictionaries and MT

AAAI Conferences

Collocations pose specific problems in translation (both human and machine translation). For the native speaker of English it may be obvious that you'pay attention', but for a native speaker of Dutch it would have been much simpler if in English people'donated attention.' Within an MT system, we can deal with these mismatches in different ways. Simply adding the entry to our bilingual dictionary saying'pay' is the translation of'schenken', leaves us with the job of specifying in which contexts we can use this equivalence. A more elaborate dictionary might list the complete collocation alongside with its translation.



A ProbabiUstic Approach to Japanese Lexical Analysis

AAAI Conferences

In contrast with standard, knowledge intensive methods, the stochastic approach to lexical analysis uses statistical techniques that are based on probabilistic models. This approach has not previously been applied to unrestricted Japanese text and promises to yield insights into word formation and other morphological processes in Japanese. An experiment designed to assess the accuracy of a simple statistical technique for segmenting hiragana strings showed that this method was able to perform the task with a relatively low rate of error.


The MT Lexicon and the Translation of Compounds

AAAI Conferences

Paul Bennett* Marta t Carulla Kerry t G. Maxwell (Position Paper) Our comments derive from tile experience of designing and implementing proposals for the translation of compounds within a multilingual MT system, which aims at mininfising transfer, Eurotra. We concentrate here on nouu-noun compounds. In the ideal case, nothing would have to be added to tile lexicon to handle compounds, since all relevant information would be there for tile treatment of syntactic structure. We would claim that this situation is approached as far as argmnent structure is concerned, i.e. the system of argument structure for nouns which has been developed in Eurotra is also adequate when these nouns occur as heads of compounds. This helps when translation as a compound is, unusually, not permitted (e.g.


SS93-02-010.pdf

AAAI Conferences

USING ONLINE THESAURUS IN MACHINE-AIDED TRANSLATION SYSTEMS Sylvie REGNIER 1, Fr(}d6rique SEGOND, Shirley 2 THOMAS (position paper) The scope of this paper is machine-aided translation (MAT) as opposed to fully automatic translation systems. We point out the importance of thesaurus in the translation process and suggest incorporating an online thesaurus in MAT systems. As we are adressing the question of translation we consider it essential to consult professional translators and interpreters. When translating, they do not only rely on bilingual lexicons, although bilingual dictionaries are needed to provide them with vocabulary. Another tool many translators find very helpful, and whose organisation reflects that of the translation process, is the thesaurus.


Using Distributed Patterns as Language Independent Lexical Representations

AAAI Conferences

Department of Computer Science and Information Systems University of Limerick Limerick, Ireland While it is possible to construct Machine Translation (MT) systems of surprising sophistication using the technique of transfer between augmented parse trees (Brockmann, 1991) few people would doubt that in order to perform a fully satisfactory translation it will ultimately be necessary to work with meaning representations. However, there are many problems with developing a computationally tractable representation scheme for linguistic meanings either at the sentential (propositional) or lexical levels. One approach to the problem of capturing meanings at the lexical level is to use a form of distributed representation where each word meaning is converted into a point in an n-dimensional space (Sutcliffe, 1992a). Such representations can capture a wide variety of word meanings within the same formalism. In addition they can be used within distributed representations for capturing higher level information such as that expressed by sentences (SutcliiTe, 1991a).


SS93-02-008.pdf

AAAI Conferences

Position paper: 'Grammatical semantics and multilinguality: what stands behind the lexicon?' John A. Bateman, Project KOMET, GMD/IPSI, Darmstadt, Germany There has in recent years been a steady increase in the role given to the lexicon in computational linguistics. Accordingly, there are now also many efforts to uncover appropriate organizations of lexical information: including proposals for taxonomies of semantic organizational primitives/features, 'ontology' design, etc. This very necessary activity seems to me, however, to be partly compromized by a second trend also resulting from the attention given to the lexicon. That is the move to lexicalize grammars so that the'grammatical' component becomes minimal and grammatical properties are'projected' from those of their lexical components. By reducing the role of graiumatical considerations, a strong source of information about useful lexical organization has been removed.


Constraints on the Space of MT Divergences

AAAI Conferences

This abstract addresses two questions with respect to the role of the lexicon in a machine translation (MT) system: 1. What types of MT divergences are appropriately characterized in the lexicon? 2. What are the principles governing the structures and processes of the lexicon that constrain the space of MT divergences? We start our discussion with the following definitions: MT Divergences: all source-language (SL), target-language (TL) sentence translation pairs, where the SL and TL sentences have different structures or convey different information.