Recognizing similarities between literary works for copyright infringement detection requires evaluating similarity in the expression of content. Copyright law protects expression of content; similarities in content alone are not enough to indicate infringement. Expression refers to the way people convey particular information; it captures both the information and the manner of its presentation. In this paper, we present a novel set of linguistically informed features that provide a computational definition of expression and that enable accurate recognition of individual titles and their paraphrases more than 80% of the time. In comparison, baseline features, e.g., tfidf-weighted keywords, function words, etc., give an accuracy of at most 53%. Our computational definition of expression uses linguistic features that are extracted from POStagged text using context-free grammars, without incurring the computational cost of full parsers. The results indicate that informative linguistic features do not have to be computationally prohibitively expensive to extract.
With the progress in machine translation, it becomes more subtle to develop the evaluation metric capturing the systems’ differences in comparison to the human translations. In contrast to the current efforts in leveraging more linguistic information to depict translation quality, this paper takes the thread of combining language independent features for a robust solution to MT evaluation metric. To compete with finer granularity of modeling brought by linguistic features, the proposed method augments the word level metrics by a letter based calculation. An empirical study is then conducted over WMT data to train the metrics by ranking SVM. The results reveal that the integration of current language independent metrics can generate well enough performance for a variety of languages. Time-split data validation is promising as a better training setting, though the greedy strategy also works well.
Most fuzzy systems including fuzzy decision support and fuzzy control systems provide out-puts in the form of fuzzy sets that represent the inferred conclusions. Linguistic interpretation of such outputs often involves the use of linguistic approximation that assigns a linguistic label to a fuzzy set based on the predefined primary terms, linguistic modifiers and linguistic connectives. More generally, linguistic approximation can be formalized in the terms of the re-translation rules that correspond to the translation rules in ex-plicitation (e.g. simple, modifier, composite, quantification and qualification rules) in com-puting with words [Zadeh 1996]. However most existing methods of linguistic approximation use the simple, modifier and composite re-translation rules only. Although these methods can provide a sufficient approximation of simple fuzzy sets the approximation of more complex ones that are typical in many practical applications of fuzzy systems may be less satisfactory. Therefore the question arises why not use in linguistic ap-proximation also other re-translation rules corre-sponding to the translation rules in explicitation to advantage. In particular linguistic quantifica-tion may be desirable in situations where the conclusions interpreted as quantified linguistic propositions can be more informative and natu-ral. This paper presents some aspects of linguis-tic approximation in the context of the re-translation rules and proposes an approach to linguistic approximation with the use of quantifi-cation rules, i.e. quantified linguistic approxima-tion. Two methods of the quantified linguistic approximation are considered with the use of lin-guistic quantifiers based on the concepts of the non-fuzzy and fuzzy cardinalities of fuzzy sets. A number of examples are provided to illustrate the proposed approach.
This paper aims to detect linguistic characteristics and grammatical patterns from speech transcriptions generated by Alzheimer's disease (AD) patients. This is considered an important application of natural language processing and deep learning techniques for computational health. The authors propose several neural models such as CNNs and LSTM-RNNs -- and combinations of them -- to enhance an AD classification task. The trained neural models are used to interpret linguistic characteristics of AD patients (including gender variation) via activation clustering and first-derivative saliency techniques. Language variation can serve as a proxy that monitors how patients' cognitive functions have been affected (e.g., issues with word finding and impaired reasoning).