Language Production, Cognition, and the Lexicon - Google книги
Bernadette Sharp. Sergei Nirenburg. Lexicon and Lexical Analysis. Alain Polguere. Marie Claude L'Homme. Mathieu Lafourcade and Alain Joubert.
Olivier Ferret. Yves Lepage. Pushpak Bhattacharyya. Gregory Grefenstette. Yorick Wilks. Rodolfo Delmonte. Language and Speech Analysis and Generation. Statistical Evidences through Text Production. Kumiko Tanaka-Ishii. Rolf Schwitter. Line Jakubiec-Jamet. Kristiina Jokinen. Nicolas Daoust and Guy Lapalme. Reading and Writing Technologies. A Readability Question? Juyeon Kang, Patrick Saint-Dizier. Cerstin Mahlow. Language Resources and Language Engineering.
Joseph Mariani, Gil Francopoulo. In this paper we propose a model for ontology-driven conceptual access to multilingual lexicon taking advantage of the cognitive-conceptual structure of radical system embedded in shared orthography of Chinese and Japanese. Our proposal rely crucially on two facts. Second, the Chinese character orthography is anchored on a system of radical parts which encodes basic concepts. Each character as an orthographic unit contains radicals which indicate the broad semantic class of the meaning of that unit.
Our study utilizes the homomorphism between the Chinese hanzi and Japanese kanji systems, but goes beyond the character-to-character mapping of kanji-hanzi conversion, to identify bilingual word correspondences. We use bilingual dictionaries, including WordNets, to verify semantic relation between the cross-lingual pairs. These bilingual pairs are then mapped to ontology of characters structured according to the organization of the basic concepts of radicals.
The conceptual structure of the radical ontology is proposed as the model for simultaneous conceptual access to both languages.
Cited by other publications
It is suggested that the proposed model has the conceptual robustness to be applied to other languages based on the fact that it works now for two typologically very different languages and that the model contains Generative Lexicon GL -like coercive links to account for a wide range of possible cross-lingual semantic relations. The purpose of this paper is to summarize some of the results obtained over many years of research in proportional analogy applied to natural language processing.
We recall some mathematical formalizations obtained based on general axioms drawn from a study of the history of the notion from Euclid to modern linguistics. The obtained formalization relies on two articulative notions: conformity and ratio, and on two constitutive notions: similarity and contiguity. These notions are applied on a series of objects that range from sets to strings of symbols through multi-sets and vectors, so as to obtain a mathematical formalization on each of these types of objects.
Thanks to these formalizations, some results are presented that were obtained in structuring language data by the characters bitmaps , words or short sentences in several languages like Chinese or English. An important point in using such formalizations that rely on form only, concerns the truth of the analogies retrieved or produced, i. Results of evaluation on this aspect are recalled. The results presented have been obtained from reasonably large amounts of language data, like several thousands of Chinese characters or hundred thousand sentences in English or other languages.
Languages of the world, though different, share structures and vocabulary.
Most languages in the world fall far behind English, when it comes to annotated resources. Since annotation is costly, there has been worldwide effort at leveraging multilinguality in development and use of annotated corpora. The key idea is to project and utilize annotation from one language to another. This means parameters learnt from the annotated corpus of one language is made use of in the NLP of another language.
We illustrate multilingual projection through the case study of word sense disambiguation WSD whose goal is to obtain the correct meaning of a word in the context. The correct meaning is usually denoted by an appropriate sense id from a sense repository, usually the wordnet. In this paper we show how two languages can help each other in their WSD, even when neither language has any sense marked corpus. The two specific languages chosen are Hindi and Marathi.
The sense repository is the IndoWordnet which is a linked structure of wordnets of 19 major Indian languages from Indo-Aryan, Dravidian and Sino-Tibetan families. These wordnets have been created by following the expansion approach from Hindi wordnet. The WSD algorithm is reminiscent of expectation maximization. The sense distribution of either language is estimated through the mediation of the sense distribution of the other language in an iterative fashion.
The WSD accuracy arrived at is better than any state of the art accuracy of all words general purpose unsupervised WSD. Quantified self, life logging, digital eyeglasses, technology is advancing rapidly to a point where people can gather masses of data about their own persons and their own life.
Language Production, Cognition, and the Lexicon
Large-scale models of what people are doing are being built by credit companies, advertising agencies, and national security agencies, using digital traces that people leave behind them. How can individuals exploit their own data for their own benefit? With this mass of personal data, we will need to induce personal semantic dimensions to sift data and find what is meaningful to each individual.
In this chapter, we present semantic dimensions, made by experts, and by crowds. We show the type of information that individuals will have access to once lifelogging becomes common, and we will sketch what personal semantic dimensions might look like. The semantic relatedness between words can play an important role in this context. In this article, we present a novel approach to analyse the semantic relatedness between words that is based on the relevance of semantic relatedness measures on the global level of a word sense disambiguation task.
More specifically, for a given selection of senses of a text, a global similarity for the sense selection can be computed, by combining the pairwise similarities through a particular function sum for example between all the selected senses.
This global similarity value can be matched to other possible values pertaining to the selection, for example the F1 measure resulting from the evaluation with a gold standard reference annotation. We use several classical local semantic similarity measures as well as measures built by our team and study the correlation of the global score compared to the F1 values of a gold standard. Thus, we are able to locate the typical output of an algorithm compared to an exhaustive evaluation, and thus to optimise the measures and the sense selection process in general.
Part of the stated goal of the project was to detect linguistic metaphors LMs computationally in texts in four languages and map them all to a single set of conceptual metaphors CMs. Much of the inspiration for this was the classic work of George Lakoff Lakoff and Johnson which posited a set of universal metaphors in use across cultures and languages. I wish to examine the assumptions behind this goal and in particular to address the issue of how and in what representation such CMs can be expressed.
Reviving that assumption for the study of metaphor raises additional issues since, even if the senses of the terms in those CM representations could be added to the representations, metaphors often deploy new senses of words which will not be found in existing sense inventories like computational lexicons.
- Sustainable Practices in the Built Environment?
- The Diary of Georgi Dimitrov, 1933-1949.
- Unnatural Justice (Oz Blackstone series, Book 7): Deadly revenge stalks the pages of this gripping mystery (Oz Blackstone Mysteries)?
- The Artists Reality!
In what follows I discuss first the representation of CMs: in what language are they stated? I argue the need for some inclusion of the representation of the senses of their constituent terms within the CM, or at least a default assumption that the major sense with respect to some lexicon such as WordNet is the intended one. I then consider the issue of conventional metaphor and its representation in established lexicons again such as WordNet and the effect that can have on detection strategies for metaphor such as selectional preference breaking.
I then argue that the mapping of text metaphors to CMs, as well as the empirical, rather than intuitive, construction of CM inventories require further use of preference restrictions in lexicons by means of a much-discussed process of projection or coercion. In this chapter I will be concerned with what characterizes human language and the parser that computes it in real communicative situations.
Related Language Production, Cognition, and the Lexicon
Copyright 2019 - All Right Reserved