COMMON FEATURES OF MODERN MODELS OF LANGUAGE
The modern models of language have turned out to possess several common features that are very important for the comprehension and use of these models. One of these models is given by the Meaning Û Text Theory already mentioned. Another model is that based on the Head-Driven Phrase Structure Grammar. The Chomskian approach within the Western linguistic tradition includes various other models different from HPSG.
Here are the main common features of all these models:
· Functionality of the model. The linguistic models try to reproduce functions of language without directly reproducing the features of activity of the brain, which is the motor of human language.
· Opposition of the textual/phonetic form of language to its semantic representation. The manual , depicting three different well-known syntactic theories (including one of the recent variant of the theory by N. Chomsky), notices: “Language ultimately expresses a relation between sound at one end of the linguistic spectrum and meaning at the other.” Just as the diffuse notion spectrum is somehow sharpened, we have the same definition of language as in the MTT. The outer, observable form of language activity is a text, i.e., strings of phonetic symbols or letters, whereas the inner, hidden form of the same information is the meaning of this text. Language relates two these forms of the same information.
· Generalizing character of language. Separate utterances, within a speech or a text, are considered not as the language, but as samples of its functioning. The language is a theoretical generalization of the open and hence infinite set of utterances. The generalization brings in features, types, structures, levels, rules, etc., which are not directly observable. Rather these theoretical constructs are fruits of linguist's intuition and are to be repeatedly tested on new utterances and the intuition of other linguists. The generalization feature is connected with the opposition competence vs. performance in Chomskian theory and to the much earlier opposition language vs. speech in the theory of Ferdinand de Saussure.
· Dynamic character of the model. A functional model does not only propose a set of linguistic notions, but also shows (by means of rules) how these notions are used in the processing of utterances.
· Formal character of the model. A functional model is a system of rules sufficiently rigorous to be applied to any text by a person or an automaton quite formally, without intervention of the model’s author or anybody else. The application of the rules to the given text or the given meaning always produces the same result. Any part of a functional model can in principle be expressed in a strict mathematical form and thus algorithmized. If no ready mathematical tool is available at present, a new tool should be created. The presupposed properties of recognizability and algorithmizability of natural language are very important for the linguistic models aimed at computer implementation.
· Non-generative character of the model. Information does not arise or generated within the model; it merely acquires a form corresponding to other linguistic level. We may thus call the correspondences between levels equative correspondences. On the contrary, in the original generative grammars by Chomsky, the strings of symbols that can be interpreted as utterances are generated from an initial symbol, which has just abstract sense of a sentence. As to transformations by Chomsky in their initial form, they may change the meaning of an utterance, and thus they were not equative correspondences.
· Independence of the model from direction of transformation. The description of a language is independent of the direction of linguistic processing. If the processing submits to some rules, these rules should be given in equative (i.e., preserving the meaning) bi-directional form, or else they should permit reversion in principle.
· Independence of algorithms from data. A description of language structures should be considered apart from algorithms using this description. Knowledge about language does not imply a specific type of algorithms. On the contrary, in many situations an algorithm implementing some rules can have numerous options. For example, the MTT describes the text level separately from the morphologic and syntactic levels of the representation of the same utterance. Nevertheless, one can imagine an algorithm of analysis that begins to construct the corresponding part of the syntactic representation just as the morphologic representation of the first word in the utterance is formed. In the cases when linguistic knowledge is presented in declarative form with the highest possible consistency, implementing algorithms proved to be rather universal, i.e., equally applicable to several languages. (Such linguistic universality has something in common with the Universal Grammar that N. Chomsky has claimed to create.) The analogous distinction between algorithms and data is used with great success in modern compilers of programming languages (cf. compiler-compilers).
· Emphasis on detailed dictionaries. The main part of the description of any language implies words of the language. Hence, dictionaries containing descriptions of separate words are considered the principal part of a rigorous language description. Only very general properties of vast classes and subclasses of lexemes are abstracted from dictionaries, in order to constitute formal grammars.
SPECIFIC FEATURES OF THE MEANING Û TEXT MODEL
The Meaning Û Text Model was selected for the most detailed study in these books, and it is necessary now to give a short synopsis of its specific features.
· Orientation to synthesis. With the announced equivalence of the directions of synthesis and analysis, the synthesis is considered primary and more important for linguistics. Synthesis uses the entire linguistic knowledge about the text to be produced, whereas analysis uses both purely linguistic and extralinguistic knowledge, would it be encyclopedic information about the world or information about the current situation. That is why analysis is sometimes possible on the base of a partial linguistic knowledge. This can be illustrated by the fact that we sometimes can read a paper in a nearly unknown language, if the field and subject of the paper are well known to us. (We then heavily exploit our extralinguistic knowledge.) However, text analysis is considered more important for modern applications. That is why the generative grammar approach makes special emphasis on analysis, whereas for synthesis separate theories are proposed . The Meaning Û Text model admits a separate description for analysis, but postulates that it should contain the complete linguistic and any additional extralinguistic part.
· Multilevel character of the model. The model explicitly introduces an increased number of levels in language: textual, two morphologic (surface and deep), two syntactic (surface and deep), and semantic one. The representation of one level is considered equivalent to that of any other level. The equative Meaning ÞText processor and the opposite Text Þ Meaning processor are broken into several partial modules converting data from one level to the adjacent one. Each intermediate level presents the output of one module and, at the same time, the input of another module. The division of the model in several modules must simplify rules of inter-level conversions.
· Reinforced information-preserving character. The rules of correspondence between input and output data for modules within the MTT fully preserve information equivalence at all language levels.
· Variety of structures and formalisms. Each module has its own rules and formalisms in the MTT, because of significant variety of structures reflecting data on different levels (strings, trees, and networks, correspondingly). On each level, the MTT considers just a minimal possible set of descriptive features. On the contrary, the generative grammar tradition tries to find some common formalism covering the whole language, so that the total multiplicity of features of various levels are considered jointly, without explicit division to different levels.
· Peculiarities in deep and surface syntactic. The entities and syntactic features of these two levels are distinctly different in the MTT. Auxiliary and functional words of a surface disappear at the depth. Analogously, some syntactic characteristics of wordforms are present only at the surface (e.g., agreement features of gender and number for Spanish adjectives), whereas other features, being implied by meaning, are retained on the deeper levels as well (e.g., number for nouns). Such separation facilitates the minimization of descriptive means on each level. The notions of deep and surface syntactic levels in Chomskian theory too, but as we could already see, they are defined there in a quite different way.
· Independence between the syntactic hierarchy of words and their order in a sentence. These two aspects of a sentence, the labeled dependency trees and the word order, are supposed to be implied by different, though interconnected, factors. Formally, this leads to the systematic use of dependency grammars on the syntactic level, rather than of constituency grammars. Therefore, the basic rules of inter-level transformations turned out to be quite different in the MTT, as compared to the generative grammar. The basic advantage of dependency grammars is seen in that the links between meaningful words are retained on the semantic level, whereas for constituency grammars (with the exception of HPSG) the semanticlinks have to be discovered by a separate mechanism.
· Orientation to languages of a type different from English. To a certain extent, the opposition between dependency and constituency grammars is connected with different types of languages. Dependency grammars are especially appropriate for languages with free word order like Latin, Russian or Spanish, while constituency grammars suit for languages with strict word order as English. However, the MTT is suited to describe such languages as English, French, and German too. Vast experience in operations with dependency trees is accumulated in frame of the MTT, for several languages. The generative tradition (e.g., HPSG) moves to the dependency trees too, but with some reservations and in some indirect manner.
· Means of lexical functions and synonymous variations. Just the MTT has mentioned that the great part of word combinations known in any language is produced according to their mutual lexical constraints. For example, we can say in English heart attack and cordial greetings, but neither cardiac attack norhearty greeting, though the meaning of the lexemes to be combined permit all these combinations. Such limitations in the combinability have formed the calculus of the so-called lexical functions within the MTT. The calculus includes rules of transformation of syntactic trees containing lexical functions from one form to another. A human can convey the same meaning in many possible ways. For example, the Spanish sentence Juan me prestó ayudais equal to Juan me ayudó. Lexical functions permit to make these conversions quite formally, thus implementing the mechanism of synonymous variations. This property plays the essential role in synthesis and has no analog in the generative tradition. When translating from one language to another, a variant realizable for a specific construction is searched in the target language among synonymous syntactic variants. Lexical functions permit to standardize semantic representation as well, diminishing the variety of labels for semantic nodes.
· Government pattern. In contradistinction to subcategorization frames of generative linguistics, government patterns in the MTT directly connect semantic and syntactic valencies of words. Not only verbs, but also other parts of speech are described in terms of government patterns. Hence, they permit to explicitly indicate how each semantic valency can be represented on the syntactic level: by a noun only, by the given preposition and a noun, by any of the given prepositions and a noun, by an infinitive, or by any other way. The word order is not fixed in government patterns. To the contrary, the subcategorization frames for verbs are usually reduced just to a list of all possible combinations of syntactic valencies, separately for each possible order in a sentence. In languages with rather free word order, the number of such frames for specific verbs can reach a few dozens, and this obscures the whole picture of semantic valencies. Additionally, the variety of sets of verbs with the same combination of subcategorization frames can be quite comparable with the total number of verbs in such languages as Spanish, French or Russian.
· Keeping traditions and terminology of classical linguistics. The MTT treats the heritage of classical linguistics much more carefully than generative computational linguistics. In its lasting development, the MTT has shown that even the increased accuracy of description and the necessity of rigorous formalisms usually permits to preserve the existing terminology, perhaps after giving more strict definitions to the terms. The notions of phoneme, morpheme, morph, grammeme, lexeme, part of speech, agreement, number, gender, tense, person, syntactic subject, syntactic object, syntactic predicate, actant, circonstant, etc., have been retained. In the frameworks of generative linguistics, the theories are sometimes constructed nearly from zero, without attempts to interpret relevant phenomena in terms already known in general linguistics. These theories sometimes ignored the notions and methods of classical linguistics, including those of structuralism. This does not always give an additional strictness. More often, this leads to terminological confusion, since specialists in the adjacent fields merely do not understand each other.
We can formulate the problem of selecting a good model for any specific linguistic application as follows.
A holistic model of the language facilitates describing the language as a whole system. However, when we concentrate on the objectives of a specific application system, we can select for our purposes only that level, or those levels, of the whole language description, which are relevant and sufficient for the specific objective. Thus, we can use a reduced model for algorithmization of a specific application.
Here are some examples of the adequate choice of such a reduced description.
· If we want to build an information retrieval system based on the use of keywords that differ from each other only by their invariant parts remaining after cutting off irrelevant suffixes and endings, then no linguistic levels are necessary. All words like México, mexicanos, mexicana, etc., can be equivalent for such a system. Other relevant groups can be gobierno, gobiernos, or ciudad, ciudades, etc. Thus, we can use a list containing only the initial substrings (i.e., stems or quasi-stems) like mexic-, gobierno-, ciudad-, etc. We also will instruct the program to ignore the case of letters. Our tasks can be solved by a simple search for these substrings in the text. Thus, linguistic knowledge is reduced here to the list of substrings mentioned above.
· If we want to consider in our system the wordforms dormí, duermo, durmió, etc., or será, es, fui, era, sido, etc. as equivalent keywords, then we must introduce the morphologic level of description. This gives us a method of how to automatically reduce all these wordforms to standard forms like dormir or ser.
· If we want to distinguish in our texts those occurrences of the string México that refer to the name of the city, from the occurrences that refer to name of the state or country, then we should introduce both morphologic and syntactic levels. Indeed, only word combinations or the broader contexts of the relevant words can help us to disambiguate such word occurrences.
· In a spell checker without limitations on available memory, we can store all wordforms in the computer dictionary. Nevertheless, if the memory is limited and the language is highly inflectional, like Spanish, French or Russian, we will have to use some morphologic representation (splitting words to stems and endings) for all the relevant wordforms.
· In grammar checkers, we should take morphologic and syntactic levels, in order to check the syntactic structures of all the sentences. The semantic level usually remains unnecessary.
· For translation from one natural language to another, rather distant, language, all the linguistic levels are necessary. However, for translation between two very similar languages, only morphologic and syntactic levels may be necessary. For the case of such very “isomorphic” languages as Spanish and Portuguese, the morphologic level alone may suffice.
· If we create a very simple system of understanding of sentences with a narrow subject area, a small dictionary, and a very strict order of words, we can reduce the dictionary to the set of strings reflecting initial parts of the words actually used in such texts and directly supply them with the semantic interpretations. In this way, we entirely avoid the morphologic and syntactic problems; only the textual and the semantic levels of representation are necessary.
· If we create a more robust system of text understanding, then we should take a full model of language plus a reasoning subsystem, for the complete semantic interpretation of the text.
However, to make a reasonable choice of any practical situation, we need to know the whole model.
Дата добавления: 2016-09-06; просмотров: 1301;