DECOMPOSITION AND ATOMIZATION OF MEANING
Semantic representation in many cases turns out to be universal, i.e., common to different natural languages. Purely grammatical features of different languages are not usually reflected in this representation. For example, the gender of Spanish nouns and adjectives is not included in their semantic representation, so that this representation turned to be equal to that of English. If the given noun refers to a person of a specific sex, the latter is reflected on semantic level explicitly, via a special predicate of sex, and it is on the grammar of specific language where is established the correspondence between sex and gender. It is curious that in German nouns can have three genders: masculine, feminine, and neuter, but the noun Mädchen ‘girl’ is neuter, not feminine!
Thus, the semantic representation of the English sentence The little girls see the red flower it is the same as the one given above, despite the absence of gender in English nouns and adjectives. The representation of the corresponding Russian sentence is the same too, though the word used for red in Russian has masculine gender, because of its agreement in gender with corresponding noun of masculine.[16]
Nevertheless, the cases when semantic representations for two or more utterances with seemingly the same meaning do occur. In such situations, linguists hope to find a universal representation via decomposition and even atomization of the meaning of several semantic components.
In natural sciences, such as physics, researchers usually try to divide all the entities under consideration into the simplest possible, i.e., atomic, or elementary, units and then to deduce properties of their conglomerations from the properties of these elementary entities. In principle, linguistics has the same objective. It tries to find the atomic elements of meaning usually called semantic primitives, or semes.
Semes are considered indefinable, since they cannot be interpreted in terms of any other linguistic meanings. Nevertheless, they can be explained to human readers by examples from the extralinguistic reality, such as pictures, sound records, videos, etc. All other components of semantic representation should be then expressed through the semes.
In other words, each predicate or its terms can be usually represented in the semantic representation of text in a more detailed manner, such as a logical formula or a semantic graph. For example, we can decompose
MATAR(x) ® CAUSAR(MORIR(x)) ® CAUSAR(CESAR(VIVIR(x))),
i.e., MATAR(x) is something like ‘causar cesar el vivir(x),’ or ‘cause stop living(x),’ where the predicates CESAR(x), VIVIR(y), and CAUSAR(z) are more elementary than the initial predicate MATAR(x).[17]
Figure IV.9 shows a decomposition of the sentence Juan mató a José enseguida = Juan causó a José cesar vivir enseguida in the mentioned more primitive notions. Note that the number labels of valencies of the whole combination of the primitives can differ from the number labels of corresponding valencies of the member primitives: e.g., the actant 2 of the whole combination is the actant 1 of the component VIVIR. The mark C in Figure IV.9 stands for the circumstantial relation (which is not a valency but something inverse, i.e., a passive semantic valency).
FIGURE IV.9. Decomposition of the verb MATAR into semes. |
Over the past 30 years, ambitious attempts to find and describe a limited number of semes, to which a major part of the semantics of a natural language would be reduced, have not been successful.
Some scientists agree that the expected number of such semes is not much more than 2´000, but until now, this figure is still debatable. To comply with needs of computational linguistics, everybody agreed that it is sufficient to disintegrate meanings of lexemes to a reasonable limit implied by the application.
Therefore, computational linguistics uses many evidently non-elementary terms and logical predicates in the semantic representation. From this point of view, the translation from one cognate language to another does not need any disintegration of meaning at all.
Once again, only practical results help computational linguists to judge what meaning representation is the best for the selected application domain.
NOT-UNIQUENESS OF MEANINGÞTEXT MAPPING: SYNONYMY
Returning to the mapping of Meanings to Texts and vice versa, we should mention that, in contrast to common mathematical functions, this mapping is not unique in both directions, i.e., it is of the many-to-many type. In this section, we will discuss one direction of the mapping: from Meanings to Texts.
Different texts or their fragments can be, in the opinion of all or the majority of people, equivalent in their meanings. In other words, two or more texts can be mapped to the same element of the set of Meanings. In Figure IV.4, the Meaning M is represented with three different Texts T, i.e., these three Texts have the same Meaning.[18]
For example, the Spanish adjectives pequeño and chico are equivalent in many contexts, as well as the English words small and little. Such equivalent words are called synonymous words, or synonyms, and the phenomenon is called synonymy of words. We can consider also synonymy of word combinations (phrases) or sentences as well. In these cases the term synonymous expressions is used.
The words equivalent in all possible contexts are called absolute synonyms. Trivial examples of absolute synonymy are abbreviated and complete names of organizations, e.g. in Spanish ONU º Organización de las Naciones Unidas. Nontrivial examples of absolute synonymy of single words are rather rare in any language. Examples from Mexican Spanish are: alzadura º alzamiento, acotación º acotamiento, coche º carro.
However, it is more frequent that the two synonyms are equivalent in their meanings in many contexts, but not all.
Sometimes the set of possible contexts for one such synonym covers the whole set of contexts for another synonym; this is called inclusive synonymy. Spanish examples are querer > desear > anhelar: querer is less specific than desear which in turn is less specific than anhelar. It means that in nearly every context we can substitute desear or querer for anhelar, but not in every context anhelar can be substituted for querer or desear.
Most frequently, though, we can find only some—perhaps significant—intersection of the possible sets of contexts. For example, the Spanish nouns deseo and voluntad are exchangeable in many cases, but in some cases only one of them can be used.
Such partial synonyms never have quite the same meaning. In some contexts, the difference is not relevant, so that they both can be used, whereas in other contexts the difference does not permit to replace one partial synonym with the other.
The book [24] is a typical dictionary of synonyms in printed form. The menu item Language | Synonyms in Microsoft Word is a typical example of an electronic dictionary of synonyms. However, many of the words that it contains in partial lists are not really synonyms, but related words, or partial synonyms, with a rather small intersection of common contexts.
NOT-UNIQUENESS OF TEXT ÞMEANING MAPPING: HOMONYMY
In the opposite direction—Texts to Meanings—a text or its fragment can exhibit two or more different meanings. That is, one element of the surface edge of the mapping (i.e. text) can correspond to two or more elements of the deep edge. We have already discussed this phenomenon in the section on automatic translation, where the example of Spanish word gato was given (see page 72). Many such examples can be found in any Spanish-English dictionary. A few more examples from Spanish are given below.
· The Spanish adjective real has two quite different meanings corresponding to the English real and royal.
· The Spanish verb querer has three different meanings corresponding to English to wish, to like, and to love.
· The Spanish noun antigüedad has three different meanings:
– ‘antiquity’, i.e. a thing belonging to an ancient epoch,
– ‘antique’, i.e. a memorial of classic antiquity,
– ‘seniority’, i.e. length of period of service in years.
The words with the same textual representation but different meanings are called homonymous words, or homonyms, with respect to each other, and the phenomenon itself is called homonymy. Larger fragments of texts—such as word combinations (phrases) or sentences—can also be homonymous. Then the termhomonymous expressions is used.
To explain the phenomenon of homonymy in more detail, we should resort again to the strict terms lexeme and wordform, rather than to the vague term word. Then we can distinguish the following important cases of homonymy:
· Lexico-morphologic homonymy: two wordforms belong to two different lexemes. This is the most general case of homonymy. For example, the string aviso is the wordform of both the verb AVISAR and the noun AVISO. The wordform clasificación belong to both the lexeme CLASIFICACIÓN1 ‘process of classification’ and the lexeme CLASIFICACIÓN2 ‘result of classification,’ though the wordform clasificaciones belongs only to CLASIFICACIÓN2, since CLASIFICACIÓN1 does not have the plural form. It should be noted that it is not relevant whether the name of the lexeme coincides with the specific homonymous wordform or not.
Another case of lexico-morphologic homonymy is represented by two different lexemes whose sets of wordforms intersect in more than one wordforms. For example, the lexemes RODAR and RUEDA cover two homonymous wordforms, rueda and ruedas; the lexemes IR and SER have a number of wordforms in common: fui, fuiste, ..., fueron.
· Purely lexical homonymy: two or more lexemes have the same sets of wordforms, like Spanish REAL1 ‘real’ and REAL2 ‘royal’ (the both have the same wordform set {real, reales}) or QUERER1 ‘to wish,’ QUERER2 ‘to like,’ and QUERER3 ‘to love.’
· Morpho-syntactic homonymy: the whole sets of wordforms are the same for two or more lexemes, but these lexemes differ in meaning and in one or more morpho-syntactic properties. For example, Spanish lexemes (el) frente ‘front’ and (la) frente ‘forehead’ differ, in addition to meaning, in gender, which influences syntactical properties of the lexemes.
· Purely morphologic homonymy: two or more wordforms are different members of the wordform set for the same lexeme. For example, fáciles is the wordform for both masculine plural and feminine plural of the Spanish adjective FÁCIL ‘easy.’ We should admit this type of homonymy, since wordforms of Spanish adjectives generally differ in gender (e.g., nuevos, nuevas ‘new’).
Resolution of all these kinds of homonymy is performed by the human listener or reader according to the context of the wordform or based on the extralinguistic situation in which this form is used. In general, the reader or listener does not even take notice of any ambiguity. The corresponding mental operations are immediate and very effective. However, resolution of such ambiguity by computer requires sophisticated methods.
In common opinion, the resolution of homonymy (and ambiguity in general) is one of the most difficult problems of computational linguistics and must be dealt with as an essential and integral part of the language-understanding process.
Without automatic homonymy resolution, all the attempts to automatically “understand” natural language will be highly error-prone and have rather limited utility.
MORE ON HOMONYMY
In the field of computational linguistics, homonymous lexemes usually form separate entries in dictionaries. Linguistic analyzers must resolve the homonymy automatically, by choosing the correct option among those described in the dictionary.
For formal distinguishing of homonyms, their description in conventional dictionaries is usually divided into several subentries. The names of lexical homonyms are supplied with the indices (numbers) attached to the words in their standard dictionary form, just as we do it in this book. Of course, in text generation, when the program compiles a text containing such words, the indices are eliminated.
The purely lexical homonymy is maybe the most difficult to resolve since at the morphologic stage of text processing it is impossible to determine what homonym is true in this context. Since morphologic considerations are useless, it is necessary to process the hypotheses about several homonyms in parallel.
Concerning similarity of meaning of different lexical homonyms, various situations can be observed in any language. In some cases, such homonyms have no elements of meaning in common at all, like the Spanish REAL1 ‘real’ and REAL2 ‘royal.’ In other cases, the intersection of meaning is obvious, like in QUERER2‘to like’ and QUERER3 ‘to love,’ or CLASIFICACIÓN1 ‘process of classification’ and CLASIFICACIÓN2 ‘result of classification.’ In the latter cases, the relation can be exposed through the decomposition of meanings of the homonyms lexemes. The cases in which meanings intersect are referred to in general linguistics as polysemy.
For theoretical purposes, we can refer the whole set of homonymous lexemes connected in their meaning as vocable. For example, we may introduce the vocable {QUERER1, QUERER2, QUERER3}. Or else we can take united lexeme, which is called polysemic one.
In computational linguistics, the intricate semantic structures of various lexemes are usually ignored. Thus, similarity in meaning is ignored too.
Nevertheless, for purely technical purposes, sets of any homonymous lexemes, no matter whether they are connected in meaning or not, can be considered. They might be referred as pseudo-vocables. For example, the pseudo-vocable REAL = {REAL1, REAL2} can be introduced.
A more versatile approach to handle polysemy in computational linguistics has been developed in recent years using object-oriented ideas. Polysemic lexemes are represented as one superclass that reflects the common part of their meaning, and a number of subclasses then reflect their semantic differences.
A serious complication for computational linguistics is that new senses of old words are constantly being created in natural language. The older words are used in new meanings, for new situations or in new contexts. It has been observed that natural language has the property of self-enrichment and thus is veryproductive.
The ways of the enrichment of language are rather numerous, and the main of them are the following:
· A former lexeme is used in a metaphorical way. For example, numerous nouns denoting a process are used in many languages to refer also to a result of this process (cf. Spanish declaración, publicación, interpretación, etc.). The semantic proximity is thus exploited. For another example, the Spanish wordestética ‘esthetics’ rather recently has acquired the meaning of heir-dressing saloon in Mexico. Since highly professional heir dressing really achieves esthetic goals, the semantic proximity is also evident here. The problem of resolution of metaphorical homonymy has been a topic of much research [51].
· A former lexeme is used in a metonymical way. Some proximity in place, form, configuration, function, or situation is used for metonymy. As the first example, the Spanish words lentes ‘lenses,’ espejuelos ‘glasses,’ and gafas ‘hooks’ are used in the meaning ‘spectacles.’ Thus, a part of a thing gives the name to the whole thing. As the second example, in many languages the name of an organization with a stable residence can be used to designate its seat. For another example, Ha llegado a la universidad means that the person arrived at the building or the campus of the university. As the third example, the Spanish word pluma‘feather’ is used also as ‘pen.’ As not back ago as in the middle of ninth century, feathers were used for writing, and then the newly invented tool for writing had kept by several languages as the name of its functional predecessor.
· A new lexeme is loaned from a foreign language. Meantime, the former, “native,” lexeme can remain in the language, with essentially the same meaning. For example, English had adopted the Russian word sputnik in 1957, but the term artificial satellite is used as before.
· Commonly used abbreviations became common words, loosing their marking by uppercase letters. For example, the Spanish words sida and ovni are used now more frequently, then their synonymous counterparts síndrome de inmunodeficiencia adquirida and objeto volante no identificado.
One can see that metaphors, metonymies, loans, and former abbreviations broaden both homonymy and synonymy of language.
Returning to the description of all possible senses of homonymous words, we should admit that this problem does not have an agreed solution in lexicography. This can be proved by comparison of any two large dictionaries. Below, given are two entries with the same Spanish lexeme estante ‘rack/bookcase/shelf,’ one taken from the Dictionary of Anaya group [22] and the other from the Dictionary of Royal Academy of Spain (DRAE) [23].
estante (in Anaya Dictionary)
1. m. Armario sin puertas y con baldas.
2. m. Balda, anaquel.
3. m. Cada uno de los pies que sostienen la armadura de algunas máquinas.
4. adj. Parado, inmóvil.
estante (in DRAE)
1. a. p. us. de estar. Que está presente o permanente en un lugar. Pedro, ESTANTE en la corte romana.
2. adj. Aplícase al ganado, en especial lanar, que pasta constantemente dentro del término jurisdiccional en que está amillarado.
3. Dícese del ganadero o dueño de este ganado.
4. Mueble con anaqueles o entrepaños, y generalmente sin puertas, que sirve para colocar libros, papeles u otras cosas.
5. Anaquel.
6. Cada uno de los cuatro pies derechos que sostienen la armadura del batán, en que juegan los mazos.
7. Cada uno de los dos pies derechos sobre que se apoya y gira el eje horizontal de un torno.
8. Murc. El que en compañía de otros lleva los pasos en las procesiones de Semana Santa.
9. Amér. Cada uno de los maderos incorruptibles que, hincados en el suelo, sirven de sostén al armazón de las casas en las ciudades tropicales.
10. Mar. Palo o madero que se ponía sobre las mesas de guarnición para atar en él los aparejos de la nave.
One does not need to know Spanish to realize that the examples of the divergence in these two descriptions are numerous.
Some homonyms in a given language are translated into another language by non-homonymous lexemes, like the Spanish antigüedad.
In other cases, a set of homonyms in a given language is translated into a similar set of homonyms in the other language, like the Spanish plato when translated into the English dish (two possible interpretations are ‘portion of food’ and ‘kind of crockery’).
Thus, bilingual considerations sometimes help to find homonyms and distinguishing their meanings, though the main considerations should be deserved to the inner facts of the given language.
It can be concluded that synonymy and homonymy are important and unavoidable properties of any natural language. They bring many heavy problems into computational linguistics, especially homonymy.
Classical lexicography can help to define these problems, but their resolution during the analysis is on computational linguistics.
MULTISTAGE CHARACTER OF THE MEANING Û TEXT TRANSFORMER
FIGURE IV.10. Levels of linguistic representation. |
The ambiguity of the Meaning Û Text mapping in both directions, as well as the complicated structure of entities on both ends of the Meaning Û Text transformer make it impossible to study this transformer without dividing the process of transformation into several sequential stages.
Existence of such stages in natural language is acknowledged by many linguists. In this way, intermediate levels of representation of the information under processing are introduced (see Figure IV.10), as well as partial transformers for transition from a level to an adjacent (see Figure IV.11).
Two intermediate levels are commonly accepted by all linguists, with small differences in the definitions, namely the morphologic and syntactic ones.
FIGURE IV.11. Stages of transformation. |
In fact, classical general linguistics laid the basis for such a division before any modern research. We will study these levels later in detail.
Other intermediate levels are often introduced by some linguists so that the partial transformers themselves are divided into sub-transformers. For example, a surface and a deep syntactic level are introduced for the syntactic stage, and deep and a surface morphologic level for the morphologic one.
Thus, we can imagine language as a multistage, or multilevel, Meaning Û Text transformer (see Figure IV.12).
The knowledge necessary for each level of transformation is represented in computer dictionaries and computer grammars (see Figure IV.13). A computer dictionary is a collection of information on each word, and thus it is the main knowledge base of a text processing system. A computer grammar is a set of rules based on common properties of large groups of words. Hence, the grammar rules are equally applicable to many words.
Since the information stored in the dictionaries for each lexeme is specified for each linguistic level separately, program developers often distinguish a morphologic dictionary that specifies the morphologic information for each word, a syntactic dictionary, and a semantic dictionary, as in Figure IV.13.
FIGURE IV.12. Interlevel processing. |
In contrast, all information can be representedin one dictionary, giving for each lexeme all the necessary data. In this case, the dictionary entry for each lexeme has several zones that give the properties of this lexeme at the given linguistic level, i.e., a morphologic zone, syntactic zone, and semantic zone.
Clearly, these two representations of the dictionary are logically equivalent.
According to Figure IV.13, the information about lexemes is distributed among several linguistic levels. In Text, there are only wordforms. In analysis, lexemes as units under processing are involved at morphologic level. Then they are used at surface and deep syntactical levels and at last disappeared at semantic level, giving up their places to semantic elements. The latter elements conserve the meaning of lexemes, but are devoid of their purely grammatical properties, such as part of speech or gender. Hence, we can conclude that there is no level in the Text Þ Meaning transformer, which could be called lexical.
Дата добавления: 2016-09-06; просмотров: 1484;