EMPIRICAL VERSUS RATIONALIST APPROACHES

In the recent years, the interest to empirical approach in linguistic research has livened. The empirical approach is based on numerous statistical observations gathered purely automatically. Hence, it can be called statistical approach as well. It is opposed to the rationalist approach, which requires constructing a functional model of language on the base of texts and the researcher’s intuition. Throughout this book, we explain only the rationalist approach, both in the variants of the generative grammar, and of the MTT.

The empirical approach can be illustrated more easily on the example of the machine translation. A huge bilingual corpus of text is being taken, i.e., two very long, equal in the meaning, and arranged in parallel, texts in two different languages. Statistics is being gathered on text fragments going in nearly equal places on the opposite sides of the bilingual. An attempt is being made to learn how, for any fragment in one language (including those not yet met in the corpus), to find a fragment in other language, which is equivalent to the former in the meaning. The solution would give the translation of any text by the empirical method.

It can be seen that such a method unites two types of models given above—research and functional ones. It is also obvious that it is impossible to accumulate the statistics in general, without elaboration of some definitions and specifications.

It is first necessary to determine what is the size of fragments to be compared, what are “nearly equal” places and what is the equivalence (or rather quasi-equivalence) of the fragments in the two parallel texts. Hence, the answer to these questions requires some elements of a rationalist model, as it was exposed everywhere above.

It is difficult to deny the need of statistical observations in computational linguistics. In particular, in any rationalist model we should take into account those linguistic phenomena (lexemes, syntactic constructions, etc.) that can be met in texts most frequently. That is why we will spare the space in this book for statistical methods in computational linguistics.

As to the empirical method just outlined, its real successes are still unknown.

A common feature of rationalist and empiric methods is that both of them presuppose natural language cognizable and algorithmizable. Linguists and philosophers suggest sometimes the opposite point of view. They argue that since human beings usually reason without any limitations of logic, their language activity can also lack a logical and algorithmic basis.

As applied to the natural language, this pessimistic viewpoint however contradicts to the everyday human practice, as well as to the practice of modern computational linguistics. Humans can manage a new natural language in any age and in a rather rational manner, whereas computational linguistics has already managed some features of natural language, and the process of mastering is going on. Thus, we may neglect this pessimistic point of view.