Danh mục tài liệu

Báo cáo khoa học: Applications of the Theory of Clumps

Số trang: 15      Loại file: pdf      Dung lượng: 268.36 KB      Lượt xem: 8      Lượt tải: 0    
Xem trước 2 trang đầu tiên của tài liệu này:

Thông tin tài liệu:

[, vol.8, nos.3 and 4, June and October 1965]*by , Cambridge Language Research Unit, Cambridge, EnglandThe paper describes how the need for automatic aids to classification arose in a manual experiment in information retrieval. It goes on to discuss the problems of automatic classification in general, and to consider various methods that have been proposed. The definition of a particular kind of class, or "clump," is then put forward. Some programming techniques are indicated, and the paper concludes with a discussion of the difficulties of adequately evaluating the results of any automatic classification procedure.The C.L.R.U. Information Retrieval Experiment Since the...
Nội dung trích xuất từ tài liệu:
Báo cáo khoa học: "Applications of the Theory of Clumps" [Mechanical Translation and Computational Linguistics, vol.8, nos.3 and 4, June and October 1965] Applications of the Theory of Clumps* by R. M. Needham, Cambridge Language Research Unit, Cambridge, England The paper describes how the need for automatic aids to classification arose in a manual experiment in information retrieval. It goes on to dis- cuss the problems of automatic classification in general, and to consider various methods that have been proposed. The definition of a particular kind of class, or clump, is then put forward. Some programming tech- niques are indicated, and the paper concludes with a discussion of the difficulties of adequately evaluating the results of any automatic classifi- cation procedure. the terms that included the term in question. The docu-The C.L.R.U. Information Retrieval Experiment ment numbers were also punched on all the cards forSince the work on classification and grouping now the terms including the terms derived from the docu-being carried out at the C.L.R.U. arose out of the ment, and for the terms including these terms andUnits original information retrieval experiment, I shall so on.describe this experiment briefly. The Units approach In retrieval, the cards for the terms in the requestrepresented an attempt to combine descriptors and uni- were superimposed, so that any document containingterms. Documents in the Units research library of all of them would be identified. If there was no im-offprints were indexed by their most important terms mediate output, a “scale of relevance” procedure couldor keywords, and these were then arranged in a multi- be used, in which successive terms above a given termple lattice hierarchy. The inclusion relation in this sys- are brought down, and with them, all the terms thattem was interpreted, very informally, as follows: term they include. In replacing D by C, for example, weA includes term B if, when you ask for a document are saying that documents containing B, E and F ascontaining A, you do not mind getting one containing well as C are relevant to our request (we pick up thisB. A particular term could be subsumed under as many information because the numbers for the documentsothers as seemed appropriate, so that the system con- containing B, E, and F are punched on the card for C,tained meets as well as joins, that is, was a lattice as as well as those for documents containing C itself).opposed to a tree, for example as follows: Where a request contained a number of terms, there was a step-by-step rule for bringing down the sets of higher-level terms, though the whole operation of the retrieval system could be modified to suit the users requirements if appropriate. The system seemed to work reasonably well when ...

Tài liệu có liên quan: