IRMA-International.org: Creator of Knowledge
Information Resources Management Association
Advancing the Concepts & Practices of Information Resources Management in Modern Organizations

Incremental Indexing and Its Evaluation for Full Text Search

Incremental Indexing and Its Evaluation for Full Text Search
View Free PDF
Author(s): Hiroshi Yamamoto (Hitachi, Ltd., Japan), Seishiro Ohmi (Hitachi, Ltd., Japan)and Hiroshi Tsuji (Hitachi, Ltd., Japan)
Copyright: 2003
Pages: 3
Source title: Information Technology & Organizations: Trends, Issues, Challenges & Solutions
Source Editor(s): Mehdi Khosrow-Pour, D.B.A. (Information Resources Management Association, USA)
DOI: 10.4018/978-1-59140-066-0.ch180
ISBN13: 9781616921248
EISBN13: 9781466665330

Abstract

N-gram indexing method is the most popular algorithm for the Japanese full text search system where each index consists of serial N characters [1][2]. N-gram based indices can be made in the system. For the English full text search system, indices are based on a word that consists of N-gram (N characters). For the Japanese full text search system, indices are not based on a word but a gram (a character) [3][4][6][7]. In general, the system has 2-gram index in order to save the volumes of index file while there are many words that consists of more than three serial characters and some serial two characters are meaningless from the view of search terms[3][8]. In short, 2-gram can be uniformly used on indices are extracted from the target document for full text search. The advantage of N-gram indexing method is to avoid false drops in the full text search system because indices are uniformly based on 2-grams that are extracted from target documents. On the other hand, the disadvantage is less efficient of searching because the index that can be often used in searching is created with the same method as the index that cannot be often used. In short, the index that can be often used in searching equally based on 2-grams the same as the index that cannot be often used in searching. In order to improve the performance of 2-gram based test search system, this paper presents supplemental indexing algorithm, called incremental word indexing method. Basic idea under this research is that words used frequently in search terms should be indexed. With incremental word indexing method, indices that are based on words used frequently in search terms should be added to uniformly 2- gram based indices. So this method can maintain the advantage to avoid false drops. This method can improve the performance searching with using supplemental indices that consist of words, without using uniformly 2-gram based indices. Consequently if we can specify the word used in search terms, the performance of searching can be improved efficiently.

Body Bottom