Visual Filtering of Search Results with Document Maps

Document maps are a new visualization tool that help users filter information retrieval search results. Documents are compactly represented by a square matrix that presents document similarity and frequently occurring search items. Evaluation has shown that document maps enable users to filter search results quickly and accurately.

1. Introduction

Document maps are a new visualization tool that help reduce the effects of information overload when using information retrieval (IR) systems. Information overload can occur in IR when users receive a large number of results. Document maps enable users to quickly filter a set of results to reduce or avoid information overload. Document maps show document similarity and frequently occurring search items. Document similarity is useful for showing which groups of documents within the results have a similar content. The frequency of occurrence of search items such as author names, publication years, and keywords, is useful for making quick decisions about other relationships between the results.

2. The Document Map Visualisation

Document maps represent a set of numbered documents as a set of numbered squares. Document maps are similar to statistical maps which divide a country into areas and color those areas to represent quantities such as concentrations of population or disease (Tufte, 1983). The following document maps represent nine documents. The titles of the documents are listed below the document map in the implementation. Document maps are a compact representation that scale well; a 15×14 map is required for 200 documents and only a 32×32 map is required for 1000 documents. The values of each search item are presented next to the document map and can be queried dynamically with alphasliders (Ahlberg & Shneiderman, 1994).

A document map Documents ordered by similarity Four highlighted documents
(a) (b) (c)

Document maps show the similarity between a document selected by the user and the remainder of the documents. A numeric measure of similarity between all the documents was calculated using latent semantic analysis (Deerwester et al, 1990). Document maps visualize the similarity between the selected document and the remainder of the documents with grey scale shading; the darker the grey, the more similar a document is to the selected document. The similarity values are thresholded to five shades of grey that show bands of similar documents more clearly. The document squares are also sorted to group documents that are similar to the selected document. Document map (b) above shows the nine documents ordered by their similarity to document 5. Document 5 is placed at the origin in the top left hand corner. The remaining documents are placed diagonally from top left to bottom right in descending order of similarity.

The frequency of occurrence of search items is shown by filling in each square that represents a document containing the search item. The number of filled squares shows the frequency in document map (c) above.

3. Evaluation

Document maps have been comparatively evaluated with three visual and two textual presentations and the results were promising. The document map users generally completed a set of 10 filtering tasks more quickly and more accurately than the users of the other presentations. Details are provided in Morgan (2000).

4. Further Work

Further work is needed in three areas. First, document maps can only show the similarity between one document and the rest of the documents. Second, in larger maps it can be difficult to quickly differentiate between the frequency of occurrence of search items when the difference in frequency is small. Finally, the amount of cognitive effort required to map between the numbered squares and the numbered documents, and the effect this might have on users, needs to be investigated.

References

  • Ahlberg, C. & Shneiderman, B. (1994), The Alphaslider: A Compact and Rapid Selector, in B. Adelson and S. Dumais & J. Olsen (eds.), Proceedings of CHI ‘94, 365-371.
  • Deerwester S., Dumais, S. T., Furnas, G. W., Landaur, T. K. & Harshman, R. (1990), “Indexing by Latent Semantic Analysis”, Journal of the American Society for Information Science, 41(6), 391-407.
  • Morgan, J. J. (2000) Supporting Information Retrieval System Users by Making Suggestions and Visualising Results, Ph.D. Thesis, Heriot-Watt University.
  • Tufte, E. R. (1983), The Visual Display of Quantitative Information, Graphics Press.