Let's examine a simple indexed repository of data, that would store the following details:

  • Content (c)
  • And ID associated with the content (id)
  • A list of keywords associated with it (K(id))

    ((c,id),K(id))

  • As a client of that repository, we might be interested in the following views:

    k[i],document_count(k[i]), where k[i] belongs to K

    This view provides a limited knowledge about the topics covered in the content repository

    k[i],document_IDs(k[i])

    This view provides a reverse mapping for content retrieval

    keyword_proximity(k[i],k[j])

    Since the keyword domain is significantly smaller than the data domain, it is possible to export the kxk mapping that tells us about the topics covered in the content repository. The kxk mapping is in the form of the function: keyword_proximity(k,k) = log( doccount(k,k)/totaldoccount )

    document_proximity(id,id[i]) (*maxdist)

    This view provides information about relevant documents to the selected one. This view should only be accessed through a "distance vewiport", as it can be quite extensive. The distance viewport is a number determining the radius within which related document ids are returned. The distance is computed as follows: document_proximity(id,id[i]) = 1 - 2*size( K(id,id[i]) )/( size( K(id) ) + size( K(id[i]) ), where K(id,id[i]) is the intersection of K(id) and K(id[i])

    keyword_proximity(id[i],id[j]) (*K)

    This view holds an even larger amount of data and should only be accessed through a "key viewport". The key viewport is set via a list of keywords, and consists of all the IDs that are indexed by the keywords in the list. The distance is computed as follows: keyword_proximity(id[i],id[j]) = 1 - 2*size( K(id[i],id[j]) )/( size( K(id[i]) ) + size( K(id[j]) ) ), where K(id[i],id[j]) is the intersection of K(id[i]) and K(id[j])

    back