Similarity thesauri
This code was used to produce examples for my talk
Similarity thesauri and cross-language retrieval. More
information about this talk is on my studies page; a handout is also
available.
The example code was written in Python; it requires Python version 2.2
or higher.
This file contains the following classes:
- Token
- This class provides the link between items and features.
- Tokenizable
- This is the super class for all objects that can
be decomposed into tokens.
- TokString
- The simplest kind of document, consisting of just
a string
- Properties
- This class is derived from UserDict and
implements the subsumption order on feature structures as operators <=
and >=.
- Document
- Documents can have additional properties, for
example their language. Furthermore, they can be composed of other documents.
- IndexComp
- This is the common super class of Item
and Feature. The constructor takes a Properties object
as argument and either returns a previously constructed object with the
same properties, or constructs a new one.
- Item
- This class was derived from IndexComp without any changes.
- Feature
- This class was derived from IndexComp without any changes.
- IRstruct
- This class provides the basic functions for IR
systems, for example weighting methods and storage of items and
features.
- SimThes
- This class implements the construction of a
similarity thesaurus as described in the handout.
- SimThes_CL
- A class implementing a cross-language similarity
thesaurus. This changes only the output functions.
Most classes contain methods .asTeX and .asMP that
produce TeX and MetaPost snippets describing the object.
This file contains the documents used to construct the examples in
the handout.
This file constructs two similarity thesaury from the documents in
docs.py and writes the corresponding TeX and Metapost snippets
to files.
Copyright © 1999--2004
Sebastian Marius Kirsch
webmaster@sebastian-kirsch.org
, all rights reserved.
Id: index.wml,v 1.3 2004/05/26 10:05:29 skirsch Exp