Please, let me know if you find any error in the following information.
| Evaluation | |
| trec_eval |
trec_eval trec_eval.8.1.tar.gz trec_eval.8.0.tar.gz trec_eval.7.3.tar.gz trec_eval.7.0beta trec_eval.v3beta trec2_eval trec_eval_hp trec1_eval The software for doing IR system evaluation. Links: |
| 3 IR tools tested or in use within the RIM team | |
![]() |
Zettair team site tool site Previously known as lucy. Zettair is a (small) set of software written in language C for text indexing and retrieval. Comment: The index format is very easy to understand. It is easy to add its own weighting scheme too. The straightforward programming style makes easy to add other features (in indexing for instance). |
![]() |
mg tool site book site MG is an open-source compressing, indexing and retrieval system for text, images, and textual images. It is written in language C. Development discontinued since August 1999 Comment: The book does not help too much to understand the software, but anyway it is a very good book on both compression and information retrieval. The software is more difficult to extend than Zettair because there is heavy use of (complex) macros to tackle with the compression features. But we succeeded in some extensions by inserting our own code in some key points, both in the indexing and the querying phases. However, it is very difficult to create new code to directly access to the index (again this is due to the complex compression mechanisms in use). Links:
|
| smart |
smart tool site Smart implements the basic vector model of information retrieval. It is possible to experiment with different weighting schemes. It is written in language C. Development discontinued since 1992. Comment: Not easy to install. The configuration mechanism is difficult to understand. The configuration process is error prone. Some (badly) documented features actually don't work. Extensions that fit well in the vector model are not too difficult but it is quite impossible to add other ones. Links: Because this software is difficult to use and its internal documentation is not good, here are some links on how to use it.
|
| 7 softwares not tested | |
![]() |
Cheshire tool site (Mainly C) (Most recently modified file: 2005-01-13 in V2.41) A Next-Generation Online Catalog and Full-Text Information Retrieval System. |
|
DataparkSearch Engine tool site (C) (Most recently modified file: 2005-12-01 in V4.35) DataparkSearch Engine is a full-featured open sources web-based search engine released under the GNU General Public License and designed to organize search within a website, group of websites, intranet or local system. | |
|
Lemur tool site (C++) The Lemur Toolkit for Language Modeling and Information Retrieval. | |
|
Lucene tool site (Java) (Most recently modified file: 2004-11-29 in V1.4.3) Lucene is a high-performance, full-featured text search engine library written entirely in Java. | |
|
Senga tool site (Mainly C++) Senga is a development group focused on information retrieval software.
| |
|
Terrier tool site (Java) (Last version 1.0.2) (Most recently modified file: 2005-03-17 in V1.0.2) Terrier is a software for the rapid development of Web, intranet and desktop search engines. More generally, it is a modular platform for the rapid development of large-scale Information Retrieval applications, providing indexing and retrieval functionalities. | |
|
Wumpus tool site (C++) (Most recently modified file: 2005-11-30 in V2005-11-30) Wumpus is an information retrieval system. Its main purpose is to study issues that arise in the context of indexing dynamic text collections in multi-user environments. | |
|
Xapian tool site (C++) (Most recently modified file: 2005-07-15 in V 0.9.2) Xapian is an Open Source Probabilistic Information Retrieval library, released under the GPL. It's written in C++. Features: Ranked probablistic search, Relevance feedback, Phrase and proximity searching, Structured boolean search operators, Stemming (Danish, Dutch, English, Finnish, French, German, Italian, Norwegian, Portuguese, Russian, Spanish, and Swedish). | |
| 1 library not tested | |
| bow |
Bow library site Bow (or libbow) is a library of C code useful for writing statistical text analysis, language modeling and information retrieval programs. The current distribution includes the library, as well as front-ends for document classification (rainbow), document retrieval (arrow) and document clustering (crossbow). |