Logo UC3M
Logo IT
[WebTLab] / [Results] / ITACA evaluation
Bandera española
WebTLab image
Web Technologies Laboratory
Within GAST group

ITACA evaluation


Itaca dataset is designed for evaluating web search when word sense disambiguation is applied. It consists of 20 queries, each of one with a set of terms disambiguated with Wikipedia entries. The Itaca dataset consists of 3 files where each row is terminated by a newline and fields are separated by an space. The files are described below.


It contains the query ID and the string of that query:

ID String
1 iphone features
2 ipad features


It contains the query ID and a description of its goal:

ID Goal
1 General features of an Iphone.
2 General features of an Ipad.
3 Information about the Journal Citation Reports publication.


It contains the term ID for that query (formed by the query ID and the term number) and the URL of the Wikipedia concept that term refers to:

ID Term Disambiguation
1.1 iphone http://en.wikipedia.org/wiki/Iphone
1.2 features http://en.wikipedia.org/wiki/Feature_(software_design)


Author: Damaris Fuentes-Lorenzo
The author can not guarantee the correctness of the data, its suitability for any particular purpose, or the validity of results based on the use of the data set, and author assumes no responsibility for the content, legality, reliability, and accuracy of the data.
The data set may be used for any research purposes.
Please acknowledge the use of the data set in publications resulting from the use of the data set:
Damaris Fuentes-Lorenzo. (2011). Itaca dataset, http://webtlab.it.uc3m.es/results/ITACA/evaluation.html


Itaca dataset, zip.
Valid XHTML 1.0 Transitional icon||Valid CSS icon||Triple-A level of conformity icon||Tawdis version 3 icon