Detalhes do Documento

Using Neighbors to Date Web Documents

Autor(es): Sérgio Nunes cv logo 1 ; Cristina Ribeiro cv logo 2 ; Gabriel David cv logo 3

Data: 2007

Identificador Persistente: http://hdl.handle.net/10216/5255

Origem: Repositório Aberto da Universidade do Porto

Assunto(s): Ciências tecnológicas; Tecnologia; Tecnologia da informação


Descrição
Time has been successfully used as a feature in web information retrieval tasks. In this context, estimating a document's inception date or last update date is a necessary task. Classic approaches have used HTTP header fields to estimate a document's last update time. The main problem with this approach is that it is applicable to a small part of web documents. In this work, we evaluate an alternative strategy based on a document's neighborhood. Using a random sample containing 10,000 URLs from the Yahoo! Directory, we study each document's links and media assets to determine its age. If we only consider isolated documents, we are able to date 52% of them. Including the document's neighborhood, we are able to estimate the date of more than 85\% of the same sample. Also, we find that estimates differ significantly according to the type of neighbors used. The most reliable estimates are based on the document's media assets, while the worst estimates are based on incoming links. These results are experimentally evaluated with a real world application using different datasets.
Tipo de Documento Documento de conferência
Idioma Português
delicious logo  facebook logo  linkedin logo  twitter logo 
degois logo
mendeley logo

Documentos Relacionados



    Financiadores do RCAAP

Fundação para a Ciência e a Tecnologia Universidade do Minho   Governo Português Ministério da Educação e Ciência Programa Operacional da Sociedade do Conhecimento União Europeia