Detalhes do Documento

Introducing the Portuguese web archive initiative

Autor(es): Gomes, Daniel cv logo 1 ; Nogueira, André cv logo 2 ; Miranda, João cv logo 3 ; Costa, Miguel cv logo 4

Data: 2009

Origem: Repositório Comum

Assunto(s): Archive; Portugal; Preservation; History


Descrição
This paper introduces the Portuguese Web Archive initiative, presenting its main objectives and work in progress. Term search over web archives collections is a desirable feature that raises new challenges. It is discussed how the terms index size could be reduced without significantly decreasing the quality of search results. The results obtained from the first performed crawl show that the Portuguese web is composed approximately at least by 54 million contents that correspond to 2.8 TB of data. The crawl of the Portuguese web was stored in 2 TB of disk space using the ARC compressed format.
Tipo de Documento Artigo
Idioma Inglês
delicious logo  facebook logo  linkedin logo  twitter logo 
degois logo
mendeley logo

Documentos Relacionados



    Financiadores do RCAAP

Fundação para a Ciência e a Tecnologia Universidade do Minho   Governo Português Ministério da Educação e Ciência Programa Operacional da Sociedade do Conhecimento União Europeia