Detalhes do Documento

An Approach to Web-scale Named-Entity Disambiguation

Autor(es): Luís António Diniz Fernandes de Morais Sarmento cv logo 1 ; Eugénio da Costa Oliveira cv logo 2

Data: 2009

Identificador Persistente: http://hdl.handle.net/10216/15161

Origem: Repositório Aberto da Universidade do Porto

Assunto(s): Ciências Físicas; Ciência de computadores; Informática


Descrição
We present a multi-pass clustering approach to large scale, wide-scope named-entity disambiguation (NED) on collections of web pages. Our approach uses name co-occurrence information to cluster and hence disambiguate entities, and is designed to handle NED on the entire web. We show that on web collections, NED becomes increasingly difficult as the corpus size increases, not only because of the challenge of scaling the NED algorithm, but also because new and surprising facets of entities become visible in the data. This effect limits the potential benefits for data-driven approaches of processing larger data-sets, and suggests that efficient clustering-based disambiguation methods for the web will require extracting more specialized information from documents.
Tipo de Documento Documento de conferência
Idioma Português
delicious logo  facebook logo  linkedin logo  twitter logo 
degois logo
mendeley logo

Documentos Relacionados



    Financiadores do RCAAP

Fundação para a Ciência e a Tecnologia Universidade do Minho   Governo Português Ministério da Educação e Ciência Programa Operacional da Sociedade do Conhecimento União Europeia