Document details

Efficient clustering of web-derived data sets

Author(s): Luís António Diniz Fernandes de Morais Sarmento cv logo 1 ; Eugénio da Costa Oliveira cv logo 2 ; Alexander P. Kehlenbeck cv logo 3 ; Lyle Ungar cv logo 4

Date: 2009

Persistent ID: http://hdl.handle.net/10216/15175

Origin: Repositório Aberto da Universidade do Porto

Subject(s): Ciências Físicas; Ciência de computadores; Informática


Description
Many data sets derived from the web are large, high-dimensional, sparse and have a Zipfian distribution of both classes and features. On such data sets, current scalable clustering methods such as streaming clustering suffer from fragmentation, where large classes are incorrectly divided into many smaller clusters, and computational efficiency drops significantly. We present a new clustering algorithm based on connected components that addresses these issues and so works well on web-type data.
Document Type Conference Object
Language Portuguese
delicious logo  facebook logo  linkedin logo  twitter logo 
degois logo
mendeley logo

Related documents



    Financiadores do RCAAP

Fundação para a Ciência e a Tecnologia Universidade do Minho   Governo Português Ministério da Educação e Ciência Programa Operacional da Sociedade do Conhecimento EU