Author(s):
Silva, Bruno
; Marques, Nuno
Date: 2014
Origin: Repositório Comum
Subject(s): SOM; Ubiquitous environments; UbiSOM
Description
Com o apoio RAADRI. Knowledge discovery in ubiquitous environments are usually conditioned by the data stream model, e.g., data is potentially
infinite, arrives continuously and is subject to concept drift. These factors present additional challenges to standard
data mining algorithms. Artificial Neural Networks (ANN) models are still poorly explored in these settings. State-of-the-art methods to deal with data streams are single-pass modifications of standard algorithms, e.g., Kmeans for clustering, and involve some relaxation of the quality of the results, i.e., since the data cannot be revisited to refine the models, the goal is to achieve good approximations [Gama, 2010]. In [Guha et al., 2003] an improved single pass k-means algorithm is proposed. However, k-means suffers from the problem that the initial k clusters have to be set either randomly or through other methods. This has a strong impact on the quality of the clustering process. CluStream [Aggarwal et al., 2003] is a framework that targets high-dimensional data streams in a two-phased approach, where an online phase produces micro-clusterings of the incoming data, while producing on-demand offline models of data also with k-means. In this position paper we address the use of Self-Organizing Maps (SOM) [Kohonen, 1982] and argue its strengths over current methods and directions to be explored on its adaptation to ubiquitous environments, which involve dynamic estimation of the learning parameters based on measuring concept drift on, usually, non-stationary underlying distributions. In a previous work [Silva and Marques, 2012] we presented a neural network-based framework for data stream mining that explored the two-phased methodology, where the SOM produced offline models. In this paper we advocate the development of a standalone Ubiquitous SOM
(UbiSOM), that is capable of producing models in an online fashion, to be integrated in the framework. This allows derived
knowledge to be accessible at any time.