A conceptual clustering approach for user profiling in personal information agents

Abstract

Information agents have emerged in the last decade as an alternative to assist users to cope with the increasing volume of information available on the Web. In order to provide personalized assistance, these agents rely on having some knowledge about users contained into user profiles, i.e., models of users preferences and interests gathered by observation of user behavior. User profiles have to summarize categories corresponding not only to diverse user information interests but also to different levels of abstraction in order to allow agents to decide on the relevance of new pieces of information. In accomplishing this goal, the discovery of interest categories using document clustering offers the advantage that an a priori knowledge of user interests is not needed, therefore the process of acquiring profiles is completely unsupervised. However, most document clustering algorithms are not applicable to the problem of incrementally acquiring and modeling interests because of either the kind of solutions they provide, which do not resemble user interests, or the way they build such solutions, which is generally not incremental. In this paper we describe and evaluate a document clustering algorithm, named WebDCC (Web Document Conceptual Clustering), designed to support learning of user interests by personal information agents. WebDCC algorithm carries out incremental, unsupervised concept learning over Web documents with the goal of building and maintaining both accurate and comprehensible user profiles. Empirical evaluation of using this algorithm for user profiling and its advantages with respect to other clustering algorithms are presented.

Publication
AI Communications