STILUS-CLUSTER

Text automatic clustering

STILUS Cluster is a system for text automatic clustering.

It aims at finding, from a set of texts, groups of similar documents.

STILUS-Cluster is a component of the STILUS linguistic technology products family that offers a functionality for the text automatic clustering in natural language.

STILUS-Cluster implements an optimization of the usual algorithm of the K-Medias, modified with a definition of distance between elements that allows to take into account the main terms of a text and the auxiliary ones (for example, for the news clustering, the terms in the title of the article and those in the lead-in, respectively). To determine the ideal intracluster and intercluster distances, STILUS-Cluster incorporates an algorithm of search based on the density of the cluster (the medoid-based average distance of the elements), that makes possible to establish the desired size of the cluster (and therefore, the degree of similarity between the elements composing it).

As a result of the clustering process, STILUS-Cluster gives you a list of the groups found, with their sizes, their density and the list of the more representative (balanced) terms, as well as of the different texts that are part of this group.

White paper on Language Technologies

Download >>

Showroom

Try our products and demonstrators.

Showroom >>