INFORMATION EXTRACTION
The information extraction includes the algorithms, methods and processes directed toward the identification of information in a text. The possibility of finding elements in the text will make the representation of their semantic content easier. The four processes described below provide several data of a text that make its understanding easier:
Identification of structures. It aims at finding in a text some very concrete information that usually adopts similar structures. This makes it possible to use standards combining structure and linguistic information. For example, our Information Prospector demo explores websites to get automatically telephone numbers, postal addresses, email addresses, references to other websites and, in general, any contact data being in the website.
Identification of keywords. Apart from recognizing structures, it is interesting to determine automatically which words of a text are more adequate to characterize it, that is, which words must be chosen as possible keywords. The correct combination of the frequency of a word in a text and its global frequency in the web is a sign of the adaptation of this word to represent the complete text. The Tags Generator (SEO) demo would be of interest for the companies dedicated to the Search Engine Optimization.
Named Entity Recognition. The possibility of recognizing automatically the presence of a proper noun is one of the more useful applications of the information extraction. The Daedalus technology goes a little beyond and makes it possible to distinguish when you speak about a person, an organization, a place name, etc., and including when you make a coreference of entities, for example, by recognizing that the names 'Obama' and 'Barack Obama' refer to the same person. In the Named Entity Recognition demo you can test this functionality, included in the product STILUS NER (Named Entity Recognition). Among our clients who are using this technology there is Acceso Group.
- Generation of summaries. The linguistic process of the text makes it possible to determine which the key parts to understand its content are. Daedalus has the technology needed to analyze the content of a text, by building a summary with the most relevant sentences. This process incorporates a set of configuration parameters that makes it possible to make good quality summaries for different types of documents (news, legislative texts, internal company documents, etc.)
Don't forget you can test all the Daedalus technology through our Showroom.
