STILUS Sem

Semantic Library

STILUS Sem is a software library as well as a group of linguistic resources to make the semantic expansion of text. Among other functionalities, the semantic expansion system makes it possible to:

  • Get so much semantic information of a word as its themes, type of entity, reference to other entities or geographical information
  • Enlarging a chain of text adding linked terms through synonymy (words with the same meaning), antonymy (words with an opposite meaning) or words linked semantically

Semantic analysis

In its resources, STILUS Sem includes different semantic information. Schematically the lexical entries can be accompanied by the following semantic features:

<type of entity> <themes> <reference> <geographical information> <link>

These features can be open, if they include free textual information, or they can be determined by a series of values, many times dependent of a hierarchical organization. These values derive from the Sekine's hierarchical classification for named entities (Sekine's Extended Named Entity Hierarchy), which has been slightly modified and reinterpreted to tag in STILUS as many named entities as common nouns.

More in detail, the different semantic features code the following structured information:

  • <type of entity>: it refers to the type of reality representing a term. It includes the following sub-features:
    1. <class of entity>: (instance | class | sub-class)

      • Donald -> SemEntity=@inst@...
      • pájaro ->SemEntity=@class@...
      • pato -> SemEntity=@subc@...

    1. <fiction>: (fiction | non-fiction | undefined)

      • Mickey Mouse -> SemEntity=@inst@fiction@...
      • Copito de Nieve ->SemEntity=@inst@nofiction@...
      • Mahoma ->SemEntity=@inst@undef@...

  • <themes>: it refers to the discipline or environment of use of a term. In the first level of hierarchy of classification, you can find the following concepts:

    • BASIC_SCIENCES
    • SOCIAL_SCIENCES
    • HUMANITIES
    • NATURAL_SCIENCES
    • LIFE_SCIENCES
    • TECHNOLOGY
    • SOCIETY
    • ARTS
    • SPORT

    The hierarchy has a second level, for example, within BASIC_SCIENCES, you would find: CHEMISTRY, PHYSICS, GEOMETRY and MATHEMATICS.
  • <reference>: reference to other forms. It includes three features:
    1. <thematic reference>: it refers to the canonical form that is the ‘official’, ‘scientific’ or ‘more complete’ form of the term:

      • ABC->SemRemission=@American_Broadcasting_Companies@@
      • abedul->SemRemission=@betula_verrucosa@@
      • Clarín->SemRemission=@Leopoldo_Alas@@

    1. <reference to preferred variant>: it refers to a preferred spelling variant:

      • cardiaco->SemRemission=@@cardíaco@
      • acogimiento->SemRemission=@@acogida@

    1. <reference to not preferred variant>: it refers to (an)other spelling variant(s) equally admitted, but less preferred or frequent:

      • cardíaco->SemRemission=@@@cardiaco
      • acogida->SemRemission=@@@acogimiento
  • <geographical information>: it refers to geographical information. It includes the following sub-features:

    1. <district> “@” <town> “@” <province> “@” <region> “@” <community> “@” <country> “@” <continent>
    1. <international>: (+ | ONU)
    1. <historical place>
  • <link>: it indicates the type of link between the tagged term and the referred concept. In the first level of hierarchy you can find:

    • STRUCTURAL_RELATION
    • ORGANIZATIONAL_RELATION
    • HUMAN_RELATION
    • GPE_REF
    • GPE_MEMBER

    There is a second level: for example, for HUMAN_RELATION, there are ORGANIZATION_AFFINITY and PERSON_AFFINITY.

There will only be explicit information of the semantic features that have been effectively tagged. On the other hand the lack of information in different features is pertinent, as it indicates how high the value is in a hierarchy, or it separates possible sub-values associated to a tagged semantic feature. The possibility of an entry of incorporating one or another type of semantic features will depend on the grammar category of this lemma. For example, an adjective will never be able to include information on the type of entity, but it will be able to do it on the themes or spelling variants.

Expansion with synonyms, antonyms and linked words

The base of knowledge of synonyms stores the different possible meanings of a word and, for each one of them, the synonyms associated (to this meaning). In this way, the semantic expansion system would be able to associate to a word all its synonyms or only the ones having a specific meaning (and that the application's user could have chosen, for example). The mentioned base of knowledge of synonyms as well as the other linguistic resources compiled and developed by Daedalus is constantly maintained by its linguistic team.

Implementation

The semantic expansion system is packed as a programming interface offering different functions as the semantic analysis or the search for meanings and synonyms of a word. The component is developed in the C/C++ programming language and it is available for Unix/Linux or Microsoft Windows platforms. In this last case different options can be considered, either ActiveX or COM, DLL object or static library, or any other possibility.

 

White paper on Language Technologies

Download >>

Showroom

Try our products and demonstrators.

Showroom >>