TASS is an experimental evaluation workshop for sentiment analysis and online reputation analysis focused on Spanish language, organized by Daedalus, Universidad Politécnica de Madrid and Universidad de Jaén, as a satellite event of the annual SEPLN Conference. After a successful first edition in 2012, TASS 2013 [http://www.daedalus.es/TASS2013] is going to be held on Friday September 20th, 2013 at Universidad Complutense de Madrid, Madrid, Spain. Attendance is free and you are all welcome to participate.
The long-term objective of TASS is to foster research in the field of reputation analysis, which is the process of tracking, investigating and reporting an entity’s actions and other entities’ opinions about those actions. The rise of social media such as blogs and social networks and the increasing amount of user-generated contents in the form of reviews, recommendations, ratings and any other form of opinion, has led to creation of an emerging trend towards online reputation analysis, i.e., the use of technologies to calculate the reputation value of a given entity based on the opinions that people show in social media about that entity. All of them are becoming promising topics in the field of marketing and customer relationship management.
As a first approach, reputation analysis has two technological aspects: sentiment analysis and text classification (or categorization). Sentiment analysis is the application of natural language processing and text analytics to identify and extract subjective information from texts. Automatic text classification is used to guess the topic of the text, among those of a predefined set of categories or classes, so as to be able to assign the reputation level of the company into different facets, axis or points of view of analysis.
The setup of the workshop is based on a series of challenge tasks based on two provided corpus, specifically focused on Spanish language, which are intended to promote the application of existing state-of-the-art and new proposals of algorithms and techniques in these fields and provide a benchmark forum for comparing the latest approaches. In addition, with the creation and release of the fully tagged corpus, we aim to provide a benchmark dataset that enables researchers to compare their algorithms and systems.
Two corpus were provided:
- The General corpus contains over 68 000 Twitter messages, written in Spanish by about 150 well-known personalities and celebrities of the world of politics, economy, communication, mass media and culture, between November 2011 and March 2012.
- The Politics corpus contains 2 500 tweets, gathered during the electoral campaign of the 2011 general elections in Spain (Elecciones a Cortes Generales de 2011), from Twitter messages mentioning any of the four main national-level political parties: Partido Popular (PP), Partido Socialista Obrero Español (PSOE), Izquierda Unida (IU) y Unión, Progreso y Democracia (UPyD).
All messages are tagged with its global polarity, indicating whether the text expresses a positive, negative or neutral sentiment, or no sentiment at all. 5 levels have been defined: strong positive (P+), positive (P), neutral (NEU), negative (N), strong negative (N+) and one additional no sentiment tag (NONE). In addition, there is also an indication of the level of agreement or disagreement of the expressed sentiment within the content, with two possible values: AGREEMENT and DISAGREEMENT. Moreover, a selection of a set of topics has been made based on the thematic areas covered by the corpus, such as politics, soccer, literature or entertainment, and each message has been assigned to one or several of these topics. More information on these corpus will be included in future posts.
Four tasks were proposed for the participants, covering different aspects of sentiment analysis and automatic text classification:
- Task 1: Sentiment Analysis at Global Level. This task consists on performing an automatic sentiment analysis to determine the global polarity (using 5 levels) of each message in the test set of the General corpus.
- Task 2: Topic Classification. The technological challenge of this task is to build a classifier to automatically identify the topic of each message in the test set of the General corpus.
- Task 3: Sentiment Analysis at Entity Level. This task consists on performing an automatic sentiment analysis, similar to Task 1, but determining the polarity at entity level (using 3 polarity levels) of each message in the Politics corpus.
- Task 4: Political Tendency Identification. This task moves one step forward towards reputation analysis and the objective is to estimate the political tendency of each user in the test set of the General corpus, in four possible values: LEFT, RIGHT, CENTRE and UNDEFINED. Participants could use whatever strategy they decide, but a first approach could be to aggregate the results of the previous tasks by author and topic.
31 groups registered (as compared to 15 groups in TASS 2012) and 14 groups (9 last year) sent their submissions. Participants were invited to submit a paper to the workshop in order to describe their experiments and discussing the results with the audience in the regular workshop session.
If you feel curious about the approaches adopted by the different groups and the results achieved in each Task, you are very welcome to attend the session on Friday September 20th, 2013 at Universidad Complutense de Madrid!
Or stay tuned for future posts that will provide valuable information and conclusions.