Proofreading, part of the ecosystem of the economic agents of Spanish

To define the ecosystem of the economic agents of Spanish, we should consider linguistic correctness as a corporate asset and good practice.

In a world where the image of a company is strictly linked to its acts of communication, the concern for language cannot be limited only to publishing houses. The generation of web content and the participation in social networks are essential conditions to be competitive. The production of quality content positions a company as a reference in the market and attracts potential customers to the brand; the neglect of linguistic correctness may lead to loss of credibility and prestige or indifference after an unsuccessful act of communication.

But the publication of content on the Internet has also another target recipient: Google. With respect to SEO positioning, it is required to write correctly in order to be visible. Not only is it important to use the keywords you would use to retrieve information, you must also bear in mind that Google penalizes spelling and grammar mistakes.

Moreover, linguistic correctness increases the quality in the processes carried out by the so-called language industries, and therefore benefits their business. The challenge in natural language processing is precisely being able to “translate” what the user writes -or intends to write- often with spelling mistakes, which implies the description of linguistic rules and patterns supervised by professionals.

Philologists, proofreaders and language consultants, besides ensuring our image, help to monetize information processing. Their participation in multidisciplinary teams provides added value. The Spanish company Daedalus, for example, has managed to position its proofreading tool as a leading technology for automatic proofreading thanks to this kind of professional interaction. Stilus helps to optimize the working time in professional editing, translation and proofreading.

But what types of customers request proofreading services? After evaluating the results of surveys carried out to proofreaders and consultants, UniCo (the association of Spanish proofreaders) explains in its Libro blanco (White Paper) that publishers are still their main source of income. Nevertheless, there is a gradual increase of “direct customers”, i.e. writers who self-publish, PhD students preparing their thesis, etc., something that: “all professionals who wish to increase their customer portfolio should assess”. It also highlights that the proofreading of websites generates ever more jobs, while other sectors such as advertising do not present any changes, perhaps because of “the lack of knowledge about how beneficial a relationship between the two would be”.


If we consider automatic proofreading consumers, media companies stand out as large customers. Speaking of SMEs, freelancers and individual customers, writers, Spanish students and translators are those who head the list (according to



In view of this, it seems that among the business opportunities for both types of service (human and automatic), the essential ones are the cooperation in self-publishing scenarios and Quality Assurance in translation (pre-editing and post-editing of machine translation, from a technological point of view).




Conclusion: although slowly, the interest in proofreaders and language consultants, and their social recognition are visibly increasing.

More in general, business opportunities in the Spanish corporate community respond to an uncertain reality subject to economic changes, technological advances and new areas of interest. Anyway, no matter the context, it is evident that the neglect of linguistic correctness in the information society may result in a loss of value or prestige, even ruining a business.


Detection of trends: discovering your “unknown unknowns”

Media monitoring focusing on well-known topics that the public considers relevant is imperative. But it is more important to detect what has been still unknown and it is now emerging. The automatic discovery of topics and the analysis of trends help us to understand what we don’t know we don’t know.

Monitoring of –both social and traditional- media and other sources of information usually tracks and analyzes a “focus” established a priori: a set of topics and aspects (people, companies, brands, industries…) known and predefined. This might be enough in scenarios where innovations and changes are not frequent… i.e. not a single one, at the present time.

Indeed, monitoring the conversation about something which everybody agrees that is worth tracking is not enough anymore. There are plenty of cases in which what is actually important is to discover the emergence of new topics (from news or rumors) with the potential to become relevant. To take advantage of this value, it is necessary to perform an early warning that enables to put those topics “on the radar” and to identify and understand the trend as soon as possible. In other words, we need tools to discover our “unknown unknowns”.

Trend DetectionAt Daedalus we are providing solutions both for discovering new topics –trend candidates– and for the analysis and understanding of the active trends.

Discovering and identifying new trends

A community or a market may be talking in “business as usual” mode with a few topics, keywords, frequencies… more or less known and stable. But suddenly people may start to talk about a topic or issue actually new (or there are known topics that multiply their frequency), even unknown terms towards which the conversation gravitates.

This is the situation that can indicate the emergence of a new relevant topic or concept. It is important to discover and validate these emerging topics as soon as possible because they might constitute market-moving information or cause a crisis of any nature (incident, reputational crisis).

For example, for a provider of financial market information it may be important to detect in forums and social networks rumors about the unexpected merging of two companies, even a few minutes before that such information “becomes news” and begins to appear on the screens of Bloomberg or Thomson Reuters. Similarly, for civil protection or emergency management services it is essential to discover as soon as possible that people are starting to talk about a mass gathering or a potentially dangerous incident.

But the discovery of our “unknown unknowns” is not limited to social media. It is the case of agencies and corporate departments engaged in user experience management activities and in the analysis of the voice of the customer which comes from contact centers interactions, satisfaction surveys, etc. For them, as important as analyzing the voice of the customer according to predefined categories and topics (e.g. activities and departments of the company, its products and brands, competitors) is to discover the “new voice”, topics that were not on the agenda and emerge as relevant.

From a technical point of view, the detection of bursty keywords and clustering are useful tools for the discovery of possible trends. To identify a topic it is necessary to recognize the concepts mentioned in a text, examine how often do these concepts appear in a set of texts and how do they co-occur.

These processes enable to group concepts in topics or themes. But those concepts can be expressed in many ways, some are multiword terms and, in all cases, a normalization process is indispensable for identifying unique concepts. To facilitate the understanding of topics, it is necessary to choose a concept representing them, taking into account aspects such as the frequency of use of that concept or if it stands for a named entity.

Nevertheless, recognizing a trend and identifying a topic are not the same thing. A trend is an interesting evolution in time of a certain topic. For that purpose, it has to be considered the probability, a priori, of a topic to appear a number of times, in a specific time, in a set of conversations. If that probability is exceeded by a considerable margin, the topic can be of interest. If that behavior is maintained for several periods of time, we have a trend.

Most of the existing algorithms that identify topics are not prepared to recognize also trends, i.e. to take into account the time variable (at least with a moderate computing time). On the other hand, none of them relies on linguistic information to standardize the references to the concepts of a topic and, on many occasions, they do not even allow to use multiword terms to represent concepts. For this reason, at Daedalus we have developed our own extensions for the topic detection algorithms, up to the point of enabling them to discover trends.

Monitoring and understanding trends

In order to monitor and understand what we have identified as a potential trend, we must be able to define its “meaning pattern”: which thematic categories, keywords, entities, concepts… do define it. Done this, technology allows us to detect all conversations that respond to that pattern and to group them automatically.

We should be able to perform an exhaustive and aggregated analysis on these conversations: identify the sentiment (positive, negative) associated with each comment’s aspect, discover the perception of the community in relation to these trends (the concepts semantically related with it that appear more often), segment users and see the evolution in time of the trend.

At Daedalus we are applying this analysis, for example, to monitor emergencies in the physical world (accidents, gatherings) and build a social dashboard with aggregated information about them.

Finally, we must be able to define alerts on the evolution of the trend in order to act quickly and accordingly, e.g. detecting that the conversation volume about that trend exceeds a maximum or minimum threshold, that its associated sentiment reaches extreme positive or negative polarity, that comes into contact with (and can “contaminate”) certain entities, etc.

The semantic analysis permits us to not only discover and identify new topics, but also to monitor and understand their evolution in order to focus on what is more relevant at any given moment. (And we remind you that Textalytics, our Meaning as a Service product, is the easiest, less risky and most affordable way to embed semantic analysis in your applications.)



Stilus Macro for Word: the new add-in by Daedalus that corrects “safe errors” at the push of a button

Recently, the company Daedalus has released a new version of Stilus, its tool to proofread spelling, grammar and style in Spanish. Innovations include Stilus Macro for Word, a new add-in that enables to autocorrect hundreds of thousands of context-independent errors at the push of a button.


Stilus Macro for Word, a “precise and effective” automatic proofreader

Stilus’ contextual and semantic technology for proofreading texts in Spanish has enabled to isolate hundreds of thousands of context-independent writing mistakes. The automated correction of this type of “safe errors” not only speeds up the first phases of the proofreading of orthotypography and style, but also carries it out with a very high precision. This is the reason why Stilus Macro for Word is the first add-in of Stilus that not only checks a text, but also performs actions directly on it.

Stilus Macro for Word


It runs thousands of error patterns without making the user lose control of the proofreading

Once you press the button to start the proofreading, Stilus Macro downloads and verifies on the document almost 200 000 error patterns of spelling and orthotypography. While the mandatory corrections are applied directly in the text, the recommended ones are inserted as comments and properly supported by references. In addition, all these actions are performed activating Microsoft Word’s Track Changes functionality, so that the user does not lose at any time the control of the proofreading process.

Settings Stilus MacroAs we will see, Stilus Macro enables to add “personal” search & replace instructions that take priority over the default ones during the process. This, together with its linguisitc options, allows to perform a completely personalized, safe and quick first cleaning of the text.


It helps to optimize the working time in professional editing and proofreading environments

Thanks to an intelligent language processing, Stilus Macro verifies on the text tens of thousands error patterns at high speed, and applies the corresponding substitutions by simply pushing a button. The immediate correction of hundreds of errors complements the advanced capabilities of Stilus for Word. The combined use of both add-ins, first Stilus Macro and then Stilus for Word (to detect possible contextual and grammatical errors) reduces considerably the time usually dedicated to the first automatic cleanings of text. Authors, proofreaders, translators, editors… optimize their working time and consequently increase their productivity.


FIRST STEP: Stilus Macro for Word

  • Set the proofreading parameters according to your or your client’s preferences.
  • Run the proofreading and wait for the result to appear with Word’s Track Changes activated.
  • Analyze and validate the output. Reject the changes that you do not approve and include the ones you consider appropriate among those proposed as comments.
  • Take advantage of the first processes to adjust and complete the coverage with your personal macros. You will achieve higher productivity on further occasions.
  • Finally, select the option “Accept All Changes in Document” in Microsoft Word. You will obtain a first cleaning of the text.

SECOND STEP: Stilus for Word

  • Set the proofreading parameters.
  • Run the proofreading.
  • Correct the remaining contextual and grammatical errors.


But you can take advantage of the tool a lot more, it’s in your hands! The customization of the personal macros dictionary (for example, turning the changes proposed by Stilus Macro as comments into direct ones) enables to adjusts the default output, which can progressively reduce the time dedicated to the manual validation. The more defined the process is, the more productive the use of the application will be.

Stilus Macro for Word proofreads 200 pages in less than 5 minutes!


Notes on the customization of the personal macros dictionary

One of the essential innovations of Stilus Macro, if compared to other modalities of Stilus, is that it enables to incorporate personal substitutions which have priority over the default ones in the autocorrection process. This functionality, apart from widening the coverage of the automatic proofreader, can also be used to “nullify” (replacing a form with the same) or “redirect” the software’s operations in a personalized way (setting a different replacement with respect to the one assigned by default).

At this point, the experts in the macro editing technique may consider that manipulating these aspects at leisure might be dangerous, taking into account the troubles implied by setting replacements that ignore the limitations defined with regular expressions in the context “word” (and not simple “strings” which could match “parts” of the word, for example). It seems logic, but they should not worry; in the development of the tool the issue has been faced intelligently in such a way that any proposed replacement is considered a ‘textual unit’, so that the user does not need to worry about unwanted replacements. Nevertheless, two considerations must be taken into account when using this first version of the add-in: personal replacements are “case sensitive” (i.e. they will be formally literal) and, secondly, at the moment Stilus Macro ignores orthotypographic replacements if any or all of their elements coincide with regular expressions. However, at Daedalus we believe that, due to the convenience offered by this functionality as it is, it was worth to incorporate it in the first version of the add-in.


Stilus Macro makes the task of proofreading with macros in Word more accessible

The “macro editing” philosophy has been applied successfully in the field of professional proofreading for quite some time now. The technique is actually beneficial in terms of precision, consistency and productivity, but was relegated to professionals who had a certain computing expertise. The new add-in of Stilus, Stilus Macro for Word, makes this way of proofreading in Spanish more accessible.

In the last years there has been a growing demand for training in macro editing. The pioneers Jack Lyon, founder of, and Paul Beverley, author of the free manual Computer Tools for Editors, already have hundreds of professional supporters and customers in the American publishing industry. The technique is actually advantageous in terms of precision, textual consistency and productivity. The problem is that many proofreaders who have attended courses confess having failed to implement and exploit the technique in their daily work. The new add-in of Stilus has been developed to make this type of proofreading accessible (at least in its most basic conception, for now).

Inspired by a philosophy similar to the one supporting FRedit, the simplest and most popular macro by Beverley, Stilus Macro for Word searches and replaces hundreds of thousands of spelling, orthotypography and style patterns studied on corpora. In this way, it is able to run almost 200 000 error patterns and proofread a text of about 50 000 words in three minutes, tracking changes. The professional will only have to validate the output (accepting all changes in one clic, except those that considers not appropriate) to obtain a first cleaning of the text.

Stilus Macro is a tool that facilitates the work no only to beginners, but also to the experts in the technique, which saves them the effort of identifying and defining such enormous amount of patterns (as simple as they might be). In addition, this application makes up for Microsoft Word’s scalability limitations, given that it restricts the number of macros that can be stored and executed for each button or action. In this way, you can save workspace in Word to record more specialized or complex macros.

Stilus at the Third International Conference on Spanish Language Proofreading (3CICTE), a turning point for a profession that can no longer ignore the benefits of technology

Under the slogan: “Tus palabras son tu imagen” (“Your words are your image”), the Third International Conference on Spanish Language Proofreading (3CICTE) was held in Madrid on October 24, 25 and 26, promoted by La Unión de Correctores de España (UniCo), a well-known Spanish association of proofreaders. The conference, preceded by the ones celebrated in 2011 in Buenos Aires (Argentina) and in 2012 in Guadalajara (Mexico), exceeded 250 attendees combining professionals, speakers and organizers.

Auditorio de la Casa del LectorThe Casa del Lector (Madrid) was the best venue to discuss the current situation of the editing industry in Spain, Latin America and Europe. As host association, UniCo managed to convert the 3CICTE into a meeting that broadened the horizons way beyond the edition on paper. The specialization in particular areas and the new market niches open to these professionals were the key points. Among other professional challenges, the aim was to give a twist to an industry that can no longer ignore the benefits of technology. The words of the president of the association, Antonio Martín, in the prologue to the No. 0 of UniCo’s magazine were revealing:

“This conference will constitute a turning point, because it is intended for the future: it is not just another meeting we will attend and keep a bunch of good memories; this conference wants to change the direction of our work. I know that this will bother somebody, precisely those who have never wanted to change anything for the benefit of a tradition. For this reason, the conference is not for them, nor tries to upset them. In the last ten years the world has changed and we as proofreaders are proving that we are prepared to address the challenges. […] We as proofreaders are tidying up the house”. Antonio Martín Fernández

It became clear that the demand for proofreading on paper is shrinking, whereas in digital media it is noticeably growing. Additionally, to avoid professional intrusion in the 2.0 environment, professionals should specialize in programming languages and know the platforms for digital edition.

Another problem that freelance proofreaders often have to face is the low profitability of their work; they need to better monetize their proofreading or consulting services to make them more gainful. This might be achieved by raising rates (which today is quite difficult when it comes to get an order) or increasing productivity, in order to cope with the deadlines imposed by different customers without having to reject orders. The automated control of the economic performance and the billable hours, the application of techniques as macro editing or the use of assisted proofreading tools like Stilus were three of the most practical solutions discussed in the moment of providing answers in that regard.


Proofreaders and “automatic proofreaders”

The fact that most newspaper offices get by without human proofreaders and that their presence in publishing houses has been reduced to the minimum is evident. The delicate economic situation and the limited commitment to linguistic quality that paradoxically these media are showing mean that, optimistically, the quality assurance is often limited to the employment of software. Representatives of other European associations of proofreaders (France, Switzerland, and United Kingdom) pointed out that in the English and French publishing processes the figure of the “first proofreader” has already been replaced by increasingly sophisticated technology. This will happen in the Spanish-speaking countries too, it’s just a matter of time.

We endorse the opinion of our European colleagues, and from our position of developers, we would like to emphasize that, while it is true that nowadays intelligent technologies can offer good results in the first cleaning of spelling and style, they are not yet ready to perform tasks of “proofreading” in its broader meaning (it seems rather far-fetched, despite their tremendous potential). This means that they are generally used as a remedy with the aim of reducing costs, giving acceptable but less professional results if compared with what would be achieved in combination with a human supervision. In such context it is not surprising that critics often go against the software rather than the people in charge who choose to employ it as the only way to check the linguistic quality (in the best scenarios).

How to react to this situation? At the 3CICTE things were made clear. For the first time in a forum on professional proofreading in Spanish, the unproductive hostility to technology was left out. So why not consider technology also to our own advantage? Why not think about it as a new market niche? Today, not only professional language technology tools do exist, but they also require language professionals to develop and improve them.


Macro editing and Stilus, among the solutions to increase the professional proofreader’s productivity

The Third International Conference on Spanish Language Proofreading was intended for the future. For this reason, it was essential to provide solutions to one of the most serious problems in the industry: the low profitability of the profession. Proofreading with macros by Paul Beverley, one of the pioneers in the technique, and the presentation of Stilus as a system for assisted proofreading were two of the proposals.

We have already spoken about the capabilities and features of Stilus as spellchecker for text in Spanish on other occasions, but we would like to remark that its use by experts increases the benefits in terms of corporate consistency, stylistic criteria and productivity.

Stilus en el 3CICTEWho uses Stilus does not run the risk of deviating from the academic norms, even choosing an atypical linguistic configuration. However, a critical and trained ability is essential when choosing the combination of parameters and validate correctly the tool’s output. Any automated process of this kind may make x number of mistakes, falling into what in technology is known as “false positive”. If these false warnings go together with a credible argumentation (as in fact happens in Stilus), an inexperienced user could fall into a trap, unintentional but unavoidable from the computational point of view. For this reason, despite being a didactic technology, its employment in the professional field is still more interesting.

Furthermore, knowing in advance the ambitions of the conference, Daedalus decided that the 3CICTE would have been the ideal setting for the official presentation of Stilus Macro for Word (the new add-in of Stilus).

In the last years there has been a growing demand for training in macro editing. The pioneers Jack Lyon, founder of, and Paul Beverley, author of the free manual Computer Tools for Editors, already have hundreds of professional supporters and customers in the American publishing industry. The technique is actually advantageous in terms of precision, textual consistency and productivity. Nevertheless, when asked by Beverley himself about its implementation, many correctors confessed having attended some courses, but could not apply and exploit the technique in their daily work. The add-in of Stilus has been developed to make this type of proofreading accessible, at least in its most basic conception.

P. BeverlyInspired by a philosophy similar to the one behind FRedit, the simplest and most popular macro created by Paul Beverley, Stilus Macro for Word searches and replaces hundreds of thousands of spelling, typography and style patterns in Spanish (as well as personalized replacements) by pressing a button. In this way, it is able to run almost 200 000 error patterns and proofread a text of about 50 000 words in three minutes, tracking changes. The professional will only have to validate the output (accepting all changes, except those considered not appropriate) to obtain a first cleaning of the text.

Stilus Macro para Word

Will the 4CICTE follow the guidelines set in Spain?

relevo 4CICTE

The 4CICTE will be held in Peru. The announcement of the next conference’s location marked the conclusion of the 3CICTE. Sofía Rodríguez, president of Ascot Perú, picked up the baton on behalf of her association to hold the 4CICTE in 2016.


We have to wait and see if the innovative approaches promoted by UniCo will be supported in the following conferences and, most importantly, if they will be applied by the community of proofreaders. It seems clear that to address the challenges of this profession a change of direction is advisable, which should embrace technology among other aspects.


How to analyze content related to healthcare in social networks?

We all know that social networks are increasingly more present in our lives and the field of Healthcare is no exception. It is interesting to see how users talk about the use they make of certain drugs; in this example a user mentions his experience with Lorazepam:


In this one, someone is looking for a pharmacy where Alprazolam is available:


In this other example scientific information about a health problem is provided, in particular on the relationship between benzodiazepine and Alzheimer’s:


Other users use the network for activities that might be illegal, such as:


In addition, we should not lose sight of the websites specialized in healthcare issues that are emerging in recent times, bringing patients into contact with doctors through the network. In this type of websites users can provide more information to define their problems; the more precise it is, the more useful will be the advice received from healthcare professionals.

The current technologies for information extraction have limitations in understanding the specific language of the sector: diseases, symptoms, treatments, active ingredients, drugs… That is why Daedalus’ semantic tools incorporate terminology based on standard ontologies such as SNOMED, MedDRA and encodings like ICD-9/ICD-10, which enable to process medical unstructured information with high accuracy, in large volumes, at high speed and low costs.

It is certain that both pharmaceutical companies and medical centers, public institutions related to healthcare and, why not, insurance companies may find useful information on how certain drugs are taken, how often people talk about certain illnesses, if there is any side effect…  The capabilities to analyze matters related to healthcare in social networks and other types of content are quite extended, from pharmacovigilance applications to the coding of medical records.

At Daedalus we develop solutions for the Healthcare field that allow to:

  • Understand the conversation in social media (forums, blogs, networks) in order to analyze the reputation of a hospital / pharmaceutical laboratory and its brands, discover trends and side effects, detect emerging crises… and in general to listen to the Voice of the Patient.
  • Structure, code, and automatically tag medical records and other clinical documents for drawing up statistics, make segmentation, detect trends and optimize health management.

New automatic phonetic and phonological transcriber for Spanish developed by Daedalus

The phonetic and phonological transcriber for Spanish developed by Daedalus is now available in our showroom page.

Tools of this type currently offer different applications:

  • First of all, they constitute the first module of all voice synthesis systems. Indeed, current systems called TTS (Text to Speech) necessarily include a first component that performs the conversion of text to its phonetic representation. Through other components, the phonetic transcriptions are replaced by acoustic material consisting of the physical realization of the sounds. Voice recognition systems work in a similar, but opposite sense.
  • Transcribers are also an essential tool in philology. First, for training purposes, i.e. for philology students who must learn to transcribe texts correctly. Then, as helpful tools in the professional field: philologists do have to transcribe “by ear” to give account of the phenomena not covered by the standard description of the phonological and phonetic levels of the language, but these tools can also serve as starting point for the ‘standard’ initial transcription, which will be subsequently revised and corrected. In this sense, transcribers are especially useful in the context of Dialectology and its related branches, as well as in the science known as Historical Phonology. Current transcribers tend to adjust to a standard pronunciation, as stated above, and they are conceived only from a synchronic point of view; however, they are often configurable through different variables that allow different levels of transcription. In this regard, our transcriber enables a certain (small, actually) synchronic parameterization, being it phonetic, phonological or etymological.
  • Transcribers are also particularly useful for learning foreign languages, because they provide the notation (with a higher or lower level of detail) of the actual pronunciation of texts. In fact, it is frequent that foreigner-oriented dictionaries include a phonetic transcription (usually encoded in a very simplified IPA) of each one of the words.

Our phonetic and phonological transcriber offers the phonetic transcription of text in Spanish in various well-known phonetic alphabets:

IPA: International Phonetic Alphabet.

RFE: Alphabet of the Revista de Filología Española, the phonetic alphabet used in Spain in the field of philology until recently. Currently it competes in this area with the IPA.

Computing-Oriented Phonetic Alphabets: Phonetic alphabets based on ASCII characters, allowing the further processing of text using computer programs.

  • DEFSFE: Alphabet proposed by Antonio Ríos, from the Universidad Autónoma de Barcelona (see alphabet).
  • SAMPA (Speech Assessment Methods Phonetic Alphabet): Phonetic alphabet based on the IPA. We take into account also the SAMPROSA specifications (SAM Prosodic Alphabet).
  • SALA (SpeechDat across Latin America): SAMPA adaptation for the transcription of Latin American Spanish.
  • WORLDBET: Alphabet based on IPA with additional symbols.
  • VIA: Alphabet of ViaVoice, commercial application for voice recognition.

For more information about the different alphabets, you can access the following links:

Our tool also offers a phonological transcription of text, which is of little use for automated TTS systems as the ones mentioned above, but it is certainly useful in the field of philology, especially in the context of the classical philological currents known as functionalism and structuralism, although on a purely didactic level.

In terms of logical architecture, the transcriber consists of the following modules:

  • Preprocessing module: embeds different sub-modules that preprocess the input text eliminating unnecessary characters, identifying breaks, expanding numbers, etc., thus generating a standard text.
  • Syllabification module: applies an algorithm that breaks down the terms of the text into syllables.
  • Accent module: places the phonetic accents on the terms of the text.
  • Phonetic transcriber module: performs the actual phonetic transcription applying a set of rules that, depending on the phonetic and pragmatic context (the latter selected through the options of the interface, as well as some phonetic neutralizations), allow the identification of the correct allophone.
  • Phonological transcriber module: performs the phonological transcription.

Our phonetic transcriber also offers different transcription options:

  • Preprocessing: process or not abbreviations, symbols, numbers, Roman numerals.
  • Syllabification: syllabify or not terms, phonetically or phonologically.
  • Transcription:
    • Vocalism: mark the nasal allophones of vowels.
    • Consonants: transcribe considering the linguistic phenomenon known as yeísmo or without it; phonetic neutralization of /B/ /D/ /G/.

Other options: mark synalephas, place accents on vowels (for the RFE, basically).

Now you can benefit from a tool with the potential to increase your CTR by 30% and to improve your organic ranking in search engines… and that your competitors are not using. If you are interested, keep reading.

In a previous post we discussed how every organization with online presence will need to make apparent the meaning of their web content, as search engines are evolving to a more semantic approach. In this new scenario, semantic markup technology can help to make content more relevant to search engines and more attractive to users.

rich snippetThe markup enables website owners to add to their pages an HTML code that allows search engines to identify specific elements of those pages and, in some cases, present them in search results in the form of rich snippets.

The standard, a result of the collaboration between Google, Bing and Yahoo, provides a set of vocabularies used for the markup of structured data in HTML documents, so that they can be understood by search engines, aggregators and social media. With the support of the industry’s leaders, represents the semantic markup’s coming of age.

Improve your inbound marketing with semantic tagging

Online marketing is evolving and, along with traditional paid media (advertising) and owned ones (website, blog), the new earned media (organic search, social conversations) are critical. More than an outbound, interruption-based marketing, now it is all about inbound marketing, where the key is to be findable and to appear in those conversations in which users talk about their needs and the products they use.

schemaorg_keywords-schema-integrations-2014_usThis requires not only to create and publish optimized content about those topics, but also to promote them and make them more findable in search engines and shareable through social media. Semantically tagging our content can help us in many ways:

  • The markup allows increasing the relevance of our content for certain search queries. Optimizing content and tagging it explicitly with specific entities makes easier for all kinds of search engines to identify its meaning so that it will appear in the results of more queries related to those entities. This does not mean that automatically provides a better ranking in the results pages; however, some analyses have shown a certain degree of correlation. This study by Searchmetrics found out that pages incorporating rank better by an average of four positions compared to web pages that do not integrate it. Although (paying attention to the spokesmen of search engines) this is not a causal effect, it appears to exist some indirect relationship between semantic markup and a better ranking.
  • Tagging pages with metadata that identify them as information about products, movies, applications, recipes, etc. makes them more likely to appear in the vertical areas of general search engines and in specialized engines.
  • When a semantically tagged content is shown in the search results, it appears in the form of rich snippets or similar, which include specific data, access to multimedia elements and even the possibility of browsing information and refining the search. This increases the visibility, appeal and “clickability” of that outcome, which results into more visits and social sharing opportunities for that content. In some cases an increase in clicks from organic search has been reported, up to 30%. Above all, allows optimizing the CTR of a link.
  • In the same vein, when content is shared in social media or added by an automated tool, links generated algorithmically by these systems are more appealing and informative and can increase traffic to that content.

However, very few sites are currently using

Despite its numerous benefits, the adoption of semantic markup is still very low. According to the report by Searchmetrics, only 0.3% of the analyzed domains incorporated integrations. And that is more striking when compared with its potential impact: according to the same study, Google enriches search results with information derived from markup in more than 36% of the queries.

Additionally, in only 34.4% of queries the search engine returned results with neither integrations nor any other structured data involved. It is clear that is more popular among the results of search engines than among webmasters. markup tools

This low penetration might be because the detailed semantic markup is mainly a manual process. There are several tools that can be of help in the markup work

Google Structured Data Markup Helpe

The communities of the most popular content management systems have even developed plug-ins for this task. But these tools generate the markup code only once the element to which it refers has been identified (more or less manually).

The following Google tools serve to validate the result of a markup and to get an idea about the volume of structured data that the search engine can see in our pages.

To ease the use of structured data markup and rich snippets, at Daedalus we are developing semantic publishing technologies to automatically tag content incorporating information about all kinds of elements of meaning that appear in it: people, organizations, brands, dates, topics, concepts…

In particular, our product Textalytics - Meaning as a Service includes a specific semantic publishing API featuring markup, as you can check using this demonstrator. Markup Demonstrator

It is very likely that your competition is not using It is time to act, integrate it immediately and make the most of its enormous advantages.

If you need more information, don’t hesitate to contact us.


Semantic markup, rich snippets and expose the meaning of your content

As search engines are evolving to a more semantic approach, every organization with online presence will need to make apparent the meaning of their web content. Semantic markup technology can help to make content more relevant to search engines and more attractive to users.

Search engines: from keywords to entities

Search BarcelonaSearch engines are evolving. Providing users with a series of results that contain a certain string of characters (e.g. “barcelona”) is no longer enough. The objective now is to provide them information related to a certain “thing” or meaning (the city of Barcelona, the Barcelona F. C.) or to respond to the user’s intent (organizing a trip to Barcelona). In the future, search engines must be able to offer precise answers to specific questions (e.g. How many inhabitants does Barcelona’s province have?) without having to navigate in a results page.

This transition towards “things, not strings” and entities rather than keywords collides with search engines’ difficulty in interpreting the meaning of an online content: HTML is a language designed to describe how a web page should be presented, not to express its meaning. Even in pages which aim is to provide a set of structured data —typically residing in a company’s internal databases— the nature of HTML hides that data from the search, social and aggregation ecosystem.

Marketers and, in general, all persons wanting their online contents to be spread and findable, need to enrich these contents with metadata to specify to search engines and other applications what do they mean and not just what it is said on them. In other words, they need technologies that enable them to semantically tag their content.

Semantic markup and rich snippets

drooling dog snippet metadata circledThe major search engines have been experimenting with semantic tagging and structured data applications for years (in fact, few days ago marked the fifth anniversary of the presentation of Google’s rich snippets). Essentially, with these technologies the owners of web sites can add to their pages an HTML markup that enables search engines to identify specific elements of those pages and, in some cases, present them in search results.

rich snippetOver the years, it has been possible to tag HTML content with different syntaxes (microformats, HTML5 microdata, RDFa) and various vocabularies to provide information about products, people, events… This information has usually resulted in rich snippets and similar formats, which are more informative than a simple blue link with a more or less representative text, and more appealing to users.

The advantages for both search engines and content generators are obvious. For search engines, providing their users with rich and more relevant search results is a step forward in their objective of facilitating the access to information.

For content generators it is a chance to appear among the search results, stand out within the results page and get more visits and social shares (we will analyze this in detail in a future post).

However, in order for this technology to gain acceptance, the different agents involved are required to agree upon the vocabulary that is going to be used to identify each type of entity and its properties: a universal language for semantic tagging is essential. the lingua franca of semantic markup

Result of the collaboration between Google, Bing and Yahoo (then joined by Yandex), provides a set of vocabularies used for the markup of structured data in HTML documents, so that they can be understood by search engines, aggregators and social media. With the support of the industry’s leaders, can represent the semantic markup’s coming of age.

Person schema.orgPerson allows identifying people, organizations, places, products, reviews, works (books, movies, recipes…), and its vocabulary is continuously expanding. In addition, it supports different syntaxes: microdata, microformats, RDFa and, recently, JSON-LD. With the creation of a common markup scheme, major search engines aim at improving the understanding of the information contained in web pages and its representation in the results pages. vocabulary enables not only to describe, but also disambiguate and associate elements with their meaning.’s sameAs property permits to associate a particular instance of a “thing” (person, organization, brand…) that appears on a web page with a reference URI indicating unambiguously the identity of the element, for example, a page from Wikipedia, Freebase or an official web site.

Both search engines and all kinds of social media and aggregators depend more and more on this type of semantic references within the web pages (most notably Google, since the launch of its Knowledge Graph and the Hummingbird update of its algorithm).

Google itself has announced that will continue supporting other vocabularies and syntaxes for the markup of structured data, but it will favor the use of This strong support by the largest players will prompt more and more content providers to adopt, which in turn will become the reference vocabulary for the expression of structured data.

The case of the media industry: rNews

Probably one of the sectors in which the need for semantic markup is most urgent is the one of mass media. Online media need to make their content more findable and relatable and improve the targeting of contextual ads that constitute their main revenue stream. For that purpose, the IPTC (a consortium of leading agencies, media companies and providers in that sector) has developed rNews.

It is a standard that defines the use of semantic markup to annotate HTML documents with news-specific metadata, both structural (title, medium, author, date) and content related (people, organizations, concepts, locations).

Being and rNews two projects that were born almost at the same time, and in order to avoid the proliferation of standards, included support to rNews virtually from the beginning for news tagging. Currently, leading media as the New York Times are tagging all articles using rNews on

The semantic tagging of content offers enormous possibilities in the field of SEO and marketing in general, but is not exempt from difficulties: how to tag the thousands —or millions— of existing pages, for example, in a medium?

We will try to cover these issues in upcoming posts.

Text Analytics market 2014: Seth Grimes interviews Daedalus’ CEO

Seth GrimesSeth Grimes is one of the leading industry analysts covering the text analytics and semantic technology market. During the past month he published a series of interviews with relevant figures in this industry, a material to be included in his forthcoming report Text Analytics 2014: User Perspectives on Solutions and Providers, which will be published before summer (for more info, stay tuned to this blog).

Our CEO, José Carlos González, was one of the selected executives. In the interview, Seth and José Carlos discuss recent changes in the industry, customer cases, features requested by the market, etc.

This is the beginning of the interview:

Text Analytics 2014: Q&A with José Carlos González, Daedalus

How has the market for text technologies, and text-analytics-reliant solutions, changed in the past year? Any surprises?

Over the past year, there has been a lot of buzz around text analytics. We have seen a sharp increase of interest around the topic, along with some caution and distrust by people from markets where a combination of sampling and manual processing has been the rule until now.

We have perceived two (not surprising) phenomena:

  • The blooming of companies addressing specific vertical markets incorporating basic text processing capabilities. Most of the time, text analytics functionality is achieved through integration of general-purpose open source, simple pattern matching or pure statistical solutions. Such solutions can be built rapidly from large resources (corpora) available for free, which has lowered entry barriers for newcomers at the cost of poor adaptation to the task and low accuracy.
  • Providers have strengthened the effort carried out to create or educate the markets. For instance, non-negligible investments have been made to make the technology easily integrable and demonstrable. However, the accuracy of text analytics tools depends to some extent on the typology of text (language, genre, source) and on the purpose and interpretation of the client. General-purpose and do-it-yourself approaches may lead to deceive user expectations due to wrong parametrization or goals outside the scope of particular tools.


Interested? Read the rest of the interview -featuring customer cases, our “Meaning as a Service” product Textalytics and coming functionalities of our offering- on Seth Grimes’ blog.


Mining of useful information in social media: Daedalus at Big Data Week 2014

In the past few days we took part in Big Data Week 2014 in Madrid. Big Data Week is a network of events that take place in different cities of the world and is one of the most important global platforms focused on the social, political and technological impact of Big Data.

Big Data Week 2014 Madrid

These events bring together a global community of data scientists, technology providers and business users, and provide an open and self-organized environment to educate, inform, and inspire in the field of exploitation of massive data. In this year’s edition in Madrid, the Francisco de Vitoria University assumed through the CEIEC the role of City Partner and led the event.

Earthquakes, Buying Signals and… #WTF

With the title “Earthquakes, Buying Signals and… #WTF: Mining of Useful Information in Social Media”, our presentation illustrated how to use semantic technologies to automatically extract valuable information from social media scenarios, where Volume, Variety and Velocity requirements are extreme.

The presentation began by putting social media analysis in a context of unstructured content explosion and Big Data, and introducing semantic processing technologies.  Then, we presented some application scenarios we are developing in our R&D and commercial projects.

These applications basically focus on the areas of Voice of the Customer (VoC) / Customer Insights and Voice of the Citizen:

  • Customer journey and buying signals
  • Brand personality and perception maps
  • Corporate reputation
  • Smart cities and citizen sensor
  • Early detection and monitoring of emergencies

Finally, our service Textalytics “Meaning as a Service” was introduced as the easiest and most productive way to introduce semantic processing into any application, and thus extract useful information from social media and other unstructured content. (Remember that Textalytics can be used for free to process up to 500,000 words/month.)

In addition, in the event’s exhibition area we presented some demos focused on the above mentioned applications.

