Stilus Macro for Word: the new add-in by Daedalus that corrects “safe errors” at the push of a button

Recently, the company Daedalus has released a new version of Stilus, its tool to proofread spelling, grammar and style in Spanish. Innovations include Stilus Macro for Word, a new add-in that enables to autocorrect hundreds of thousands of context-independent errors at the push of a button.

 

Stilus Macro for Word, a “precise and effective” automatic proofreader

Stilus’ contextual and semantic technology for proofreading texts in Spanish has enabled to isolate hundreds of thousands of context-independent writing mistakes. The automated correction of this type of “safe errors” not only speeds up the first phases of the proofreading of orthotypography and style, but also carries it out with a very high precision. This is the reason why Stilus Macro for Word is the first add-in of Stilus that not only checks a text, but also performs actions directly on it.

Stilus Macro for Word

 

It runs thousands of error patterns without making the user lose control of the proofreading

Once you press the button to start the proofreading, Stilus Macro downloads and verifies on the document almost 200 000 error patterns of spelling and orthotypography. While the mandatory corrections are applied directly in the text, the recommended ones are inserted as comments and properly supported by references. In addition, all these actions are performed activating Microsoft Word’s Track Changes functionality, so that the user does not lose at any time the control of the proofreading process.

Settings Stilus MacroAs we will see, Stilus Macro enables to add “personal” search & replace instructions that take priority over the default ones during the process. This, together with its linguisitc options, allows to perform a completely personalized, safe and quick first cleaning of the text.

 

It helps to optimize the working time in professional editing and proofreading environments

Thanks to an intelligent language processing, Stilus Macro verifies on the text tens of thousands error patterns at high speed, and applies the corresponding substitutions by simply pushing a button. The immediate correction of hundreds of errors complements the advanced capabilities of Stilus for Word. The combined use of both add-ins, first Stilus Macro and then Stilus for Word (to detect possible contextual and grammatical errors) reduces considerably the time usually dedicated to the first automatic cleanings of text. Authors, proofreaders, translators, editors… optimize their working time and consequently increase their productivity.

 

FIRST STEP: Stilus Macro for Word

  • Set the proofreading parameters according to your or your client’s preferences.
  • Run the proofreading and wait for the result to appear with Word’s Track Changes activated.
  • Analyze and validate the output. Reject the changes that you do not approve and include the ones you consider appropriate among those proposed as comments.
  • Take advantage of the first processes to adjust and complete the coverage with your personal macros. You will achieve higher productivity on further occasions.
  • Finally, select the option “Accept All Changes in Document” in Microsoft Word. You will obtain a first cleaning of the text.

SECOND STEP: Stilus for Word

  • Set the proofreading parameters.
  • Run the proofreading.
  • Correct the remaining contextual and grammatical errors.

 

But you can take advantage of the tool a lot more, it’s in your hands! The customization of the personal macros dictionary (for example, turning the changes proposed by Stilus Macro as comments into direct ones) enables to adjusts the default output, which can progressively reduce the time dedicated to the manual validation. The more defined the process is, the more productive the use of the application will be.

Stilus Macro for Word proofreads 200 pages in less than 5 minutes!

 

Notes on the customization of the personal macros dictionary

One of the essential innovations of Stilus Macro, if compared to other modalities of Stilus, is that it enables to incorporate personal substitutions which have priority over the default ones in the autocorrection process. This functionality, apart from widening the coverage of the automatic proofreader, can also be used to “nullify” (replacing a form with the same) or “redirect” the software’s operations in a personalized way (setting a different replacement with respect to the one assigned by default).

At this point, the experts in the macro editing technique may consider that manipulating these aspects at leisure might be dangerous, taking into account the troubles implied by setting replacements that ignore the limitations defined with regular expressions in the context “word” (and not simple “strings” which could match “parts” of the word, for example). It seems logic, but they should not worry; in the development of the tool the issue has been faced intelligently in such a way that any proposed replacement is considered a ‘textual unit’, so that the user does not need to worry about unwanted replacements. Nevertheless, two considerations must be taken into account when using this first version of the add-in: personal replacements are “case sensitive” (i.e. they will be formally literal) and, secondly, at the moment Stilus Macro ignores orthotypographic replacements if any or all of their elements coincide with regular expressions. However, at Daedalus we believe that, due to the convenience offered by this functionality as it is, it was worth to incorporate it in the first version of the add-in.

 

Stilus Macro makes the task of proofreading with macros in Word more accessible

The “macro editing” philosophy has been applied successfully in the field of professional proofreading for quite some time now. The technique is actually beneficial in terms of precision, consistency and productivity, but was relegated to professionals who had a certain computing expertise. The new add-in of Stilus, Stilus Macro for Word, makes this way of proofreading in Spanish more accessible.

In the last years there has been a growing demand for training in macro editing. The pioneers Jack Lyon, founder of www.editorium.com, and Paul Beverley, author of the free manual Computer Tools for Editors, already have hundreds of professional supporters and customers in the American publishing industry. The technique is actually advantageous in terms of precision, textual consistency and productivity. The problem is that many proofreaders who have attended courses confess having failed to implement and exploit the technique in their daily work. The new add-in of Stilus has been developed to make this type of proofreading accessible (at least in its most basic conception, for now).

Inspired by a philosophy similar to the one supporting FRedit, the simplest and most popular macro by Beverley, Stilus Macro for Word searches and replaces hundreds of thousands of spelling, orthotypography and style patterns studied on corpora. In this way, it is able to run almost 200 000 error patterns and proofread a text of about 50 000 words in three minutes, tracking changes. The professional will only have to validate the output (accepting all changes in one clic, except those that considers not appropriate) to obtain a first cleaning of the text.

Stilus Macro is a tool that facilitates the work no only to beginners, but also to the experts in the technique, which saves them the effort of identifying and defining such enormous amount of patterns (as simple as they might be). In addition, this application makes up for Microsoft Word’s scalability limitations, given that it restricts the number of macros that can be stored and executed for each button or action. In this way, you can save workspace in Word to record more specialized or complex macros.

Here you have a demo to watch the add-in in action:

 

TwitterLinkedInGoogle+FacebookEmail

Stilus at the Third International Conference on Spanish Language Proofreading (3CICTE), a turning point for a profession that can no longer ignore the benefits of technology

Under the slogan: “Tus palabras son tu imagen” (“Your words are your image”), the Third International Conference on Spanish Language Proofreading (3CICTE) was held in Madrid on October 24, 25 and 26, promoted by La Unión de Correctores de España (UniCo), a well-known Spanish association of proofreaders. The conference, preceded by the ones celebrated in 2011 in Buenos Aires (Argentina) and in 2012 in Guadalajara (Mexico), exceeded 250 attendees combining professionals, speakers and organizers.

Auditorio de la Casa del LectorThe Casa del Lector (Madrid) was the best venue to discuss the current situation of the editing industry in Spain, Latin America and Europe. As host association, UniCo managed to convert the 3CICTE into a meeting that broadened the horizons way beyond the edition on paper. The specialization in particular areas and the new market niches open to these professionals were the key points. Among other professional challenges, the aim was to give a twist to an industry that can no longer ignore the benefits of technology. The words of the president of the association, Antonio Martín, in the prologue to the No. 0 of UniCo’s magazine were revealing:

“This conference will constitute a turning point, because it is intended for the future: it is not just another meeting we will attend and keep a bunch of good memories; this conference wants to change the direction of our work. I know that this will bother somebody, precisely those who have never wanted to change anything for the benefit of a tradition. For this reason, the conference is not for them, nor tries to upset them. In the last ten years the world has changed and we as proofreaders are proving that we are prepared to address the challenges. […] We as proofreaders are tidying up the house”. Antonio Martín Fernández

It became clear that the demand for proofreading on paper is shrinking, whereas in digital media it is noticeably growing. Additionally, to avoid professional intrusion in the 2.0 environment, professionals should specialize in programming languages and know the platforms for digital edition.

Another problem that freelance proofreaders often have to face is the low profitability of their work; they need to better monetize their proofreading or consulting services to make them more gainful. This might be achieved by raising rates (which today is quite difficult when it comes to get an order) or increasing productivity, in order to cope with the deadlines imposed by different customers without having to reject orders. The automated control of the economic performance and the billable hours, the application of techniques as macro editing or the use of assisted proofreading tools like Stilus were three of the most practical solutions discussed in the moment of providing answers in that regard.

 

Proofreaders and “automatic proofreaders”

The fact that most newspaper offices get by without human proofreaders and that their presence in publishing houses has been reduced to the minimum is evident. The delicate economic situation and the limited commitment to linguistic quality that paradoxically these media are showing mean that, optimistically, the quality assurance is often limited to the employment of software. Representatives of other European associations of proofreaders (France, Switzerland, and United Kingdom) pointed out that in the English and French publishing processes the figure of the “first proofreader” has already been replaced by increasingly sophisticated technology. This will happen in the Spanish-speaking countries too, it’s just a matter of time.

We endorse the opinion of our European colleagues, and from our position of developers, we would like to emphasize that, while it is true that nowadays intelligent technologies can offer good results in the first cleaning of spelling and style, they are not yet ready to perform tasks of “proofreading” in its broader meaning (it seems rather far-fetched, despite their tremendous potential). This means that they are generally used as a remedy with the aim of reducing costs, giving acceptable but less professional results if compared with what would be achieved in combination with a human supervision. In such context it is not surprising that critics often go against the software rather than the people in charge who choose to employ it as the only way to check the linguistic quality (in the best scenarios).

How to react to this situation? At the 3CICTE things were made clear. For the first time in a forum on professional proofreading in Spanish, the unproductive hostility to technology was left out. So why not consider technology also to our own advantage? Why not think about it as a new market niche? Today, not only professional language technology tools do exist, but they also require language professionals to develop and improve them.

 

Macro editing and Stilus, among the solutions to increase the professional proofreader’s productivity

The Third International Conference on Spanish Language Proofreading was intended for the future. For this reason, it was essential to provide solutions to one of the most serious problems in the industry: the low profitability of the profession. Proofreading with macros by Paul Beverley, one of the pioneers in the technique, and the presentation of Stilus as a system for assisted proofreading were two of the proposals.

We have already spoken about the capabilities and features of Stilus as spellchecker for text in Spanish on other occasions, but we would like to remark that its use by experts increases the benefits in terms of corporate consistency, stylistic criteria and productivity.

Stilus en el 3CICTEWho uses Stilus does not run the risk of deviating from the academic norms, even choosing an atypical linguistic configuration. However, a critical and trained ability is essential when choosing the combination of parameters and validate correctly the tool’s output. Any automated process of this kind may make x number of mistakes, falling into what in technology is known as “false positive”. If these false warnings go together with a credible argumentation (as in fact happens in Stilus), an inexperienced user could fall into a trap, unintentional but unavoidable from the computational point of view. For this reason, despite being a didactic technology, its employment in the professional field is still more interesting.

Furthermore, knowing in advance the ambitions of the conference, Daedalus decided that the 3CICTE would have been the ideal setting for the official presentation of Stilus Macro for Word (the new add-in of Stilus).

In the last years there has been a growing demand for training in macro editing. The pioneers Jack Lyon, founder of www.editorium.com, and Paul Beverley, author of the free manual Computer Tools for Editors, already have hundreds of professional supporters and customers in the American publishing industry. The technique is actually advantageous in terms of precision, textual consistency and productivity. Nevertheless, when asked by Beverley himself about its implementation, many correctors confessed having attended some courses, but could not apply and exploit the technique in their daily work. The add-in of Stilus has been developed to make this type of proofreading accessible, at least in its most basic conception.

P. BeverlyInspired by a philosophy similar to the one behind FRedit, the simplest and most popular macro created by Paul Beverley, Stilus Macro for Word searches and replaces hundreds of thousands of spelling, typography and style patterns in Spanish (as well as personalized replacements) by pressing a button. In this way, it is able to run almost 200 000 error patterns and proofread a text of about 50 000 words in three minutes, tracking changes. The professional will only have to validate the output (accepting all changes, except those considered not appropriate) to obtain a first cleaning of the text.

Stilus Macro para Word

Here you can watch the video featuring some demos of Stilus’ modalities that was shown during the Third International Conference on Spanish Language Proofreading.

 

Will the 4CICTE follow the guidelines set in Spain?

relevo 4CICTE

The 4CICTE will be held in Peru. The announcement of the next conference’s location marked the conclusion of the 3CICTE. Sofía Rodríguez, president of Ascot Perú, picked up the baton on behalf of her association to hold the 4CICTE in 2016.

 

We have to wait and see if the innovative approaches promoted by UniCo will be supported in the following conferences and, most importantly, if they will be applied by the community of proofreaders. It seems clear that to address the challenges of this profession a change of direction is advisable, which should embrace technology among other aspects.

TwitterLinkedInGoogle+FacebookEmail

How to analyze content related to healthcare in social networks?

We all know that social networks are increasingly more present in our lives and the field of Healthcare is no exception. It is interesting to see how users talk about the use they make of certain drugs; in this example a user mentions his experience with Lorazepam:

EjemploConsejoUsuario-peque

In this one, someone is looking for a pharmacy where Alprazolam is available:

EjemploOtrosUsos

In this other example scientific information about a health problem is provided, in particular on the relationship between benzodiazepine and Alzheimer’s:

EjemploInfoCientifica

Other users use the network for activities that might be illegal, such as:

EjemploUsoIlegal

In addition, we should not lose sight of the websites specialized in healthcare issues that are emerging in recent times, bringing patients into contact with doctors through the network. In this type of websites users can provide more information to define their problems; the more precise it is, the more useful will be the advice received from healthcare professionals.

The current technologies for information extraction have limitations in understanding the specific language of the sector: diseases, symptoms, treatments, active ingredients, drugs… That is why Daedalus’ semantic tools incorporate terminology based on standard ontologies such as SNOMED, MedDRA and encodings like ICD-9/ICD-10, which enable to process medical unstructured information with high accuracy, in large volumes, at high speed and low costs.

It is certain that both pharmaceutical companies and medical centers, public institutions related to healthcare and, why not, insurance companies may find useful information on how certain drugs are taken, how often people talk about certain illnesses, if there is any side effect…  The capabilities to analyze matters related to healthcare in social networks and other types of content are quite extended, from pharmacovigilance applications to the coding of medical records.

At Daedalus we develop solutions for the Healthcare field that allow to:

  • Understand the conversation in social media (forums, blogs, networks) in order to analyze the reputation of a hospital / pharmaceutical laboratory and its brands, discover trends and side effects, detect emerging crises… and in general to listen to the Voice of the Patient.
  • Structure, code, and automatically tag medical records and other clinical documents for drawing up statistics, make segmentation, detect trends and optimize health management.

If you wish to know more about these solutions and the technology behind them, take a look to this recorded webinar (in Spanish).  

 

 

TwitterLinkedInGoogle+FacebookEmail

New automatic phonetic and phonological transcriber for Spanish developed by Daedalus

The phonetic and phonological transcriber for Spanish developed by Daedalus is now available in our showroom page.

Tools of this type currently offer different applications:

  • First of all, they constitute the first module of all voice synthesis systems. Indeed, current systems called TTS (Text to Speech) necessarily include a first component that performs the conversion of text to its phonetic representation. Through other components, the phonetic transcriptions are replaced by acoustic material consisting of the physical realization of the sounds. Voice recognition systems work in a similar, but opposite sense.
  • Transcribers are also an essential tool in philology. First, for training purposes, i.e. for philology students who must learn to transcribe texts correctly. Then, as helpful tools in the professional field: philologists do have to transcribe “by ear” to give account of the phenomena not covered by the standard description of the phonological and phonetic levels of the language, but these tools can also serve as starting point for the ‘standard’ initial transcription, which will be subsequently revised and corrected. In this sense, transcribers are especially useful in the context of Dialectology and its related branches, as well as in the science known as Historical Phonology. Current transcribers tend to adjust to a standard pronunciation, as stated above, and they are conceived only from a synchronic point of view; however, they are often configurable through different variables that allow different levels of transcription. In this regard, our transcriber enables a certain (small, actually) synchronic parameterization, being it phonetic, phonological or etymological.
  • Transcribers are also particularly useful for learning foreign languages, because they provide the notation (with a higher or lower level of detail) of the actual pronunciation of texts. In fact, it is frequent that foreigner-oriented dictionaries include a phonetic transcription (usually encoded in a very simplified IPA) of each one of the words.

Our phonetic and phonological transcriber offers the phonetic transcription of text in Spanish in various well-known phonetic alphabets:

IPA: International Phonetic Alphabet.

RFE: Alphabet of the Revista de Filología Española, the phonetic alphabet used in Spain in the field of philology until recently. Currently it competes in this area with the IPA.

Computing-Oriented Phonetic Alphabets: Phonetic alphabets based on ASCII characters, allowing the further processing of text using computer programs.

  • DEFSFE: Alphabet proposed by Antonio Ríos, from the Universidad Autónoma de Barcelona (see alphabet).
  • SAMPA (Speech Assessment Methods Phonetic Alphabet): Phonetic alphabet based on the IPA. We take into account also the SAMPROSA specifications (SAM Prosodic Alphabet).
  • SALA (SpeechDat across Latin America): SAMPA adaptation for the transcription of Latin American Spanish.
  • WORLDBET: Alphabet based on IPA with additional symbols.
  • VIA: Alphabet of ViaVoice, commercial application for voice recognition.

For more information about the different alphabets, you can access the following links:

Our tool also offers a phonological transcription of text, which is of little use for automated TTS systems as the ones mentioned above, but it is certainly useful in the field of philology, especially in the context of the classical philological currents known as functionalism and structuralism, although on a purely didactic level.

In terms of logical architecture, the transcriber consists of the following modules:

  • Preprocessing module: embeds different sub-modules that preprocess the input text eliminating unnecessary characters, identifying breaks, expanding numbers, etc., thus generating a standard text.
  • Syllabification module: applies an algorithm that breaks down the terms of the text into syllables.
  • Accent module: places the phonetic accents on the terms of the text.
  • Phonetic transcriber module: performs the actual phonetic transcription applying a set of rules that, depending on the phonetic and pragmatic context (the latter selected through the options of the interface, as well as some phonetic neutralizations), allow the identification of the correct allophone.
  • Phonological transcriber module: performs the phonological transcription.

Our phonetic transcriber also offers different transcription options:

  • Preprocessing: process or not abbreviations, symbols, numbers, Roman numerals.
  • Syllabification: syllabify or not terms, phonetically or phonologically.
  • Transcription:
    • Vocalism: mark the nasal allophones of vowels.
    • Consonants: transcribe considering the linguistic phenomenon known as yeísmo or without it; phonetic neutralization of /B/ /D/ /G/.

Other options: mark synalephas, place accents on vowels (for the RFE, basically).

TwitterLinkedInGoogle+FacebookEmail

Schema.org semantic markup: the (very) secret weapon of your online marketing

Now you can benefit from a tool with the potential to increase your CTR by 30% and to improve your organic ranking in search engines… and that your competitors are not using. If you are interested, keep reading.

In a previous post we discussed how every organization with online presence will need to make apparent the meaning of their web content, as search engines are evolving to a more semantic approach. In this new scenario, semantic markup technology can help to make content more relevant to search engines and more attractive to users.

rich snippetThe markup enables website owners to add to their pages an HTML code that allows search engines to identify specific elements of those pages and, in some cases, present them in search results in the form of rich snippets.

The schema.org standard, a result of the collaboration between Google, Bing and Yahoo, provides a set of vocabularies used for the markup of structured data in HTML documents, so that they can be understood by search engines, aggregators and social media. With the support of the industry’s leaders, schema.org represents the semantic markup’s coming of age.

Improve your inbound marketing with semantic tagging

Online marketing is evolving and, along with traditional paid media (advertising) and owned ones (website, blog), the new earned media (organic search, social conversations) are critical. More than an outbound, interruption-based marketing, now it is all about inbound marketing, where the key is to be findable and to appear in those conversations in which users talk about their needs and the products they use.

schemaorg_keywords-schema-integrations-2014_usThis requires not only to create and publish optimized content about those topics, but also to promote them and make them more findable in search engines and shareable through social media. Semantically tagging our content can help us in many ways:

  • The markup allows increasing the relevance of our content for certain search queries. Optimizing content and tagging it explicitly with specific entities makes easier for all kinds of search engines to identify its meaning so that it will appear in the results of more queries related to those entities. This does not mean that schema.org automatically provides a better ranking in the results pages; however, some analyses have shown a certain degree of correlation. This study by Searchmetrics found out that pages incorporating schema.org rank better by an average of four positions compared to web pages that do not integrate it. Although (paying attention to the spokesmen of search engines) this is not a causal effect, it appears to exist some indirect relationship between semantic markup and a better ranking.
  • Tagging pages with metadata that identify them as information about products, movies, applications, recipes, etc. makes them more likely to appear in the vertical areas of general search engines and in specialized engines.
  • When a semantically tagged content is shown in the search results, it appears in the form of rich snippets or similar, which include specific data, access to multimedia elements and even the possibility of browsing information and refining the search. This increases the visibility, appeal and “clickability” of that outcome, which results into more visits and social sharing opportunities for that content. In some cases an increase in clicks from organic search has been reported, up to 30%. Above all, schema.org allows optimizing the CTR of a link.
  • In the same vein, when content is shared in social media or added by an automated tool, links generated algorithmically by these systems are more appealing and informative and can increase traffic to that content.

However, very few sites are currently using schema.org

Despite its numerous benefits, the adoption of schema.org semantic markup is still very low. According to the report by Searchmetrics, only 0.3% of the analyzed domains incorporated schema.org integrations. And that is more striking when compared with its potential impact: according to the same study, Google enriches search results with information derived from schema.org markup in more than 36% of the queries.

Additionally, in only 34.4% of queries the search engine returned results with neither schema.org integrations nor any other structured data involved. It is clear that schema.org is more popular among the results of search engines than among webmasters.

Schema.org markup tools

This low penetration might be because the detailed semantic markup is mainly a manual process. There are several tools that can be of help in the markup work

Google Structured Data Markup Helpe

The communities of the most popular content management systems have even developed plug-ins for this task. But these tools generate the markup code only once the element to which it refers has been identified (more or less manually).

The following Google tools serve to validate the result of a markup and to get an idea about the volume of structured data that the search engine can see in our pages.

To ease the use of structured data markup and rich snippets, at Daedalus we are developing semantic publishing technologies to automatically tag content incorporating information about all kinds of elements of meaning that appear in it: people, organizations, brands, dates, topics, concepts…

In particular, our product Textalytics - Meaning as a Service includes a specific semantic publishing API featuring schema.org markup, as you can check using this demonstrator.

Schema.org Markup Demonstrator

It is very likely that your competition is not using schema.org. It is time to act, integrate it immediately and make the most of its enormous advantages.

If you need more information, don’t hesitate to contact us.

TwitterLinkedInGoogle+FacebookEmail

Semantic markup, rich snippets and schema.org: expose the meaning of your content

As search engines are evolving to a more semantic approach, every organization with online presence will need to make apparent the meaning of their web content. Semantic markup technology can help to make content more relevant to search engines and more attractive to users.

Search engines: from keywords to entities

Search BarcelonaSearch engines are evolving. Providing users with a series of results that contain a certain string of characters (e.g. “barcelona”) is no longer enough. The objective now is to provide them information related to a certain “thing” or meaning (the city of Barcelona, the Barcelona F. C.) or to respond to the user’s intent (organizing a trip to Barcelona). In the future, search engines must be able to offer precise answers to specific questions (e.g. How many inhabitants does Barcelona’s province have?) without having to navigate in a results page.

This transition towards “things, not strings” and entities rather than keywords collides with search engines’ difficulty in interpreting the meaning of an online content: HTML is a language designed to describe how a web page should be presented, not to express its meaning. Even in pages which aim is to provide a set of structured data —typically residing in a company’s internal databases— the nature of HTML hides that data from the search, social and aggregation ecosystem.

Marketers and, in general, all persons wanting their online contents to be spread and findable, need to enrich these contents with metadata to specify to search engines and other applications what do they mean and not just what it is said on them. In other words, they need technologies that enable them to semantically tag their content.

Semantic markup and rich snippets

drooling dog snippet metadata circledThe major search engines have been experimenting with semantic tagging and structured data applications for years (in fact, few days ago marked the fifth anniversary of the presentation of Google’s rich snippets). Essentially, with these technologies the owners of web sites can add to their pages an HTML markup that enables search engines to identify specific elements of those pages and, in some cases, present them in search results.

rich snippetOver the years, it has been possible to tag HTML content with different syntaxes (microformats, HTML5 microdata, RDFa) and various vocabularies to provide information about products, people, events… This information has usually resulted in rich snippets and similar formats, which are more informative than a simple blue link with a more or less representative text, and more appealing to users.

The advantages for both search engines and content generators are obvious. For search engines, providing their users with rich and more relevant search results is a step forward in their objective of facilitating the access to information.

For content generators it is a chance to appear among the search results, stand out within the results page and get more visits and social shares (we will analyze this in detail in a future post).

However, in order for this technology to gain acceptance, the different agents involved are required to agree upon the vocabulary that is going to be used to identify each type of entity and its properties: a universal language for semantic tagging is essential.

Schema.org: the lingua franca of semantic markup

Result of the collaboration between Google, Bing and Yahoo (then joined by Yandex), schema.org provides a set of vocabularies used for the markup of structured data in HTML documents, so that they can be understood by search engines, aggregators and social media. With the support of the industry’s leaders, schema.org can represent the semantic markup’s coming of age.

Person schema.orgPerson schema.orgSchema.org allows identifying people, organizations, places, products, reviews, works (books, movies, recipes…), and its vocabulary is continuously expanding. In addition, it supports different syntaxes: microdata, microformats, RDFa and, recently, JSON-LD. With the creation of a common markup scheme, major search engines aim at improving the understanding of the information contained in web pages and its representation in the results pages.

Schema.org vocabulary enables not only to describe, but also disambiguate and associate elements with their meaning.  Schema.org’s sameAs property permits to associate a particular instance of a “thing” (person, organization, brand…) that appears on a web page with a reference URI indicating unambiguously the identity of the element, for example, a page from Wikipedia, Freebase or an official web site.

Both search engines and all kinds of social media and aggregators depend more and more on this type of semantic references within the web pages (most notably Google, since the launch of its Knowledge Graph and the Hummingbird update of its algorithm).

Google itself has announced that will continue supporting other vocabularies and syntaxes for the markup of structured data, but it will favor the use of schema.org. This strong support by the largest players will prompt more and more content providers to adopt schema.org, which in turn will become the reference vocabulary for the expression of structured data.

The case of the media industry: rNews

Probably one of the sectors in which the need for semantic markup is most urgent is the one of mass media. Online media need to make their content more findable and relatable and improve the targeting of contextual ads that constitute their main revenue stream. For that purpose, the IPTC (a consortium of leading agencies, media companies and providers in that sector) has developed rNews.

It is a standard that defines the use of semantic markup to annotate HTML documents with news-specific metadata, both structural (title, medium, author, date) and content related (people, organizations, concepts, locations).

Being schema.org and rNews two projects that were born almost at the same time, and in order to avoid the proliferation of standards, schema.org included support to rNews virtually from the beginning for news tagging. Currently, leading media as the New York Times are tagging all articles using rNews on schema.org.

The semantic tagging of content offers enormous possibilities in the field of SEO and marketing in general, but is not exempt from difficulties: how to tag the thousands —or millions— of existing pages, for example, in a medium?

We will try to cover these issues in upcoming posts.

Meanwhile, if you wish to discover how semantic technologies enable you to produce and publish more valuable content, faster and at lower cost don’t miss this webinar by Daedalus (in Spanish).

TwitterLinkedInGoogle+FacebookEmail

Text Analytics market 2014: Seth Grimes interviews Daedalus’ CEO

Seth GrimesSeth Grimes is one of the leading industry analysts covering the text analytics and semantic technology market. During the past month he published a series of interviews with relevant figures in this industry, a material to be included in his forthcoming report Text Analytics 2014: User Perspectives on Solutions and Providers, which will be published before summer (for more info, stay tuned to this blog).

Our CEO, José Carlos González, was one of the selected executives. In the interview, Seth and José Carlos discuss recent changes in the industry, customer cases, features requested by the market, etc.

This is the beginning of the interview:

Text Analytics 2014: Q&A with José Carlos González, Daedalus

How has the market for text technologies, and text-analytics-reliant solutions, changed in the past year? Any surprises?

Over the past year, there has been a lot of buzz around text analytics. We have seen a sharp increase of interest around the topic, along with some caution and distrust by people from markets where a combination of sampling and manual processing has been the rule until now.

We have perceived two (not surprising) phenomena:

  • The blooming of companies addressing specific vertical markets incorporating basic text processing capabilities. Most of the time, text analytics functionality is achieved through integration of general-purpose open source, simple pattern matching or pure statistical solutions. Such solutions can be built rapidly from large resources (corpora) available for free, which has lowered entry barriers for newcomers at the cost of poor adaptation to the task and low accuracy.
  • Providers have strengthened the effort carried out to create or educate the markets. For instance, non-negligible investments have been made to make the technology easily integrable and demonstrable. However, the accuracy of text analytics tools depends to some extent on the typology of text (language, genre, source) and on the purpose and interpretation of the client. General-purpose and do-it-yourself approaches may lead to deceive user expectations due to wrong parametrization or goals outside the scope of particular tools.

———

Interested? Read the rest of the interview -featuring customer cases, our “Meaning as a Service” product Textalytics and coming functionalities of our offering- on Seth Grimes’ blog.

TwitterLinkedInGoogle+FacebookEmail

Mining of useful information in social media: Daedalus at Big Data Week 2014

In the past few days we took part in Big Data Week 2014 in Madrid. Big Data Week is a network of events that take place in different cities of the world and is one of the most important global platforms focused on the social, political and technological impact of Big Data.

Big Data Week 2014 Madrid

These events bring together a global community of data scientists, technology providers and business users, and provide an open and self-organized environment to educate, inform, and inspire in the field of exploitation of massive data. In this year’s edition in Madrid, the Francisco de Vitoria University assumed through the CEIEC the role of City Partner and led the event.

Earthquakes, Buying Signals and… #WTF

With the title “Earthquakes, Buying Signals and… #WTF: Mining of Useful Information in Social Media”, our presentation illustrated how to use semantic technologies to automatically extract valuable information from social media scenarios, where Volume, Variety and Velocity requirements are extreme.

The presentation began by putting social media analysis in a context of unstructured content explosion and Big Data, and introducing semantic processing technologies.  Then, we presented some application scenarios we are developing in our R&D and commercial projects.

These applications basically focus on the areas of Voice of the Customer (VoC) / Customer Insights and Voice of the Citizen:

  • Customer journey and buying signals
  • Brand personality and perception maps
  • Corporate reputation
  • Smart cities and citizen sensor
  • Early detection and monitoring of emergencies

Finally, our service Textalytics “Meaning as a Service” was introduced as the easiest and most productive way to introduce semantic processing into any application, and thus extract useful information from social media and other unstructured content. (Remember that Textalytics can be used for free to process up to 500,000 words/month.)

In addition, in the event’s exhibition area we presented some demos focused on the above mentioned applications.

Here are the slides of the presentation (Spanish).

 

TwitterLinkedInGoogle+FacebookEmail

The workshop of Stilus, one of the ones that sparked more curiosity among the attendees of Lenguando

Translated by Luca De Filippis

The last weekend, in Madrid, Casa del Lector was the best scenario to celebrate Lenguando: the first national meeting on language and technology. The pioneering initiative, brought successfully to reality by our colleagues at Molino de Ideas, Cálamo & Cran and Xosé Castro, was driven, among other sponsors, by Daedalus‘s Stilus.

XoseStilusThe spirit of the conference was to bring together in the same space translators, proofreaders, philologists and other communication and language professionals, with an emphasis on the technological revolution of the sector, among other issues.

The talks about the advances in language technology and the simultaneous workshops on their practical application were the most anticipated. In particular, the workshop given in the main auditorium by Concepción Polo (who’s writing this post) on behalf of the team of Stilus was one of the most anticipated by the attendees, according to the organization.

Corpus Linguistics applied to proofreading

LenguandoWith the intention of presenting innovative content and above all practical, in the workshop we considered the possible applications of Corpus Linguistics (CL) in the specific area of professional automatic proofreading. The first aspect that aroused the interest was the disclosure (for many) of the new features of lemmatized and morphological search finally offered by the academic corpora Nuevo Diccionario Histórico (CDH) and Corpus del Español del Siglo XXI (CORPES XXI). Another key content was the brief comparison between the capabilities of these new corpora of the Spanish Royal Academy and those of the less known, although magnificent and veteran Corpus del Español by Mark Davies.

Sin títuloAfter presenting the theory, some reflections followed: how and for what purpose a professional can apply Corpus Linguistics in decision-making process of proofreading and, also, how to automatize proofreading patterns with Word macros, for example.

In the last part of the workshop we explained how an intelligent automatic proofreader is able to address contextual issues that remain outside the autonomous user’s reach.codigoStiuls It was time to examine and understand the pseudo C++ code on which Stilus’ linguistic rules are based. The surprise among the participants without experience in Natural Language Processing laid both in the potential of this technology and in the mere fact of being able to interpret C rules that handled formal, morphological, syntagmatic and even semantic elements.

Presentation of Stilus Macro

Indeed, the availability of tagged corpora allows carrying out empirical research on syntactic and lexical phenomena of a language on an unimaginable scale, and its application to computational linguistics is highly beneficial. Still, the examination of corpora shows that there are thousands of incorrect sequences of words that can be detected without needing morphosyntactic support, and this is precisely the purpose of Stilus Macro: an add-on —still in development— we presented at the end of the workshop, which is capable of running with high speed more than 230.000 context-independent patterns for spell, grammar and style proofreading with Word; a task essentially simple, but unfeasible from a human point of view.

Demo-video

 

For more information, access the full presentation.

 

TwitterLinkedInGoogle+FacebookEmail

NextGen Mobile Content Analytics, big data analytics solution for mobile video games

Mobile AnalyticsWe’ve recently started working on the project NextGen Mobile Content Analytics, which aims at researching, designing and developing a solution of Mobile Business Intelligence for developers and publishers of video games on mobile platforms. It is focused on providing analysis and customized recommendations starting from the player’s gaming experience.

Currently, the business tools aimed at analyzing the activity of users in mobile games (as Flurry, for example) limit their functionality to the collection of raw data, which must be interpreted by a human analyst to determine which actions should be performed in the analyzed scenarios. Some scenarios also require an immediate reaction, for example in the case of a player manifesting his will to give up the game soon, or when decisions have to be made with respect to the variation of a digital product’s price.

The project’s goal is to create an intelligent system to identify behavioral patterns and use them to categorize players in real time, according to different gaming, social or economic criteria (e.g. more active players, players who never buy, players who are going to leave the game soon, etc.). This will allow users to run on customized according to business goals actions essential in models where revenues depend on the interaction and evolution of the player in the game. This will allow performing customized actions over users according to business goals, which is essential in a model where revenues depend on the interaction and evolution of the player in the game.

We bring our know-how in the collection, analysis and visualization of massive data (big data). The solution’s architecture consists of a datawharehouse containing a log of the game’s events, with advanced data analytics and reporting modules to exploit the information stored. Furthermore, an SDK will be developed to be integrated into the game’s code in order to register in the datawarehouse the player’s actions. These will be analyzed by automatic classification algorithms (player profiles and related actions to take). Then, through clustering and visualization, potential problems will be identified, which will permit to take concrete decision for further software updates.

The NextGen Mobile Content Analytics project (TSI-100600-2013-198) is funded by the Spanish Ministry of Industry, Energy and Tourism in the framework of the Strategic Plan for Telecommunication and Information Society, National Plan for Scientific Research, Development and Technologic Innovation 2013-2016. We have been developing the project in cooperation with Digital Legends, a leading Spanish company well-known internationally for developing high-quality video games for mobile platforms.

logo-ministerio

We will publish more information as we move forward in the project and get further results. In any case, if you have any question, do not hesitate to contact us.

[Translation by Luca de Filippis]

TwitterLinkedInGoogle+FacebookEmail