As search engines are evolving to a more semantic approach, every organization with online presence will need to make apparent the meaning of their web content. Semantic markup technology can help to make content more relevant to search engines and more attractive to users.
Search engines: from keywords to entities
Search engines are evolving. Providing users with a series of results that contain a certain string of characters (e.g. “barcelona”) is no longer enough. The objective now is to provide them information related to a certain “thing” or meaning (the city of Barcelona, the Barcelona F. C.) or to respond to the user’s intent (organizing a trip to Barcelona). In the future, search engines must be able to offer precise answers to specific questions (e.g. How many inhabitants does Barcelona’s province have?) without having to navigate in a results page.
This transition towards “things, not strings” and entities rather than keywords collides with search engines’ difficulty in interpreting the meaning of an online content: HTML is a language designed to describe how a web page should be presented, not to express its meaning. Even in pages which aim is to provide a set of structured data —typically residing in a company’s internal databases— the nature of HTML hides that data from the search, social and aggregation ecosystem.
Marketers and, in general, all persons wanting their online contents to be spread and findable, need to enrich these contents with metadata to specify to search engines and other applications what do they mean and not just what it is said on them. In other words, they need technologies that enable them to semantically tag their content.
Semantic markup and rich snippets
The major search engines have been experimenting with semantic tagging and structured data applications for years (in fact, few days ago marked the fifth anniversary of the presentation of Google’s rich snippets). Essentially, with these technologies the owners of web sites can add to their pages an HTML markup that enables search engines to identify specific elements of those pages and, in some cases, present them in search results.
Over the years, it has been possible to tag HTML content with different syntaxes (microformats, HTML5 microdata, RDFa) and various vocabularies to provide information about products, people, events… This information has usually resulted in rich snippets and similar formats, which are more informative than a simple blue link with a more or less representative text, and more appealing to users.
The advantages for both search engines and content generators are obvious. For search engines, providing their users with rich and more relevant search results is a step forward in their objective of facilitating the access to information.
For content generators it is a chance to appear among the search results, stand out within the results page and get more visits and social shares (we will analyze this in detail in a future post).
However, in order for this technology to gain acceptance, the different agents involved are required to agree upon the vocabulary that is going to be used to identify each type of entity and its properties: a universal language for semantic tagging is essential.
Schema.org: the lingua franca of semantic markup
Result of the collaboration between Google, Bing and Yahoo (then joined by Yandex), schema.org provides a set of vocabularies used for the markup of structured data in HTML documents, so that they can be understood by search engines, aggregators and social media. With the support of the industry’s leaders, schema.org can represent the semantic markup’s coming of age.
Schema.org allows identifying people, organizations, places, products, reviews, works (books, movies, recipes…), and its vocabulary is continuously expanding. In addition, it supports different syntaxes: microdata, microformats, RDFa and, recently, JSON-LD. With the creation of a common markup scheme, major search engines aim at improving the understanding of the information contained in web pages and its representation in the results pages.
Schema.org vocabulary enables not only to describe, but also disambiguate and associate elements with their meaning. Schema.org’s sameAs property permits to associate a particular instance of a “thing” (person, organization, brand…) that appears on a web page with a reference URI indicating unambiguously the identity of the element, for example, a page from Wikipedia, Freebase or an official web site.
Both search engines and all kinds of social media and aggregators depend more and more on this type of semantic references within the web pages (most notably Google, since the launch of its Knowledge Graph and the Hummingbird update of its algorithm).
Google itself has announced that will continue supporting other vocabularies and syntaxes for the markup of structured data, but it will favor the use of schema.org. This strong support by the largest players will prompt more and more content providers to adopt schema.org, which in turn will become the reference vocabulary for the expression of structured data.
The case of the media industry: rNews
Probably one of the sectors in which the need for semantic markup is most urgent is the one of mass media. Online media need to make their content more findable and relatable and improve the targeting of contextual ads that constitute their main revenue stream. For that purpose, the IPTC (a consortium of leading agencies, media companies and providers in that sector) has developed rNews.
It is a standard that defines the use of semantic markup to annotate HTML documents with news-specific metadata, both structural (title, medium, author, date) and content related (people, organizations, concepts, locations).
Being schema.org and rNews two projects that were born almost at the same time, and in order to avoid the proliferation of standards, schema.org included support to rNews virtually from the beginning for news tagging. Currently, leading media as the New York Times are tagging all articles using rNews on schema.org.
The semantic tagging of content offers enormous possibilities in the field of SEO and marketing in general, but is not exempt from difficulties: how to tag the thousands —or millions— of existing pages, for example, in a medium?
We will try to cover these issues in upcoming posts.
Meanwhile, if you wish to discover how semantic technologies enable you to produce and publish more valuable content, faster and at lower cost don’t miss this webinar by Daedalus (in Spanish).