This is about writing the internet. Every page on the world wide web is constructed using a code language (more precisely a ‘markup’ language), called HTML. The whole of HTML’s affordances is made up by what standards prescribe, what browsers allow, and what people write—and it forms what is the web. This text is an introduction to the different power struggles that underlie the continued development of this language.
the foundational myth
With the commercial explosion of the World Wide Web in the 1990ies, the once simple mark-up language HTML doesn’t suffice for the kind of media-heavy content that is becoming prevalent. Competing browser manufacturers Netscape and Internet Explorer tack on feature after feature to HTML.
Initiated by the web’s inventor Tim Berners Lee, the World Wide Web Consortium W3C writes standards for the HTML language. But Microsoft and Netscape largely ignore those in favour of their own inventions. 1
With no direction to the language, and the community of web writers fragmented, no consensus of best-practices exists. The HTML source for website becomes unreadable, stalling the copy-paste innovation model upon which most of the early web has been built.
At this point, an other group with vested interests joins the debate. The web designers who have to create web pages for their living, who are having a hard time to work in this fragmented landscape, often creating multiple versions of a site that cater to multiple browsers. The idea of browsers respecting the W3C standards becomes their rally point, in what is known as the Web Standards Movement.
An aside, it would be interesting to look at this group of designers more precisely: it doesn’t represent all designers, but rather a specific subset who have a hybrid design/development skill set. They operate contrasting themselves with communication agencies who have a design/development division of labour or who invest in more traditional designer-friendly tools such as Adobe Flash or Dreamweaver—in fact, the ‘Standards Aware’ designers can be seen to advocate against these kind of tools, advocating ‘hand written html’ over the ‘bloated WYSIWYG tools’.2
For a while the W3C seemed a natural ally to web developers. A standards body providing free standards—standards that became the stick to beat browser vendors with, and whose compliance became a mark of prestige for a new generation of web designer/developers.
Another party to the burgeoning ‘standards movement’ are new browser developers. These browsers with a smaller market share have a hard time competing, because most web pages are built to the whims of the Netscape or Internet Explorer’s rendering engines. Web standards will make it more easy for new browsers to gain a competitive advantage. The backgrounds of the browser manufacturers are quite varied: there is the small Norwegian company Opera, there is Mozilla, informed by ideals of an Open Web, who have created the Open Source project Firefox, and there is Apple, who have created their own browser Safari so that they don’t have to rely on third parties for a smooth web experience on their operating system.
Especially the rise of Firefox is spectacular, and the success of the Web Standards in gaining mind-share among web developers is huge.
a new power struggle
Standards are a work in progress, involving many actors. The confluence of browser vendors, web designers and the W3C working together has seen a great momentum, when the interests of all these parties aligned towards overcoming the power of the established browsers by Microsoft and Netscape. With the dust settled, afterwards, the way forward is less clear.
Discontents with the W3C becomes prevalent as development of XHTML2 progresses, which more clearly outlines the vision of the W3C: towards standards that require a strict adherence (i.e., the document won’t display if not fully well-formed), in order to pave the way for a future vision of the web, allowing the content of web pages to be more easily reasoned about by software programs—a future known as the Semantic Web.
Convincing arguments against a naive vision of the Semantic Web were already voiced by Cory Doctorow in 2002. Since software can not easily deal with natural language, web pages would need some kind of structured metadata in addition to their linguistic content. Besides the inherent impossibility of objective frameworks for metadata (‘ontologies’), the quality of such metadata will always be lacking due to human laziness on the one hand, and the human desire to game the system on the other3.
The other main argument against the new standard is has been uttered in many forms around the web, among others by ARNO* in W3C go home! (c’est le html qu’on assissine). This argument goes: the very fact that browsers are extremely forgiving in the way they interpret markup is the basis for the success of the internet: it has enabled copy-paste developing style that made the barrier to entry for creating web pages quite low\
The most consistent and influential counter reaction to the W3C’s direction comes from an association of browser vendors known as the WHATWG. They stage a coup, proposing an alternative future standard: HTML5. The name itself suggests the promise of continuity and backwards compatibility, and the standard itself focuses on capturing existing practices, with a focus on web applications.
This coup is wildly successful. W3C even endorses the new standard. For a while work continues on both HTML5 and XHTML2 until the W3C announces the decision to drop XHTML2.
conflicting interests, a case study: RDF/a
As far as standards bodies go, the W3C is quite open: The cost of membership is given on a sliding scale, depending on the character of the organization applying and the country in which it is located. That isn’t the case with the WHATWG, as written in the Charter, ‘Membership is by invitation only’. This than makes it opportune to look, who are these members, what are their interests, and how do they come into play in the nature of HTML5.
As much as it advances the state of the web HTML5 is definitely no longer focused on the ideology of the Semantic Web. To examine what this means in practice, lets look at an element of Semantic Web technology called RDF/A: the W3C ’s intended mechanism to add extra metadata to your HTML pages.\
A foundational idea of XHTML as an XML based format is that other XML based formats can be mixed in. HTML5 doesn’t provide such a format for extension. The HTML5 working group chose two other XML formats that can be emdedded in an HTML5 document: SVG drawings and MATHML mathematical formulas. RDF/A is not among the extensions allowed in HTML5.
The specification’s editor, Thomas Hickson, writes on the W3C mailing list about the reasoning to omit RDF/A. He fails to see the added value of RDF/A over natural language processing. With natural language processing, he means search algorithms that operates on existing documents, without the additional formal layer required by the Semantic Web.
Natural language processing happens to be a strength of Hickson’s employer: Google. The argument is disengenious. Google’s algorithms are extremely good because they are a huge company that has invested billions of dollars in them, and they train them on huge datasets they have access to because they are at the axis of most internet traffic.
RDF/A, on the contrary, is a way to provide for relational data that is available for anyone to use. And there are other possibilities to use this technology than the meta-utopia initially envisioned by its creators. With the design collective OSP of which I am a part, we have created a wiki used by contemporary dance creators. If Robert is mentioned, they will tag him with extra data: Robert is the choreographer of this project. On other project-pages, in which he is for example the dancer, he will be tagged as such. Clicking on Robert’s name on any of these pages will show a list of all the projects Robert has a role, and what this role is… a great way to be able to create links and relations in a bottom up way.
The suspicion arises then, that the exclusion of RDF/A from the HTML5 specification is motivated at least in part by Google’s interests. Indeed, whereas the series of standards proposed by the W3c in the 2000’s have seen widespread criticism for not being realistic, for Google they still represent a threat to it’s hegemony in finding information online. Shelley Powers phrases it as such:
On the other, I've been a part of the HTML WG for a little while now, and I don't feel entirely happy, or comfortable with many of the decisions for X/HTML5, or for the fact that it is, for all intents and purposes, authored by one person. One person who works for Google, a company that can be aggressively competitive.
Conclusion
The takeaway is that consistently throughout the history of HTML browser vendors have had the largest influence on its development. But the role of designers and developers, content creators on the web, has also been significant.
If you create websites it makes sense to implicate yourself into this process because it seems like a waste of the web’s potential it its only guided by the interests of Apple, Google, Microsoft and Mozilla.
At the same time, we all create in one way or another on the web. Maybe we should open up the circle even larger. One thing we haven’t yet seen in its history, is consumer organisations.
-
An interesting account of the first years of the W3C and the HTML standards in Dave Ragget’s 1997 book ‘Ragget on HTML’ of which the chapter 2 ‘a History of HTML’ is available online: http://www.w3.org/People/Raggett/book4/ch02.html ↩
-
cf for instance Giantmike's website is masterfully crafted with handwritten HTML ↩
-
With regards to laziness, it is telling that Metadata standards, while not employed en masse on the World Wide Web, have seen a great uptake in museums and archives, because these are the places where people are paid to make accurate metadata. ↩