Ξ

Towards simple language on the web

Published on 2017-06-26

It has been two years now that I was tasked with creating a website. One detail of this project was that it would have a "simple German" translation in addition to a German one. Little did I know back then that this was the beginning of a fantastic journey.

The Standards

First things first. There is a standard that requires us to provide simple language alternatives:

Reading Level: When text requires reading ability more advanced than the lower secondary education level after removal of proper names and titles, supplemental content, or a version that does not require reading ability more advanced than the lower secondary education level, is available. — WCAG 2.0

On the web, an alternate version of a page in a different language is usually marked up as a link somewhere in the document's head:

<link rel="alternate" hreflang="de-simple" href="…" />

This allows browsers and search engines to automatically discover content in a language you understand. Unfortunately, the language tag "de-simple" did not exist.

Most people know the language tags defined in ISO 639, but the web actually uses language tags as defined in BCP 47. These are based on ISO 639 tags, but they can be further refined with script, region, variant, or private-use subtags. Common examples for this are "en-GB" (English as used in Great Britain) or "sr-Latn" (Serbian written using the Latin script). The full list of subtags is maintained by IANA.

Of course the website that had started this whole topic for me could not wait. So I implemented it using the private-use tag de-x-simple. But this meant that browsers would just ignore the x-simple part. I wanted to find a solution that had actual benefits for users.

I started reading the archives of the IETF-languages mailing list. There had already been some controversial discussion on the topic in 2006 that had died down since. I sent my initial mail (along with another one to the Web Accessibility Initiative) in September 2015; Michael Everson helped out with some more tangible proposals (basiceng and wpsimple) in October; and sometime in December, the "simple" variant subtag was finally accepted.

Summary of the Discussion

You may ask yourself: "Why did it take three months of discussion to add four lines to that registry?" There are many reasons for that.

First of all, let me say that some people on that mailing list should seriously consider fixing their mail setups. It is very hard to follow the discussion if some key contributers just omit the In-Reply-To headers.

That aside, BCP 47 language tags are — similar to many other web standards — very important. They govern a global network that we all use. It is imperative to get them right. And getting languages right is inherently difficult. What constitutes a language is a political question as much as it is a linguistic one. Mix that with engineering and you have very few people who can make informed decisions.

In the case of simple language, it boils down to this: Are there fixed rules for this variant? And the answer is: Yes, there are more than enough rules: Basic English, the US Plain Writing Act, Leichte Sprache, and many more. Unfortunately, these are all distinct systems for specific languages. The only thing they all have in common is their intent to somehow be simple.

This makes sense if you factor in the many different target groups: children, second language learners, people with cognitive disabilities, non-experts reading a scientific text, … — they all have have slightly different needs. Many people therefore argued that a general simple subtag would not mean anything and we should have distinct subtags for the individual well defined systems instead. But even if different target groups have different needs, most forms of simplification help all of them:

[…] the problem with divergent E2R [easy to read] user groups is usually solved indirectly by just developing websites that are simple enough and include reduced amount of the most relevant content. This manner of approach guarantees an accessible site or a section of a site for almost all E2R-users. — Sami Älli

So in the end, a generic simple subtag was accepted. However it was clearly stated that additional, more specific variants could be added on top of that, resulting in language tags like de-simple-leicht. This would automatically fall back to de-simple if the more specific variant was not available.

If I'm a user I want "simple" English. Users could care less about a distinction between Voice of America English, Wikipedia Simplified English and Basic English. I just want an English I might be able to understand a bit better than normal English. I can't specify en-US-VoA in http-accept-language, because it'll match "en-US" not "en-US-wpsimple". So those tags are useless to the user. (However if we wanted to consider en-US-simple-VoA and en-US-simple-odgenbe and en-US-simple-wp that might work). — Shawn Steele

Is a new language variant the Answer?

The question remains whether the accessibility issue can be solved on a language level alone. Some people argued that complex websites can not be simplified just by using a different language:

Language can be made simple or complex. That's not the main problem in many cases though. The bigger problem is that complex ideas will remain complex, even when described in simple language. — Paul Bohman

Other people argued the exact opposite, and it seemed to be a matter of personal believe:

Explaining complex ideas is difficult. Explaining them with simple language is more difficult. But complex ideas can be explained in simple language. Thousands of very good teachers do that every day. — Chaals McCathie Nevile

And it is not only the concepts presented in a text that can be an issue. Navigation, layout, typography, interactions and much more can considerably add to complexity as well:

[…] understandability of text seems to be related to its presentation for many people. For example, it is not just the complexity of the text itself but how well it is organized (for example using headers, lists, and structures), and how it is presented (font, spacing, width, etc.). — Shadi Abou-Zahra

Adoption

More than a year later I came back to see who had adopted this new language tag. Unsurprisingly, nobody seemed to have noticed. We had failed to get the relevant experts involved in the discussion, so now they either did not know about the new language tag, or — even worse — they did not care.

I sent yet another mail to the Web Accessibility Initiative asking about including my pattern as recommended technique for WCAG 2.0. The answer was simple: If browsers do not support it, we cannot recommend it. A classic chicken-egg dilemma.

Someone proposed that I should write a browser extension that allows to automatically switch to the simple version of a website if available. I created a simple extension that should work in chrome, firefox, and some other browsers. Unfortunately I am very bad in advertising my projects. So nothing came of it.

So what are the next steps for me? I will try to contact big projects that publish simple content already. An obvious choice is wikipedia (which had ferocious debates about deleting its simple version). But there are probably more publishers that might benefit from this.

Conclusion

What have we learned? Naming languages is a fascinating and difficult area. Simple language is a controversial topic that can not yet find consensus in the W3C-WAI, IETF-languages, or wikipedia communities. And it is not actually that hard to get involved in web standards, even for some non-expert like me.

As a final word, let me say that you will probably never need the simple variant subtag. In the vast majority of cases you should have a single, accessible version instead of creating custom alternatives. If you want to know how to keep your texts simple, the US Federal Plain Language Guidelines are a good place to start.