Subject: Re: dc language in rss
Date: Mon, 28 Oct 2002 08:24:08 -0500 (EST)
From: Aaron Straup Cope
To: Bill Kearney
Subject: Re: dc language in rss
On Fri, 25 Oct 2002, Bill Kearney wrote:
> That would indeed be a problem. You could actually mark up those sections, even
> down to the paragraphs or even words with span tags. I shudder at the thought
> of what most environments would DO with that data, but it's certainly possible.
If I were a better person, I(would(learn(lisp))) and write an Emacs
minor-mode to do that. (Sadly(,(lisp(scares(me))))).
> Well, the problem is what does that element mean? What purpose is it being used
> for? I daresay outside of Syndic8's listing of feeds by language, not much is
> paying attention to it. So my question to you is what would you have a reader
> program DO with multiple languages?
The short answer is : I have no idea.
The longer answer is : Who cares?
There are two issues here :
The first falls into the Foofy Grand Unifying Principles category - the
people who invented the Internet didn't know what it was going to be used
for. Why should RSS, and its tool set, presume the samething as basic and
often controversial as language?
The second falls into the Dueling Shakespeare category - RFC 1766 states
that :
"In some contexts, it is possible to have information in more than one
language, or it might be possible to provide tools for assisting in the
understanding of a language (like dictionaries).
"A prerequisite for any such function is a means of labelling the
information content with an identifier for the language in which is is
written."
But in the absense of multiple language tags, the correct answer when
prigs like me start pussing is :
<quote src = "rfc1766">
The information in the subtag may for instance be:
- Country identification, such as en-US (this usage is
described in ISO 639)
- Dialect or variant information, such as no-nynorsk or en-
cockney
- Languages not listed in ISO 639 that are not variants of
any listed language, which can be registered with the i-
prefix, such as i-cherokee
- Script variations, such as az-arabic and az-cyrillic
</quote>
Which doesn't solve everyone's problem, but can be adapted to deal with
the problem of Quebec. I chose en-quebecois, because I like the sound of
it. Sovereigntists, on the other hand will probably opt for 'en-qc' since
it implies nationhood.
Then, of course, there is the question of how to deal with representing a
weblog written by the province's allophone population (translation:
persons whose mother tongue is neither English nor French and who, in my
limited experience, often speak upward of 4-6 languages). What then?
qc-allophone?
Simon Willison : "I've put together an XML-RPC proxy for the [W3C Validator]."
Le Québec en images
via
afroginthevallyMina Naguib : Weather::Underground.pm