[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [JDEV] charsets (was: Protocol extension?)



>
>Looking att protocol examples from jabber.org:
>
><message language="swahili" encoding="ISO-whatever-1234">
>  <to>mbwana</to>
>  <say>barbarbar!</say>
></message>

The language code would be more like "sw" - there is a whole lot of
stuff on locale specifiction that could be lifted without change to
get the language coding. (I believe klingon has an official two letter
code as well) The codes allow for structure as in en_uk and en_us and
en_us_texas :-)


>... or something the like. This raises the interesting question about
>the <to></to> field. Should it also be able to contain non-us-ascii
>characters? That could cause some complications for clients, right? If
>a Japanese person's name contains Kanji or Furigana characters, his
>name would be something like* "ÃUåS". We probably want to enforce user
>IDs in us ascii, though at least some roster info can be in any
>encoding... except group names and such.


I suspect that the only place where anything is set out for what cannot
be non-ascii is hostnames which have a limited character set (RFC
anyone?) I mean, I would get pretty hacked off if my name used
non-ascii characters and you told me I couldn't use it. Computers are
meant to assist people, right? Ever looked in on a Japanese IRC
Channel? If a client can't/won't cope then so what? It could (like
some mailers) display a message saying that the message is encoded in
a character set that is not supported and allow various options for
viewing the data. (In fact very often 7-bit ascii works anyway) The
really nasty things start when you get a community of overlapping
groups that use different encodings - this certainly happens with
Russian and with Japanese. Flagged encodings would be a joy for
sorting out this kind of mess! Allow encoding attributes on anything
that can have CDATA, and ignore them if you want (so long as this fact
is documented that's OK by me). (when I say "you", I mean the abstract
you that writes clients of course)


>* typical; my netscape on Linux actually displays Kanji correctly.

Yup, Linux does a pretty good job with Japanese and Korean. Damn site
better than other systems I wont name.

L.
-- 
http://catless.ncl.ac.uk/Lindsay