[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[JDEV] Encodings & Simplicity




I've been thinking about this whole problem of encodings and would like to
thrown one more log on the fire... :)

We are assuming that in international circumstances, individuals will want to
create Jabber packets that contain non-ASCII data. Okay, that's good and fine.
But why force the Jabber packets that *enclose* the data to conform to the same
standard? That is, why does a Jabber packet *have* to be encoded in anything
besides UTF-8?! Jabber, by nature, doesn't care what **data** a packet carries,
so long as the transport/etherex/etc can parse the actual packet. Am I making
any sense? :)

It would be *very* difficult to ensure that every client and every transport
could understand 7 possible packet encodings. Why not just specify that all
protocol-related information be in UTF-8 (which is what most programming
languages use -- I don't know of a Chinese/Korean/Thai character based
programming language) ? Furthermore, specify that individual tags within a
packet could have an attribute telling the client what format the CDATA (for a
particular tag) is in. This way, it's up to the client to decode funky chars and
the server/transport side of things only worries about the encoding...

These are just a few thoughts I had..probably doesn't solve everything, but I
think it simplfies the issue...doesn't it? :)

D.