Comments on Mugunth's Blog: New tamil unicode encoding proposal - My opinion

http://www.araichchi.net/kanini/unicode/pros_and_c...

2006-08-27T02:56:00.000+05:30

http://www.araichchi.net/kanini/unicode/pros_and_cons.html

The above page of mine would shed some light with this problem.

and

Current Unicode is based on Tamil Grammar, excluding the natural sort order.

Well, as far as sorting concerned, we can touch the nose by taking the arm through the back of the neck!!!

TUNE might make the sorting easier, but nothing else is technically correct.

Like Mugunth said, I think current Unicode is good but if TUNE get accepted we can use that too, including for ezuththuch chiirmai.

Another thing to note, Unicode in general is not working in certain important instances. It will be same weather it is current Unicode or TUNE. Unicode in general is still crawling. eg, email, graphics etc are a painfully not working. This is where a temporary Tamil ISO standard would still help while Unicode grows to be a real one in the future. May be instead of spending time with TUNE, working for a temporary non-hacked Tamil ISO would be a godd move.

Sinnathurai Srivas

concerning:> Binary sorts are the fastest type > o...

2006-08-27T02:36:00.000+05:30

concerning:
> Binary sorts are the fastest type
> of sort, and produce reasonable
> results for the English alphabet
> because the ASCII and EBCDIC standards
> define the letters A to Z in ascending
> numeric value.

This is only true if your data is all upper case (or all lower case). Since most English data is _not_ all the same case, most such data cannot be sorted in a binary sort.

இதை வாசியுங்கள்:http://www.indiawebdevelopers.com/...

2006-06-16T22:57:00.000+05:30

இதை வாசியுங்கள்:
http://www.indiawebdevelopers.com/technology/oracle9i/sorting.asp

"
Conventionally, when character data is stored, the sort sequence is based on thenumeric values of the characters defined by the character encoding scheme. This is called a binary sort.

Binary sorts are the fastest type of sort, and produce reasonable results for the English alphabet because the ASCII and EBCDIC standards define the letters A to Z in ascending numeric value.
"

[Please note the point, the BINARY sorting is only possible because the letters are in order]

[Here is how our Tamil may be sorted]
"
A linguistic sort operates by replacing characters with numeric values that reflect each character' s proper linguistic order. These numeric values are found in a table containing major and minor values.
"
இதற்கு கீழ் குறிப்பிடப் படும் முறைகள் எல்லாம், தமிழ் போன்ற [level-2] மொழிகளுக்கு. பாருங்கள் எவ்வளவு மேலதிக processing தேவை என்பதை.

காலப்போக்கில், இந்த நேர இடைவெளி வெகுவாக குறையும், ஆனால், இவை தேவை என்பது நிரந்தரமே.

****************************
"
Using linguistic indices you can provide the sophisticated sorting capabilities of a multilingual sort while achieving sorting performance nearly as good as a binary sort (which offers the best performance).
"
****************************

Binary sort is NOT possible for present day Tamil Unicode! If anybody can prove that they can do a binary sort for the presend day Tamil Unicode, I'll clean their shoes by licking.

--
______
CAPital
http://1paarvai.wordpress.com/

Please read at:http://1paarvai.wordpress.com/tag/u...

2006-06-16T22:10:00.000+05:30

Please read at:

http://1paarvai.wordpress.com/tag/unicode/

Please read in order upto P-9 [so far]

_______
CAPital

Well said Mugunth.As ususal, uninformed people (us...

2005-12-08T05:10:00.000+05:30

Well said Mugunth.

As ususal, uninformed people (usually TUNE supporters) seem to be intent on spreading FUD about the current Unicode.

In, http://urpudathathu.blogspot.com/2005/12/blog-post_06.html, Pari writes:

> Sorting in current Tamil Unicode is horrible. So is parsing.
> They better get it right this time, or goto hell.

This is a typical example an ill-informed, emotional statement about Tamil Unicode, made with little understanding of the facts.

First, sorting has nothing to do with the Unicode encoding. Unicode merely defines a mapping from each abstract character entity in a script to a unique 16-bit (or 32-bit) number. Sorting, on the other hand, is a highly locale-specific issue; one that depends on linguisitics, country/region, and the type of text being sorted. Therefore, sorting needs to be highly customizable and not tied to the character entity order in any script.

The proponents of TUNE seem to hold the simplistic, but mistaken, view that character order in Unicode equals sorting order, or atleast it should be. This may work for the Tamil script, as there is only one dominant language that uses the Tamil script, but for other scripts such an assumption will create all kinds of problems.

Thus, correct sorting requires additional knowledge beyond the mere character order and is typically implmented on a per-locale basis by the system libraries in each operating system. (For example, we can have a different sort orders for ta_IN and ta_MY respectively.)

It is well known that the current *Windows* implementation of Tamil sorting has some flaws; ie, Windows does not sort in the order that a native Tamil speaker would expect. However, that doesn't mean the rest of the computer world has to get it wrong either: The GNU C library implements Tamil sorting according to the Madras Tamil Lexicon order. Any system that uses the GNU C library, such as Linux, can sort Tamil text perfectly well. So, it is not the fault of the Unicode encoding, nor of the Unicode consortium, if a particular system library does not implement sorting correctly.

Second, where is the evidence that parsing Tamil Unicode is hard? Most major computer languages, including Java, C/C++, Python, have excellent Unicode handling support that obviate the need for ever hand-coding a parser.

The TUNE proposal, a work of questionable scholarship based on flimsy evidence, will never never be accepted by the Unicode consortium on technical grounds. Fortunately Tamil Unicode is here to stay, despite the shenanigans of the TUNE proponents, thanks to the excellent multi-platform support, its daily use by Tamil bloggers and Tamil websites with "clue".

//Existing tamil unicode standard works fine in ev...

2005-12-08T04:30:00.000+05:30

//Existing tamil unicode standard works fine in every aspect.// Not really. eg.,sorting.