Disclaimer

These scripts come without warranty of any kind. Use them at your own risk. I assume no liability for the accuracy, correctness, completeness, or usefulness of any information provided by this site nor for any sort of damages using these scripts may cause.
Showing posts with label AL32UTF8. Show all posts
Showing posts with label AL32UTF8. Show all posts

Saturday, August 13, 2011

Quick Reference on Database Characterset.

Recently i was upgrading the Oracle apps 11i to R12 and the customer want Arabic NLS. So i need to convert my database characterset form US7ASCII to a characterset that supports Arabic. I have confused which one should i use, UTF8 or AL32UTF8.  I did some reserch on oracle support and decided to use AL32UTF8.  Below given points helped me to make that decision.


The default UTF-8 (Unicode) characterset for 9i/10G is AL32UTF8, however this characterset is NOT recognized by any pre-9i clients/server systems.

Recommend that you use UTF8 instead of AL32UTF8 as database characterset if you have 8i (or older) servers and clients connecting to the 9i/10g system
until you can upgrade the older versions to 9i or higher.
UTF8 is Unicode revision 3.0 in 8.1.7 and up. AL32UTF8 is updated with newer Unicode versions in each major release

Besides the difference in Unicode version the "big difference" is that AL32UTF8 has build in support for "Surrogate Pairs", also known as "Surrogate characters"
or "Supplementary characters". Practically this means that in 99% of the cases you can use UTF8 instead of AL32UTF8 without any problem.
There is no performance difference between UTF8 and AL32UTF8. But seen UTF8 is NOT updated with newer Unicode versions we however suggest to use AL32UTF8 when possible.

Summary:
If you use 8.1 or 8.0 clients or servers or connect to 8.1 or 8.0 databases then use UTF8 as NLS_CHARACTERSET for the 9i (or up) databases, otherwise
(unless the application vendor explicit mentions UTF8 of course) use AL32UTF8.

For an Unicode database Oracle does not need "Unicode support" from the OS where the database is running on because the Oracle AL32UTF8 implementation is not depending on OS features
There is also no need to "install Unicode" or so for the Oracle database/client software, all character sets known in a database version , and this includes
Unicode character sets, are always installed. You simply cannot choose to not install them
If your current Oracle version is 8.1.7 or lower then it's best to upgrade first to a higher release, mainly because
a) you then can use AL32UTF8 (not possible in 8i)
b) Csscan has a few issues in 817 who might provoke confusion.
If your current Oracle version is 9i or up then both (before or after) are a good choice, it simply depends on your preference or needed application changes
Storage.
AL32UTF8 is a varying width characterset, which means that the code for 1 character can be 1 , 2 , 3 or 4 bytes long. This is a big difference with
character sets like WE8ISO8559P1 or WE8MSWIN1252 where 1 character is always 1 byte.