Tuesday, February 14, 2012

Character set support

I'm looking at an MS SQL server database and it stores city names
across the world. All a
arabic , chinese etc names are converted to the latin alphabet.

I feel it is probably critical we support accents as a means to
distinguish between cities that would be converted to the same Latin
alphabet representation otherwise and as a matter of cultural
politeness.

Now ..... how do we do this?
Use varchar? which character code page do we use?
Use nvarchar? I heard a rumour that SQL server pads unused characters
with blanks and hence will use space unnecessarily.

The data will be presented on sheets and emailed around the world - it
will also appear on a website.

Any advice greatly appreciated - thanks
Oliver(oraustin@.hotmail.com) writes:

Quote:

Originally Posted by

I'm looking at an MS SQL server database and it stores city names
across the world. All a
arabic , chinese etc names are converted to the latin alphabet.
>
I feel it is probably critical we support accents as a means to
distinguish between cities that would be converted to the same Latin
alphabet representation otherwise and as a matter of cultural
politeness.
>
>
Now ..... how do we do this?
Use varchar? which character code page do we use?
Use nvarchar?


You need to use nvarchar. With varchar you can only support one code
page, and that is not enough. For instance, if you use CP-1252, Windows
Latin-1, you will not be able to enter the capital of Roumania
correctly, nor the largest city of Turkey. (I'm using a newsreader that
is not Unicode capable, so embarrassing enough, I cannot enter these
name correctly myself.)

Quote:

Originally Posted by

I heard a rumour that SQL server pads unused characters
with blanks and hence will use space unnecessarily.


That is incorrect. If you use varchar/nvarchar, SQL Server will not
pad. However, if you save data with trailing spaces, SQL Server will
by default to chop that off. The char/nchar data types on the other
hand are fixed-length, but you would not use them for city length.

On the other hand, in nvarchar each character takes up two bytes, in
difference to varchar with one byte per character. So nvarchar takes
twice the space.

But given your requirements, you don't have much choice.

--
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se
Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pr...oads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodin...ions/books.mspx

No comments:

Post a Comment