Char and Varchar: Char and Varchar

Hi All,
How internally SQL Server Stores Data in a Char and Varchar field. Like I
think there will be an extra bit of data which will store for Varchar -
Which is for Number of Characters in the field. Is something similar in Char
also?
Also How Does SQL Server Retrive records from both types of field?
Thanks
Prabhat
Buy "Inside SQL Server" by Kalen Delaney for the complete explanation.
The shortened version is that a VARCHAR uses 2 extra bytes (to store the
actual length of the data in the column, or more accurately, the end
position of the column).
Jacco Schalkwijk
SQL Server MVP
"Prabhat" <not_a_mail@.hotmail.com> wrote in message
news:OSfHSS3NEHA.644@.tk2msftngp13.phx.gbl...
> Hi All,
> How internally SQL Server Stores Data in a Char and Varchar field. Like I
> think there will be an extra bit of data which will store for Varchar -
> Which is for Number of Characters in the field. Is something similar in
Char
> also?
> Also How Does SQL Server Retrive records from both types of field?
> Thanks
> Prabhat
>
|||I believe that I've also read in this forum that all the CHAR columns are
physically arranged and stored at the begining of the data block, regardless
of there position in the DDL.
After the CHAR columns, the VARCHAR columns come next - with the 2 extra
bytes to store the "end byte", or basically the next starting offset of the
next VARCHAR column.
Access to the CHAR columns would then be faster than access to the VARCHAR
columns, as the byte positions are fixed for each record. Any access to
VARCHAR columns requires "byte-offset" checking for each column, one at a
time.
"Jacco Schalkwijk" <NOSPAMjaccos@.eurostop.co.uk> wrote in message
news:ezCg2x4NEHA.3348@.TK2MSFTNGP09.phx.gbl...[vbcol=seagreen]
> Buy "Inside SQL Server" by Kalen Delaney for the complete explanation.
> The shortened version is that a VARCHAR uses 2 extra bytes (to store the
> actual length of the data in the column, or more accurately, the end
> position of the column).
> --
> Jacco Schalkwijk
> SQL Server MVP
> "Prabhat" <not_a_mail@.hotmail.com> wrote in message
> news:OSfHSS3NEHA.644@.tk2msftngp13.phx.gbl...
I
> Char
>
|||On the other hand, access to tables having CHAR fields may be much slower
than to the same tables with VARCHAR ones if you rarely use all (or almost
all) length of the field in actual data - cause the more records are located
on one page the less I/O operations SQL Server will make during group table
reads
"Steve Z" <szlamany@.antarescomputing_no_spam.com> wrote in message
news:uCvQmy5NEHA.620@.TK2MSFTNGP10.phx.gbl...
> I believe that I've also read in this forum that all the CHAR columns are
> physically arranged and stored at the begining of the data block,
regardless
> of there position in the DDL.
> After the CHAR columns, the VARCHAR columns come next - with the 2 extra
> bytes to store the "end byte", or basically the next starting offset of
the
> next VARCHAR column.
> Access to the CHAR columns would then be faster than access to the VARCHAR
> columns, as the byte positions are fixed for each record. Any access to
> VARCHAR columns requires "byte-offset" checking for each column, one at a
> time.
|||On Tue, 11 May 2004 17:11:16 -0400, Steve Z wrote:

>I believe that I've also read in this forum that all the CHAR columns are
>physically arranged and stored at the begining of the data block, regardless
>of there position in the DDL.
>After the CHAR columns, the VARCHAR columns come next - with the 2 extra
>bytes to store the "end byte", or basically the next starting offset of the
>next VARCHAR column.
>Access to the CHAR columns would then be faster than access to the VARCHAR
>columns, as the byte positions are fixed for each record. Any access to
>VARCHAR columns requires "byte-offset" checking for each column, one at a
>time.
Hi Steve,
Almost true. In fact, all fixed-length columns are stored at the
beginning of the data block and all variable-length columns at the
end. Access to a varchar column does not imply looping through all
other varchar columns stored before the column you need (as might be
concluded from your post); the end position of whatever is stored
before the varchar column being sought and the end position of that
column itself can be found directly.
This does introduce a little bit of overhead in CPU use and reading
memory, but this is compensated many times by the reduced physical I/O
(see Alex' message)
Best, Hugo
(Remove _NO_ and _SPAM_ to get my e-mail address)
|||I agree. These kind of considerations are pretty much irrelevant for
performance compared to proper database design, query writing and indexing.
Jacco Schalkwijk
SQL Server MVP
"Hugo Kornelis" <hugo@.pe_NO_rFact.in_SPAM_fo> wrote in message
news:dbp3a0hd0808238thks1lrjabd12umgfqv@.4ax.com... [vbcol=seagreen]
> On Tue, 11 May 2004 17:11:16 -0400, Steve Z wrote:
regardless[vbcol=seagreen]
the[vbcol=seagreen]
VARCHAR
> Hi Steve,
> Almost true. In fact, all fixed-length columns are stored at the
> beginning of the data block and all variable-length columns at the
> end. Access to a varchar column does not imply looping through all
> other varchar columns stored before the column you need (as might be
> concluded from your post); the end position of whatever is stored
> before the varchar column being sought and the end position of that
> column itself can be found directly.
> This does introduce a little bit of overhead in CPU use and reading
> memory, but this is compensated many times by the reduced physical I/O
> (see Alex' message)
> Best, Hugo
> --
> (Remove _NO_ and _SPAM_ to get my e-mail address)
|||Hey guys, he was asking how it was stored - it's always good to know what
goes on behind the scenes.
Would you ever create a table with lets say 100 single character VARCHAR
fields? Would there be value to that, if the columns were guaranteed to
have a character in them (ie. not a space).
How would a primary clustered key behave if it was built on a VARCHAR field
vs. a CHAR field?
And, please explain to me how the 100th VARCHAR fields byte position can be
determined in a buffer of data without reading each of the prior 99 VARCHAR
control fields? Without knowing the length of each of the prior 99 data
elements in that row, how could the starting byte of the 100th element be
determined? Since the length of the VARCHAR is limited to 8000 byte - that
takes 14 of the 16 bits in the 2-byte control field. So, I can assume from
that fact that the 2-byte control field is not storing the "position" of the
element, but simply the length of each element.
Inside MS SQL Server 2000, by Kalen Delany, on page 230 states that VARCHAR
are most appropriate when you expect significant differences in the lengths
of the data in the column and when the data length in the column wont
frequently be changed. Changing the length frequently would required
rebuilding the data page - right?
If changing an element to full size causes it to no longer fit on the data
page, then page splits occur - that's alot of work for the engine to handle.
Page 233 indicates that "no one answer is right" and "If you understand the
tradeoffs; you'll be able to make the best choice".
"Jacco Schalkwijk" <NOSPAMjaccos@.eurostop.co.uk> wrote in message
news:e3a8hrAOEHA.3988@.tk2msftngp13.phx.gbl...
> I agree. These kind of considerations are pretty much irrelevant for
> performance compared to proper database design, query writing and
indexing.[vbcol=seagreen]
> --
> Jacco Schalkwijk
> SQL Server MVP
>
> "Hugo Kornelis" <hugo@.pe_NO_rFact.in_SPAM_fo> wrote in message
> news:dbp3a0hd0808238thks1lrjabd12umgfqv@.4ax.com...
are[vbcol=seagreen]
> regardless
extra[vbcol=seagreen]
> the
> VARCHAR
to[vbcol=seagreen]
a
>
|||X-Newsreader: Forte Agent 1.91/32.564
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-Complaints-To: abuse@.supernews.com
Lines: 95
Path: TK2MSFTNGP08.phx.gbl!newsfeed00.sul.t-online.de!t-online.de!newsfeed.gamma.ru!Gamma.RU!newsfeed.medi a.kyoto-u.ac.jp!newshosting.com!nx02.iad01.newshosting.com !news-feed01.roc.ny.frontiernet.net!nntp.frontiernet.net !tdsnet-transit!newspeer.tds.net!ne
wspeer.radix.net!news-spur1.maxwell.syr.edu!news.maxwell.syr.edu!sn-xit-03!sn-xit-06!sn-post-01!supernews.com!corp.supernews.com!not-for-mail
Xref: TK2MSFTNGP08.phx.gbl microsoft.public.sqlserver.server:343056 microsoft.public.sqlserver.programming:446557
On Wed, 12 May 2004 09:36:31 -0400, Steve Z wrote:

>Hey guys, he was asking how it was stored - it's always good to know what
>goes on behind the scenes.
I agree. That's why I replied to your post, acknowledging that you
were almost completely right, correcting a small error and adding an
important consideration. I have seen too many examples of people using
only fixed-length charachter data in order to avoid the overhead of
having to find the start and end of the data. Much as I'd like to
believe that people will realise by themselves that this overhead is
nothing compared to the increased I/O, I can't.

>Would you ever create a table with lets say 100 single character VARCHAR
>fields? Would there be value to that, if the columns were guaranteed to
>have a character in them (ie. not a space).
Of course not. If the maximum length is short (<= 10 characters), I
always use CHAR. If it's longer, but most values use almost the full
maximum length, I also use CHAR. I use VARCHAR if long and short
strings are interspersed.

>How would a primary clustered key behave if it was built on a VARCHAR field
>vs. a CHAR field?
It depends. If the column has a maximum length of 250 characters, but
most values are a lot shorter, using VARCHAR would reduce the number
of index pages on the intermediate levels. The leaf level would of
course not be affected (apart from the effect of any CHAR vs VARCHAR
column decision on the data pages). For an unindexed key, the leaf
level would need less pages as well. If the table holds many rows, I
think (I haven't tested it!) that the number of I/O's would be less,
hence the queries would return results faster. All this at the price
of the CPU having less idle time.

>And, please explain to me how the 100th VARCHAR fields byte position can be
>determined in a buffer of data without reading each of the prior 99 VARCHAR
>control fields? Without knowing the length of each of the prior 99 data
>elements in that row, how could the starting byte of the 100th element be
>determined?
Easy. Take the end position of the 99th element (stored ar a fixed
position in the row, after the fixed length and before the varying
length data) and add 1.

> Since the length of the VARCHAR is limited to 8000 byte - that
>takes 14 of the 16 bits in the 2-byte control field. So, I can assume from
>that fact that the 2-byte control field is not storing the "position" of the
>element, but simply the length of each element.
You seem to forget that the length of the complete row may not exceed
8,060 bytes (check BOL, Maximum Capacity Specifications). The 2-byte
control field will hold the ending position of each element just fine.

>Inside MS SQL Server 2000, by Kalen Delany, on page 230 states that VARCHAR
>are most appropriate when you expect significant differences in the lengths
>of the data in the column and when the data length in the column wont
>frequently be changed.
Apart from the page number, this has apparently not changed from the
previous edition (Inside MS SQL Server 7.0, by Ron Soukup and Kalen
Delany).

>Changing the length frequently would required
>rebuilding the data page - right?
>If changing an element to full size causes it to no longer fit on the data
>page, then page splits occur - that's alot of work for the engine to handle.
It's been a while since I read this part, but I think that's not
entirely correct. The row's data itself will have to be rebuilt, as
there can be no gaps between the contents of the varying length
columns. If the row is shortened, it will stay where it is. If the
row's size increases, it is replaced by a forwarding pointer and the
new row's content is stored somewhere else; the other rows on the same
page remain intact. If the row has to move again later, the original
forwarding pointer is changed (to prevent long chains of pointers).

>Page 233 indicates that "no one answer is right" and "If you understand the
>tradeoffs; you'll be able to make the best choice".
Yes - a terrific piece of advice that I support wholeheartedly!
Best, Hugo
(Remove _NO_ and _SPAM_ to get my e-mail address)
|||> >Changing the length frequently would required[vbcol=seagreen]
data[vbcol=seagreen]
handle.
> It's been a while since I read this part, but I think that's not
> entirely correct. The row's data itself will have to be rebuilt, as
> there can be no gaps between the contents of the varying length
> columns. If the row is shortened, it will stay where it is. If the
> row's size increases, it is replaced by a forwarding pointer and the
> new row's content is stored somewhere else; the other rows on the same
> page remain intact. If the row has to move again later, the original
> forwarding pointer is changed (to prevent long chains of pointers).
That is only correct for heaps, tables without a clustered index. In a table
with a clustered index this scenario will cause a page split, as the rows
have to stay in the same position in the clustered index.
Jacco Schalkwijk
SQL Server MVP
"Hugo Kornelis" <hugo@.pe_NO_rFact.in_SPAM_fo> wrote in message
news:bl45a0tjo06s6npdrm9rf83mg8o4n80uik@.4ax.com... [vbcol=seagreen]
> On Wed, 12 May 2004 09:36:31 -0400, Steve Z wrote:
>
> I agree. That's why I replied to your post, acknowledging that you
> were almost completely right, correcting a small error and adding an
> important consideration. I have seen too many examples of people using
> only fixed-length charachter data in order to avoid the overhead of
> having to find the start and end of the data. Much as I'd like to
> believe that people will realise by themselves that this overhead is
> nothing compared to the increased I/O, I can't.
>
> Of course not. If the maximum length is short (<= 10 characters), I
> always use CHAR. If it's longer, but most values use almost the full
> maximum length, I also use CHAR. I use VARCHAR if long and short
> strings are interspersed.
>
field[vbcol=seagreen]
> It depends. If the column has a maximum length of 250 characters, but
> most values are a lot shorter, using VARCHAR would reduce the number
> of index pages on the intermediate levels. The leaf level would of
> course not be affected (apart from the effect of any CHAR vs VARCHAR
> column decision on the data pages). For an unindexed key, the leaf
> level would need less pages as well. If the table holds many rows, I
> think (I haven't tested it!) that the number of I/O's would be less,
> hence the queries would return results faster. All this at the price
> of the CPU having less idle time.
>
be[vbcol=seagreen]
VARCHAR[vbcol=seagreen]
> Easy. Take the end position of the 99th element (stored ar a fixed
> position in the row, after the fixed length and before the varying
> length data) and add 1.
>
from[vbcol=seagreen]
the[vbcol=seagreen]
> You seem to forget that the length of the complete row may not exceed
> 8,060 bytes (check BOL, Maximum Capacity Specifications). The 2-byte
> control field will hold the ending position of each element just fine.
>
VARCHAR[vbcol=seagreen]
lengths[vbcol=seagreen]
> Apart from the page number, this has apparently not changed from the
> previous edition (Inside MS SQL Server 7.0, by Ron Soukup and Kalen
> Delany).
>
data[vbcol=seagreen]
handle.[vbcol=seagreen]
> It's been a while since I read this part, but I think that's not
> entirely correct. The row's data itself will have to be rebuilt, as
> there can be no gaps between the contents of the varying length
> columns. If the row is shortened, it will stay where it is. If the
> row's size increases, it is replaced by a forwarding pointer and the
> new row's content is stored somewhere else; the other rows on the same
> page remain intact. If the row has to move again later, the original
> forwarding pointer is changed (to prevent long chains of pointers).
>
the
> Yes - a terrific piece of advice that I support wholeheartedly!
>
> Best, Hugo
> --
> (Remove _NO_ and _SPAM_ to get my e-mail address)
|||Thanks Hugo - Your discussion is very helpful to me.
Prabhat
"Hugo Kornelis" <hugo@.pe_NO_rFact.in_SPAM_fo> wrote in message
news:bl45a0tjo06s6npdrm9rf83mg8o4n80uik@.4ax.com... [vbcol=seagreen]
> On Wed, 12 May 2004 09:36:31 -0400, Steve Z wrote:
>
> I agree. That's why I replied to your post, acknowledging that you
> were almost completely right, correcting a small error and adding an
> important consideration. I have seen too many examples of people using
> only fixed-length charachter data in order to avoid the overhead of
> having to find the start and end of the data. Much as I'd like to
> believe that people will realise by themselves that this overhead is
> nothing compared to the increased I/O, I can't.
>
> Of course not. If the maximum length is short (<= 10 characters), I
> always use CHAR. If it's longer, but most values use almost the full
> maximum length, I also use CHAR. I use VARCHAR if long and short
> strings are interspersed.
>
field[vbcol=seagreen]
> It depends. If the column has a maximum length of 250 characters, but
> most values are a lot shorter, using VARCHAR would reduce the number
> of index pages on the intermediate levels. The leaf level would of
> course not be affected (apart from the effect of any CHAR vs VARCHAR
> column decision on the data pages). For an unindexed key, the leaf
> level would need less pages as well. If the table holds many rows, I
> think (I haven't tested it!) that the number of I/O's would be less,
> hence the queries would return results faster. All this at the price
> of the CPU having less idle time.
>
be[vbcol=seagreen]
VARCHAR[vbcol=seagreen]
> Easy. Take the end position of the 99th element (stored ar a fixed
> position in the row, after the fixed length and before the varying
> length data) and add 1.
>
from[vbcol=seagreen]
the[vbcol=seagreen]
> You seem to forget that the length of the complete row may not exceed
> 8,060 bytes (check BOL, Maximum Capacity Specifications). The 2-byte
> control field will hold the ending position of each element just fine.
>
VARCHAR[vbcol=seagreen]
lengths[vbcol=seagreen]
> Apart from the page number, this has apparently not changed from the
> previous edition (Inside MS SQL Server 7.0, by Ron Soukup and Kalen
> Delany).
>
data[vbcol=seagreen]
handle.[vbcol=seagreen]
> It's been a while since I read this part, but I think that's not
> entirely correct. The row's data itself will have to be rebuilt, as
> there can be no gaps between the contents of the varying length
> columns. If the row is shortened, it will stay where it is. If the
> row's size increases, it is replaced by a forwarding pointer and the
> new row's content is stored somewhere else; the other rows on the same
> page remain intact. If the row has to move again later, the original
> forwarding pointer is changed (to prevent long chains of pointers).
>
the
> Yes - a terrific piece of advice that I support wholeheartedly!
>
> Best, Hugo
> --
> (Remove _NO_ and _SPAM_ to get my e-mail address)

Sunday, February 12, 2012

Char and Varchar

No comments:

Post a Comment

Char and Varchar

Blog Archive

About Me