Hi All,
How internally SQL Server Stores Data in a Char and Varchar field. Like I
think there will be an extra bit of data which will store for Varchar -
Which is for Number of Characters in the field. Is something similar in Char
also?
Also How Does SQL Server Retrive records from both types of field?
Thanks
PrabhatBuy "Inside SQL Server" by Kalen Delaney for the complete explanation.
The shortened version is that a VARCHAR uses 2 extra bytes (to store the
actual length of the data in the column, or more accurately, the end
position of the column).
Jacco Schalkwijk
SQL Server MVP
"Prabhat" <not_a_mail@.hotmail.com> wrote in message
news:OSfHSS3NEHA.644@.tk2msftngp13.phx.gbl...
> Hi All,
> How internally SQL Server Stores Data in a Char and Varchar field. Like I
> think there will be an extra bit of data which will store for Varchar -
> Which is for Number of Characters in the field. Is something similar in
Char
> also?
> Also How Does SQL Server Retrive records from both types of field?
> Thanks
> Prabhat
>|||I believe that I've also read in this forum that all the CHAR columns are
physically arranged and stored at the begining of the data block, regardless
of there position in the DDL.
After the CHAR columns, the VARCHAR columns come next - with the 2 extra
bytes to store the "end byte", or basically the next starting offset of the
next VARCHAR column.
Access to the CHAR columns would then be faster than access to the VARCHAR
columns, as the byte positions are fixed for each record. Any access to
VARCHAR columns requires "byte-offset" checking for each column, one at a
time.
"Jacco Schalkwijk" <NOSPAMjaccos@.eurostop.co.uk> wrote in message
news:ezCg2x4NEHA.3348@.TK2MSFTNGP09.phx.gbl...
> Buy "Inside SQL Server" by Kalen Delaney for the complete explanation.
> The shortened version is that a VARCHAR uses 2 extra bytes (to store the
> actual length of the data in the column, or more accurately, the end
> position of the column).
> --
> Jacco Schalkwijk
> SQL Server MVP
> "Prabhat" <not_a_mail@.hotmail.com> wrote in message
> news:OSfHSS3NEHA.644@.tk2msftngp13.phx.gbl...
I[vbcol=seagreen]
> Char
>|||On the other hand, access to tables having CHAR fields may be much slower
than to the same tables with VARCHAR ones if you rarely use all (or almost
all) length of the field in actual data - cause the more records are located
on one page the less I/O operations SQL Server will make during group table
reads
"Steve Z" <szlamany@.antarescomputing_no_spam.com> wrote in message
news:uCvQmy5NEHA.620@.TK2MSFTNGP10.phx.gbl...
> I believe that I've also read in this forum that all the CHAR columns are
> physically arranged and stored at the begining of the data block,
regardless
> of there position in the DDL.
> After the CHAR columns, the VARCHAR columns come next - with the 2 extra
> bytes to store the "end byte", or basically the next starting offset of
the
> next VARCHAR column.
> Access to the CHAR columns would then be faster than access to the VARCHAR
> columns, as the byte positions are fixed for each record. Any access to
> VARCHAR columns requires "byte-offset" checking for each column, one at a
> time.|||On Tue, 11 May 2004 17:11:16 -0400, Steve Z wrote:
>I believe that I've also read in this forum that all the CHAR columns are
>physically arranged and stored at the begining of the data block, regardles
s
>of there position in the DDL.
>After the CHAR columns, the VARCHAR columns come next - with the 2 extra
>bytes to store the "end byte", or basically the next starting offset of the
>next VARCHAR column.
>Access to the CHAR columns would then be faster than access to the VARCHAR
>columns, as the byte positions are fixed for each record. Any access to
>VARCHAR columns requires "byte-offset" checking for each column, one at a
>time.
Hi Steve,
Almost true. In fact, all fixed-length columns are stored at the
beginning of the data block and all variable-length columns at the
end. Access to a varchar column does not imply looping through all
other varchar columns stored before the column you need (as might be
concluded from your post); the end position of whatever is stored
before the varchar column being sought and the end position of that
column itself can be found directly.
This does introduce a little bit of overhead in CPU use and reading
memory, but this is compensated many times by the reduced physical I/O
(see Alex' message)
Best, Hugo
--
(Remove _NO_ and _SPAM_ to get my e-mail address)|||I agree. These kind of considerations are pretty much irrelevant for
performance compared to proper database design, query writing and indexing.
Jacco Schalkwijk
SQL Server MVP
"Hugo Kornelis" <hugo@.pe_NO_rFact.in_SPAM_fo> wrote in message
news:dbp3a0hd0808238thks1lrjabd12umgfqv@.
4ax.com...
> On Tue, 11 May 2004 17:11:16 -0400, Steve Z wrote:
>
regardless[vbcol=seagreen]
the[vbcol=seagreen]
VARCHAR[vbcol=seagreen]
> Hi Steve,
> Almost true. In fact, all fixed-length columns are stored at the
> beginning of the data block and all variable-length columns at the
> end. Access to a varchar column does not imply looping through all
> other varchar columns stored before the column you need (as might be
> concluded from your post); the end position of whatever is stored
> before the varchar column being sought and the end position of that
> column itself can be found directly.
> This does introduce a little bit of overhead in CPU use and reading
> memory, but this is compensated many times by the reduced physical I/O
> (see Alex' message)
> Best, Hugo
> --
> (Remove _NO_ and _SPAM_ to get my e-mail address)|||Hey guys, he was asking how it was stored - it's always good to know what
goes on behind the scenes.
Would you ever create a table with lets say 100 single character VARCHAR
fields? Would there be value to that, if the columns were guaranteed to
have a character in them (ie. not a space).
How would a primary clustered key behave if it was built on a VARCHAR field
vs. a CHAR field?
And, please explain to me how the 100th VARCHAR fields byte position can be
determined in a buffer of data without reading each of the prior 99 VARCHAR
control fields? Without knowing the length of each of the prior 99 data
elements in that row, how could the starting byte of the 100th element be
determined? Since the length of the VARCHAR is limited to 8000 byte - that
takes 14 of the 16 bits in the 2-byte control field. So, I can assume from
that fact that the 2-byte control field is not storing the "position" of the
element, but simply the length of each element.
Inside MS SQL Server 2000, by Kalen Delany, on page 230 states that VARCHAR
are most appropriate when you expect significant differences in the lengths
of the data in the column and when the data length in the column wont
frequently be changed. Changing the length frequently would required
rebuilding the data page - right?
If changing an element to full size causes it to no longer fit on the data
page, then page splits occur - that's alot of work for the engine to handle.
Page 233 indicates that "no one answer is right" and "If you understand the
tradeoffs; you'll be able to make the best choice".
"Jacco Schalkwijk" <NOSPAMjaccos@.eurostop.co.uk> wrote in message
news:e3a8hrAOEHA.3988@.tk2msftngp13.phx.gbl...
> I agree. These kind of considerations are pretty much irrelevant for
> performance compared to proper database design, query writing and
indexing.
> --
> Jacco Schalkwijk
> SQL Server MVP
>
> "Hugo Kornelis" <hugo@.pe_NO_rFact.in_SPAM_fo> wrote in message
> news:dbp3a0hd0808238thks1lrjabd12umgfqv@.
4ax.com...
are[vbcol=seagreen]
> regardless
extra[vbcol=seagreen]
> the
> VARCHAR
to[vbcol=seagreen]
a[vbcol=seagreen]
>|||X-Newsreader: Forte Agent 1.91/32.564
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-Complaints-To: abuse@.supernews.com
Lines: 95
Path: TK2MSFTNGP08.phx.gbl!newsfeed00.sul.t-online.de!t-online.de!newsfeed.g
amma.ru!Gamma.RU!newsfeed.media.kyoto-u.ac.jp!newshosting.com!nx02.iad01.new
shosting.com!news-feed01.roc.ny.frontiernet.net!nntp.frontiernet.net!tdsnet-
transit!newspeer.tds.net!ne
wspeer.radix.net!news-spur1.maxwell.syr.edu!news.maxwell.syr.edu!sn-xit-03!s
n-xit-06!sn-post-01!supernews.com!corp.supernews.com!not-for-mail
Xref: TK2MSFTNGP08.phx.gbl microsoft.public.sqlserver.server:343056 microsof
t.public.sqlserver.programming:446557
On Wed, 12 May 2004 09:36:31 -0400, Steve Z wrote:
>Hey guys, he was asking how it was stored - it's always good to know what
>goes on behind the scenes.
I agree. That's why I replied to your post, acknowledging that you
were almost completely right, correcting a small error and adding an
important consideration. I have seen too many examples of people using
only fixed-length charachter data in order to avoid the overhead of
having to find the start and end of the data. Much as I'd like to
believe that people will realise by themselves that this overhead is
nothing compared to the increased I/O, I can't.
>Would you ever create a table with lets say 100 single character VARCHAR
>fields? Would there be value to that, if the columns were guaranteed to
>have a character in them (ie. not a space).
Of course not. If the maximum length is short (<= 10 characters), I
always use CHAR. If it's longer, but most values use almost the full
maximum length, I also use CHAR. I use VARCHAR if long and short
strings are interspersed.
>How would a primary clustered key behave if it was built on a VARCHAR field
>vs. a CHAR field?
It depends. If the column has a maximum length of 250 characters, but
most values are a lot shorter, using VARCHAR would reduce the number
of index pages on the intermediate levels. The leaf level would of
course not be affected (apart from the effect of any CHAR vs VARCHAR
column decision on the data pages). For an unindexed key, the leaf
level would need less pages as well. If the table holds many rows, I
think (I haven't tested it!) that the number of I/O's would be less,
hence the queries would return results faster. All this at the price
of the CPU having less idle time.
>And, please explain to me how the 100th VARCHAR fields byte position can be
>determined in a buffer of data without reading each of the prior 99 VARCHAR
>control fields? Without knowing the length of each of the prior 99 data
>elements in that row, how could the starting byte of the 100th element be
>determined?
Easy. Take the end position of the 99th element (stored ar a fixed
position in the row, after the fixed length and before the varying
length data) and add 1.
> Since the length of the VARCHAR is limited to 8000 byte - that
>takes 14 of the 16 bits in the 2-byte control field. So, I can assume from
>that fact that the 2-byte control field is not storing the "position" of th
e
>element, but simply the length of each element.
You seem to forget that the length of the complete row may not exceed
8,060 bytes (check BOL, Maximum Capacity Specifications). The 2-byte
control field will hold the ending position of each element just fine.
>Inside MS SQL Server 2000, by Kalen Delany, on page 230 states that VARCHAR
>are most appropriate when you expect significant differences in the lengths
>of the data in the column and when the data length in the column wont
>frequently be changed.
Apart from the page number, this has apparently not changed from the
previous edition (Inside MS SQL Server 7.0, by Ron Soukup and Kalen
Delany).
>Changing the length frequently would required
>rebuilding the data page - right?
>If changing an element to full size causes it to no longer fit on the data
>page, then page splits occur - that's alot of work for the engine to handle.[/vbcol
]
It's been a while since I read this part, but I think that's not
entirely correct. The row's data itself will have to be rebuilt, as
there can be no gaps between the contents of the varying length
columns. If the row is shortened, it will stay where it is. If the
row's size increases, it is replaced by a forwarding pointer and the
new row's content is stored somewhere else; the other rows on the same
page remain intact. If the row has to move again later, the original
forwarding pointer is changed (to prevent long chains of pointers).
[vbcol=seagreen]
>Page 233 indicates that "no one answer is right" and "If you understand the
>tradeoffs; you'll be able to make the best choice".
Yes - a terrific piece of advice that I support wholeheartedly!
Best, Hugo
--
(Remove _NO_ and _SPAM_ to get my e-mail address)|||> >Changing the length frequently would required
data[vbcol=seagreen]
handle.[vbcol=seagreen]
> It's been a while since I read this part, but I think that's not
> entirely correct. The row's data itself will have to be rebuilt, as
> there can be no gaps between the contents of the varying length
> columns. If the row is shortened, it will stay where it is. If the
> row's size increases, it is replaced by a forwarding pointer and the
> new row's content is stored somewhere else; the other rows on the same
> page remain intact. If the row has to move again later, the original
> forwarding pointer is changed (to prevent long chains of pointers).
That is only correct for heaps, tables without a clustered index. In a table
with a clustered index this scenario will cause a page split, as the rows
have to stay in the same position in the clustered index.
Jacco Schalkwijk
SQL Server MVP
"Hugo Kornelis" <hugo@.pe_NO_rFact.in_SPAM_fo> wrote in message
news:bl45a0tjo06s6npdrm9rf83mg8o4n80uik@.
4ax.com...
> On Wed, 12 May 2004 09:36:31 -0400, Steve Z wrote:
>
> I agree. That's why I replied to your post, acknowledging that you
> were almost completely right, correcting a small error and adding an
> important consideration. I have seen too many examples of people using
> only fixed-length charachter data in order to avoid the overhead of
> having to find the start and end of the data. Much as I'd like to
> believe that people will realise by themselves that this overhead is
> nothing compared to the increased I/O, I can't.
>
> Of course not. If the maximum length is short (<= 10 characters), I
> always use CHAR. If it's longer, but most values use almost the full
> maximum length, I also use CHAR. I use VARCHAR if long and short
> strings are interspersed.
>
field[vbcol=seagreen]
> It depends. If the column has a maximum length of 250 characters, but
> most values are a lot shorter, using VARCHAR would reduce the number
> of index pages on the intermediate levels. The leaf level would of
> course not be affected (apart from the effect of any CHAR vs VARCHAR
> column decision on the data pages). For an unindexed key, the leaf
> level would need less pages as well. If the table holds many rows, I
> think (I haven't tested it!) that the number of I/O's would be less,
> hence the queries would return results faster. All this at the price
> of the CPU having less idle time.
>
be[vbcol=seagreen]
VARCHAR[vbcol=seagreen]
> Easy. Take the end position of the 99th element (stored ar a fixed
> position in the row, after the fixed length and before the varying
> length data) and add 1.
>
from[vbcol=seagreen]
the[vbcol=seagreen]
> You seem to forget that the length of the complete row may not exceed
> 8,060 bytes (check BOL, Maximum Capacity Specifications). The 2-byte
> control field will hold the ending position of each element just fine.
>
VARCHAR[vbcol=seagreen]
lengths[vbcol=seagreen]
> Apart from the page number, this has apparently not changed from the
> previous edition (Inside MS SQL Server 7.0, by Ron Soukup and Kalen
> Delany).
>
data[vbcol=seagreen]
handle.[vbcol=seagreen]
> It's been a while since I read this part, but I think that's not
> entirely correct. The row's data itself will have to be rebuilt, as
> there can be no gaps between the contents of the varying length
> columns. If the row is shortened, it will stay where it is. If the
> row's size increases, it is replaced by a forwarding pointer and the
> new row's content is stored somewhere else; the other rows on the same
> page remain intact. If the row has to move again later, the original
> forwarding pointer is changed (to prevent long chains of pointers).
>
the[vbcol=seagreen]
> Yes - a terrific piece of advice that I support wholeheartedly!
>
> Best, Hugo
> --
> (Remove _NO_ and _SPAM_ to get my e-mail address)|||Thanks Hugo - Your discussion is very helpful to me.
Prabhat
"Hugo Kornelis" <hugo@.pe_NO_rFact.in_SPAM_fo> wrote in message
news:bl45a0tjo06s6npdrm9rf83mg8o4n80uik@.
4ax.com...
> On Wed, 12 May 2004 09:36:31 -0400, Steve Z wrote:
>
> I agree. That's why I replied to your post, acknowledging that you
> were almost completely right, correcting a small error and adding an
> important consideration. I have seen too many examples of people using
> only fixed-length charachter data in order to avoid the overhead of
> having to find the start and end of the data. Much as I'd like to
> believe that people will realise by themselves that this overhead is
> nothing compared to the increased I/O, I can't.
>
> Of course not. If the maximum length is short (<= 10 characters), I
> always use CHAR. If it's longer, but most values use almost the full
> maximum length, I also use CHAR. I use VARCHAR if long and short
> strings are interspersed.
>
field[vbcol=seagreen]
> It depends. If the column has a maximum length of 250 characters, but
> most values are a lot shorter, using VARCHAR would reduce the number
> of index pages on the intermediate levels. The leaf level would of
> course not be affected (apart from the effect of any CHAR vs VARCHAR
> column decision on the data pages). For an unindexed key, the leaf
> level would need less pages as well. If the table holds many rows, I
> think (I haven't tested it!) that the number of I/O's would be less,
> hence the queries would return results faster. All this at the price
> of the CPU having less idle time.
>
be[vbcol=seagreen]
VARCHAR[vbcol=seagreen]
> Easy. Take the end position of the 99th element (stored ar a fixed
> position in the row, after the fixed length and before the varying
> length data) and add 1.
>
from[vbcol=seagreen]
the[vbcol=seagreen]
> You seem to forget that the length of the complete row may not exceed
> 8,060 bytes (check BOL, Maximum Capacity Specifications). The 2-byte
> control field will hold the ending position of each element just fine.
>
VARCHAR[vbcol=seagreen]
lengths[vbcol=seagreen]
> Apart from the page number, this has apparently not changed from the
> previous edition (Inside MS SQL Server 7.0, by Ron Soukup and Kalen
> Delany).
>
data[vbcol=seagreen]
handle.[vbcol=seagreen]
> It's been a while since I read this part, but I think that's not
> entirely correct. The row's data itself will have to be rebuilt, as
> there can be no gaps between the contents of the varying length
> columns. If the row is shortened, it will stay where it is. If the
> row's size increases, it is replaced by a forwarding pointer and the
> new row's content is stored somewhere else; the other rows on the same
> page remain intact. If the row has to move again later, the original
> forwarding pointer is changed (to prevent long chains of pointers).
>
the[vbcol=seagreen]
> Yes - a terrific piece of advice that I support wholeheartedly!
>
> Best, Hugo
> --
> (Remove _NO_ and _SPAM_ to get my e-mail address)
Showing posts with label extra. Show all posts
Showing posts with label extra. Show all posts
Sunday, February 12, 2012
Char and Varchar
Hi All,
How internally SQL Server Stores Data in a Char and Varchar field. Like I
think there will be an extra bit of data which will store for Varchar -
Which is for Number of Characters in the field. Is something similar in Char
also?
Also How Does SQL Server Retrive records from both types of field?
Thanks
PrabhatBuy "Inside SQL Server" by Kalen Delaney for the complete explanation.
The shortened version is that a VARCHAR uses 2 extra bytes (to store the
actual length of the data in the column, or more accurately, the end
position of the column).
--
Jacco Schalkwijk
SQL Server MVP
"Prabhat" <not_a_mail@.hotmail.com> wrote in message
news:OSfHSS3NEHA.644@.tk2msftngp13.phx.gbl...
> Hi All,
> How internally SQL Server Stores Data in a Char and Varchar field. Like I
> think there will be an extra bit of data which will store for Varchar -
> Which is for Number of Characters in the field. Is something similar in
Char
> also?
> Also How Does SQL Server Retrive records from both types of field?
> Thanks
> Prabhat
>|||I believe that I've also read in this forum that all the CHAR columns are
physically arranged and stored at the begining of the data block, regardless
of there position in the DDL.
After the CHAR columns, the VARCHAR columns come next - with the 2 extra
bytes to store the "end byte", or basically the next starting offset of the
next VARCHAR column.
Access to the CHAR columns would then be faster than access to the VARCHAR
columns, as the byte positions are fixed for each record. Any access to
VARCHAR columns requires "byte-offset" checking for each column, one at a
time.
"Jacco Schalkwijk" <NOSPAMjaccos@.eurostop.co.uk> wrote in message
news:ezCg2x4NEHA.3348@.TK2MSFTNGP09.phx.gbl...
> Buy "Inside SQL Server" by Kalen Delaney for the complete explanation.
> The shortened version is that a VARCHAR uses 2 extra bytes (to store the
> actual length of the data in the column, or more accurately, the end
> position of the column).
> --
> Jacco Schalkwijk
> SQL Server MVP
> "Prabhat" <not_a_mail@.hotmail.com> wrote in message
> news:OSfHSS3NEHA.644@.tk2msftngp13.phx.gbl...
> > Hi All,
> >
> > How internally SQL Server Stores Data in a Char and Varchar field. Like
I
> > think there will be an extra bit of data which will store for Varchar -
> > Which is for Number of Characters in the field. Is something similar in
> Char
> > also?
> >
> > Also How Does SQL Server Retrive records from both types of field?
> >
> > Thanks
> > Prabhat
> >
> >
>|||On the other hand, access to tables having CHAR fields may be much slower
than to the same tables with VARCHAR ones if you rarely use all (or almost
all) length of the field in actual data - cause the more records are located
on one page the less I/O operations SQL Server will make during group table
reads
"Steve Z" <szlamany@.antarescomputing_no_spam.com> wrote in message
news:uCvQmy5NEHA.620@.TK2MSFTNGP10.phx.gbl...
> I believe that I've also read in this forum that all the CHAR columns are
> physically arranged and stored at the begining of the data block,
regardless
> of there position in the DDL.
> After the CHAR columns, the VARCHAR columns come next - with the 2 extra
> bytes to store the "end byte", or basically the next starting offset of
the
> next VARCHAR column.
> Access to the CHAR columns would then be faster than access to the VARCHAR
> columns, as the byte positions are fixed for each record. Any access to
> VARCHAR columns requires "byte-offset" checking for each column, one at a
> time.|||On Tue, 11 May 2004 17:11:16 -0400, Steve Z wrote:
>I believe that I've also read in this forum that all the CHAR columns are
>physically arranged and stored at the begining of the data block, regardless
>of there position in the DDL.
>After the CHAR columns, the VARCHAR columns come next - with the 2 extra
>bytes to store the "end byte", or basically the next starting offset of the
>next VARCHAR column.
>Access to the CHAR columns would then be faster than access to the VARCHAR
>columns, as the byte positions are fixed for each record. Any access to
>VARCHAR columns requires "byte-offset" checking for each column, one at a
>time.
Hi Steve,
Almost true. In fact, all fixed-length columns are stored at the
beginning of the data block and all variable-length columns at the
end. Access to a varchar column does not imply looping through all
other varchar columns stored before the column you need (as might be
concluded from your post); the end position of whatever is stored
before the varchar column being sought and the end position of that
column itself can be found directly.
This does introduce a little bit of overhead in CPU use and reading
memory, but this is compensated many times by the reduced physical I/O
(see Alex' message)
Best, Hugo
--
(Remove _NO_ and _SPAM_ to get my e-mail address)|||I agree. These kind of considerations are pretty much irrelevant for
performance compared to proper database design, query writing and indexing.
--
Jacco Schalkwijk
SQL Server MVP
"Hugo Kornelis" <hugo@.pe_NO_rFact.in_SPAM_fo> wrote in message
news:dbp3a0hd0808238thks1lrjabd12umgfqv@.4ax.com...
> On Tue, 11 May 2004 17:11:16 -0400, Steve Z wrote:
> >I believe that I've also read in this forum that all the CHAR columns are
> >physically arranged and stored at the begining of the data block,
regardless
> >of there position in the DDL.
> >
> >After the CHAR columns, the VARCHAR columns come next - with the 2 extra
> >bytes to store the "end byte", or basically the next starting offset of
the
> >next VARCHAR column.
> >
> >Access to the CHAR columns would then be faster than access to the
VARCHAR
> >columns, as the byte positions are fixed for each record. Any access to
> >VARCHAR columns requires "byte-offset" checking for each column, one at a
> >time.
> Hi Steve,
> Almost true. In fact, all fixed-length columns are stored at the
> beginning of the data block and all variable-length columns at the
> end. Access to a varchar column does not imply looping through all
> other varchar columns stored before the column you need (as might be
> concluded from your post); the end position of whatever is stored
> before the varchar column being sought and the end position of that
> column itself can be found directly.
> This does introduce a little bit of overhead in CPU use and reading
> memory, but this is compensated many times by the reduced physical I/O
> (see Alex' message)
> Best, Hugo
> --
> (Remove _NO_ and _SPAM_ to get my e-mail address)|||Hey guys, he was asking how it was stored - it's always good to know what
goes on behind the scenes.
Would you ever create a table with lets say 100 single character VARCHAR
fields? Would there be value to that, if the columns were guaranteed to
have a character in them (ie. not a space).
How would a primary clustered key behave if it was built on a VARCHAR field
vs. a CHAR field?
And, please explain to me how the 100th VARCHAR fields byte position can be
determined in a buffer of data without reading each of the prior 99 VARCHAR
control fields? Without knowing the length of each of the prior 99 data
elements in that row, how could the starting byte of the 100th element be
determined? Since the length of the VARCHAR is limited to 8000 byte - that
takes 14 of the 16 bits in the 2-byte control field. So, I can assume from
that fact that the 2-byte control field is not storing the "position" of the
element, but simply the length of each element.
Inside MS SQL Server 2000, by Kalen Delany, on page 230 states that VARCHAR
are most appropriate when you expect significant differences in the lengths
of the data in the column and when the data length in the column wont
frequently be changed. Changing the length frequently would required
rebuilding the data page - right?
If changing an element to full size causes it to no longer fit on the data
page, then page splits occur - that's alot of work for the engine to handle.
Page 233 indicates that "no one answer is right" and "If you understand the
tradeoffs; you'll be able to make the best choice".
"Jacco Schalkwijk" <NOSPAMjaccos@.eurostop.co.uk> wrote in message
news:e3a8hrAOEHA.3988@.tk2msftngp13.phx.gbl...
> I agree. These kind of considerations are pretty much irrelevant for
> performance compared to proper database design, query writing and
indexing.
> --
> Jacco Schalkwijk
> SQL Server MVP
>
> "Hugo Kornelis" <hugo@.pe_NO_rFact.in_SPAM_fo> wrote in message
> news:dbp3a0hd0808238thks1lrjabd12umgfqv@.4ax.com...
> > On Tue, 11 May 2004 17:11:16 -0400, Steve Z wrote:
> >
> > >I believe that I've also read in this forum that all the CHAR columns
are
> > >physically arranged and stored at the begining of the data block,
> regardless
> > >of there position in the DDL.
> > >
> > >After the CHAR columns, the VARCHAR columns come next - with the 2
extra
> > >bytes to store the "end byte", or basically the next starting offset of
> the
> > >next VARCHAR column.
> > >
> > >Access to the CHAR columns would then be faster than access to the
> VARCHAR
> > >columns, as the byte positions are fixed for each record. Any access
to
> > >VARCHAR columns requires "byte-offset" checking for each column, one at
a
> > >time.
> >
> > Hi Steve,
> >
> > Almost true. In fact, all fixed-length columns are stored at the
> > beginning of the data block and all variable-length columns at the
> > end. Access to a varchar column does not imply looping through all
> > other varchar columns stored before the column you need (as might be
> > concluded from your post); the end position of whatever is stored
> > before the varchar column being sought and the end position of that
> > column itself can be found directly.
> >
> > This does introduce a little bit of overhead in CPU use and reading
> > memory, but this is compensated many times by the reduced physical I/O
> > (see Alex' message)
> >
> > Best, Hugo
> > --
> >
> > (Remove _NO_ and _SPAM_ to get my e-mail address)
>|||Thanks Steve and All for the suggestions. I am getting lots of inputs from
the Discussion.
Thanks
Prabhat
"Steve Z" <szlamany@.antarescomputing_no_spam.com> wrote in message
news:eAvmKZCOEHA.2480@.tk2msftngp13.phx.gbl...
> Hey guys, he was asking how it was stored - it's always good to know what
> goes on behind the scenes.
> Would you ever create a table with lets say 100 single character VARCHAR
> fields? Would there be value to that, if the columns were guaranteed to
> have a character in them (ie. not a space).
> How would a primary clustered key behave if it was built on a VARCHAR
field
> vs. a CHAR field?
> And, please explain to me how the 100th VARCHAR fields byte position can
be
> determined in a buffer of data without reading each of the prior 99
VARCHAR
> control fields? Without knowing the length of each of the prior 99 data
> elements in that row, how could the starting byte of the 100th element be
> determined? Since the length of the VARCHAR is limited to 8000 byte -
that
> takes 14 of the 16 bits in the 2-byte control field. So, I can assume
from
> that fact that the 2-byte control field is not storing the "position" of
the
> element, but simply the length of each element.
> Inside MS SQL Server 2000, by Kalen Delany, on page 230 states that
VARCHAR
> are most appropriate when you expect significant differences in the
lengths
> of the data in the column and when the data length in the column wont
> frequently be changed. Changing the length frequently would required
> rebuilding the data page - right?
> If changing an element to full size causes it to no longer fit on the data
> page, then page splits occur - that's alot of work for the engine to
handle.
> Page 233 indicates that "no one answer is right" and "If you understand
the
> tradeoffs; you'll be able to make the best choice".
>
> "Jacco Schalkwijk" <NOSPAMjaccos@.eurostop.co.uk> wrote in message
> news:e3a8hrAOEHA.3988@.tk2msftngp13.phx.gbl...
> > I agree. These kind of considerations are pretty much irrelevant for
> > performance compared to proper database design, query writing and
> indexing.
> >
> > --
> > Jacco Schalkwijk
> > SQL Server MVP
> >
> >
> > "Hugo Kornelis" <hugo@.pe_NO_rFact.in_SPAM_fo> wrote in message
> > news:dbp3a0hd0808238thks1lrjabd12umgfqv@.4ax.com...
> > > On Tue, 11 May 2004 17:11:16 -0400, Steve Z wrote:
> > >
> > > >I believe that I've also read in this forum that all the CHAR columns
> are
> > > >physically arranged and stored at the begining of the data block,
> > regardless
> > > >of there position in the DDL.
> > > >
> > > >After the CHAR columns, the VARCHAR columns come next - with the 2
> extra
> > > >bytes to store the "end byte", or basically the next starting offset
of
> > the
> > > >next VARCHAR column.
> > > >
> > > >Access to the CHAR columns would then be faster than access to the
> > VARCHAR
> > > >columns, as the byte positions are fixed for each record. Any access
> to
> > > >VARCHAR columns requires "byte-offset" checking for each column, one
at
> a
> > > >time.
> > >
> > > Hi Steve,
> > >
> > > Almost true. In fact, all fixed-length columns are stored at the
> > > beginning of the data block and all variable-length columns at the
> > > end. Access to a varchar column does not imply looping through all
> > > other varchar columns stored before the column you need (as might be
> > > concluded from your post); the end position of whatever is stored
> > > before the varchar column being sought and the end position of that
> > > column itself can be found directly.
> > >
> > > This does introduce a little bit of overhead in CPU use and reading
> > > memory, but this is compensated many times by the reduced physical I/O
> > > (see Alex' message)
> > >
> > > Best, Hugo
> > > --
> > >
> > > (Remove _NO_ and _SPAM_ to get my e-mail address)
> >
> >
>|||On Wed, 12 May 2004 09:36:31 -0400, Steve Z wrote:
>Hey guys, he was asking how it was stored - it's always good to know what
>goes on behind the scenes.
I agree. That's why I replied to your post, acknowledging that you
were almost completely right, correcting a small error and adding an
important consideration. I have seen too many examples of people using
only fixed-length charachter data in order to avoid the overhead of
having to find the start and end of the data. Much as I'd like to
believe that people will realise by themselves that this overhead is
nothing compared to the increased I/O, I can't.
>Would you ever create a table with lets say 100 single character VARCHAR
>fields? Would there be value to that, if the columns were guaranteed to
>have a character in them (ie. not a space).
Of course not. If the maximum length is short (<= 10 characters), I
always use CHAR. If it's longer, but most values use almost the full
maximum length, I also use CHAR. I use VARCHAR if long and short
strings are interspersed.
>How would a primary clustered key behave if it was built on a VARCHAR field
>vs. a CHAR field?
It depends. If the column has a maximum length of 250 characters, but
most values are a lot shorter, using VARCHAR would reduce the number
of index pages on the intermediate levels. The leaf level would of
course not be affected (apart from the effect of any CHAR vs VARCHAR
column decision on the data pages). For an unindexed key, the leaf
level would need less pages as well. If the table holds many rows, I
think (I haven't tested it!) that the number of I/O's would be less,
hence the queries would return results faster. All this at the price
of the CPU having less idle time.
>And, please explain to me how the 100th VARCHAR fields byte position can be
>determined in a buffer of data without reading each of the prior 99 VARCHAR
>control fields? Without knowing the length of each of the prior 99 data
>elements in that row, how could the starting byte of the 100th element be
>determined?
Easy. Take the end position of the 99th element (stored ar a fixed
position in the row, after the fixed length and before the varying
length data) and add 1.
> Since the length of the VARCHAR is limited to 8000 byte - that
>takes 14 of the 16 bits in the 2-byte control field. So, I can assume from
>that fact that the 2-byte control field is not storing the "position" of the
>element, but simply the length of each element.
You seem to forget that the length of the complete row may not exceed
8,060 bytes (check BOL, Maximum Capacity Specifications). The 2-byte
control field will hold the ending position of each element just fine.
>Inside MS SQL Server 2000, by Kalen Delany, on page 230 states that VARCHAR
>are most appropriate when you expect significant differences in the lengths
>of the data in the column and when the data length in the column wont
>frequently be changed.
Apart from the page number, this has apparently not changed from the
previous edition (Inside MS SQL Server 7.0, by Ron Soukup and Kalen
Delany).
>Changing the length frequently would required
>rebuilding the data page - right?
>If changing an element to full size causes it to no longer fit on the data
>page, then page splits occur - that's alot of work for the engine to handle.
It's been a while since I read this part, but I think that's not
entirely correct. The row's data itself will have to be rebuilt, as
there can be no gaps between the contents of the varying length
columns. If the row is shortened, it will stay where it is. If the
row's size increases, it is replaced by a forwarding pointer and the
new row's content is stored somewhere else; the other rows on the same
page remain intact. If the row has to move again later, the original
forwarding pointer is changed (to prevent long chains of pointers).
>Page 233 indicates that "no one answer is right" and "If you understand the
>tradeoffs; you'll be able to make the best choice".
Yes - a terrific piece of advice that I support wholeheartedly!
Best, Hugo
--
(Remove _NO_ and _SPAM_ to get my e-mail address)|||> >Changing the length frequently would required
> >rebuilding the data page - right?
> >
> >If changing an element to full size causes it to no longer fit on the
data
> >page, then page splits occur - that's alot of work for the engine to
handle.
> It's been a while since I read this part, but I think that's not
> entirely correct. The row's data itself will have to be rebuilt, as
> there can be no gaps between the contents of the varying length
> columns. If the row is shortened, it will stay where it is. If the
> row's size increases, it is replaced by a forwarding pointer and the
> new row's content is stored somewhere else; the other rows on the same
> page remain intact. If the row has to move again later, the original
> forwarding pointer is changed (to prevent long chains of pointers).
That is only correct for heaps, tables without a clustered index. In a table
with a clustered index this scenario will cause a page split, as the rows
have to stay in the same position in the clustered index.
--
Jacco Schalkwijk
SQL Server MVP
"Hugo Kornelis" <hugo@.pe_NO_rFact.in_SPAM_fo> wrote in message
news:bl45a0tjo06s6npdrm9rf83mg8o4n80uik@.4ax.com...
> On Wed, 12 May 2004 09:36:31 -0400, Steve Z wrote:
> >Hey guys, he was asking how it was stored - it's always good to know what
> >goes on behind the scenes.
> I agree. That's why I replied to your post, acknowledging that you
> were almost completely right, correcting a small error and adding an
> important consideration. I have seen too many examples of people using
> only fixed-length charachter data in order to avoid the overhead of
> having to find the start and end of the data. Much as I'd like to
> believe that people will realise by themselves that this overhead is
> nothing compared to the increased I/O, I can't.
>
> >Would you ever create a table with lets say 100 single character VARCHAR
> >fields? Would there be value to that, if the columns were guaranteed to
> >have a character in them (ie. not a space).
> Of course not. If the maximum length is short (<= 10 characters), I
> always use CHAR. If it's longer, but most values use almost the full
> maximum length, I also use CHAR. I use VARCHAR if long and short
> strings are interspersed.
>
> >How would a primary clustered key behave if it was built on a VARCHAR
field
> >vs. a CHAR field?
> It depends. If the column has a maximum length of 250 characters, but
> most values are a lot shorter, using VARCHAR would reduce the number
> of index pages on the intermediate levels. The leaf level would of
> course not be affected (apart from the effect of any CHAR vs VARCHAR
> column decision on the data pages). For an unindexed key, the leaf
> level would need less pages as well. If the table holds many rows, I
> think (I haven't tested it!) that the number of I/O's would be less,
> hence the queries would return results faster. All this at the price
> of the CPU having less idle time.
>
> >And, please explain to me how the 100th VARCHAR fields byte position can
be
> >determined in a buffer of data without reading each of the prior 99
VARCHAR
> >control fields? Without knowing the length of each of the prior 99 data
> >elements in that row, how could the starting byte of the 100th element be
> >determined?
> Easy. Take the end position of the 99th element (stored ar a fixed
> position in the row, after the fixed length and before the varying
> length data) and add 1.
>
> > Since the length of the VARCHAR is limited to 8000 byte - that
> >takes 14 of the 16 bits in the 2-byte control field. So, I can assume
from
> >that fact that the 2-byte control field is not storing the "position" of
the
> >element, but simply the length of each element.
> You seem to forget that the length of the complete row may not exceed
> 8,060 bytes (check BOL, Maximum Capacity Specifications). The 2-byte
> control field will hold the ending position of each element just fine.
>
> >Inside MS SQL Server 2000, by Kalen Delany, on page 230 states that
VARCHAR
> >are most appropriate when you expect significant differences in the
lengths
> >of the data in the column and when the data length in the column wont
> >frequently be changed.
> Apart from the page number, this has apparently not changed from the
> previous edition (Inside MS SQL Server 7.0, by Ron Soukup and Kalen
> Delany).
>
> >Changing the length frequently would required
> >rebuilding the data page - right?
> >
> >If changing an element to full size causes it to no longer fit on the
data
> >page, then page splits occur - that's alot of work for the engine to
handle.
> It's been a while since I read this part, but I think that's not
> entirely correct. The row's data itself will have to be rebuilt, as
> there can be no gaps between the contents of the varying length
> columns. If the row is shortened, it will stay where it is. If the
> row's size increases, it is replaced by a forwarding pointer and the
> new row's content is stored somewhere else; the other rows on the same
> page remain intact. If the row has to move again later, the original
> forwarding pointer is changed (to prevent long chains of pointers).
>
> >Page 233 indicates that "no one answer is right" and "If you understand
the
> >tradeoffs; you'll be able to make the best choice".
> Yes - a terrific piece of advice that I support wholeheartedly!
>
> Best, Hugo
> --
> (Remove _NO_ and _SPAM_ to get my e-mail address)|||Thanks Hugo - Your discussion is very helpful to me.
Prabhat
"Hugo Kornelis" <hugo@.pe_NO_rFact.in_SPAM_fo> wrote in message
news:bl45a0tjo06s6npdrm9rf83mg8o4n80uik@.4ax.com...
> On Wed, 12 May 2004 09:36:31 -0400, Steve Z wrote:
> >Hey guys, he was asking how it was stored - it's always good to know what
> >goes on behind the scenes.
> I agree. That's why I replied to your post, acknowledging that you
> were almost completely right, correcting a small error and adding an
> important consideration. I have seen too many examples of people using
> only fixed-length charachter data in order to avoid the overhead of
> having to find the start and end of the data. Much as I'd like to
> believe that people will realise by themselves that this overhead is
> nothing compared to the increased I/O, I can't.
>
> >Would you ever create a table with lets say 100 single character VARCHAR
> >fields? Would there be value to that, if the columns were guaranteed to
> >have a character in them (ie. not a space).
> Of course not. If the maximum length is short (<= 10 characters), I
> always use CHAR. If it's longer, but most values use almost the full
> maximum length, I also use CHAR. I use VARCHAR if long and short
> strings are interspersed.
>
> >How would a primary clustered key behave if it was built on a VARCHAR
field
> >vs. a CHAR field?
> It depends. If the column has a maximum length of 250 characters, but
> most values are a lot shorter, using VARCHAR would reduce the number
> of index pages on the intermediate levels. The leaf level would of
> course not be affected (apart from the effect of any CHAR vs VARCHAR
> column decision on the data pages). For an unindexed key, the leaf
> level would need less pages as well. If the table holds many rows, I
> think (I haven't tested it!) that the number of I/O's would be less,
> hence the queries would return results faster. All this at the price
> of the CPU having less idle time.
>
> >And, please explain to me how the 100th VARCHAR fields byte position can
be
> >determined in a buffer of data without reading each of the prior 99
VARCHAR
> >control fields? Without knowing the length of each of the prior 99 data
> >elements in that row, how could the starting byte of the 100th element be
> >determined?
> Easy. Take the end position of the 99th element (stored ar a fixed
> position in the row, after the fixed length and before the varying
> length data) and add 1.
>
> > Since the length of the VARCHAR is limited to 8000 byte - that
> >takes 14 of the 16 bits in the 2-byte control field. So, I can assume
from
> >that fact that the 2-byte control field is not storing the "position" of
the
> >element, but simply the length of each element.
> You seem to forget that the length of the complete row may not exceed
> 8,060 bytes (check BOL, Maximum Capacity Specifications). The 2-byte
> control field will hold the ending position of each element just fine.
>
> >Inside MS SQL Server 2000, by Kalen Delany, on page 230 states that
VARCHAR
> >are most appropriate when you expect significant differences in the
lengths
> >of the data in the column and when the data length in the column wont
> >frequently be changed.
> Apart from the page number, this has apparently not changed from the
> previous edition (Inside MS SQL Server 7.0, by Ron Soukup and Kalen
> Delany).
>
> >Changing the length frequently would required
> >rebuilding the data page - right?
> >
> >If changing an element to full size causes it to no longer fit on the
data
> >page, then page splits occur - that's alot of work for the engine to
handle.
> It's been a while since I read this part, but I think that's not
> entirely correct. The row's data itself will have to be rebuilt, as
> there can be no gaps between the contents of the varying length
> columns. If the row is shortened, it will stay where it is. If the
> row's size increases, it is replaced by a forwarding pointer and the
> new row's content is stored somewhere else; the other rows on the same
> page remain intact. If the row has to move again later, the original
> forwarding pointer is changed (to prevent long chains of pointers).
>
> >Page 233 indicates that "no one answer is right" and "If you understand
the
> >tradeoffs; you'll be able to make the best choice".
> Yes - a terrific piece of advice that I support wholeheartedly!
>
> Best, Hugo
> --
> (Remove _NO_ and _SPAM_ to get my e-mail address)|||> And, please explain to me how the 100th VARCHAR fields byte position can
be
> determined in a buffer of data without reading each of the prior 99
VARCHAR
> control fields? Without knowing the length of each of the prior 99 data
> elements in that row, how could the starting byte of the 100th element be
> determined? Since the length of the VARCHAR is limited to 8000 byte -
that
> takes 14 of the 16 bits in the 2-byte control field. So, I can assume
from
> that fact that the 2-byte control field is not storing the "position" of
the
> element, but simply the length of each element.
It's a table from "Inside MS SQL Server 2000" book; pay attention to a row
with mnemonic "VarOffset"
Table 6-7. Information stored in a table's data rows.
Information Mnemonic Size
Status Bits A TagA 1 byte
Status Bits B (not used in SQL Server 2000) TagB 1 byte
Fixed-length size Fsize 2 bytes
Fixed-length data Fdata Fsize -4
Number of columns Ncol 2 bytes
NULL bitmap (1 bit for each column in table; a 1 indicates that the
corresponding column is NULL) Nullbits Ceiling (Ncol / 8)
Number of variable-length columns VarCount 2 bytes
Variable column offset array VarOffset 2 * VarCount
Variable-length data VarData VarOff[VarCount] - (Fsize + 4 + Ceiling
(Ncol / 8) + 2 * VarCount)|||Then the assumption being bantered about here about the control field for a
varchar element being two bytes in size was wrong. It's storing roughly
two-bytes for each varchar element for the offset and another two-bytes for
the element length.
Thanks for the table - I hadn't stumbled upon that info.
This will help the original poster of this question with gaining a full
understanding of impact of char vs. varchar.
"Alex Cieszinski" <nomail@.thanks> wrote in message
news:%23zq%23xKaOEHA.3380@.TK2MSFTNGP11.phx.gbl...
> > And, please explain to me how the 100th VARCHAR fields byte position can
> be
> > determined in a buffer of data without reading each of the prior 99
> VARCHAR
> > control fields? Without knowing the length of each of the prior 99 data
> > elements in that row, how could the starting byte of the 100th element
be
> > determined? Since the length of the VARCHAR is limited to 8000 byte -
> that
> > takes 14 of the 16 bits in the 2-byte control field. So, I can assume
> from
> > that fact that the 2-byte control field is not storing the "position" of
> the
> > element, but simply the length of each element.
> It's a table from "Inside MS SQL Server 2000" book; pay attention to a row
> with mnemonic "VarOffset"
> Table 6-7. Information stored in a table's data rows.
> Information Mnemonic Size
> Status Bits A TagA 1 byte
> Status Bits B (not used in SQL Server 2000) TagB 1 byte
> Fixed-length size Fsize 2 bytes
> Fixed-length data Fdata Fsize -4
> Number of columns Ncol 2 bytes
> NULL bitmap (1 bit for each column in table; a 1 indicates that the
> corresponding column is NULL) Nullbits Ceiling (Ncol / 8)
> Number of variable-length columns VarCount 2 bytes
> Variable column offset array VarOffset 2 * VarCount
> Variable-length data VarData VarOff[VarCount] - (Fsize + 4 + Ceiling
> (Ncol / 8) + 2 * VarCount)
>
>|||Hi Alex,
As I am not a advance User of SQL Server So I could not get that table. Can
U please elaborate that, What is that info about and what does that mean.
As Steve told that "This will help the original poster of this question with
gaining a full understanding of impact of char vs. varchar." So for that I
wanted to know what is that info all about.
Thanks
Prabhat
"Alex Cieszinski" <nomail@.thanks> wrote in message
news:#zq#xKaOEHA.3380@.TK2MSFTNGP11.phx.gbl...
> > And, please explain to me how the 100th VARCHAR fields byte position can
> be
> > determined in a buffer of data without reading each of the prior 99
> VARCHAR
> > control fields? Without knowing the length of each of the prior 99 data
> > elements in that row, how could the starting byte of the 100th element
be
> > determined? Since the length of the VARCHAR is limited to 8000 byte -
> that
> > takes 14 of the 16 bits in the 2-byte control field. So, I can assume
> from
> > that fact that the 2-byte control field is not storing the "position" of
> the
> > element, but simply the length of each element.
> It's a table from "Inside MS SQL Server 2000" book; pay attention to a row
> with mnemonic "VarOffset"
> Table 6-7. Information stored in a table's data rows.
> Information Mnemonic Size
> Status Bits A TagA 1 byte
> Status Bits B (not used in SQL Server 2000) TagB 1 byte
> Fixed-length size Fsize 2 bytes
> Fixed-length data Fdata Fsize -4
> Number of columns Ncol 2 bytes
> NULL bitmap (1 bit for each column in table; a 1 indicates that the
> corresponding column is NULL) Nullbits Ceiling (Ncol / 8)
> Number of variable-length columns VarCount 2 bytes
> Variable column offset array VarOffset 2 * VarCount
> Variable-length data VarData VarOff[VarCount] - (Fsize + 4 + Ceiling
> (Ncol / 8) + 2 * VarCount)
>
>|||The letter just lost formatting - that's why you couldn't catch what it does
really means..
Table 6-7. Information stored in a table's data rows.
Information Mnemonic Size
Status Bits A TagA 1 byte
Status Bits B TagB 1 byte
Fixed-length size Fsize 2
bytes
Fixed-length data Fdata Fsize -4
Number of columns Ncol 2 bytes
NULL bitmap NullbitsCeiling (Ncol / 8)
Number of variable-length columns VarCount 2 bytes
Variable column offset array VarOffset 2 * VarCount
Variable-length data VarData
VarOff[VarCount] - (Fsize + 4 +
(Ncol / 8) + 2 * VarCount)
This table describes which part every data row in database consists of...
now I think it should be clear what are they and how fixed-length data
differs from variable-length
"Prabhat" <not_a_mail@.hotmail.com> ÓÏÏÂÝÉÌ/ÓÏÏÂÝÉÌÁ × ÎÏ×ÏÓÔÑÈ ÓÌÅÄÕÀÝÅÅ:
news:OZYqA1cOEHA.1392@.TK2MSFTNGP09.phx.gbl...
> Hi Alex,
> As I am not a advance User of SQL Server So I could not get that table.
Can
> U please elaborate that, What is that info about and what does that mean.
> As Steve told that "This will help the original poster of this question
with
> gaining a full understanding of impact of char vs. varchar." So for that I
> wanted to know what is that info all about.
> Thanks
> Prabhat
>
> "Alex Cieszinski" <nomail@.thanks> wrote in message
> news:#zq#xKaOEHA.3380@.TK2MSFTNGP11.phx.gbl...
> > > And, please explain to me how the 100th VARCHAR fields byte position
can
> > be
> > > determined in a buffer of data without reading each of the prior 99
> > VARCHAR
> > > control fields? Without knowing the length of each of the prior 99
data
> > > elements in that row, how could the starting byte of the 100th element
> be
> > > determined? Since the length of the VARCHAR is limited to 8000 byte -
> > that
> > > takes 14 of the 16 bits in the 2-byte control field. So, I can assume
> > from
> > > that fact that the 2-byte control field is not storing the "position"
of
> > the
> > > element, but simply the length of each element.
> >
> > It's a table from "Inside MS SQL Server 2000" book; pay attention to a
row
> > with mnemonic "VarOffset"
> >
> > Table 6-7. Information stored in a table's data rows.
> >
> > Information Mnemonic Size
> > Status Bits A TagA 1 byte
> > Status Bits B (not used in SQL Server 2000) TagB 1 byte
> > Fixed-length size Fsize 2 bytes
> > Fixed-length data Fdata Fsize -4
> > Number of columns Ncol 2 bytes
> > NULL bitmap (1 bit for each column in table; a 1 indicates that
the
> > corresponding column is NULL) Nullbits Ceiling (Ncol / 8)
> > Number of variable-length columns VarCount 2 bytes
> > Variable column offset array VarOffset 2 * VarCount
> > Variable-length data VarData VarOff[VarCount] - (Fsize + 4 +
Ceiling
> > (Ncol / 8) + 2 * VarCount)
> >
> >
> >
> >
>|||On Fri, 14 May 2004 10:33:44 -0400, Steve Z wrote:
>Then the assumption being bantered about here about the control field for a
>varchar element being two bytes in size was wrong. It's storing roughly
>two-bytes for each varchar element for the offset and another two-bytes for
>the element length.
Hi Steve,
No, you're mistaken.
It's storing exactly two bytes for each varchar element for the offset (OF
THE LAST BYTE, not the start!!), but not a single byte for the length. The
length is the difference between two succesive end locations.
If one varchar value ends at position 100 and the next ends at position
108, the latter must surely be 8 bytes.
Best, Hugo
--
(Remove _NO_ and _SPAM_ to get my e-mail address)|||Thanks Alex,
But I am not clear on these terms, as I am a biggner to SQL Server just
trying to lean how SQL Server stores data and manage the edits to Char and
Varchar Data. And As well as how the Indexes as gets update for those types
of columns. How the Data from those column are retrived. etc. etc.
But the Entire Discussion was really helpful for me. But As I can guess that
your reply (table 6.7) gives the answer of how the data is stored in DB. But
to be frank I am not clear on these. I will try to learn some basic
Internals of SQL Server. Like Database, DataFile.. And all the Units of
storage etc etc and How these are managed. How Data is saved and retrived.
How a Transaction occurs. Like in oracle it has some thing like user table
space and temporary table space etc. etc.
Of course I am not an advanced SQL Server User/Programmer but As My current
Project is usiong SQL Server So I am leaning these basics of Database (I am
good in TSQL). This newsgroup helps me a lot. Of cource this thread also.
It will be helpful I will get the above detals.
Thanks
Prabhat
"Alex Cieszinski" <nomail@.1> wrote in message
news:O4M2r7fOEHA.3596@.tk2msftngp13.phx.gbl...
> The letter just lost formatting - that's why you couldn't catch what it
does
> really means..
> Table 6-7. Information stored in a table's data rows.
> Information Mnemonic Size
> Status Bits A TagA 1
byte
> Status Bits B TagB 1
byte
> Fixed-length size Fsize 2
> bytes
> Fixed-length data Fdata
Fsize -4
> Number of columns Ncol 2 bytes
> NULL bitmap NullbitsCeiling (Ncol / 8)
> Number of variable-length columns VarCount 2 bytes
> Variable column offset array VarOffset 2 * VarCount
> Variable-length data VarData
> VarOff[VarCount] - (Fsize + 4 +
> (Ncol / 8) + 2 * VarCount)
> This table describes which part every data row in database consists of...
> now I think it should be clear what are they and how fixed-length data
> differs from variable-length
>
> "Prabhat" <not_a_mail@.hotmail.com> ÓÏÏÂÝÉÌ/ÓÏÏÂÝÉÌÁ × ÎÏ×ÏÓÔÑÈ ÓÌÅÄÕÀÝÅÅ:
> news:OZYqA1cOEHA.1392@.TK2MSFTNGP09.phx.gbl...
> > Hi Alex,
> >
> > As I am not a advance User of SQL Server So I could not get that table.
> Can
> > U please elaborate that, What is that info about and what does that
mean.
> >
> > As Steve told that "This will help the original poster of this question
> with
> > gaining a full understanding of impact of char vs. varchar." So for that
I
> > wanted to know what is that info all about.
> >
> > Thanks
> > Prabhat
> >
> >
> > "Alex Cieszinski" <nomail@.thanks> wrote in message
> > news:#zq#xKaOEHA.3380@.TK2MSFTNGP11.phx.gbl...
> > > > And, please explain to me how the 100th VARCHAR fields byte position
> can
> > > be
> > > > determined in a buffer of data without reading each of the prior 99
> > > VARCHAR
> > > > control fields? Without knowing the length of each of the prior 99
> data
> > > > elements in that row, how could the starting byte of the 100th
element
> > be
> > > > determined? Since the length of the VARCHAR is limited to 8000
byte -
> > > that
> > > > takes 14 of the 16 bits in the 2-byte control field. So, I can
assume
> > > from
> > > > that fact that the 2-byte control field is not storing the
"position"
> of
> > > the
> > > > element, but simply the length of each element.
> > >
> > > It's a table from "Inside MS SQL Server 2000" book; pay attention to a
> row
> > > with mnemonic "VarOffset"
> > >
> > > Table 6-7. Information stored in a table's data rows.
> > >
> > > Information Mnemonic Size
> > > Status Bits A TagA 1 byte
> > > Status Bits B (not used in SQL Server 2000) TagB 1 byte
> > > Fixed-length size Fsize 2 bytes
> > > Fixed-length data Fdata Fsize -4
> > > Number of columns Ncol 2 bytes
> > > NULL bitmap (1 bit for each column in table; a 1 indicates that
> the
> > > corresponding column is NULL) Nullbits Ceiling (Ncol / 8)
> > > Number of variable-length columns VarCount 2 bytes
> > > Variable column offset array VarOffset 2 * VarCount
> > > Variable-length data VarData VarOff[VarCount] - (Fsize + 4 +
> Ceiling
> > > (Ncol / 8) + 2 * VarCount)
> > >
> > >
> > >
> > >
> >
> >
>|||I think the best solution to learn this is to read "Inside MS SQL Server
2000" book by Kalen Delaney - this book contains numerous information on
how MSSQL is functioning with inside information
"Prabhat" <not_a_mail@.hotmail.com> ÓÏÏÂÝÉÌ/ÓÏÏÂÝÉÌÁ × ÎÏ×ÏÓÔÑÈ ÓÌÅÄÕÀÝÅÅ:
news:eQKNCrkOEHA.3028@.TK2MSFTNGP11.phx.gbl...
> Thanks Alex,
> But I am not clear on these terms, as I am a biggner to SQL Server just
> trying to lean how SQL Server stores data and manage the edits to Char and
> Varchar Data. And As well as how the Indexes as gets update for those
types
> of columns. How the Data from those column are retrived. etc. etc.
> But the Entire Discussion was really helpful for me. But As I can guess
that
> your reply (table 6.7) gives the answer of how the data is stored in DB.
But
> to be frank I am not clear on these. I will try to learn some basic
> Internals of SQL Server. Like Database, DataFile.. And all the Units of
> storage etc etc and How these are managed. How Data is saved and retrived.
> How a Transaction occurs. Like in oracle it has some thing like user table
> space and temporary table space etc. etc.
> Of course I am not an advanced SQL Server User/Programmer but As My
current
> Project is usiong SQL Server So I am leaning these basics of Database (I
am
> good in TSQL). This newsgroup helps me a lot. Of cource this thread also.
> It will be helpful I will get the above detals.
> Thanks
> Prabhat
>
> "Alex Cieszinski" <nomail@.1> wrote in message
> news:O4M2r7fOEHA.3596@.tk2msftngp13.phx.gbl...
> > The letter just lost formatting - that's why you couldn't catch what it
> does
> > really means..
> >
> > Table 6-7. Information stored in a table's data rows.
> >
> > Information Mnemonic Size
> > Status Bits A TagA 1
> byte
> > Status Bits B TagB 1
> byte
> > Fixed-length size Fsize 2
> > bytes
> > Fixed-length data Fdata
> Fsize -4
> > Number of columns Ncol 2 bytes
> > NULL bitmap NullbitsCeiling (Ncol /
8)
> > Number of variable-length columns VarCount 2 bytes
> > Variable column offset array VarOffset 2 *
VarCount
> > Variable-length data VarData
> > VarOff[VarCount] - (Fsize + 4 +
> > (Ncol / 8) + 2 * VarCount)
> >
> > This table describes which part every data row in database consists
of...
> > now I think it should be clear what are they and how fixed-length data
> > differs from variable-length
> >
> >
> > "Prabhat" <not_a_mail@.hotmail.com> ÓÏÏÂÝÉÌ/ÓÏÏÂÝÉÌÁ × ÎÏ×ÏÓÔÑÈ
ÓÌÅÄÕÀÝÅÅ:
> > news:OZYqA1cOEHA.1392@.TK2MSFTNGP09.phx.gbl...
> > > Hi Alex,
> > >
> > > As I am not a advance User of SQL Server So I could not get that
table.
> > Can
> > > U please elaborate that, What is that info about and what does that
> mean.
> > >
> > > As Steve told that "This will help the original poster of this
question
> > with
> > > gaining a full understanding of impact of char vs. varchar." So for
that
> I
> > > wanted to know what is that info all about.
> > >
> > > Thanks
> > > Prabhat
> > >
> > >
> > > "Alex Cieszinski" <nomail@.thanks> wrote in message
> > > news:#zq#xKaOEHA.3380@.TK2MSFTNGP11.phx.gbl...
> > > > > And, please explain to me how the 100th VARCHAR fields byte
position
> > can
> > > > be
> > > > > determined in a buffer of data without reading each of the prior
99
> > > > VARCHAR
> > > > > control fields? Without knowing the length of each of the prior
99
> > data
> > > > > elements in that row, how could the starting byte of the 100th
> element
> > > be
> > > > > determined? Since the length of the VARCHAR is limited to 8000
> byte -
> > > > that
> > > > > takes 14 of the 16 bits in the 2-byte control field. So, I can
> assume
> > > > from
> > > > > that fact that the 2-byte control field is not storing the
> "position"
> > of
> > > > the
> > > > > element, but simply the length of each element.
> > > >
> > > > It's a table from "Inside MS SQL Server 2000" book; pay attention to
a
> > row
> > > > with mnemonic "VarOffset"
> > > >
> > > > Table 6-7. Information stored in a table's data rows.
> > > >
> > > > Information Mnemonic Size
> > > > Status Bits A TagA 1 byte
> > > > Status Bits B (not used in SQL Server 2000) TagB 1 byte
> > > > Fixed-length size Fsize 2 bytes
> > > > Fixed-length data Fdata Fsize -4
> > > > Number of columns Ncol 2 bytes
> > > > NULL bitmap (1 bit for each column in table; a 1 indicates
that
> > the
> > > > corresponding column is NULL) Nullbits Ceiling (Ncol / 8)
> > > > Number of variable-length columns VarCount 2 bytes
> > > > Variable column offset array VarOffset 2 * VarCount
> > > > Variable-length data VarData VarOff[VarCount] - (Fsize + 4 +
> > Ceiling
> > > > (Ncol / 8) + 2 * VarCount)
> > > >
> > > >
> > > >
> > > >
> > >
> > >
> >
> >
>
How internally SQL Server Stores Data in a Char and Varchar field. Like I
think there will be an extra bit of data which will store for Varchar -
Which is for Number of Characters in the field. Is something similar in Char
also?
Also How Does SQL Server Retrive records from both types of field?
Thanks
PrabhatBuy "Inside SQL Server" by Kalen Delaney for the complete explanation.
The shortened version is that a VARCHAR uses 2 extra bytes (to store the
actual length of the data in the column, or more accurately, the end
position of the column).
--
Jacco Schalkwijk
SQL Server MVP
"Prabhat" <not_a_mail@.hotmail.com> wrote in message
news:OSfHSS3NEHA.644@.tk2msftngp13.phx.gbl...
> Hi All,
> How internally SQL Server Stores Data in a Char and Varchar field. Like I
> think there will be an extra bit of data which will store for Varchar -
> Which is for Number of Characters in the field. Is something similar in
Char
> also?
> Also How Does SQL Server Retrive records from both types of field?
> Thanks
> Prabhat
>|||I believe that I've also read in this forum that all the CHAR columns are
physically arranged and stored at the begining of the data block, regardless
of there position in the DDL.
After the CHAR columns, the VARCHAR columns come next - with the 2 extra
bytes to store the "end byte", or basically the next starting offset of the
next VARCHAR column.
Access to the CHAR columns would then be faster than access to the VARCHAR
columns, as the byte positions are fixed for each record. Any access to
VARCHAR columns requires "byte-offset" checking for each column, one at a
time.
"Jacco Schalkwijk" <NOSPAMjaccos@.eurostop.co.uk> wrote in message
news:ezCg2x4NEHA.3348@.TK2MSFTNGP09.phx.gbl...
> Buy "Inside SQL Server" by Kalen Delaney for the complete explanation.
> The shortened version is that a VARCHAR uses 2 extra bytes (to store the
> actual length of the data in the column, or more accurately, the end
> position of the column).
> --
> Jacco Schalkwijk
> SQL Server MVP
> "Prabhat" <not_a_mail@.hotmail.com> wrote in message
> news:OSfHSS3NEHA.644@.tk2msftngp13.phx.gbl...
> > Hi All,
> >
> > How internally SQL Server Stores Data in a Char and Varchar field. Like
I
> > think there will be an extra bit of data which will store for Varchar -
> > Which is for Number of Characters in the field. Is something similar in
> Char
> > also?
> >
> > Also How Does SQL Server Retrive records from both types of field?
> >
> > Thanks
> > Prabhat
> >
> >
>|||On the other hand, access to tables having CHAR fields may be much slower
than to the same tables with VARCHAR ones if you rarely use all (or almost
all) length of the field in actual data - cause the more records are located
on one page the less I/O operations SQL Server will make during group table
reads
"Steve Z" <szlamany@.antarescomputing_no_spam.com> wrote in message
news:uCvQmy5NEHA.620@.TK2MSFTNGP10.phx.gbl...
> I believe that I've also read in this forum that all the CHAR columns are
> physically arranged and stored at the begining of the data block,
regardless
> of there position in the DDL.
> After the CHAR columns, the VARCHAR columns come next - with the 2 extra
> bytes to store the "end byte", or basically the next starting offset of
the
> next VARCHAR column.
> Access to the CHAR columns would then be faster than access to the VARCHAR
> columns, as the byte positions are fixed for each record. Any access to
> VARCHAR columns requires "byte-offset" checking for each column, one at a
> time.|||On Tue, 11 May 2004 17:11:16 -0400, Steve Z wrote:
>I believe that I've also read in this forum that all the CHAR columns are
>physically arranged and stored at the begining of the data block, regardless
>of there position in the DDL.
>After the CHAR columns, the VARCHAR columns come next - with the 2 extra
>bytes to store the "end byte", or basically the next starting offset of the
>next VARCHAR column.
>Access to the CHAR columns would then be faster than access to the VARCHAR
>columns, as the byte positions are fixed for each record. Any access to
>VARCHAR columns requires "byte-offset" checking for each column, one at a
>time.
Hi Steve,
Almost true. In fact, all fixed-length columns are stored at the
beginning of the data block and all variable-length columns at the
end. Access to a varchar column does not imply looping through all
other varchar columns stored before the column you need (as might be
concluded from your post); the end position of whatever is stored
before the varchar column being sought and the end position of that
column itself can be found directly.
This does introduce a little bit of overhead in CPU use and reading
memory, but this is compensated many times by the reduced physical I/O
(see Alex' message)
Best, Hugo
--
(Remove _NO_ and _SPAM_ to get my e-mail address)|||I agree. These kind of considerations are pretty much irrelevant for
performance compared to proper database design, query writing and indexing.
--
Jacco Schalkwijk
SQL Server MVP
"Hugo Kornelis" <hugo@.pe_NO_rFact.in_SPAM_fo> wrote in message
news:dbp3a0hd0808238thks1lrjabd12umgfqv@.4ax.com...
> On Tue, 11 May 2004 17:11:16 -0400, Steve Z wrote:
> >I believe that I've also read in this forum that all the CHAR columns are
> >physically arranged and stored at the begining of the data block,
regardless
> >of there position in the DDL.
> >
> >After the CHAR columns, the VARCHAR columns come next - with the 2 extra
> >bytes to store the "end byte", or basically the next starting offset of
the
> >next VARCHAR column.
> >
> >Access to the CHAR columns would then be faster than access to the
VARCHAR
> >columns, as the byte positions are fixed for each record. Any access to
> >VARCHAR columns requires "byte-offset" checking for each column, one at a
> >time.
> Hi Steve,
> Almost true. In fact, all fixed-length columns are stored at the
> beginning of the data block and all variable-length columns at the
> end. Access to a varchar column does not imply looping through all
> other varchar columns stored before the column you need (as might be
> concluded from your post); the end position of whatever is stored
> before the varchar column being sought and the end position of that
> column itself can be found directly.
> This does introduce a little bit of overhead in CPU use and reading
> memory, but this is compensated many times by the reduced physical I/O
> (see Alex' message)
> Best, Hugo
> --
> (Remove _NO_ and _SPAM_ to get my e-mail address)|||Hey guys, he was asking how it was stored - it's always good to know what
goes on behind the scenes.
Would you ever create a table with lets say 100 single character VARCHAR
fields? Would there be value to that, if the columns were guaranteed to
have a character in them (ie. not a space).
How would a primary clustered key behave if it was built on a VARCHAR field
vs. a CHAR field?
And, please explain to me how the 100th VARCHAR fields byte position can be
determined in a buffer of data without reading each of the prior 99 VARCHAR
control fields? Without knowing the length of each of the prior 99 data
elements in that row, how could the starting byte of the 100th element be
determined? Since the length of the VARCHAR is limited to 8000 byte - that
takes 14 of the 16 bits in the 2-byte control field. So, I can assume from
that fact that the 2-byte control field is not storing the "position" of the
element, but simply the length of each element.
Inside MS SQL Server 2000, by Kalen Delany, on page 230 states that VARCHAR
are most appropriate when you expect significant differences in the lengths
of the data in the column and when the data length in the column wont
frequently be changed. Changing the length frequently would required
rebuilding the data page - right?
If changing an element to full size causes it to no longer fit on the data
page, then page splits occur - that's alot of work for the engine to handle.
Page 233 indicates that "no one answer is right" and "If you understand the
tradeoffs; you'll be able to make the best choice".
"Jacco Schalkwijk" <NOSPAMjaccos@.eurostop.co.uk> wrote in message
news:e3a8hrAOEHA.3988@.tk2msftngp13.phx.gbl...
> I agree. These kind of considerations are pretty much irrelevant for
> performance compared to proper database design, query writing and
indexing.
> --
> Jacco Schalkwijk
> SQL Server MVP
>
> "Hugo Kornelis" <hugo@.pe_NO_rFact.in_SPAM_fo> wrote in message
> news:dbp3a0hd0808238thks1lrjabd12umgfqv@.4ax.com...
> > On Tue, 11 May 2004 17:11:16 -0400, Steve Z wrote:
> >
> > >I believe that I've also read in this forum that all the CHAR columns
are
> > >physically arranged and stored at the begining of the data block,
> regardless
> > >of there position in the DDL.
> > >
> > >After the CHAR columns, the VARCHAR columns come next - with the 2
extra
> > >bytes to store the "end byte", or basically the next starting offset of
> the
> > >next VARCHAR column.
> > >
> > >Access to the CHAR columns would then be faster than access to the
> VARCHAR
> > >columns, as the byte positions are fixed for each record. Any access
to
> > >VARCHAR columns requires "byte-offset" checking for each column, one at
a
> > >time.
> >
> > Hi Steve,
> >
> > Almost true. In fact, all fixed-length columns are stored at the
> > beginning of the data block and all variable-length columns at the
> > end. Access to a varchar column does not imply looping through all
> > other varchar columns stored before the column you need (as might be
> > concluded from your post); the end position of whatever is stored
> > before the varchar column being sought and the end position of that
> > column itself can be found directly.
> >
> > This does introduce a little bit of overhead in CPU use and reading
> > memory, but this is compensated many times by the reduced physical I/O
> > (see Alex' message)
> >
> > Best, Hugo
> > --
> >
> > (Remove _NO_ and _SPAM_ to get my e-mail address)
>|||Thanks Steve and All for the suggestions. I am getting lots of inputs from
the Discussion.
Thanks
Prabhat
"Steve Z" <szlamany@.antarescomputing_no_spam.com> wrote in message
news:eAvmKZCOEHA.2480@.tk2msftngp13.phx.gbl...
> Hey guys, he was asking how it was stored - it's always good to know what
> goes on behind the scenes.
> Would you ever create a table with lets say 100 single character VARCHAR
> fields? Would there be value to that, if the columns were guaranteed to
> have a character in them (ie. not a space).
> How would a primary clustered key behave if it was built on a VARCHAR
field
> vs. a CHAR field?
> And, please explain to me how the 100th VARCHAR fields byte position can
be
> determined in a buffer of data without reading each of the prior 99
VARCHAR
> control fields? Without knowing the length of each of the prior 99 data
> elements in that row, how could the starting byte of the 100th element be
> determined? Since the length of the VARCHAR is limited to 8000 byte -
that
> takes 14 of the 16 bits in the 2-byte control field. So, I can assume
from
> that fact that the 2-byte control field is not storing the "position" of
the
> element, but simply the length of each element.
> Inside MS SQL Server 2000, by Kalen Delany, on page 230 states that
VARCHAR
> are most appropriate when you expect significant differences in the
lengths
> of the data in the column and when the data length in the column wont
> frequently be changed. Changing the length frequently would required
> rebuilding the data page - right?
> If changing an element to full size causes it to no longer fit on the data
> page, then page splits occur - that's alot of work for the engine to
handle.
> Page 233 indicates that "no one answer is right" and "If you understand
the
> tradeoffs; you'll be able to make the best choice".
>
> "Jacco Schalkwijk" <NOSPAMjaccos@.eurostop.co.uk> wrote in message
> news:e3a8hrAOEHA.3988@.tk2msftngp13.phx.gbl...
> > I agree. These kind of considerations are pretty much irrelevant for
> > performance compared to proper database design, query writing and
> indexing.
> >
> > --
> > Jacco Schalkwijk
> > SQL Server MVP
> >
> >
> > "Hugo Kornelis" <hugo@.pe_NO_rFact.in_SPAM_fo> wrote in message
> > news:dbp3a0hd0808238thks1lrjabd12umgfqv@.4ax.com...
> > > On Tue, 11 May 2004 17:11:16 -0400, Steve Z wrote:
> > >
> > > >I believe that I've also read in this forum that all the CHAR columns
> are
> > > >physically arranged and stored at the begining of the data block,
> > regardless
> > > >of there position in the DDL.
> > > >
> > > >After the CHAR columns, the VARCHAR columns come next - with the 2
> extra
> > > >bytes to store the "end byte", or basically the next starting offset
of
> > the
> > > >next VARCHAR column.
> > > >
> > > >Access to the CHAR columns would then be faster than access to the
> > VARCHAR
> > > >columns, as the byte positions are fixed for each record. Any access
> to
> > > >VARCHAR columns requires "byte-offset" checking for each column, one
at
> a
> > > >time.
> > >
> > > Hi Steve,
> > >
> > > Almost true. In fact, all fixed-length columns are stored at the
> > > beginning of the data block and all variable-length columns at the
> > > end. Access to a varchar column does not imply looping through all
> > > other varchar columns stored before the column you need (as might be
> > > concluded from your post); the end position of whatever is stored
> > > before the varchar column being sought and the end position of that
> > > column itself can be found directly.
> > >
> > > This does introduce a little bit of overhead in CPU use and reading
> > > memory, but this is compensated many times by the reduced physical I/O
> > > (see Alex' message)
> > >
> > > Best, Hugo
> > > --
> > >
> > > (Remove _NO_ and _SPAM_ to get my e-mail address)
> >
> >
>|||On Wed, 12 May 2004 09:36:31 -0400, Steve Z wrote:
>Hey guys, he was asking how it was stored - it's always good to know what
>goes on behind the scenes.
I agree. That's why I replied to your post, acknowledging that you
were almost completely right, correcting a small error and adding an
important consideration. I have seen too many examples of people using
only fixed-length charachter data in order to avoid the overhead of
having to find the start and end of the data. Much as I'd like to
believe that people will realise by themselves that this overhead is
nothing compared to the increased I/O, I can't.
>Would you ever create a table with lets say 100 single character VARCHAR
>fields? Would there be value to that, if the columns were guaranteed to
>have a character in them (ie. not a space).
Of course not. If the maximum length is short (<= 10 characters), I
always use CHAR. If it's longer, but most values use almost the full
maximum length, I also use CHAR. I use VARCHAR if long and short
strings are interspersed.
>How would a primary clustered key behave if it was built on a VARCHAR field
>vs. a CHAR field?
It depends. If the column has a maximum length of 250 characters, but
most values are a lot shorter, using VARCHAR would reduce the number
of index pages on the intermediate levels. The leaf level would of
course not be affected (apart from the effect of any CHAR vs VARCHAR
column decision on the data pages). For an unindexed key, the leaf
level would need less pages as well. If the table holds many rows, I
think (I haven't tested it!) that the number of I/O's would be less,
hence the queries would return results faster. All this at the price
of the CPU having less idle time.
>And, please explain to me how the 100th VARCHAR fields byte position can be
>determined in a buffer of data without reading each of the prior 99 VARCHAR
>control fields? Without knowing the length of each of the prior 99 data
>elements in that row, how could the starting byte of the 100th element be
>determined?
Easy. Take the end position of the 99th element (stored ar a fixed
position in the row, after the fixed length and before the varying
length data) and add 1.
> Since the length of the VARCHAR is limited to 8000 byte - that
>takes 14 of the 16 bits in the 2-byte control field. So, I can assume from
>that fact that the 2-byte control field is not storing the "position" of the
>element, but simply the length of each element.
You seem to forget that the length of the complete row may not exceed
8,060 bytes (check BOL, Maximum Capacity Specifications). The 2-byte
control field will hold the ending position of each element just fine.
>Inside MS SQL Server 2000, by Kalen Delany, on page 230 states that VARCHAR
>are most appropriate when you expect significant differences in the lengths
>of the data in the column and when the data length in the column wont
>frequently be changed.
Apart from the page number, this has apparently not changed from the
previous edition (Inside MS SQL Server 7.0, by Ron Soukup and Kalen
Delany).
>Changing the length frequently would required
>rebuilding the data page - right?
>If changing an element to full size causes it to no longer fit on the data
>page, then page splits occur - that's alot of work for the engine to handle.
It's been a while since I read this part, but I think that's not
entirely correct. The row's data itself will have to be rebuilt, as
there can be no gaps between the contents of the varying length
columns. If the row is shortened, it will stay where it is. If the
row's size increases, it is replaced by a forwarding pointer and the
new row's content is stored somewhere else; the other rows on the same
page remain intact. If the row has to move again later, the original
forwarding pointer is changed (to prevent long chains of pointers).
>Page 233 indicates that "no one answer is right" and "If you understand the
>tradeoffs; you'll be able to make the best choice".
Yes - a terrific piece of advice that I support wholeheartedly!
Best, Hugo
--
(Remove _NO_ and _SPAM_ to get my e-mail address)|||> >Changing the length frequently would required
> >rebuilding the data page - right?
> >
> >If changing an element to full size causes it to no longer fit on the
data
> >page, then page splits occur - that's alot of work for the engine to
handle.
> It's been a while since I read this part, but I think that's not
> entirely correct. The row's data itself will have to be rebuilt, as
> there can be no gaps between the contents of the varying length
> columns. If the row is shortened, it will stay where it is. If the
> row's size increases, it is replaced by a forwarding pointer and the
> new row's content is stored somewhere else; the other rows on the same
> page remain intact. If the row has to move again later, the original
> forwarding pointer is changed (to prevent long chains of pointers).
That is only correct for heaps, tables without a clustered index. In a table
with a clustered index this scenario will cause a page split, as the rows
have to stay in the same position in the clustered index.
--
Jacco Schalkwijk
SQL Server MVP
"Hugo Kornelis" <hugo@.pe_NO_rFact.in_SPAM_fo> wrote in message
news:bl45a0tjo06s6npdrm9rf83mg8o4n80uik@.4ax.com...
> On Wed, 12 May 2004 09:36:31 -0400, Steve Z wrote:
> >Hey guys, he was asking how it was stored - it's always good to know what
> >goes on behind the scenes.
> I agree. That's why I replied to your post, acknowledging that you
> were almost completely right, correcting a small error and adding an
> important consideration. I have seen too many examples of people using
> only fixed-length charachter data in order to avoid the overhead of
> having to find the start and end of the data. Much as I'd like to
> believe that people will realise by themselves that this overhead is
> nothing compared to the increased I/O, I can't.
>
> >Would you ever create a table with lets say 100 single character VARCHAR
> >fields? Would there be value to that, if the columns were guaranteed to
> >have a character in them (ie. not a space).
> Of course not. If the maximum length is short (<= 10 characters), I
> always use CHAR. If it's longer, but most values use almost the full
> maximum length, I also use CHAR. I use VARCHAR if long and short
> strings are interspersed.
>
> >How would a primary clustered key behave if it was built on a VARCHAR
field
> >vs. a CHAR field?
> It depends. If the column has a maximum length of 250 characters, but
> most values are a lot shorter, using VARCHAR would reduce the number
> of index pages on the intermediate levels. The leaf level would of
> course not be affected (apart from the effect of any CHAR vs VARCHAR
> column decision on the data pages). For an unindexed key, the leaf
> level would need less pages as well. If the table holds many rows, I
> think (I haven't tested it!) that the number of I/O's would be less,
> hence the queries would return results faster. All this at the price
> of the CPU having less idle time.
>
> >And, please explain to me how the 100th VARCHAR fields byte position can
be
> >determined in a buffer of data without reading each of the prior 99
VARCHAR
> >control fields? Without knowing the length of each of the prior 99 data
> >elements in that row, how could the starting byte of the 100th element be
> >determined?
> Easy. Take the end position of the 99th element (stored ar a fixed
> position in the row, after the fixed length and before the varying
> length data) and add 1.
>
> > Since the length of the VARCHAR is limited to 8000 byte - that
> >takes 14 of the 16 bits in the 2-byte control field. So, I can assume
from
> >that fact that the 2-byte control field is not storing the "position" of
the
> >element, but simply the length of each element.
> You seem to forget that the length of the complete row may not exceed
> 8,060 bytes (check BOL, Maximum Capacity Specifications). The 2-byte
> control field will hold the ending position of each element just fine.
>
> >Inside MS SQL Server 2000, by Kalen Delany, on page 230 states that
VARCHAR
> >are most appropriate when you expect significant differences in the
lengths
> >of the data in the column and when the data length in the column wont
> >frequently be changed.
> Apart from the page number, this has apparently not changed from the
> previous edition (Inside MS SQL Server 7.0, by Ron Soukup and Kalen
> Delany).
>
> >Changing the length frequently would required
> >rebuilding the data page - right?
> >
> >If changing an element to full size causes it to no longer fit on the
data
> >page, then page splits occur - that's alot of work for the engine to
handle.
> It's been a while since I read this part, but I think that's not
> entirely correct. The row's data itself will have to be rebuilt, as
> there can be no gaps between the contents of the varying length
> columns. If the row is shortened, it will stay where it is. If the
> row's size increases, it is replaced by a forwarding pointer and the
> new row's content is stored somewhere else; the other rows on the same
> page remain intact. If the row has to move again later, the original
> forwarding pointer is changed (to prevent long chains of pointers).
>
> >Page 233 indicates that "no one answer is right" and "If you understand
the
> >tradeoffs; you'll be able to make the best choice".
> Yes - a terrific piece of advice that I support wholeheartedly!
>
> Best, Hugo
> --
> (Remove _NO_ and _SPAM_ to get my e-mail address)|||Thanks Hugo - Your discussion is very helpful to me.
Prabhat
"Hugo Kornelis" <hugo@.pe_NO_rFact.in_SPAM_fo> wrote in message
news:bl45a0tjo06s6npdrm9rf83mg8o4n80uik@.4ax.com...
> On Wed, 12 May 2004 09:36:31 -0400, Steve Z wrote:
> >Hey guys, he was asking how it was stored - it's always good to know what
> >goes on behind the scenes.
> I agree. That's why I replied to your post, acknowledging that you
> were almost completely right, correcting a small error and adding an
> important consideration. I have seen too many examples of people using
> only fixed-length charachter data in order to avoid the overhead of
> having to find the start and end of the data. Much as I'd like to
> believe that people will realise by themselves that this overhead is
> nothing compared to the increased I/O, I can't.
>
> >Would you ever create a table with lets say 100 single character VARCHAR
> >fields? Would there be value to that, if the columns were guaranteed to
> >have a character in them (ie. not a space).
> Of course not. If the maximum length is short (<= 10 characters), I
> always use CHAR. If it's longer, but most values use almost the full
> maximum length, I also use CHAR. I use VARCHAR if long and short
> strings are interspersed.
>
> >How would a primary clustered key behave if it was built on a VARCHAR
field
> >vs. a CHAR field?
> It depends. If the column has a maximum length of 250 characters, but
> most values are a lot shorter, using VARCHAR would reduce the number
> of index pages on the intermediate levels. The leaf level would of
> course not be affected (apart from the effect of any CHAR vs VARCHAR
> column decision on the data pages). For an unindexed key, the leaf
> level would need less pages as well. If the table holds many rows, I
> think (I haven't tested it!) that the number of I/O's would be less,
> hence the queries would return results faster. All this at the price
> of the CPU having less idle time.
>
> >And, please explain to me how the 100th VARCHAR fields byte position can
be
> >determined in a buffer of data without reading each of the prior 99
VARCHAR
> >control fields? Without knowing the length of each of the prior 99 data
> >elements in that row, how could the starting byte of the 100th element be
> >determined?
> Easy. Take the end position of the 99th element (stored ar a fixed
> position in the row, after the fixed length and before the varying
> length data) and add 1.
>
> > Since the length of the VARCHAR is limited to 8000 byte - that
> >takes 14 of the 16 bits in the 2-byte control field. So, I can assume
from
> >that fact that the 2-byte control field is not storing the "position" of
the
> >element, but simply the length of each element.
> You seem to forget that the length of the complete row may not exceed
> 8,060 bytes (check BOL, Maximum Capacity Specifications). The 2-byte
> control field will hold the ending position of each element just fine.
>
> >Inside MS SQL Server 2000, by Kalen Delany, on page 230 states that
VARCHAR
> >are most appropriate when you expect significant differences in the
lengths
> >of the data in the column and when the data length in the column wont
> >frequently be changed.
> Apart from the page number, this has apparently not changed from the
> previous edition (Inside MS SQL Server 7.0, by Ron Soukup and Kalen
> Delany).
>
> >Changing the length frequently would required
> >rebuilding the data page - right?
> >
> >If changing an element to full size causes it to no longer fit on the
data
> >page, then page splits occur - that's alot of work for the engine to
handle.
> It's been a while since I read this part, but I think that's not
> entirely correct. The row's data itself will have to be rebuilt, as
> there can be no gaps between the contents of the varying length
> columns. If the row is shortened, it will stay where it is. If the
> row's size increases, it is replaced by a forwarding pointer and the
> new row's content is stored somewhere else; the other rows on the same
> page remain intact. If the row has to move again later, the original
> forwarding pointer is changed (to prevent long chains of pointers).
>
> >Page 233 indicates that "no one answer is right" and "If you understand
the
> >tradeoffs; you'll be able to make the best choice".
> Yes - a terrific piece of advice that I support wholeheartedly!
>
> Best, Hugo
> --
> (Remove _NO_ and _SPAM_ to get my e-mail address)|||> And, please explain to me how the 100th VARCHAR fields byte position can
be
> determined in a buffer of data without reading each of the prior 99
VARCHAR
> control fields? Without knowing the length of each of the prior 99 data
> elements in that row, how could the starting byte of the 100th element be
> determined? Since the length of the VARCHAR is limited to 8000 byte -
that
> takes 14 of the 16 bits in the 2-byte control field. So, I can assume
from
> that fact that the 2-byte control field is not storing the "position" of
the
> element, but simply the length of each element.
It's a table from "Inside MS SQL Server 2000" book; pay attention to a row
with mnemonic "VarOffset"
Table 6-7. Information stored in a table's data rows.
Information Mnemonic Size
Status Bits A TagA 1 byte
Status Bits B (not used in SQL Server 2000) TagB 1 byte
Fixed-length size Fsize 2 bytes
Fixed-length data Fdata Fsize -4
Number of columns Ncol 2 bytes
NULL bitmap (1 bit for each column in table; a 1 indicates that the
corresponding column is NULL) Nullbits Ceiling (Ncol / 8)
Number of variable-length columns VarCount 2 bytes
Variable column offset array VarOffset 2 * VarCount
Variable-length data VarData VarOff[VarCount] - (Fsize + 4 + Ceiling
(Ncol / 8) + 2 * VarCount)|||Then the assumption being bantered about here about the control field for a
varchar element being two bytes in size was wrong. It's storing roughly
two-bytes for each varchar element for the offset and another two-bytes for
the element length.
Thanks for the table - I hadn't stumbled upon that info.
This will help the original poster of this question with gaining a full
understanding of impact of char vs. varchar.
"Alex Cieszinski" <nomail@.thanks> wrote in message
news:%23zq%23xKaOEHA.3380@.TK2MSFTNGP11.phx.gbl...
> > And, please explain to me how the 100th VARCHAR fields byte position can
> be
> > determined in a buffer of data without reading each of the prior 99
> VARCHAR
> > control fields? Without knowing the length of each of the prior 99 data
> > elements in that row, how could the starting byte of the 100th element
be
> > determined? Since the length of the VARCHAR is limited to 8000 byte -
> that
> > takes 14 of the 16 bits in the 2-byte control field. So, I can assume
> from
> > that fact that the 2-byte control field is not storing the "position" of
> the
> > element, but simply the length of each element.
> It's a table from "Inside MS SQL Server 2000" book; pay attention to a row
> with mnemonic "VarOffset"
> Table 6-7. Information stored in a table's data rows.
> Information Mnemonic Size
> Status Bits A TagA 1 byte
> Status Bits B (not used in SQL Server 2000) TagB 1 byte
> Fixed-length size Fsize 2 bytes
> Fixed-length data Fdata Fsize -4
> Number of columns Ncol 2 bytes
> NULL bitmap (1 bit for each column in table; a 1 indicates that the
> corresponding column is NULL) Nullbits Ceiling (Ncol / 8)
> Number of variable-length columns VarCount 2 bytes
> Variable column offset array VarOffset 2 * VarCount
> Variable-length data VarData VarOff[VarCount] - (Fsize + 4 + Ceiling
> (Ncol / 8) + 2 * VarCount)
>
>|||Hi Alex,
As I am not a advance User of SQL Server So I could not get that table. Can
U please elaborate that, What is that info about and what does that mean.
As Steve told that "This will help the original poster of this question with
gaining a full understanding of impact of char vs. varchar." So for that I
wanted to know what is that info all about.
Thanks
Prabhat
"Alex Cieszinski" <nomail@.thanks> wrote in message
news:#zq#xKaOEHA.3380@.TK2MSFTNGP11.phx.gbl...
> > And, please explain to me how the 100th VARCHAR fields byte position can
> be
> > determined in a buffer of data without reading each of the prior 99
> VARCHAR
> > control fields? Without knowing the length of each of the prior 99 data
> > elements in that row, how could the starting byte of the 100th element
be
> > determined? Since the length of the VARCHAR is limited to 8000 byte -
> that
> > takes 14 of the 16 bits in the 2-byte control field. So, I can assume
> from
> > that fact that the 2-byte control field is not storing the "position" of
> the
> > element, but simply the length of each element.
> It's a table from "Inside MS SQL Server 2000" book; pay attention to a row
> with mnemonic "VarOffset"
> Table 6-7. Information stored in a table's data rows.
> Information Mnemonic Size
> Status Bits A TagA 1 byte
> Status Bits B (not used in SQL Server 2000) TagB 1 byte
> Fixed-length size Fsize 2 bytes
> Fixed-length data Fdata Fsize -4
> Number of columns Ncol 2 bytes
> NULL bitmap (1 bit for each column in table; a 1 indicates that the
> corresponding column is NULL) Nullbits Ceiling (Ncol / 8)
> Number of variable-length columns VarCount 2 bytes
> Variable column offset array VarOffset 2 * VarCount
> Variable-length data VarData VarOff[VarCount] - (Fsize + 4 + Ceiling
> (Ncol / 8) + 2 * VarCount)
>
>|||The letter just lost formatting - that's why you couldn't catch what it does
really means..
Table 6-7. Information stored in a table's data rows.
Information Mnemonic Size
Status Bits A TagA 1 byte
Status Bits B TagB 1 byte
Fixed-length size Fsize 2
bytes
Fixed-length data Fdata Fsize -4
Number of columns Ncol 2 bytes
NULL bitmap NullbitsCeiling (Ncol / 8)
Number of variable-length columns VarCount 2 bytes
Variable column offset array VarOffset 2 * VarCount
Variable-length data VarData
VarOff[VarCount] - (Fsize + 4 +
(Ncol / 8) + 2 * VarCount)
This table describes which part every data row in database consists of...
now I think it should be clear what are they and how fixed-length data
differs from variable-length
"Prabhat" <not_a_mail@.hotmail.com> ÓÏÏÂÝÉÌ/ÓÏÏÂÝÉÌÁ × ÎÏ×ÏÓÔÑÈ ÓÌÅÄÕÀÝÅÅ:
news:OZYqA1cOEHA.1392@.TK2MSFTNGP09.phx.gbl...
> Hi Alex,
> As I am not a advance User of SQL Server So I could not get that table.
Can
> U please elaborate that, What is that info about and what does that mean.
> As Steve told that "This will help the original poster of this question
with
> gaining a full understanding of impact of char vs. varchar." So for that I
> wanted to know what is that info all about.
> Thanks
> Prabhat
>
> "Alex Cieszinski" <nomail@.thanks> wrote in message
> news:#zq#xKaOEHA.3380@.TK2MSFTNGP11.phx.gbl...
> > > And, please explain to me how the 100th VARCHAR fields byte position
can
> > be
> > > determined in a buffer of data without reading each of the prior 99
> > VARCHAR
> > > control fields? Without knowing the length of each of the prior 99
data
> > > elements in that row, how could the starting byte of the 100th element
> be
> > > determined? Since the length of the VARCHAR is limited to 8000 byte -
> > that
> > > takes 14 of the 16 bits in the 2-byte control field. So, I can assume
> > from
> > > that fact that the 2-byte control field is not storing the "position"
of
> > the
> > > element, but simply the length of each element.
> >
> > It's a table from "Inside MS SQL Server 2000" book; pay attention to a
row
> > with mnemonic "VarOffset"
> >
> > Table 6-7. Information stored in a table's data rows.
> >
> > Information Mnemonic Size
> > Status Bits A TagA 1 byte
> > Status Bits B (not used in SQL Server 2000) TagB 1 byte
> > Fixed-length size Fsize 2 bytes
> > Fixed-length data Fdata Fsize -4
> > Number of columns Ncol 2 bytes
> > NULL bitmap (1 bit for each column in table; a 1 indicates that
the
> > corresponding column is NULL) Nullbits Ceiling (Ncol / 8)
> > Number of variable-length columns VarCount 2 bytes
> > Variable column offset array VarOffset 2 * VarCount
> > Variable-length data VarData VarOff[VarCount] - (Fsize + 4 +
Ceiling
> > (Ncol / 8) + 2 * VarCount)
> >
> >
> >
> >
>|||On Fri, 14 May 2004 10:33:44 -0400, Steve Z wrote:
>Then the assumption being bantered about here about the control field for a
>varchar element being two bytes in size was wrong. It's storing roughly
>two-bytes for each varchar element for the offset and another two-bytes for
>the element length.
Hi Steve,
No, you're mistaken.
It's storing exactly two bytes for each varchar element for the offset (OF
THE LAST BYTE, not the start!!), but not a single byte for the length. The
length is the difference between two succesive end locations.
If one varchar value ends at position 100 and the next ends at position
108, the latter must surely be 8 bytes.
Best, Hugo
--
(Remove _NO_ and _SPAM_ to get my e-mail address)|||Thanks Alex,
But I am not clear on these terms, as I am a biggner to SQL Server just
trying to lean how SQL Server stores data and manage the edits to Char and
Varchar Data. And As well as how the Indexes as gets update for those types
of columns. How the Data from those column are retrived. etc. etc.
But the Entire Discussion was really helpful for me. But As I can guess that
your reply (table 6.7) gives the answer of how the data is stored in DB. But
to be frank I am not clear on these. I will try to learn some basic
Internals of SQL Server. Like Database, DataFile.. And all the Units of
storage etc etc and How these are managed. How Data is saved and retrived.
How a Transaction occurs. Like in oracle it has some thing like user table
space and temporary table space etc. etc.
Of course I am not an advanced SQL Server User/Programmer but As My current
Project is usiong SQL Server So I am leaning these basics of Database (I am
good in TSQL). This newsgroup helps me a lot. Of cource this thread also.
It will be helpful I will get the above detals.
Thanks
Prabhat
"Alex Cieszinski" <nomail@.1> wrote in message
news:O4M2r7fOEHA.3596@.tk2msftngp13.phx.gbl...
> The letter just lost formatting - that's why you couldn't catch what it
does
> really means..
> Table 6-7. Information stored in a table's data rows.
> Information Mnemonic Size
> Status Bits A TagA 1
byte
> Status Bits B TagB 1
byte
> Fixed-length size Fsize 2
> bytes
> Fixed-length data Fdata
Fsize -4
> Number of columns Ncol 2 bytes
> NULL bitmap NullbitsCeiling (Ncol / 8)
> Number of variable-length columns VarCount 2 bytes
> Variable column offset array VarOffset 2 * VarCount
> Variable-length data VarData
> VarOff[VarCount] - (Fsize + 4 +
> (Ncol / 8) + 2 * VarCount)
> This table describes which part every data row in database consists of...
> now I think it should be clear what are they and how fixed-length data
> differs from variable-length
>
> "Prabhat" <not_a_mail@.hotmail.com> ÓÏÏÂÝÉÌ/ÓÏÏÂÝÉÌÁ × ÎÏ×ÏÓÔÑÈ ÓÌÅÄÕÀÝÅÅ:
> news:OZYqA1cOEHA.1392@.TK2MSFTNGP09.phx.gbl...
> > Hi Alex,
> >
> > As I am not a advance User of SQL Server So I could not get that table.
> Can
> > U please elaborate that, What is that info about and what does that
mean.
> >
> > As Steve told that "This will help the original poster of this question
> with
> > gaining a full understanding of impact of char vs. varchar." So for that
I
> > wanted to know what is that info all about.
> >
> > Thanks
> > Prabhat
> >
> >
> > "Alex Cieszinski" <nomail@.thanks> wrote in message
> > news:#zq#xKaOEHA.3380@.TK2MSFTNGP11.phx.gbl...
> > > > And, please explain to me how the 100th VARCHAR fields byte position
> can
> > > be
> > > > determined in a buffer of data without reading each of the prior 99
> > > VARCHAR
> > > > control fields? Without knowing the length of each of the prior 99
> data
> > > > elements in that row, how could the starting byte of the 100th
element
> > be
> > > > determined? Since the length of the VARCHAR is limited to 8000
byte -
> > > that
> > > > takes 14 of the 16 bits in the 2-byte control field. So, I can
assume
> > > from
> > > > that fact that the 2-byte control field is not storing the
"position"
> of
> > > the
> > > > element, but simply the length of each element.
> > >
> > > It's a table from "Inside MS SQL Server 2000" book; pay attention to a
> row
> > > with mnemonic "VarOffset"
> > >
> > > Table 6-7. Information stored in a table's data rows.
> > >
> > > Information Mnemonic Size
> > > Status Bits A TagA 1 byte
> > > Status Bits B (not used in SQL Server 2000) TagB 1 byte
> > > Fixed-length size Fsize 2 bytes
> > > Fixed-length data Fdata Fsize -4
> > > Number of columns Ncol 2 bytes
> > > NULL bitmap (1 bit for each column in table; a 1 indicates that
> the
> > > corresponding column is NULL) Nullbits Ceiling (Ncol / 8)
> > > Number of variable-length columns VarCount 2 bytes
> > > Variable column offset array VarOffset 2 * VarCount
> > > Variable-length data VarData VarOff[VarCount] - (Fsize + 4 +
> Ceiling
> > > (Ncol / 8) + 2 * VarCount)
> > >
> > >
> > >
> > >
> >
> >
>|||I think the best solution to learn this is to read "Inside MS SQL Server
2000" book by Kalen Delaney - this book contains numerous information on
how MSSQL is functioning with inside information
"Prabhat" <not_a_mail@.hotmail.com> ÓÏÏÂÝÉÌ/ÓÏÏÂÝÉÌÁ × ÎÏ×ÏÓÔÑÈ ÓÌÅÄÕÀÝÅÅ:
news:eQKNCrkOEHA.3028@.TK2MSFTNGP11.phx.gbl...
> Thanks Alex,
> But I am not clear on these terms, as I am a biggner to SQL Server just
> trying to lean how SQL Server stores data and manage the edits to Char and
> Varchar Data. And As well as how the Indexes as gets update for those
types
> of columns. How the Data from those column are retrived. etc. etc.
> But the Entire Discussion was really helpful for me. But As I can guess
that
> your reply (table 6.7) gives the answer of how the data is stored in DB.
But
> to be frank I am not clear on these. I will try to learn some basic
> Internals of SQL Server. Like Database, DataFile.. And all the Units of
> storage etc etc and How these are managed. How Data is saved and retrived.
> How a Transaction occurs. Like in oracle it has some thing like user table
> space and temporary table space etc. etc.
> Of course I am not an advanced SQL Server User/Programmer but As My
current
> Project is usiong SQL Server So I am leaning these basics of Database (I
am
> good in TSQL). This newsgroup helps me a lot. Of cource this thread also.
> It will be helpful I will get the above detals.
> Thanks
> Prabhat
>
> "Alex Cieszinski" <nomail@.1> wrote in message
> news:O4M2r7fOEHA.3596@.tk2msftngp13.phx.gbl...
> > The letter just lost formatting - that's why you couldn't catch what it
> does
> > really means..
> >
> > Table 6-7. Information stored in a table's data rows.
> >
> > Information Mnemonic Size
> > Status Bits A TagA 1
> byte
> > Status Bits B TagB 1
> byte
> > Fixed-length size Fsize 2
> > bytes
> > Fixed-length data Fdata
> Fsize -4
> > Number of columns Ncol 2 bytes
> > NULL bitmap NullbitsCeiling (Ncol /
8)
> > Number of variable-length columns VarCount 2 bytes
> > Variable column offset array VarOffset 2 *
VarCount
> > Variable-length data VarData
> > VarOff[VarCount] - (Fsize + 4 +
> > (Ncol / 8) + 2 * VarCount)
> >
> > This table describes which part every data row in database consists
of...
> > now I think it should be clear what are they and how fixed-length data
> > differs from variable-length
> >
> >
> > "Prabhat" <not_a_mail@.hotmail.com> ÓÏÏÂÝÉÌ/ÓÏÏÂÝÉÌÁ × ÎÏ×ÏÓÔÑÈ
ÓÌÅÄÕÀÝÅÅ:
> > news:OZYqA1cOEHA.1392@.TK2MSFTNGP09.phx.gbl...
> > > Hi Alex,
> > >
> > > As I am not a advance User of SQL Server So I could not get that
table.
> > Can
> > > U please elaborate that, What is that info about and what does that
> mean.
> > >
> > > As Steve told that "This will help the original poster of this
question
> > with
> > > gaining a full understanding of impact of char vs. varchar." So for
that
> I
> > > wanted to know what is that info all about.
> > >
> > > Thanks
> > > Prabhat
> > >
> > >
> > > "Alex Cieszinski" <nomail@.thanks> wrote in message
> > > news:#zq#xKaOEHA.3380@.TK2MSFTNGP11.phx.gbl...
> > > > > And, please explain to me how the 100th VARCHAR fields byte
position
> > can
> > > > be
> > > > > determined in a buffer of data without reading each of the prior
99
> > > > VARCHAR
> > > > > control fields? Without knowing the length of each of the prior
99
> > data
> > > > > elements in that row, how could the starting byte of the 100th
> element
> > > be
> > > > > determined? Since the length of the VARCHAR is limited to 8000
> byte -
> > > > that
> > > > > takes 14 of the 16 bits in the 2-byte control field. So, I can
> assume
> > > > from
> > > > > that fact that the 2-byte control field is not storing the
> "position"
> > of
> > > > the
> > > > > element, but simply the length of each element.
> > > >
> > > > It's a table from "Inside MS SQL Server 2000" book; pay attention to
a
> > row
> > > > with mnemonic "VarOffset"
> > > >
> > > > Table 6-7. Information stored in a table's data rows.
> > > >
> > > > Information Mnemonic Size
> > > > Status Bits A TagA 1 byte
> > > > Status Bits B (not used in SQL Server 2000) TagB 1 byte
> > > > Fixed-length size Fsize 2 bytes
> > > > Fixed-length data Fdata Fsize -4
> > > > Number of columns Ncol 2 bytes
> > > > NULL bitmap (1 bit for each column in table; a 1 indicates
that
> > the
> > > > corresponding column is NULL) Nullbits Ceiling (Ncol / 8)
> > > > Number of variable-length columns VarCount 2 bytes
> > > > Variable column offset array VarOffset 2 * VarCount
> > > > Variable-length data VarData VarOff[VarCount] - (Fsize + 4 +
> > Ceiling
> > > > (Ncol / 8) + 2 * VarCount)
> > > >
> > > >
> > > >
> > > >
> > >
> > >
> >
> >
>
Char and Varchar
Hi All,
How internally SQL Server Stores Data in a Char and Varchar field. Like I
think there will be an extra bit of data which will store for Varchar -
Which is for Number of Characters in the field. Is something similar in Char
also?
Also How Does SQL Server Retrive records from both types of field?
Thanks
Prabhat
Buy "Inside SQL Server" by Kalen Delaney for the complete explanation.
The shortened version is that a VARCHAR uses 2 extra bytes (to store the
actual length of the data in the column, or more accurately, the end
position of the column).
Jacco Schalkwijk
SQL Server MVP
"Prabhat" <not_a_mail@.hotmail.com> wrote in message
news:OSfHSS3NEHA.644@.tk2msftngp13.phx.gbl...
> Hi All,
> How internally SQL Server Stores Data in a Char and Varchar field. Like I
> think there will be an extra bit of data which will store for Varchar -
> Which is for Number of Characters in the field. Is something similar in
Char
> also?
> Also How Does SQL Server Retrive records from both types of field?
> Thanks
> Prabhat
>
|||I believe that I've also read in this forum that all the CHAR columns are
physically arranged and stored at the begining of the data block, regardless
of there position in the DDL.
After the CHAR columns, the VARCHAR columns come next - with the 2 extra
bytes to store the "end byte", or basically the next starting offset of the
next VARCHAR column.
Access to the CHAR columns would then be faster than access to the VARCHAR
columns, as the byte positions are fixed for each record. Any access to
VARCHAR columns requires "byte-offset" checking for each column, one at a
time.
"Jacco Schalkwijk" <NOSPAMjaccos@.eurostop.co.uk> wrote in message
news:ezCg2x4NEHA.3348@.TK2MSFTNGP09.phx.gbl...[vbcol=seagreen]
> Buy "Inside SQL Server" by Kalen Delaney for the complete explanation.
> The shortened version is that a VARCHAR uses 2 extra bytes (to store the
> actual length of the data in the column, or more accurately, the end
> position of the column).
> --
> Jacco Schalkwijk
> SQL Server MVP
> "Prabhat" <not_a_mail@.hotmail.com> wrote in message
> news:OSfHSS3NEHA.644@.tk2msftngp13.phx.gbl...
I
> Char
>
|||On the other hand, access to tables having CHAR fields may be much slower
than to the same tables with VARCHAR ones if you rarely use all (or almost
all) length of the field in actual data - cause the more records are located
on one page the less I/O operations SQL Server will make during group table
reads
"Steve Z" <szlamany@.antarescomputing_no_spam.com> wrote in message
news:uCvQmy5NEHA.620@.TK2MSFTNGP10.phx.gbl...
> I believe that I've also read in this forum that all the CHAR columns are
> physically arranged and stored at the begining of the data block,
regardless
> of there position in the DDL.
> After the CHAR columns, the VARCHAR columns come next - with the 2 extra
> bytes to store the "end byte", or basically the next starting offset of
the
> next VARCHAR column.
> Access to the CHAR columns would then be faster than access to the VARCHAR
> columns, as the byte positions are fixed for each record. Any access to
> VARCHAR columns requires "byte-offset" checking for each column, one at a
> time.
|||On Tue, 11 May 2004 17:11:16 -0400, Steve Z wrote:
>I believe that I've also read in this forum that all the CHAR columns are
>physically arranged and stored at the begining of the data block, regardless
>of there position in the DDL.
>After the CHAR columns, the VARCHAR columns come next - with the 2 extra
>bytes to store the "end byte", or basically the next starting offset of the
>next VARCHAR column.
>Access to the CHAR columns would then be faster than access to the VARCHAR
>columns, as the byte positions are fixed for each record. Any access to
>VARCHAR columns requires "byte-offset" checking for each column, one at a
>time.
Hi Steve,
Almost true. In fact, all fixed-length columns are stored at the
beginning of the data block and all variable-length columns at the
end. Access to a varchar column does not imply looping through all
other varchar columns stored before the column you need (as might be
concluded from your post); the end position of whatever is stored
before the varchar column being sought and the end position of that
column itself can be found directly.
This does introduce a little bit of overhead in CPU use and reading
memory, but this is compensated many times by the reduced physical I/O
(see Alex' message)
Best, Hugo
(Remove _NO_ and _SPAM_ to get my e-mail address)
|||I agree. These kind of considerations are pretty much irrelevant for
performance compared to proper database design, query writing and indexing.
Jacco Schalkwijk
SQL Server MVP
"Hugo Kornelis" <hugo@.pe_NO_rFact.in_SPAM_fo> wrote in message
news:dbp3a0hd0808238thks1lrjabd12umgfqv@.4ax.com... [vbcol=seagreen]
> On Tue, 11 May 2004 17:11:16 -0400, Steve Z wrote:
regardless[vbcol=seagreen]
the[vbcol=seagreen]
VARCHAR
> Hi Steve,
> Almost true. In fact, all fixed-length columns are stored at the
> beginning of the data block and all variable-length columns at the
> end. Access to a varchar column does not imply looping through all
> other varchar columns stored before the column you need (as might be
> concluded from your post); the end position of whatever is stored
> before the varchar column being sought and the end position of that
> column itself can be found directly.
> This does introduce a little bit of overhead in CPU use and reading
> memory, but this is compensated many times by the reduced physical I/O
> (see Alex' message)
> Best, Hugo
> --
> (Remove _NO_ and _SPAM_ to get my e-mail address)
|||Hey guys, he was asking how it was stored - it's always good to know what
goes on behind the scenes.
Would you ever create a table with lets say 100 single character VARCHAR
fields? Would there be value to that, if the columns were guaranteed to
have a character in them (ie. not a space).
How would a primary clustered key behave if it was built on a VARCHAR field
vs. a CHAR field?
And, please explain to me how the 100th VARCHAR fields byte position can be
determined in a buffer of data without reading each of the prior 99 VARCHAR
control fields? Without knowing the length of each of the prior 99 data
elements in that row, how could the starting byte of the 100th element be
determined? Since the length of the VARCHAR is limited to 8000 byte - that
takes 14 of the 16 bits in the 2-byte control field. So, I can assume from
that fact that the 2-byte control field is not storing the "position" of the
element, but simply the length of each element.
Inside MS SQL Server 2000, by Kalen Delany, on page 230 states that VARCHAR
are most appropriate when you expect significant differences in the lengths
of the data in the column and when the data length in the column wont
frequently be changed. Changing the length frequently would required
rebuilding the data page - right?
If changing an element to full size causes it to no longer fit on the data
page, then page splits occur - that's alot of work for the engine to handle.
Page 233 indicates that "no one answer is right" and "If you understand the
tradeoffs; you'll be able to make the best choice".
"Jacco Schalkwijk" <NOSPAMjaccos@.eurostop.co.uk> wrote in message
news:e3a8hrAOEHA.3988@.tk2msftngp13.phx.gbl...
> I agree. These kind of considerations are pretty much irrelevant for
> performance compared to proper database design, query writing and
indexing.[vbcol=seagreen]
> --
> Jacco Schalkwijk
> SQL Server MVP
>
> "Hugo Kornelis" <hugo@.pe_NO_rFact.in_SPAM_fo> wrote in message
> news:dbp3a0hd0808238thks1lrjabd12umgfqv@.4ax.com...
are[vbcol=seagreen]
> regardless
extra[vbcol=seagreen]
> the
> VARCHAR
to[vbcol=seagreen]
a
>
|||X-Newsreader: Forte Agent 1.91/32.564
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-Complaints-To: abuse@.supernews.com
Lines: 95
Path: TK2MSFTNGP08.phx.gbl!newsfeed00.sul.t-online.de!t-online.de!newsfeed.gamma.ru!Gamma.RU!newsfeed.medi a.kyoto-u.ac.jp!newshosting.com!nx02.iad01.newshosting.com !news-feed01.roc.ny.frontiernet.net!nntp.frontiernet.net !tdsnet-transit!newspeer.tds.net!ne
wspeer.radix.net!news-spur1.maxwell.syr.edu!news.maxwell.syr.edu!sn-xit-03!sn-xit-06!sn-post-01!supernews.com!corp.supernews.com!not-for-mail
Xref: TK2MSFTNGP08.phx.gbl microsoft.public.sqlserver.server:343056 microsoft.public.sqlserver.programming:446557
On Wed, 12 May 2004 09:36:31 -0400, Steve Z wrote:
>Hey guys, he was asking how it was stored - it's always good to know what
>goes on behind the scenes.
I agree. That's why I replied to your post, acknowledging that you
were almost completely right, correcting a small error and adding an
important consideration. I have seen too many examples of people using
only fixed-length charachter data in order to avoid the overhead of
having to find the start and end of the data. Much as I'd like to
believe that people will realise by themselves that this overhead is
nothing compared to the increased I/O, I can't.
>Would you ever create a table with lets say 100 single character VARCHAR
>fields? Would there be value to that, if the columns were guaranteed to
>have a character in them (ie. not a space).
Of course not. If the maximum length is short (<= 10 characters), I
always use CHAR. If it's longer, but most values use almost the full
maximum length, I also use CHAR. I use VARCHAR if long and short
strings are interspersed.
>How would a primary clustered key behave if it was built on a VARCHAR field
>vs. a CHAR field?
It depends. If the column has a maximum length of 250 characters, but
most values are a lot shorter, using VARCHAR would reduce the number
of index pages on the intermediate levels. The leaf level would of
course not be affected (apart from the effect of any CHAR vs VARCHAR
column decision on the data pages). For an unindexed key, the leaf
level would need less pages as well. If the table holds many rows, I
think (I haven't tested it!) that the number of I/O's would be less,
hence the queries would return results faster. All this at the price
of the CPU having less idle time.
>And, please explain to me how the 100th VARCHAR fields byte position can be
>determined in a buffer of data without reading each of the prior 99 VARCHAR
>control fields? Without knowing the length of each of the prior 99 data
>elements in that row, how could the starting byte of the 100th element be
>determined?
Easy. Take the end position of the 99th element (stored ar a fixed
position in the row, after the fixed length and before the varying
length data) and add 1.
> Since the length of the VARCHAR is limited to 8000 byte - that
>takes 14 of the 16 bits in the 2-byte control field. So, I can assume from
>that fact that the 2-byte control field is not storing the "position" of the
>element, but simply the length of each element.
You seem to forget that the length of the complete row may not exceed
8,060 bytes (check BOL, Maximum Capacity Specifications). The 2-byte
control field will hold the ending position of each element just fine.
>Inside MS SQL Server 2000, by Kalen Delany, on page 230 states that VARCHAR
>are most appropriate when you expect significant differences in the lengths
>of the data in the column and when the data length in the column wont
>frequently be changed.
Apart from the page number, this has apparently not changed from the
previous edition (Inside MS SQL Server 7.0, by Ron Soukup and Kalen
Delany).
>Changing the length frequently would required
>rebuilding the data page - right?
>If changing an element to full size causes it to no longer fit on the data
>page, then page splits occur - that's alot of work for the engine to handle.
It's been a while since I read this part, but I think that's not
entirely correct. The row's data itself will have to be rebuilt, as
there can be no gaps between the contents of the varying length
columns. If the row is shortened, it will stay where it is. If the
row's size increases, it is replaced by a forwarding pointer and the
new row's content is stored somewhere else; the other rows on the same
page remain intact. If the row has to move again later, the original
forwarding pointer is changed (to prevent long chains of pointers).
>Page 233 indicates that "no one answer is right" and "If you understand the
>tradeoffs; you'll be able to make the best choice".
Yes - a terrific piece of advice that I support wholeheartedly!
Best, Hugo
(Remove _NO_ and _SPAM_ to get my e-mail address)
|||> >Changing the length frequently would required[vbcol=seagreen]
data[vbcol=seagreen]
handle.
> It's been a while since I read this part, but I think that's not
> entirely correct. The row's data itself will have to be rebuilt, as
> there can be no gaps between the contents of the varying length
> columns. If the row is shortened, it will stay where it is. If the
> row's size increases, it is replaced by a forwarding pointer and the
> new row's content is stored somewhere else; the other rows on the same
> page remain intact. If the row has to move again later, the original
> forwarding pointer is changed (to prevent long chains of pointers).
That is only correct for heaps, tables without a clustered index. In a table
with a clustered index this scenario will cause a page split, as the rows
have to stay in the same position in the clustered index.
Jacco Schalkwijk
SQL Server MVP
"Hugo Kornelis" <hugo@.pe_NO_rFact.in_SPAM_fo> wrote in message
news:bl45a0tjo06s6npdrm9rf83mg8o4n80uik@.4ax.com... [vbcol=seagreen]
> On Wed, 12 May 2004 09:36:31 -0400, Steve Z wrote:
>
> I agree. That's why I replied to your post, acknowledging that you
> were almost completely right, correcting a small error and adding an
> important consideration. I have seen too many examples of people using
> only fixed-length charachter data in order to avoid the overhead of
> having to find the start and end of the data. Much as I'd like to
> believe that people will realise by themselves that this overhead is
> nothing compared to the increased I/O, I can't.
>
> Of course not. If the maximum length is short (<= 10 characters), I
> always use CHAR. If it's longer, but most values use almost the full
> maximum length, I also use CHAR. I use VARCHAR if long and short
> strings are interspersed.
>
field[vbcol=seagreen]
> It depends. If the column has a maximum length of 250 characters, but
> most values are a lot shorter, using VARCHAR would reduce the number
> of index pages on the intermediate levels. The leaf level would of
> course not be affected (apart from the effect of any CHAR vs VARCHAR
> column decision on the data pages). For an unindexed key, the leaf
> level would need less pages as well. If the table holds many rows, I
> think (I haven't tested it!) that the number of I/O's would be less,
> hence the queries would return results faster. All this at the price
> of the CPU having less idle time.
>
be[vbcol=seagreen]
VARCHAR[vbcol=seagreen]
> Easy. Take the end position of the 99th element (stored ar a fixed
> position in the row, after the fixed length and before the varying
> length data) and add 1.
>
from[vbcol=seagreen]
the[vbcol=seagreen]
> You seem to forget that the length of the complete row may not exceed
> 8,060 bytes (check BOL, Maximum Capacity Specifications). The 2-byte
> control field will hold the ending position of each element just fine.
>
VARCHAR[vbcol=seagreen]
lengths[vbcol=seagreen]
> Apart from the page number, this has apparently not changed from the
> previous edition (Inside MS SQL Server 7.0, by Ron Soukup and Kalen
> Delany).
>
data[vbcol=seagreen]
handle.[vbcol=seagreen]
> It's been a while since I read this part, but I think that's not
> entirely correct. The row's data itself will have to be rebuilt, as
> there can be no gaps between the contents of the varying length
> columns. If the row is shortened, it will stay where it is. If the
> row's size increases, it is replaced by a forwarding pointer and the
> new row's content is stored somewhere else; the other rows on the same
> page remain intact. If the row has to move again later, the original
> forwarding pointer is changed (to prevent long chains of pointers).
>
the
> Yes - a terrific piece of advice that I support wholeheartedly!
>
> Best, Hugo
> --
> (Remove _NO_ and _SPAM_ to get my e-mail address)
|||Thanks Hugo - Your discussion is very helpful to me.
Prabhat
"Hugo Kornelis" <hugo@.pe_NO_rFact.in_SPAM_fo> wrote in message
news:bl45a0tjo06s6npdrm9rf83mg8o4n80uik@.4ax.com... [vbcol=seagreen]
> On Wed, 12 May 2004 09:36:31 -0400, Steve Z wrote:
>
> I agree. That's why I replied to your post, acknowledging that you
> were almost completely right, correcting a small error and adding an
> important consideration. I have seen too many examples of people using
> only fixed-length charachter data in order to avoid the overhead of
> having to find the start and end of the data. Much as I'd like to
> believe that people will realise by themselves that this overhead is
> nothing compared to the increased I/O, I can't.
>
> Of course not. If the maximum length is short (<= 10 characters), I
> always use CHAR. If it's longer, but most values use almost the full
> maximum length, I also use CHAR. I use VARCHAR if long and short
> strings are interspersed.
>
field[vbcol=seagreen]
> It depends. If the column has a maximum length of 250 characters, but
> most values are a lot shorter, using VARCHAR would reduce the number
> of index pages on the intermediate levels. The leaf level would of
> course not be affected (apart from the effect of any CHAR vs VARCHAR
> column decision on the data pages). For an unindexed key, the leaf
> level would need less pages as well. If the table holds many rows, I
> think (I haven't tested it!) that the number of I/O's would be less,
> hence the queries would return results faster. All this at the price
> of the CPU having less idle time.
>
be[vbcol=seagreen]
VARCHAR[vbcol=seagreen]
> Easy. Take the end position of the 99th element (stored ar a fixed
> position in the row, after the fixed length and before the varying
> length data) and add 1.
>
from[vbcol=seagreen]
the[vbcol=seagreen]
> You seem to forget that the length of the complete row may not exceed
> 8,060 bytes (check BOL, Maximum Capacity Specifications). The 2-byte
> control field will hold the ending position of each element just fine.
>
VARCHAR[vbcol=seagreen]
lengths[vbcol=seagreen]
> Apart from the page number, this has apparently not changed from the
> previous edition (Inside MS SQL Server 7.0, by Ron Soukup and Kalen
> Delany).
>
data[vbcol=seagreen]
handle.[vbcol=seagreen]
> It's been a while since I read this part, but I think that's not
> entirely correct. The row's data itself will have to be rebuilt, as
> there can be no gaps between the contents of the varying length
> columns. If the row is shortened, it will stay where it is. If the
> row's size increases, it is replaced by a forwarding pointer and the
> new row's content is stored somewhere else; the other rows on the same
> page remain intact. If the row has to move again later, the original
> forwarding pointer is changed (to prevent long chains of pointers).
>
the
> Yes - a terrific piece of advice that I support wholeheartedly!
>
> Best, Hugo
> --
> (Remove _NO_ and _SPAM_ to get my e-mail address)
How internally SQL Server Stores Data in a Char and Varchar field. Like I
think there will be an extra bit of data which will store for Varchar -
Which is for Number of Characters in the field. Is something similar in Char
also?
Also How Does SQL Server Retrive records from both types of field?
Thanks
Prabhat
Buy "Inside SQL Server" by Kalen Delaney for the complete explanation.
The shortened version is that a VARCHAR uses 2 extra bytes (to store the
actual length of the data in the column, or more accurately, the end
position of the column).
Jacco Schalkwijk
SQL Server MVP
"Prabhat" <not_a_mail@.hotmail.com> wrote in message
news:OSfHSS3NEHA.644@.tk2msftngp13.phx.gbl...
> Hi All,
> How internally SQL Server Stores Data in a Char and Varchar field. Like I
> think there will be an extra bit of data which will store for Varchar -
> Which is for Number of Characters in the field. Is something similar in
Char
> also?
> Also How Does SQL Server Retrive records from both types of field?
> Thanks
> Prabhat
>
|||I believe that I've also read in this forum that all the CHAR columns are
physically arranged and stored at the begining of the data block, regardless
of there position in the DDL.
After the CHAR columns, the VARCHAR columns come next - with the 2 extra
bytes to store the "end byte", or basically the next starting offset of the
next VARCHAR column.
Access to the CHAR columns would then be faster than access to the VARCHAR
columns, as the byte positions are fixed for each record. Any access to
VARCHAR columns requires "byte-offset" checking for each column, one at a
time.
"Jacco Schalkwijk" <NOSPAMjaccos@.eurostop.co.uk> wrote in message
news:ezCg2x4NEHA.3348@.TK2MSFTNGP09.phx.gbl...[vbcol=seagreen]
> Buy "Inside SQL Server" by Kalen Delaney for the complete explanation.
> The shortened version is that a VARCHAR uses 2 extra bytes (to store the
> actual length of the data in the column, or more accurately, the end
> position of the column).
> --
> Jacco Schalkwijk
> SQL Server MVP
> "Prabhat" <not_a_mail@.hotmail.com> wrote in message
> news:OSfHSS3NEHA.644@.tk2msftngp13.phx.gbl...
I
> Char
>
|||On the other hand, access to tables having CHAR fields may be much slower
than to the same tables with VARCHAR ones if you rarely use all (or almost
all) length of the field in actual data - cause the more records are located
on one page the less I/O operations SQL Server will make during group table
reads
"Steve Z" <szlamany@.antarescomputing_no_spam.com> wrote in message
news:uCvQmy5NEHA.620@.TK2MSFTNGP10.phx.gbl...
> I believe that I've also read in this forum that all the CHAR columns are
> physically arranged and stored at the begining of the data block,
regardless
> of there position in the DDL.
> After the CHAR columns, the VARCHAR columns come next - with the 2 extra
> bytes to store the "end byte", or basically the next starting offset of
the
> next VARCHAR column.
> Access to the CHAR columns would then be faster than access to the VARCHAR
> columns, as the byte positions are fixed for each record. Any access to
> VARCHAR columns requires "byte-offset" checking for each column, one at a
> time.
|||On Tue, 11 May 2004 17:11:16 -0400, Steve Z wrote:
>I believe that I've also read in this forum that all the CHAR columns are
>physically arranged and stored at the begining of the data block, regardless
>of there position in the DDL.
>After the CHAR columns, the VARCHAR columns come next - with the 2 extra
>bytes to store the "end byte", or basically the next starting offset of the
>next VARCHAR column.
>Access to the CHAR columns would then be faster than access to the VARCHAR
>columns, as the byte positions are fixed for each record. Any access to
>VARCHAR columns requires "byte-offset" checking for each column, one at a
>time.
Hi Steve,
Almost true. In fact, all fixed-length columns are stored at the
beginning of the data block and all variable-length columns at the
end. Access to a varchar column does not imply looping through all
other varchar columns stored before the column you need (as might be
concluded from your post); the end position of whatever is stored
before the varchar column being sought and the end position of that
column itself can be found directly.
This does introduce a little bit of overhead in CPU use and reading
memory, but this is compensated many times by the reduced physical I/O
(see Alex' message)
Best, Hugo
(Remove _NO_ and _SPAM_ to get my e-mail address)
|||I agree. These kind of considerations are pretty much irrelevant for
performance compared to proper database design, query writing and indexing.
Jacco Schalkwijk
SQL Server MVP
"Hugo Kornelis" <hugo@.pe_NO_rFact.in_SPAM_fo> wrote in message
news:dbp3a0hd0808238thks1lrjabd12umgfqv@.4ax.com... [vbcol=seagreen]
> On Tue, 11 May 2004 17:11:16 -0400, Steve Z wrote:
regardless[vbcol=seagreen]
the[vbcol=seagreen]
VARCHAR
> Hi Steve,
> Almost true. In fact, all fixed-length columns are stored at the
> beginning of the data block and all variable-length columns at the
> end. Access to a varchar column does not imply looping through all
> other varchar columns stored before the column you need (as might be
> concluded from your post); the end position of whatever is stored
> before the varchar column being sought and the end position of that
> column itself can be found directly.
> This does introduce a little bit of overhead in CPU use and reading
> memory, but this is compensated many times by the reduced physical I/O
> (see Alex' message)
> Best, Hugo
> --
> (Remove _NO_ and _SPAM_ to get my e-mail address)
|||Hey guys, he was asking how it was stored - it's always good to know what
goes on behind the scenes.
Would you ever create a table with lets say 100 single character VARCHAR
fields? Would there be value to that, if the columns were guaranteed to
have a character in them (ie. not a space).
How would a primary clustered key behave if it was built on a VARCHAR field
vs. a CHAR field?
And, please explain to me how the 100th VARCHAR fields byte position can be
determined in a buffer of data without reading each of the prior 99 VARCHAR
control fields? Without knowing the length of each of the prior 99 data
elements in that row, how could the starting byte of the 100th element be
determined? Since the length of the VARCHAR is limited to 8000 byte - that
takes 14 of the 16 bits in the 2-byte control field. So, I can assume from
that fact that the 2-byte control field is not storing the "position" of the
element, but simply the length of each element.
Inside MS SQL Server 2000, by Kalen Delany, on page 230 states that VARCHAR
are most appropriate when you expect significant differences in the lengths
of the data in the column and when the data length in the column wont
frequently be changed. Changing the length frequently would required
rebuilding the data page - right?
If changing an element to full size causes it to no longer fit on the data
page, then page splits occur - that's alot of work for the engine to handle.
Page 233 indicates that "no one answer is right" and "If you understand the
tradeoffs; you'll be able to make the best choice".
"Jacco Schalkwijk" <NOSPAMjaccos@.eurostop.co.uk> wrote in message
news:e3a8hrAOEHA.3988@.tk2msftngp13.phx.gbl...
> I agree. These kind of considerations are pretty much irrelevant for
> performance compared to proper database design, query writing and
indexing.[vbcol=seagreen]
> --
> Jacco Schalkwijk
> SQL Server MVP
>
> "Hugo Kornelis" <hugo@.pe_NO_rFact.in_SPAM_fo> wrote in message
> news:dbp3a0hd0808238thks1lrjabd12umgfqv@.4ax.com...
are[vbcol=seagreen]
> regardless
extra[vbcol=seagreen]
> the
> VARCHAR
to[vbcol=seagreen]
a
>
|||X-Newsreader: Forte Agent 1.91/32.564
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-Complaints-To: abuse@.supernews.com
Lines: 95
Path: TK2MSFTNGP08.phx.gbl!newsfeed00.sul.t-online.de!t-online.de!newsfeed.gamma.ru!Gamma.RU!newsfeed.medi a.kyoto-u.ac.jp!newshosting.com!nx02.iad01.newshosting.com !news-feed01.roc.ny.frontiernet.net!nntp.frontiernet.net !tdsnet-transit!newspeer.tds.net!ne
wspeer.radix.net!news-spur1.maxwell.syr.edu!news.maxwell.syr.edu!sn-xit-03!sn-xit-06!sn-post-01!supernews.com!corp.supernews.com!not-for-mail
Xref: TK2MSFTNGP08.phx.gbl microsoft.public.sqlserver.server:343056 microsoft.public.sqlserver.programming:446557
On Wed, 12 May 2004 09:36:31 -0400, Steve Z wrote:
>Hey guys, he was asking how it was stored - it's always good to know what
>goes on behind the scenes.
I agree. That's why I replied to your post, acknowledging that you
were almost completely right, correcting a small error and adding an
important consideration. I have seen too many examples of people using
only fixed-length charachter data in order to avoid the overhead of
having to find the start and end of the data. Much as I'd like to
believe that people will realise by themselves that this overhead is
nothing compared to the increased I/O, I can't.
>Would you ever create a table with lets say 100 single character VARCHAR
>fields? Would there be value to that, if the columns were guaranteed to
>have a character in them (ie. not a space).
Of course not. If the maximum length is short (<= 10 characters), I
always use CHAR. If it's longer, but most values use almost the full
maximum length, I also use CHAR. I use VARCHAR if long and short
strings are interspersed.
>How would a primary clustered key behave if it was built on a VARCHAR field
>vs. a CHAR field?
It depends. If the column has a maximum length of 250 characters, but
most values are a lot shorter, using VARCHAR would reduce the number
of index pages on the intermediate levels. The leaf level would of
course not be affected (apart from the effect of any CHAR vs VARCHAR
column decision on the data pages). For an unindexed key, the leaf
level would need less pages as well. If the table holds many rows, I
think (I haven't tested it!) that the number of I/O's would be less,
hence the queries would return results faster. All this at the price
of the CPU having less idle time.
>And, please explain to me how the 100th VARCHAR fields byte position can be
>determined in a buffer of data without reading each of the prior 99 VARCHAR
>control fields? Without knowing the length of each of the prior 99 data
>elements in that row, how could the starting byte of the 100th element be
>determined?
Easy. Take the end position of the 99th element (stored ar a fixed
position in the row, after the fixed length and before the varying
length data) and add 1.
> Since the length of the VARCHAR is limited to 8000 byte - that
>takes 14 of the 16 bits in the 2-byte control field. So, I can assume from
>that fact that the 2-byte control field is not storing the "position" of the
>element, but simply the length of each element.
You seem to forget that the length of the complete row may not exceed
8,060 bytes (check BOL, Maximum Capacity Specifications). The 2-byte
control field will hold the ending position of each element just fine.
>Inside MS SQL Server 2000, by Kalen Delany, on page 230 states that VARCHAR
>are most appropriate when you expect significant differences in the lengths
>of the data in the column and when the data length in the column wont
>frequently be changed.
Apart from the page number, this has apparently not changed from the
previous edition (Inside MS SQL Server 7.0, by Ron Soukup and Kalen
Delany).
>Changing the length frequently would required
>rebuilding the data page - right?
>If changing an element to full size causes it to no longer fit on the data
>page, then page splits occur - that's alot of work for the engine to handle.
It's been a while since I read this part, but I think that's not
entirely correct. The row's data itself will have to be rebuilt, as
there can be no gaps between the contents of the varying length
columns. If the row is shortened, it will stay where it is. If the
row's size increases, it is replaced by a forwarding pointer and the
new row's content is stored somewhere else; the other rows on the same
page remain intact. If the row has to move again later, the original
forwarding pointer is changed (to prevent long chains of pointers).
>Page 233 indicates that "no one answer is right" and "If you understand the
>tradeoffs; you'll be able to make the best choice".
Yes - a terrific piece of advice that I support wholeheartedly!
Best, Hugo
(Remove _NO_ and _SPAM_ to get my e-mail address)
|||> >Changing the length frequently would required[vbcol=seagreen]
data[vbcol=seagreen]
handle.
> It's been a while since I read this part, but I think that's not
> entirely correct. The row's data itself will have to be rebuilt, as
> there can be no gaps between the contents of the varying length
> columns. If the row is shortened, it will stay where it is. If the
> row's size increases, it is replaced by a forwarding pointer and the
> new row's content is stored somewhere else; the other rows on the same
> page remain intact. If the row has to move again later, the original
> forwarding pointer is changed (to prevent long chains of pointers).
That is only correct for heaps, tables without a clustered index. In a table
with a clustered index this scenario will cause a page split, as the rows
have to stay in the same position in the clustered index.
Jacco Schalkwijk
SQL Server MVP
"Hugo Kornelis" <hugo@.pe_NO_rFact.in_SPAM_fo> wrote in message
news:bl45a0tjo06s6npdrm9rf83mg8o4n80uik@.4ax.com... [vbcol=seagreen]
> On Wed, 12 May 2004 09:36:31 -0400, Steve Z wrote:
>
> I agree. That's why I replied to your post, acknowledging that you
> were almost completely right, correcting a small error and adding an
> important consideration. I have seen too many examples of people using
> only fixed-length charachter data in order to avoid the overhead of
> having to find the start and end of the data. Much as I'd like to
> believe that people will realise by themselves that this overhead is
> nothing compared to the increased I/O, I can't.
>
> Of course not. If the maximum length is short (<= 10 characters), I
> always use CHAR. If it's longer, but most values use almost the full
> maximum length, I also use CHAR. I use VARCHAR if long and short
> strings are interspersed.
>
field[vbcol=seagreen]
> It depends. If the column has a maximum length of 250 characters, but
> most values are a lot shorter, using VARCHAR would reduce the number
> of index pages on the intermediate levels. The leaf level would of
> course not be affected (apart from the effect of any CHAR vs VARCHAR
> column decision on the data pages). For an unindexed key, the leaf
> level would need less pages as well. If the table holds many rows, I
> think (I haven't tested it!) that the number of I/O's would be less,
> hence the queries would return results faster. All this at the price
> of the CPU having less idle time.
>
be[vbcol=seagreen]
VARCHAR[vbcol=seagreen]
> Easy. Take the end position of the 99th element (stored ar a fixed
> position in the row, after the fixed length and before the varying
> length data) and add 1.
>
from[vbcol=seagreen]
the[vbcol=seagreen]
> You seem to forget that the length of the complete row may not exceed
> 8,060 bytes (check BOL, Maximum Capacity Specifications). The 2-byte
> control field will hold the ending position of each element just fine.
>
VARCHAR[vbcol=seagreen]
lengths[vbcol=seagreen]
> Apart from the page number, this has apparently not changed from the
> previous edition (Inside MS SQL Server 7.0, by Ron Soukup and Kalen
> Delany).
>
data[vbcol=seagreen]
handle.[vbcol=seagreen]
> It's been a while since I read this part, but I think that's not
> entirely correct. The row's data itself will have to be rebuilt, as
> there can be no gaps between the contents of the varying length
> columns. If the row is shortened, it will stay where it is. If the
> row's size increases, it is replaced by a forwarding pointer and the
> new row's content is stored somewhere else; the other rows on the same
> page remain intact. If the row has to move again later, the original
> forwarding pointer is changed (to prevent long chains of pointers).
>
the
> Yes - a terrific piece of advice that I support wholeheartedly!
>
> Best, Hugo
> --
> (Remove _NO_ and _SPAM_ to get my e-mail address)
|||Thanks Hugo - Your discussion is very helpful to me.
Prabhat
"Hugo Kornelis" <hugo@.pe_NO_rFact.in_SPAM_fo> wrote in message
news:bl45a0tjo06s6npdrm9rf83mg8o4n80uik@.4ax.com... [vbcol=seagreen]
> On Wed, 12 May 2004 09:36:31 -0400, Steve Z wrote:
>
> I agree. That's why I replied to your post, acknowledging that you
> were almost completely right, correcting a small error and adding an
> important consideration. I have seen too many examples of people using
> only fixed-length charachter data in order to avoid the overhead of
> having to find the start and end of the data. Much as I'd like to
> believe that people will realise by themselves that this overhead is
> nothing compared to the increased I/O, I can't.
>
> Of course not. If the maximum length is short (<= 10 characters), I
> always use CHAR. If it's longer, but most values use almost the full
> maximum length, I also use CHAR. I use VARCHAR if long and short
> strings are interspersed.
>
field[vbcol=seagreen]
> It depends. If the column has a maximum length of 250 characters, but
> most values are a lot shorter, using VARCHAR would reduce the number
> of index pages on the intermediate levels. The leaf level would of
> course not be affected (apart from the effect of any CHAR vs VARCHAR
> column decision on the data pages). For an unindexed key, the leaf
> level would need less pages as well. If the table holds many rows, I
> think (I haven't tested it!) that the number of I/O's would be less,
> hence the queries would return results faster. All this at the price
> of the CPU having less idle time.
>
be[vbcol=seagreen]
VARCHAR[vbcol=seagreen]
> Easy. Take the end position of the 99th element (stored ar a fixed
> position in the row, after the fixed length and before the varying
> length data) and add 1.
>
from[vbcol=seagreen]
the[vbcol=seagreen]
> You seem to forget that the length of the complete row may not exceed
> 8,060 bytes (check BOL, Maximum Capacity Specifications). The 2-byte
> control field will hold the ending position of each element just fine.
>
VARCHAR[vbcol=seagreen]
lengths[vbcol=seagreen]
> Apart from the page number, this has apparently not changed from the
> previous edition (Inside MS SQL Server 7.0, by Ron Soukup and Kalen
> Delany).
>
data[vbcol=seagreen]
handle.[vbcol=seagreen]
> It's been a while since I read this part, but I think that's not
> entirely correct. The row's data itself will have to be rebuilt, as
> there can be no gaps between the contents of the varying length
> columns. If the row is shortened, it will stay where it is. If the
> row's size increases, it is replaced by a forwarding pointer and the
> new row's content is stored somewhere else; the other rows on the same
> page remain intact. If the row has to move again later, the original
> forwarding pointer is changed (to prevent long chains of pointers).
>
the
> Yes - a terrific piece of advice that I support wholeheartedly!
>
> Best, Hugo
> --
> (Remove _NO_ and _SPAM_ to get my e-mail address)
Subscribe to:
Posts (Atom)