How does case-insensitive collation work?How to do a case-insensitive LIKE in a case-sensitive database?Accent Sensitive SortSQL Server collation mismatchBitmask Flags with Lookup Tables ClarificationHow to create Postgres DB with case insensitive collationWhy is my PostgreSQL ORDER BY case-insensitive?How can I pass column to function in sql?Does any DBMS have a collation that is both case-sensitive and accent-insensitive?Can database objects be made case insensitive while keeping strings case sensitive?How to do a case-insensitive LIKE in a case-sensitive database?SQL Server 2008R2 database migration to cloud: case-insensitive collation changed to case-sensitiveSSMS - How to do case-insensitive searches in Object Explorer
Team members' and manager's behaviour is indifferent after I announce my intention to leave in 8 months
Certainly naive, but I do not understand why this compiles
How does an immortal vampire king hide his vampirism and immortality?
Most general definition of differentiation
What does 36.000€ mean?
Restore a database with a different name on the same server
Do any countries have a procedure that allows a constituent part of that country to become independent unilaterally?
Is there a way to add salted hashing to my user authentication without breaking my former login server
What are examples of (collections of) papers which "close" a field?
How to protect assets from being passed to a beneficiary in a will when they are likely to die soon also
"Applicants for asylum must prove that they have fifteen family members in the Netherlands."
What made the Tusken Raiders unable / unwilling to shoot down Luke's Landspeeder?
Is there any way my opponent can prevent me from winning in this situation?
Letter of Recommendation from English Teacher for Math Ph.D. program?
Which object has been to space the most times?
Keep password in macro?
Mirrors on both bars
Can we reduce power consumption of digital interfaces by using high impedance transmission lines?
How can I format and use Custom labels with tokens in LWC?
How many times, are they multiples?
Prospective employer asking for my current pay slip during interview
What is homebrew? Should I use it in normal games?
Fixing the fields in an incorrectly generated file
How to tell my Mom that I don't care about someone without upsetting her?
How does case-insensitive collation work?
How to do a case-insensitive LIKE in a case-sensitive database?Accent Sensitive SortSQL Server collation mismatchBitmask Flags with Lookup Tables ClarificationHow to create Postgres DB with case insensitive collationWhy is my PostgreSQL ORDER BY case-insensitive?How can I pass column to function in sql?Does any DBMS have a collation that is both case-sensitive and accent-insensitive?Can database objects be made case insensitive while keeping strings case sensitive?How to do a case-insensitive LIKE in a case-sensitive database?SQL Server 2008R2 database migration to cloud: case-insensitive collation changed to case-sensitiveSSMS - How to do case-insensitive searches in Object Explorer
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty
margin-bottom:0;
Default collation type in SQL Server allows for indexing against case insensitive strings yet the case of the data is persisted. How does this actually work? I'm looking for the actual nuts and bolts, bits and bytes, or a good resource that explains it in detail.
create table casetest (fruitnames nvarchar(50) not null);
create unique index IX_fruitnames on casetest(fruitnames);
insert into casetest values ('apples');
insert into casetest values ('Pears');
-- this insert fails
insert into casetest values ('pears');
-- this yields 'Pears' as a result
select * from casetest (forceseek) where fruitnames = 'PEARS'
update casetest set fruitnames = 'pears' where fruitnames = 'pEArs'
-- this yields 'pears' as a result
select * from casetest (forceseek) where fruitnames = 'PEARS'
Questions About SQL Server Collations You Were Too Shy to Ask by Robert Sheldon covers how to use collation. It does not cover how collation works. I'm interested in how an index can be efficiently created/queried not caring about case, while simultaneously storing case data.
sql-server collation
add a comment
|
Default collation type in SQL Server allows for indexing against case insensitive strings yet the case of the data is persisted. How does this actually work? I'm looking for the actual nuts and bolts, bits and bytes, or a good resource that explains it in detail.
create table casetest (fruitnames nvarchar(50) not null);
create unique index IX_fruitnames on casetest(fruitnames);
insert into casetest values ('apples');
insert into casetest values ('Pears');
-- this insert fails
insert into casetest values ('pears');
-- this yields 'Pears' as a result
select * from casetest (forceseek) where fruitnames = 'PEARS'
update casetest set fruitnames = 'pears' where fruitnames = 'pEArs'
-- this yields 'pears' as a result
select * from casetest (forceseek) where fruitnames = 'PEARS'
Questions About SQL Server Collations You Were Too Shy to Ask by Robert Sheldon covers how to use collation. It does not cover how collation works. I'm interested in how an index can be efficiently created/queried not caring about case, while simultaneously storing case data.
sql-server collation
1
You can efficiently query (e.g. utilizing an index seek) case-insensitive strings against a case-sensitive field, but it's a little annoying.
– John Eisbrener
Sep 26 at 20:47
cocogorilla: please see note #1 that I just added to the end of my answer re: the "default" collation.
– Solomon Rutzky
Oct 2 at 16:42
add a comment
|
Default collation type in SQL Server allows for indexing against case insensitive strings yet the case of the data is persisted. How does this actually work? I'm looking for the actual nuts and bolts, bits and bytes, or a good resource that explains it in detail.
create table casetest (fruitnames nvarchar(50) not null);
create unique index IX_fruitnames on casetest(fruitnames);
insert into casetest values ('apples');
insert into casetest values ('Pears');
-- this insert fails
insert into casetest values ('pears');
-- this yields 'Pears' as a result
select * from casetest (forceseek) where fruitnames = 'PEARS'
update casetest set fruitnames = 'pears' where fruitnames = 'pEArs'
-- this yields 'pears' as a result
select * from casetest (forceseek) where fruitnames = 'PEARS'
Questions About SQL Server Collations You Were Too Shy to Ask by Robert Sheldon covers how to use collation. It does not cover how collation works. I'm interested in how an index can be efficiently created/queried not caring about case, while simultaneously storing case data.
sql-server collation
Default collation type in SQL Server allows for indexing against case insensitive strings yet the case of the data is persisted. How does this actually work? I'm looking for the actual nuts and bolts, bits and bytes, or a good resource that explains it in detail.
create table casetest (fruitnames nvarchar(50) not null);
create unique index IX_fruitnames on casetest(fruitnames);
insert into casetest values ('apples');
insert into casetest values ('Pears');
-- this insert fails
insert into casetest values ('pears');
-- this yields 'Pears' as a result
select * from casetest (forceseek) where fruitnames = 'PEARS'
update casetest set fruitnames = 'pears' where fruitnames = 'pEArs'
-- this yields 'pears' as a result
select * from casetest (forceseek) where fruitnames = 'PEARS'
Questions About SQL Server Collations You Were Too Shy to Ask by Robert Sheldon covers how to use collation. It does not cover how collation works. I'm interested in how an index can be efficiently created/queried not caring about case, while simultaneously storing case data.
sql-server collation
sql-server collation
edited Sep 26 at 20:51
cocogorilla
asked Sep 26 at 18:35
cocogorillacocogorilla
4792 silver badges11 bronze badges
4792 silver badges11 bronze badges
1
You can efficiently query (e.g. utilizing an index seek) case-insensitive strings against a case-sensitive field, but it's a little annoying.
– John Eisbrener
Sep 26 at 20:47
cocogorilla: please see note #1 that I just added to the end of my answer re: the "default" collation.
– Solomon Rutzky
Oct 2 at 16:42
add a comment
|
1
You can efficiently query (e.g. utilizing an index seek) case-insensitive strings against a case-sensitive field, but it's a little annoying.
– John Eisbrener
Sep 26 at 20:47
cocogorilla: please see note #1 that I just added to the end of my answer re: the "default" collation.
– Solomon Rutzky
Oct 2 at 16:42
1
1
You can efficiently query (e.g. utilizing an index seek) case-insensitive strings against a case-sensitive field, but it's a little annoying.
– John Eisbrener
Sep 26 at 20:47
You can efficiently query (e.g. utilizing an index seek) case-insensitive strings against a case-sensitive field, but it's a little annoying.
– John Eisbrener
Sep 26 at 20:47
cocogorilla: please see note #1 that I just added to the end of my answer re: the "default" collation.
– Solomon Rutzky
Oct 2 at 16:42
cocogorilla: please see note #1 that I just added to the end of my answer re: the "default" collation.
– Solomon Rutzky
Oct 2 at 16:42
add a comment
|
2 Answers
2
active
oldest
votes
indexing against case insensitive strings yet the case of the data is persisted. How does this actually work?
This is actually not a SQL Server specific behavior, it's just how these things work in general.
So, the data is the data. If you are speaking about an index specifically, the data needs to be stored as it is else it would require a look-up in the main table each time to get the actual value, and there would be no possibility of a covering index (at least not for string types).
The data, either in the table/clustered index or non-clustered index, does not contain any collation / sorting info. It is simply data. The collation (locale/culture rules and sensitivities) is just meta data attached to the column and used when a sort operation is called (unless overridden by a COLLATE clause), which would include the creation/rebuild of an index. The rules defined by a non-binary collation are used to generate sort-keys, which are binary representations of the string (sort keys are unnecessary in binary collations). These binary representations incorporate all of the locale/culture rules and selected sensitivities. The sort-keys are used to place the records in their proper order, but are not themselves stored in the index or table. They aren't stored (at least I haven't seen these values in the index and was told that they aren't stored) because:
- They aren't truly needed for sorting since they would merely be in the same order as the rows in the table or index anyway. But, the physical order of the index is just sorting, not comparison.
- While storing them might make comparisons faster, it would also make the index larger as the minimum size for a single character is 5 bytes, and that's just "overhead" (of the sort key structure). Most characters are 2 bytes each, plus 1 byte if there's an accent, plus 1 byte if it's upper-case. For example, "e" is a 7-byte key, "E" and "é" are both 8 bytes, and "É" is a 9-byte key. Hence, not worth storing these in the end.
There are two types of collations: SQL Server and Windows.
SQL Server
SQL Server collations (those with names starting with SQL_) are the older, pre-SQL Server 2000 way of sorting/comparing (even though SQL_Latin1_General_CP1_CI_AS is still the installation default on US English OSes, quite sadly). In this older, simplistic, non-Unicode model, each combination of locale, code page, and the various sensitivities are given a static mapping of each of the characters in that code page. Each character is assigned a value (i.e. sort weight) to denote how it equates with the others. Comparisons in this model appear to do a two-pass operation:
- First, it removes all accents (such that " ü " becomes " u "), expands characters like " Æ " into " A " and " E ", then does an initial sort so that words are in a natural order (how you would expect to find them in a dictionary).
- Then, it goes character by character to determine equality based on these underlying values per each character. This second part is what mustaccio is describing in his answer.
The only sensitivities that can be adjusted in these collations are: "case" and "accent" ("width", "kana type" and "variation selector" are not available). Also, none of these collations support Supplementary Characters (which makes sense as those are Unicode-specific and these collations only apply to non-Unicode data).
This approach applies only to non-Unicode VARCHAR data. Each unique combination of locale, code page, case-sensitivity, and accent-sensitivity has a specific "sort ID", which you can see in the following example:
SELECT COLLATIONPROPERTY(N'SQL_Latin1_General_CP1_CI_AS', 'SortID'), -- 52
COLLATIONPROPERTY(N'SQL_Latin1_General_CP1_CS_AS', 'SortID'), -- 51
COLLATIONPROPERTY(N'Latin1_General_100_CI_AS', 'SortID'); -- 0
The only difference between the first two collations is the case-sensitivity. The third collation is a Windows collation and so does not have a static mapping table.
Also, these collations should sort and compare faster than the Windows collations due to being simple lookups for character to sort weight. However, these collations are also far less functional and should generally be avoided if at all possible.
Windows
Windows collations (those with names not starting with SQL_) are the newer (starting in SQL Server 2000) way of sorting/comparing. In this newer, complex, Unicode model, each combination of locale, code page, and the various sensitivities are not given a static mapping. For one thing, there are no code pages in this model. This model assigns a default sort value to each character, and then each locale/culture can re-assign sort values to any number of characters. This allows multiple cultures to use the same characters in different ways. This does have the affect of allowing for multiple languages to be sorted naturally using the same collation if they do not use the same characters (and if one of them does not need to re-assign any values and can simply use the defaults).
The sort values in this model are not single values. They are an array of values that assign relative weights to the base letter, any diacritics (i.e. accents), casing, etc. If the collation is case-sensitive, then the "case" portion of that array is used, otherwise it's ignored (hence, insensitive). If the collation is accent-sensitive, then the "diacritic" portion of the array is used, otherwise it's ignored (hence, insensitive).
Comparisons in this model are a multi-pass operation:
- First, the string is normalized so that various ways of representing the same character will equate. For example, " ü " could be a single character / code point (U+00FC). You could also combine a non-accented " u " (U+0075) with a Combining Diaeresis " ̈ " (U+0308) to get: " ü ", which not only looks the same when rendered (unless there is a problem with your font), but is also considered to be the same as the single character version (U+00FC), unless using a binary collation (which compares bytes instead of characters). Normalization breaks the single character into the various pieces, which includes expansions for characters like " Æ " (as noted above for SQL Server collations).
- The comparison operation in this model goes character by character per each sensitivity. Sort keys for the strings are determined by applying the appropriate elements of each characters collation array of values based on which sensitivities are "sensitive". The sort key values are arranged by all of the primary sensitivities of each character (the base character), followed by all of the secondary sensitivities (diacritic weight), followed by the case weight of each character, and so on.
- Sorting is performed based on the calculated sort keys. With each sensitivity grouped together, you can get a different sort order than you would with an equivalent SQL Server collation when comparing strings of multiple characters, and accents are involved, and the collation is accent-sensitive (and even more so if the collation is also case-sensitive).
For more details on this sorting, I will eventually publish a post that shows the sort key values, how they are calculated, the differences between SQL Server and Windows collations, etc. But for now, please see my answer to: Accent Sensitive Sort (please note that the other answer to that question is a good explanation of the official Unicode algorithm, but SQL Server instead uses a custom, though similar, algorithm, and even a custom weight table).
All sensitivities can be adjusted in these collations: "case", "accent", "width", "kana type", and "variation selector" (starting in SQL Server 2017, and only for the Japanese collations). Also, some of these collations (when used with Unicode data) support Supplementary Characters (starting in SQL Server 2012). This approach applies to both NVARCHAR and VARCHAR data (even non-Unicode data). It applies to non-Unicode VARCHAR data by first converting the value to Unicode internally, and then applying the sort/comparison rules.
Please note:
- There is no universal default collation for SQL Server. There is an installation default which differs based on the current locale/language setting of the OS at time of installation (which is unfortunately
SQL_Latin1_General_CP1_CI_ASfor US English systems, so please vote for this suggestion). This can be changed during installation. This instance-level collation then sets the collation for the[model]DB which is the template used when creating new DBs, but the collation can be changed when executingCREATE DATABASEby specifying theCOLLATEclause. This database-level collation is used for variable and string literals, as well as the default for new (and altered!) columns when theCOLLATEclause is not specified (which is the case for the example code in the question). - For more info on Collations / encodings / Unicode, please visit: Collations Info
add a comment
|
Typically this is implemented using collation tables that assign a certain score to each character. The sorting routine has a comparator that uses an appropriate table, whether default or specified explicitly, to compare strings, character by character, using their collation scores. If, for example, a particular collation table assigns a score of 1 to "a" and 201 to "A", and a lower score in this particular implementation means higher precedence, then "a" will be sorter before "A". Another table might assign reverse scores: 201 to "a" and 1 to "A", and the sort order will be subsequently reverse. Yet another table might assign equal scores to "a", "A", "Á", and "Å", which would lead to a case- and accent-insensitive comparison and sorting.
Similarly, such a collation table-based comparator used when comparing an index key with the value supplied in the predicate.
1
Just FYI: this info is only correct in terms of using SQL Server collations (i.e. those with names starting withSQL_) when used onVARCHARdata. This is not exactly true forNVARCHARdata orVARCHARdata when using a Windows collation (names not starting withSQL_).
– Solomon Rutzky
Sep 26 at 19:58
add a comment
|
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "182"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/4.0/"u003ecc by-sa 4.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f249715%2fhow-does-case-insensitive-collation-work%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
indexing against case insensitive strings yet the case of the data is persisted. How does this actually work?
This is actually not a SQL Server specific behavior, it's just how these things work in general.
So, the data is the data. If you are speaking about an index specifically, the data needs to be stored as it is else it would require a look-up in the main table each time to get the actual value, and there would be no possibility of a covering index (at least not for string types).
The data, either in the table/clustered index or non-clustered index, does not contain any collation / sorting info. It is simply data. The collation (locale/culture rules and sensitivities) is just meta data attached to the column and used when a sort operation is called (unless overridden by a COLLATE clause), which would include the creation/rebuild of an index. The rules defined by a non-binary collation are used to generate sort-keys, which are binary representations of the string (sort keys are unnecessary in binary collations). These binary representations incorporate all of the locale/culture rules and selected sensitivities. The sort-keys are used to place the records in their proper order, but are not themselves stored in the index or table. They aren't stored (at least I haven't seen these values in the index and was told that they aren't stored) because:
- They aren't truly needed for sorting since they would merely be in the same order as the rows in the table or index anyway. But, the physical order of the index is just sorting, not comparison.
- While storing them might make comparisons faster, it would also make the index larger as the minimum size for a single character is 5 bytes, and that's just "overhead" (of the sort key structure). Most characters are 2 bytes each, plus 1 byte if there's an accent, plus 1 byte if it's upper-case. For example, "e" is a 7-byte key, "E" and "é" are both 8 bytes, and "É" is a 9-byte key. Hence, not worth storing these in the end.
There are two types of collations: SQL Server and Windows.
SQL Server
SQL Server collations (those with names starting with SQL_) are the older, pre-SQL Server 2000 way of sorting/comparing (even though SQL_Latin1_General_CP1_CI_AS is still the installation default on US English OSes, quite sadly). In this older, simplistic, non-Unicode model, each combination of locale, code page, and the various sensitivities are given a static mapping of each of the characters in that code page. Each character is assigned a value (i.e. sort weight) to denote how it equates with the others. Comparisons in this model appear to do a two-pass operation:
- First, it removes all accents (such that " ü " becomes " u "), expands characters like " Æ " into " A " and " E ", then does an initial sort so that words are in a natural order (how you would expect to find them in a dictionary).
- Then, it goes character by character to determine equality based on these underlying values per each character. This second part is what mustaccio is describing in his answer.
The only sensitivities that can be adjusted in these collations are: "case" and "accent" ("width", "kana type" and "variation selector" are not available). Also, none of these collations support Supplementary Characters (which makes sense as those are Unicode-specific and these collations only apply to non-Unicode data).
This approach applies only to non-Unicode VARCHAR data. Each unique combination of locale, code page, case-sensitivity, and accent-sensitivity has a specific "sort ID", which you can see in the following example:
SELECT COLLATIONPROPERTY(N'SQL_Latin1_General_CP1_CI_AS', 'SortID'), -- 52
COLLATIONPROPERTY(N'SQL_Latin1_General_CP1_CS_AS', 'SortID'), -- 51
COLLATIONPROPERTY(N'Latin1_General_100_CI_AS', 'SortID'); -- 0
The only difference between the first two collations is the case-sensitivity. The third collation is a Windows collation and so does not have a static mapping table.
Also, these collations should sort and compare faster than the Windows collations due to being simple lookups for character to sort weight. However, these collations are also far less functional and should generally be avoided if at all possible.
Windows
Windows collations (those with names not starting with SQL_) are the newer (starting in SQL Server 2000) way of sorting/comparing. In this newer, complex, Unicode model, each combination of locale, code page, and the various sensitivities are not given a static mapping. For one thing, there are no code pages in this model. This model assigns a default sort value to each character, and then each locale/culture can re-assign sort values to any number of characters. This allows multiple cultures to use the same characters in different ways. This does have the affect of allowing for multiple languages to be sorted naturally using the same collation if they do not use the same characters (and if one of them does not need to re-assign any values and can simply use the defaults).
The sort values in this model are not single values. They are an array of values that assign relative weights to the base letter, any diacritics (i.e. accents), casing, etc. If the collation is case-sensitive, then the "case" portion of that array is used, otherwise it's ignored (hence, insensitive). If the collation is accent-sensitive, then the "diacritic" portion of the array is used, otherwise it's ignored (hence, insensitive).
Comparisons in this model are a multi-pass operation:
- First, the string is normalized so that various ways of representing the same character will equate. For example, " ü " could be a single character / code point (U+00FC). You could also combine a non-accented " u " (U+0075) with a Combining Diaeresis " ̈ " (U+0308) to get: " ü ", which not only looks the same when rendered (unless there is a problem with your font), but is also considered to be the same as the single character version (U+00FC), unless using a binary collation (which compares bytes instead of characters). Normalization breaks the single character into the various pieces, which includes expansions for characters like " Æ " (as noted above for SQL Server collations).
- The comparison operation in this model goes character by character per each sensitivity. Sort keys for the strings are determined by applying the appropriate elements of each characters collation array of values based on which sensitivities are "sensitive". The sort key values are arranged by all of the primary sensitivities of each character (the base character), followed by all of the secondary sensitivities (diacritic weight), followed by the case weight of each character, and so on.
- Sorting is performed based on the calculated sort keys. With each sensitivity grouped together, you can get a different sort order than you would with an equivalent SQL Server collation when comparing strings of multiple characters, and accents are involved, and the collation is accent-sensitive (and even more so if the collation is also case-sensitive).
For more details on this sorting, I will eventually publish a post that shows the sort key values, how they are calculated, the differences between SQL Server and Windows collations, etc. But for now, please see my answer to: Accent Sensitive Sort (please note that the other answer to that question is a good explanation of the official Unicode algorithm, but SQL Server instead uses a custom, though similar, algorithm, and even a custom weight table).
All sensitivities can be adjusted in these collations: "case", "accent", "width", "kana type", and "variation selector" (starting in SQL Server 2017, and only for the Japanese collations). Also, some of these collations (when used with Unicode data) support Supplementary Characters (starting in SQL Server 2012). This approach applies to both NVARCHAR and VARCHAR data (even non-Unicode data). It applies to non-Unicode VARCHAR data by first converting the value to Unicode internally, and then applying the sort/comparison rules.
Please note:
- There is no universal default collation for SQL Server. There is an installation default which differs based on the current locale/language setting of the OS at time of installation (which is unfortunately
SQL_Latin1_General_CP1_CI_ASfor US English systems, so please vote for this suggestion). This can be changed during installation. This instance-level collation then sets the collation for the[model]DB which is the template used when creating new DBs, but the collation can be changed when executingCREATE DATABASEby specifying theCOLLATEclause. This database-level collation is used for variable and string literals, as well as the default for new (and altered!) columns when theCOLLATEclause is not specified (which is the case for the example code in the question). - For more info on Collations / encodings / Unicode, please visit: Collations Info
add a comment
|
indexing against case insensitive strings yet the case of the data is persisted. How does this actually work?
This is actually not a SQL Server specific behavior, it's just how these things work in general.
So, the data is the data. If you are speaking about an index specifically, the data needs to be stored as it is else it would require a look-up in the main table each time to get the actual value, and there would be no possibility of a covering index (at least not for string types).
The data, either in the table/clustered index or non-clustered index, does not contain any collation / sorting info. It is simply data. The collation (locale/culture rules and sensitivities) is just meta data attached to the column and used when a sort operation is called (unless overridden by a COLLATE clause), which would include the creation/rebuild of an index. The rules defined by a non-binary collation are used to generate sort-keys, which are binary representations of the string (sort keys are unnecessary in binary collations). These binary representations incorporate all of the locale/culture rules and selected sensitivities. The sort-keys are used to place the records in their proper order, but are not themselves stored in the index or table. They aren't stored (at least I haven't seen these values in the index and was told that they aren't stored) because:
- They aren't truly needed for sorting since they would merely be in the same order as the rows in the table or index anyway. But, the physical order of the index is just sorting, not comparison.
- While storing them might make comparisons faster, it would also make the index larger as the minimum size for a single character is 5 bytes, and that's just "overhead" (of the sort key structure). Most characters are 2 bytes each, plus 1 byte if there's an accent, plus 1 byte if it's upper-case. For example, "e" is a 7-byte key, "E" and "é" are both 8 bytes, and "É" is a 9-byte key. Hence, not worth storing these in the end.
There are two types of collations: SQL Server and Windows.
SQL Server
SQL Server collations (those with names starting with SQL_) are the older, pre-SQL Server 2000 way of sorting/comparing (even though SQL_Latin1_General_CP1_CI_AS is still the installation default on US English OSes, quite sadly). In this older, simplistic, non-Unicode model, each combination of locale, code page, and the various sensitivities are given a static mapping of each of the characters in that code page. Each character is assigned a value (i.e. sort weight) to denote how it equates with the others. Comparisons in this model appear to do a two-pass operation:
- First, it removes all accents (such that " ü " becomes " u "), expands characters like " Æ " into " A " and " E ", then does an initial sort so that words are in a natural order (how you would expect to find them in a dictionary).
- Then, it goes character by character to determine equality based on these underlying values per each character. This second part is what mustaccio is describing in his answer.
The only sensitivities that can be adjusted in these collations are: "case" and "accent" ("width", "kana type" and "variation selector" are not available). Also, none of these collations support Supplementary Characters (which makes sense as those are Unicode-specific and these collations only apply to non-Unicode data).
This approach applies only to non-Unicode VARCHAR data. Each unique combination of locale, code page, case-sensitivity, and accent-sensitivity has a specific "sort ID", which you can see in the following example:
SELECT COLLATIONPROPERTY(N'SQL_Latin1_General_CP1_CI_AS', 'SortID'), -- 52
COLLATIONPROPERTY(N'SQL_Latin1_General_CP1_CS_AS', 'SortID'), -- 51
COLLATIONPROPERTY(N'Latin1_General_100_CI_AS', 'SortID'); -- 0
The only difference between the first two collations is the case-sensitivity. The third collation is a Windows collation and so does not have a static mapping table.
Also, these collations should sort and compare faster than the Windows collations due to being simple lookups for character to sort weight. However, these collations are also far less functional and should generally be avoided if at all possible.
Windows
Windows collations (those with names not starting with SQL_) are the newer (starting in SQL Server 2000) way of sorting/comparing. In this newer, complex, Unicode model, each combination of locale, code page, and the various sensitivities are not given a static mapping. For one thing, there are no code pages in this model. This model assigns a default sort value to each character, and then each locale/culture can re-assign sort values to any number of characters. This allows multiple cultures to use the same characters in different ways. This does have the affect of allowing for multiple languages to be sorted naturally using the same collation if they do not use the same characters (and if one of them does not need to re-assign any values and can simply use the defaults).
The sort values in this model are not single values. They are an array of values that assign relative weights to the base letter, any diacritics (i.e. accents), casing, etc. If the collation is case-sensitive, then the "case" portion of that array is used, otherwise it's ignored (hence, insensitive). If the collation is accent-sensitive, then the "diacritic" portion of the array is used, otherwise it's ignored (hence, insensitive).
Comparisons in this model are a multi-pass operation:
- First, the string is normalized so that various ways of representing the same character will equate. For example, " ü " could be a single character / code point (U+00FC). You could also combine a non-accented " u " (U+0075) with a Combining Diaeresis " ̈ " (U+0308) to get: " ü ", which not only looks the same when rendered (unless there is a problem with your font), but is also considered to be the same as the single character version (U+00FC), unless using a binary collation (which compares bytes instead of characters). Normalization breaks the single character into the various pieces, which includes expansions for characters like " Æ " (as noted above for SQL Server collations).
- The comparison operation in this model goes character by character per each sensitivity. Sort keys for the strings are determined by applying the appropriate elements of each characters collation array of values based on which sensitivities are "sensitive". The sort key values are arranged by all of the primary sensitivities of each character (the base character), followed by all of the secondary sensitivities (diacritic weight), followed by the case weight of each character, and so on.
- Sorting is performed based on the calculated sort keys. With each sensitivity grouped together, you can get a different sort order than you would with an equivalent SQL Server collation when comparing strings of multiple characters, and accents are involved, and the collation is accent-sensitive (and even more so if the collation is also case-sensitive).
For more details on this sorting, I will eventually publish a post that shows the sort key values, how they are calculated, the differences between SQL Server and Windows collations, etc. But for now, please see my answer to: Accent Sensitive Sort (please note that the other answer to that question is a good explanation of the official Unicode algorithm, but SQL Server instead uses a custom, though similar, algorithm, and even a custom weight table).
All sensitivities can be adjusted in these collations: "case", "accent", "width", "kana type", and "variation selector" (starting in SQL Server 2017, and only for the Japanese collations). Also, some of these collations (when used with Unicode data) support Supplementary Characters (starting in SQL Server 2012). This approach applies to both NVARCHAR and VARCHAR data (even non-Unicode data). It applies to non-Unicode VARCHAR data by first converting the value to Unicode internally, and then applying the sort/comparison rules.
Please note:
- There is no universal default collation for SQL Server. There is an installation default which differs based on the current locale/language setting of the OS at time of installation (which is unfortunately
SQL_Latin1_General_CP1_CI_ASfor US English systems, so please vote for this suggestion). This can be changed during installation. This instance-level collation then sets the collation for the[model]DB which is the template used when creating new DBs, but the collation can be changed when executingCREATE DATABASEby specifying theCOLLATEclause. This database-level collation is used for variable and string literals, as well as the default for new (and altered!) columns when theCOLLATEclause is not specified (which is the case for the example code in the question). - For more info on Collations / encodings / Unicode, please visit: Collations Info
add a comment
|
indexing against case insensitive strings yet the case of the data is persisted. How does this actually work?
This is actually not a SQL Server specific behavior, it's just how these things work in general.
So, the data is the data. If you are speaking about an index specifically, the data needs to be stored as it is else it would require a look-up in the main table each time to get the actual value, and there would be no possibility of a covering index (at least not for string types).
The data, either in the table/clustered index or non-clustered index, does not contain any collation / sorting info. It is simply data. The collation (locale/culture rules and sensitivities) is just meta data attached to the column and used when a sort operation is called (unless overridden by a COLLATE clause), which would include the creation/rebuild of an index. The rules defined by a non-binary collation are used to generate sort-keys, which are binary representations of the string (sort keys are unnecessary in binary collations). These binary representations incorporate all of the locale/culture rules and selected sensitivities. The sort-keys are used to place the records in their proper order, but are not themselves stored in the index or table. They aren't stored (at least I haven't seen these values in the index and was told that they aren't stored) because:
- They aren't truly needed for sorting since they would merely be in the same order as the rows in the table or index anyway. But, the physical order of the index is just sorting, not comparison.
- While storing them might make comparisons faster, it would also make the index larger as the minimum size for a single character is 5 bytes, and that's just "overhead" (of the sort key structure). Most characters are 2 bytes each, plus 1 byte if there's an accent, plus 1 byte if it's upper-case. For example, "e" is a 7-byte key, "E" and "é" are both 8 bytes, and "É" is a 9-byte key. Hence, not worth storing these in the end.
There are two types of collations: SQL Server and Windows.
SQL Server
SQL Server collations (those with names starting with SQL_) are the older, pre-SQL Server 2000 way of sorting/comparing (even though SQL_Latin1_General_CP1_CI_AS is still the installation default on US English OSes, quite sadly). In this older, simplistic, non-Unicode model, each combination of locale, code page, and the various sensitivities are given a static mapping of each of the characters in that code page. Each character is assigned a value (i.e. sort weight) to denote how it equates with the others. Comparisons in this model appear to do a two-pass operation:
- First, it removes all accents (such that " ü " becomes " u "), expands characters like " Æ " into " A " and " E ", then does an initial sort so that words are in a natural order (how you would expect to find them in a dictionary).
- Then, it goes character by character to determine equality based on these underlying values per each character. This second part is what mustaccio is describing in his answer.
The only sensitivities that can be adjusted in these collations are: "case" and "accent" ("width", "kana type" and "variation selector" are not available). Also, none of these collations support Supplementary Characters (which makes sense as those are Unicode-specific and these collations only apply to non-Unicode data).
This approach applies only to non-Unicode VARCHAR data. Each unique combination of locale, code page, case-sensitivity, and accent-sensitivity has a specific "sort ID", which you can see in the following example:
SELECT COLLATIONPROPERTY(N'SQL_Latin1_General_CP1_CI_AS', 'SortID'), -- 52
COLLATIONPROPERTY(N'SQL_Latin1_General_CP1_CS_AS', 'SortID'), -- 51
COLLATIONPROPERTY(N'Latin1_General_100_CI_AS', 'SortID'); -- 0
The only difference between the first two collations is the case-sensitivity. The third collation is a Windows collation and so does not have a static mapping table.
Also, these collations should sort and compare faster than the Windows collations due to being simple lookups for character to sort weight. However, these collations are also far less functional and should generally be avoided if at all possible.
Windows
Windows collations (those with names not starting with SQL_) are the newer (starting in SQL Server 2000) way of sorting/comparing. In this newer, complex, Unicode model, each combination of locale, code page, and the various sensitivities are not given a static mapping. For one thing, there are no code pages in this model. This model assigns a default sort value to each character, and then each locale/culture can re-assign sort values to any number of characters. This allows multiple cultures to use the same characters in different ways. This does have the affect of allowing for multiple languages to be sorted naturally using the same collation if they do not use the same characters (and if one of them does not need to re-assign any values and can simply use the defaults).
The sort values in this model are not single values. They are an array of values that assign relative weights to the base letter, any diacritics (i.e. accents), casing, etc. If the collation is case-sensitive, then the "case" portion of that array is used, otherwise it's ignored (hence, insensitive). If the collation is accent-sensitive, then the "diacritic" portion of the array is used, otherwise it's ignored (hence, insensitive).
Comparisons in this model are a multi-pass operation:
- First, the string is normalized so that various ways of representing the same character will equate. For example, " ü " could be a single character / code point (U+00FC). You could also combine a non-accented " u " (U+0075) with a Combining Diaeresis " ̈ " (U+0308) to get: " ü ", which not only looks the same when rendered (unless there is a problem with your font), but is also considered to be the same as the single character version (U+00FC), unless using a binary collation (which compares bytes instead of characters). Normalization breaks the single character into the various pieces, which includes expansions for characters like " Æ " (as noted above for SQL Server collations).
- The comparison operation in this model goes character by character per each sensitivity. Sort keys for the strings are determined by applying the appropriate elements of each characters collation array of values based on which sensitivities are "sensitive". The sort key values are arranged by all of the primary sensitivities of each character (the base character), followed by all of the secondary sensitivities (diacritic weight), followed by the case weight of each character, and so on.
- Sorting is performed based on the calculated sort keys. With each sensitivity grouped together, you can get a different sort order than you would with an equivalent SQL Server collation when comparing strings of multiple characters, and accents are involved, and the collation is accent-sensitive (and even more so if the collation is also case-sensitive).
For more details on this sorting, I will eventually publish a post that shows the sort key values, how they are calculated, the differences between SQL Server and Windows collations, etc. But for now, please see my answer to: Accent Sensitive Sort (please note that the other answer to that question is a good explanation of the official Unicode algorithm, but SQL Server instead uses a custom, though similar, algorithm, and even a custom weight table).
All sensitivities can be adjusted in these collations: "case", "accent", "width", "kana type", and "variation selector" (starting in SQL Server 2017, and only for the Japanese collations). Also, some of these collations (when used with Unicode data) support Supplementary Characters (starting in SQL Server 2012). This approach applies to both NVARCHAR and VARCHAR data (even non-Unicode data). It applies to non-Unicode VARCHAR data by first converting the value to Unicode internally, and then applying the sort/comparison rules.
Please note:
- There is no universal default collation for SQL Server. There is an installation default which differs based on the current locale/language setting of the OS at time of installation (which is unfortunately
SQL_Latin1_General_CP1_CI_ASfor US English systems, so please vote for this suggestion). This can be changed during installation. This instance-level collation then sets the collation for the[model]DB which is the template used when creating new DBs, but the collation can be changed when executingCREATE DATABASEby specifying theCOLLATEclause. This database-level collation is used for variable and string literals, as well as the default for new (and altered!) columns when theCOLLATEclause is not specified (which is the case for the example code in the question). - For more info on Collations / encodings / Unicode, please visit: Collations Info
indexing against case insensitive strings yet the case of the data is persisted. How does this actually work?
This is actually not a SQL Server specific behavior, it's just how these things work in general.
So, the data is the data. If you are speaking about an index specifically, the data needs to be stored as it is else it would require a look-up in the main table each time to get the actual value, and there would be no possibility of a covering index (at least not for string types).
The data, either in the table/clustered index or non-clustered index, does not contain any collation / sorting info. It is simply data. The collation (locale/culture rules and sensitivities) is just meta data attached to the column and used when a sort operation is called (unless overridden by a COLLATE clause), which would include the creation/rebuild of an index. The rules defined by a non-binary collation are used to generate sort-keys, which are binary representations of the string (sort keys are unnecessary in binary collations). These binary representations incorporate all of the locale/culture rules and selected sensitivities. The sort-keys are used to place the records in their proper order, but are not themselves stored in the index or table. They aren't stored (at least I haven't seen these values in the index and was told that they aren't stored) because:
- They aren't truly needed for sorting since they would merely be in the same order as the rows in the table or index anyway. But, the physical order of the index is just sorting, not comparison.
- While storing them might make comparisons faster, it would also make the index larger as the minimum size for a single character is 5 bytes, and that's just "overhead" (of the sort key structure). Most characters are 2 bytes each, plus 1 byte if there's an accent, plus 1 byte if it's upper-case. For example, "e" is a 7-byte key, "E" and "é" are both 8 bytes, and "É" is a 9-byte key. Hence, not worth storing these in the end.
There are two types of collations: SQL Server and Windows.
SQL Server
SQL Server collations (those with names starting with SQL_) are the older, pre-SQL Server 2000 way of sorting/comparing (even though SQL_Latin1_General_CP1_CI_AS is still the installation default on US English OSes, quite sadly). In this older, simplistic, non-Unicode model, each combination of locale, code page, and the various sensitivities are given a static mapping of each of the characters in that code page. Each character is assigned a value (i.e. sort weight) to denote how it equates with the others. Comparisons in this model appear to do a two-pass operation:
- First, it removes all accents (such that " ü " becomes " u "), expands characters like " Æ " into " A " and " E ", then does an initial sort so that words are in a natural order (how you would expect to find them in a dictionary).
- Then, it goes character by character to determine equality based on these underlying values per each character. This second part is what mustaccio is describing in his answer.
The only sensitivities that can be adjusted in these collations are: "case" and "accent" ("width", "kana type" and "variation selector" are not available). Also, none of these collations support Supplementary Characters (which makes sense as those are Unicode-specific and these collations only apply to non-Unicode data).
This approach applies only to non-Unicode VARCHAR data. Each unique combination of locale, code page, case-sensitivity, and accent-sensitivity has a specific "sort ID", which you can see in the following example:
SELECT COLLATIONPROPERTY(N'SQL_Latin1_General_CP1_CI_AS', 'SortID'), -- 52
COLLATIONPROPERTY(N'SQL_Latin1_General_CP1_CS_AS', 'SortID'), -- 51
COLLATIONPROPERTY(N'Latin1_General_100_CI_AS', 'SortID'); -- 0
The only difference between the first two collations is the case-sensitivity. The third collation is a Windows collation and so does not have a static mapping table.
Also, these collations should sort and compare faster than the Windows collations due to being simple lookups for character to sort weight. However, these collations are also far less functional and should generally be avoided if at all possible.
Windows
Windows collations (those with names not starting with SQL_) are the newer (starting in SQL Server 2000) way of sorting/comparing. In this newer, complex, Unicode model, each combination of locale, code page, and the various sensitivities are not given a static mapping. For one thing, there are no code pages in this model. This model assigns a default sort value to each character, and then each locale/culture can re-assign sort values to any number of characters. This allows multiple cultures to use the same characters in different ways. This does have the affect of allowing for multiple languages to be sorted naturally using the same collation if they do not use the same characters (and if one of them does not need to re-assign any values and can simply use the defaults).
The sort values in this model are not single values. They are an array of values that assign relative weights to the base letter, any diacritics (i.e. accents), casing, etc. If the collation is case-sensitive, then the "case" portion of that array is used, otherwise it's ignored (hence, insensitive). If the collation is accent-sensitive, then the "diacritic" portion of the array is used, otherwise it's ignored (hence, insensitive).
Comparisons in this model are a multi-pass operation:
- First, the string is normalized so that various ways of representing the same character will equate. For example, " ü " could be a single character / code point (U+00FC). You could also combine a non-accented " u " (U+0075) with a Combining Diaeresis " ̈ " (U+0308) to get: " ü ", which not only looks the same when rendered (unless there is a problem with your font), but is also considered to be the same as the single character version (U+00FC), unless using a binary collation (which compares bytes instead of characters). Normalization breaks the single character into the various pieces, which includes expansions for characters like " Æ " (as noted above for SQL Server collations).
- The comparison operation in this model goes character by character per each sensitivity. Sort keys for the strings are determined by applying the appropriate elements of each characters collation array of values based on which sensitivities are "sensitive". The sort key values are arranged by all of the primary sensitivities of each character (the base character), followed by all of the secondary sensitivities (diacritic weight), followed by the case weight of each character, and so on.
- Sorting is performed based on the calculated sort keys. With each sensitivity grouped together, you can get a different sort order than you would with an equivalent SQL Server collation when comparing strings of multiple characters, and accents are involved, and the collation is accent-sensitive (and even more so if the collation is also case-sensitive).
For more details on this sorting, I will eventually publish a post that shows the sort key values, how they are calculated, the differences between SQL Server and Windows collations, etc. But for now, please see my answer to: Accent Sensitive Sort (please note that the other answer to that question is a good explanation of the official Unicode algorithm, but SQL Server instead uses a custom, though similar, algorithm, and even a custom weight table).
All sensitivities can be adjusted in these collations: "case", "accent", "width", "kana type", and "variation selector" (starting in SQL Server 2017, and only for the Japanese collations). Also, some of these collations (when used with Unicode data) support Supplementary Characters (starting in SQL Server 2012). This approach applies to both NVARCHAR and VARCHAR data (even non-Unicode data). It applies to non-Unicode VARCHAR data by first converting the value to Unicode internally, and then applying the sort/comparison rules.
Please note:
- There is no universal default collation for SQL Server. There is an installation default which differs based on the current locale/language setting of the OS at time of installation (which is unfortunately
SQL_Latin1_General_CP1_CI_ASfor US English systems, so please vote for this suggestion). This can be changed during installation. This instance-level collation then sets the collation for the[model]DB which is the template used when creating new DBs, but the collation can be changed when executingCREATE DATABASEby specifying theCOLLATEclause. This database-level collation is used for variable and string literals, as well as the default for new (and altered!) columns when theCOLLATEclause is not specified (which is the case for the example code in the question). - For more info on Collations / encodings / Unicode, please visit: Collations Info
edited Oct 2 at 16:41
answered Sep 26 at 20:18
Solomon RutzkySolomon Rutzky
55.8k5 gold badges101 silver badges215 bronze badges
55.8k5 gold badges101 silver badges215 bronze badges
add a comment
|
add a comment
|
Typically this is implemented using collation tables that assign a certain score to each character. The sorting routine has a comparator that uses an appropriate table, whether default or specified explicitly, to compare strings, character by character, using their collation scores. If, for example, a particular collation table assigns a score of 1 to "a" and 201 to "A", and a lower score in this particular implementation means higher precedence, then "a" will be sorter before "A". Another table might assign reverse scores: 201 to "a" and 1 to "A", and the sort order will be subsequently reverse. Yet another table might assign equal scores to "a", "A", "Á", and "Å", which would lead to a case- and accent-insensitive comparison and sorting.
Similarly, such a collation table-based comparator used when comparing an index key with the value supplied in the predicate.
1
Just FYI: this info is only correct in terms of using SQL Server collations (i.e. those with names starting withSQL_) when used onVARCHARdata. This is not exactly true forNVARCHARdata orVARCHARdata when using a Windows collation (names not starting withSQL_).
– Solomon Rutzky
Sep 26 at 19:58
add a comment
|
Typically this is implemented using collation tables that assign a certain score to each character. The sorting routine has a comparator that uses an appropriate table, whether default or specified explicitly, to compare strings, character by character, using their collation scores. If, for example, a particular collation table assigns a score of 1 to "a" and 201 to "A", and a lower score in this particular implementation means higher precedence, then "a" will be sorter before "A". Another table might assign reverse scores: 201 to "a" and 1 to "A", and the sort order will be subsequently reverse. Yet another table might assign equal scores to "a", "A", "Á", and "Å", which would lead to a case- and accent-insensitive comparison and sorting.
Similarly, such a collation table-based comparator used when comparing an index key with the value supplied in the predicate.
1
Just FYI: this info is only correct in terms of using SQL Server collations (i.e. those with names starting withSQL_) when used onVARCHARdata. This is not exactly true forNVARCHARdata orVARCHARdata when using a Windows collation (names not starting withSQL_).
– Solomon Rutzky
Sep 26 at 19:58
add a comment
|
Typically this is implemented using collation tables that assign a certain score to each character. The sorting routine has a comparator that uses an appropriate table, whether default or specified explicitly, to compare strings, character by character, using their collation scores. If, for example, a particular collation table assigns a score of 1 to "a" and 201 to "A", and a lower score in this particular implementation means higher precedence, then "a" will be sorter before "A". Another table might assign reverse scores: 201 to "a" and 1 to "A", and the sort order will be subsequently reverse. Yet another table might assign equal scores to "a", "A", "Á", and "Å", which would lead to a case- and accent-insensitive comparison and sorting.
Similarly, such a collation table-based comparator used when comparing an index key with the value supplied in the predicate.
Typically this is implemented using collation tables that assign a certain score to each character. The sorting routine has a comparator that uses an appropriate table, whether default or specified explicitly, to compare strings, character by character, using their collation scores. If, for example, a particular collation table assigns a score of 1 to "a" and 201 to "A", and a lower score in this particular implementation means higher precedence, then "a" will be sorter before "A". Another table might assign reverse scores: 201 to "a" and 1 to "A", and the sort order will be subsequently reverse. Yet another table might assign equal scores to "a", "A", "Á", and "Å", which would lead to a case- and accent-insensitive comparison and sorting.
Similarly, such a collation table-based comparator used when comparing an index key with the value supplied in the predicate.
edited Sep 26 at 19:28
answered Sep 26 at 19:19
mustacciomustaccio
14.1k9 gold badges31 silver badges49 bronze badges
14.1k9 gold badges31 silver badges49 bronze badges
1
Just FYI: this info is only correct in terms of using SQL Server collations (i.e. those with names starting withSQL_) when used onVARCHARdata. This is not exactly true forNVARCHARdata orVARCHARdata when using a Windows collation (names not starting withSQL_).
– Solomon Rutzky
Sep 26 at 19:58
add a comment
|
1
Just FYI: this info is only correct in terms of using SQL Server collations (i.e. those with names starting withSQL_) when used onVARCHARdata. This is not exactly true forNVARCHARdata orVARCHARdata when using a Windows collation (names not starting withSQL_).
– Solomon Rutzky
Sep 26 at 19:58
1
1
Just FYI: this info is only correct in terms of using SQL Server collations (i.e. those with names starting with
SQL_) when used on VARCHAR data. This is not exactly true for NVARCHAR data or VARCHAR data when using a Windows collation (names not starting with SQL_).– Solomon Rutzky
Sep 26 at 19:58
Just FYI: this info is only correct in terms of using SQL Server collations (i.e. those with names starting with
SQL_) when used on VARCHAR data. This is not exactly true for NVARCHAR data or VARCHAR data when using a Windows collation (names not starting with SQL_).– Solomon Rutzky
Sep 26 at 19:58
add a comment
|
Thanks for contributing an answer to Database Administrators Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f249715%2fhow-does-case-insensitive-collation-work%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
You can efficiently query (e.g. utilizing an index seek) case-insensitive strings against a case-sensitive field, but it's a little annoying.
– John Eisbrener
Sep 26 at 20:47
cocogorilla: please see note #1 that I just added to the end of my answer re: the "default" collation.
– Solomon Rutzky
Oct 2 at 16:42