SQL Server has encountered occurences of I/O requests taking longer than 15 secondsSlow checkpoint and 15 second I/O warnings on flash storageHelp on building a new Serverinnodb_flush_method=O_DIRECT vs O_DSYNC performance impact on ext3 with LVM disk partitionLog all than queries taking more than X secondsWhat frequency of hash/sort spills into tempdb is concerning?Lower performance on newest serversSQL Server Data File/Log File poor performanceSQL Server query on server taking longer than laptopIF EXISTS taking longer than embedded select statementBest Way To Migrate Existing Production Database To New Physical Storage StructureSlow checkpoint and 15 second I/O warnings on flash storage
Ubuntu 19.10: why is the "locate" command missing?
Why is Macron seemingly opposing an article 50 extension?
If I attempt to use the Teleport spell to teleport into an area warded by the Forbiddance spell, is my spell slot expended?
Etymology of Persian سبز (sabz), meaning "green"
Curious about definition of well-ordered set
what would allow for the use of cannons but not handheld guns
How can women avoid talking during dates while keeping men at ease?
Can I reuse old electrical wire?
Why does it seem everything I push moves at a constant velocity?
Regular expression grep -r 'emm*[a-f].[^ta]$'
Encrypt folder on Linux that can only be decrypted on that specific OS and device
Is it a Hamiltonian cycle on a grid?
Word or phrase for turning the tide against a rival in a competition in the last moments
Logic - How to say "Not only but also".
Why are there two King of Diamonds entries for the Curse of Strahd tarokka card readings?
What is the history of the Eldritch Knight as a class/character option?
Can I include an image covered by CC Attribution-ShareAlike 4.0 in a deck for a paid online course?
Circuit breaker door shocked me. Static?
Simple n-body class in C++
Do you make me up?
What are the correct pronouns for referring to someone whom I have never met in person when the gender is apparent?
How to call my own phone with a loud ringing tone (in order to find it in my house) even if it is in silent mode?
Ultra-Relativistic and Non-Relativistic cases for energy of a particle
Authentication versus Authorisation
SQL Server has encountered occurences of I/O requests taking longer than 15 seconds
Slow checkpoint and 15 second I/O warnings on flash storageHelp on building a new Serverinnodb_flush_method=O_DIRECT vs O_DSYNC performance impact on ext3 with LVM disk partitionLog all than queries taking more than X secondsWhat frequency of hash/sort spills into tempdb is concerning?Lower performance on newest serversSQL Server Data File/Log File poor performanceSQL Server query on server taking longer than laptopIF EXISTS taking longer than embedded select statementBest Way To Migrate Existing Production Database To New Physical Storage StructureSlow checkpoint and 15 second I/O warnings on flash storage
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty
margin-bottom:0;
On Production SQL Server, we have following config:
3 Dell PowerEdge R630 servers, combined into Availability Group
All 3 are connected to single Dell SAN storage unit which is a RAID array
From time to time, on PRIMARY we are seeing messages similar to below:
SQL Server has encountered 11 occurrence(s) of I/O requests taking longer
than 15 seconds to complete on file [F:DataMyDatabase.mdf] in database id 8.
The OS file handle is 0x0000000000001FBC.
The offset of the latest long I/O is: 0x000004295d0000.
The duration of the long I/O is: 37397 ms.
We are novice in performance troubleshooting
What are the most common ways or best practices in troubleshooting this particular issue related to storage ?
What performance counters, tools, monitors, apps, etc. must be used to narrow down to the root cause of such messages ?
Might be there is a Extended Events that can help, or some kind of audit / logging ?
performance performance-tuning sql-server-2017 storage
add a comment
|
On Production SQL Server, we have following config:
3 Dell PowerEdge R630 servers, combined into Availability Group
All 3 are connected to single Dell SAN storage unit which is a RAID array
From time to time, on PRIMARY we are seeing messages similar to below:
SQL Server has encountered 11 occurrence(s) of I/O requests taking longer
than 15 seconds to complete on file [F:DataMyDatabase.mdf] in database id 8.
The OS file handle is 0x0000000000001FBC.
The offset of the latest long I/O is: 0x000004295d0000.
The duration of the long I/O is: 37397 ms.
We are novice in performance troubleshooting
What are the most common ways or best practices in troubleshooting this particular issue related to storage ?
What performance counters, tools, monitors, apps, etc. must be used to narrow down to the root cause of such messages ?
Might be there is a Extended Events that can help, or some kind of audit / logging ?
performance performance-tuning sql-server-2017 storage
5
Related: Slow checkpoint and 15 second I/O warnings on flash storage
– Sean Gallardy
Jun 13 at 13:24
Is SQL Server running in a VM on those physical machines? If so, you need to ensure the hypervisor is setup correctly, and each VM is configured properly. For VMware, check vmware.com/content/dam/digitalmarketing/vmware/en/pdf/solutions/…
– Max Vernon
Jun 14 at 19:14
add a comment
|
On Production SQL Server, we have following config:
3 Dell PowerEdge R630 servers, combined into Availability Group
All 3 are connected to single Dell SAN storage unit which is a RAID array
From time to time, on PRIMARY we are seeing messages similar to below:
SQL Server has encountered 11 occurrence(s) of I/O requests taking longer
than 15 seconds to complete on file [F:DataMyDatabase.mdf] in database id 8.
The OS file handle is 0x0000000000001FBC.
The offset of the latest long I/O is: 0x000004295d0000.
The duration of the long I/O is: 37397 ms.
We are novice in performance troubleshooting
What are the most common ways or best practices in troubleshooting this particular issue related to storage ?
What performance counters, tools, monitors, apps, etc. must be used to narrow down to the root cause of such messages ?
Might be there is a Extended Events that can help, or some kind of audit / logging ?
performance performance-tuning sql-server-2017 storage
On Production SQL Server, we have following config:
3 Dell PowerEdge R630 servers, combined into Availability Group
All 3 are connected to single Dell SAN storage unit which is a RAID array
From time to time, on PRIMARY we are seeing messages similar to below:
SQL Server has encountered 11 occurrence(s) of I/O requests taking longer
than 15 seconds to complete on file [F:DataMyDatabase.mdf] in database id 8.
The OS file handle is 0x0000000000001FBC.
The offset of the latest long I/O is: 0x000004295d0000.
The duration of the long I/O is: 37397 ms.
We are novice in performance troubleshooting
What are the most common ways or best practices in troubleshooting this particular issue related to storage ?
What performance counters, tools, monitors, apps, etc. must be used to narrow down to the root cause of such messages ?
Might be there is a Extended Events that can help, or some kind of audit / logging ?
performance performance-tuning sql-server-2017 storage
performance performance-tuning sql-server-2017 storage
asked Jun 13 at 9:47
voodoo_shvoodoo_sh
4113 silver badges19 bronze badges
4113 silver badges19 bronze badges
5
Related: Slow checkpoint and 15 second I/O warnings on flash storage
– Sean Gallardy
Jun 13 at 13:24
Is SQL Server running in a VM on those physical machines? If so, you need to ensure the hypervisor is setup correctly, and each VM is configured properly. For VMware, check vmware.com/content/dam/digitalmarketing/vmware/en/pdf/solutions/…
– Max Vernon
Jun 14 at 19:14
add a comment
|
5
Related: Slow checkpoint and 15 second I/O warnings on flash storage
– Sean Gallardy
Jun 13 at 13:24
Is SQL Server running in a VM on those physical machines? If so, you need to ensure the hypervisor is setup correctly, and each VM is configured properly. For VMware, check vmware.com/content/dam/digitalmarketing/vmware/en/pdf/solutions/…
– Max Vernon
Jun 14 at 19:14
5
5
Related: Slow checkpoint and 15 second I/O warnings on flash storage
– Sean Gallardy
Jun 13 at 13:24
Related: Slow checkpoint and 15 second I/O warnings on flash storage
– Sean Gallardy
Jun 13 at 13:24
Is SQL Server running in a VM on those physical machines? If so, you need to ensure the hypervisor is setup correctly, and each VM is configured properly. For VMware, check vmware.com/content/dam/digitalmarketing/vmware/en/pdf/solutions/…
– Max Vernon
Jun 14 at 19:14
Is SQL Server running in a VM on those physical machines? If so, you need to ensure the hypervisor is setup correctly, and each VM is configured properly. For VMware, check vmware.com/content/dam/digitalmarketing/vmware/en/pdf/solutions/…
– Max Vernon
Jun 14 at 19:14
add a comment
|
3 Answers
3
active
oldest
votes
We have a similar setup and recently encountered these messages in the logs. We are using a DELL Compellent SAN. Here are some things to check when receiving these messages that helped us find a solution
- Review your windows performance counters for your disks that the warning messages are pointing to, specifically:
- Disk avg. read time
- Disk avg. write time
- Disk read bytes/sec
- Disk write bytes/sec
- Disk Transfers/sec
- Avg. disk queue length
- The above are averages. If you have many database files on one drive these averages can skew the result and mask a bottle neck on specific database files. Check out this query from Paul S. Randal which returns average latency for each file from the dmv
sys.dm_io_virtual_file_stats
. In our case the average latency reported was acceptable, but underneath the covers we had many files with > 200 ms average latency. - Check the timings. Is there any pattern? Does it happen more frequently at certain a time in the night? If so check if any maintenance jobs are running at that time or any scheduled activity which may increase disk activity and expose a bottle neck in your IO subsystem.
- Check the windows event viewer for errors. If your switch or SAN is being overloaded or not setup properly for your application you may find some messages in this log, and it is good to take this information to your SAN admin. In our case we were receiving iSCSI connection errors often throughout the day, hinting at the problem.
- Review your SQL Server code. When you receive these messages you shouldn't immediately think it is an IO subsystem issue and pass it to your SAN admin. You need to do your part and review the database. Do you have really bad queries being run often churning through tons of data? Bad indexing? Excessive transaction log writes? You can use some open source queries to get a health check on your database, an example for checking how your query plan looks is sp_blitzCache
- Don't ignore these. Today you may be receiving them a few times a day... then several months later when your workload increases and you forgot to monitor them they start to increase. Receiving lots of these messages can prevent SQL Server from accessing a certain file, and if it is tempdb, that is not good. In our case it got so bad that SQL Server shut itself down.
Our solution was upgrading our switch to a SAN switch. Yes, these are all points to cover within SQL Server. What led us to finding out it was the switch was that we were receiving about 1500 iSCSI pdu disconnect errors in the Windows application event viewer on the SQL Server every day. That prompted the investigation by our SAN admins into the switch.
Immediately after upgrading, the iSCSI errors were gone and average latency came down to around 50 ms for all the files, and that correlated to better performance in the application. With these points in mind hopefully you can find your solution.
1
So system events, not in SQL Server, led you to the resolution, correct? Can you offer any other encompassing troubleshooting help do narrow down if the issue is something internal to SQL Server, at the OS level, Filesystem level, or storage area networking level?
– Sean Gallardy
Jun 14 at 14:02
That is correct Sean. I may be able to add some more information as you suggest, I'll update my answer once I put that together.
– kevinnwhat
Jun 14 at 14:21
add a comment
|
This is far less often a disk issue, and far more often a networking issue. You know, the N in SAN?
If you go to your SAN team and start talking about the disks being slow, they're gonna show you a fancy graph with 0 millisecond latency on it and then point a stapler at you.
Instead, ask them about the network path to the SAN. Get speeds, if it's multipathed, etc. Get numbers from them about the speeds you should be seeing. Ask if they have benchmarks from when the servers were set up.
Then you can use Crystal Disk Mark or diskpd to validate those speeds. If they don't line up, again, it's most likely the networking.
You should also search your error log for messages that contain "FlushCache" and "saturation", because those can also be signs of network contention.
One thing you can do to avoid those things as a DBA is make sure that your maintenance and any other data-heavy tasks (like ETL) aren't going on at the same time. That can definitely put a lot of pressure on storage networking.
You may also want to check the answers here for more suggestions: Slow checkpoint and 15 second I/O warnings on flash storage
I blogged about a similar topic here: From The Server To The SAN
add a comment
|
Why storing the data on a SAN? What's the point? All database performance is tied to Disk I/O and you are using 3 servers with only one device for the I/O behind them. That makes no sense... and unfortunately so common.
I spend my life encountering poorly designed hardware platforms where people just try to design a large scale computer. All CPU power here, all disks there... hopefully there is not such a thing as remote RAM. And the saddest is they compensate the lack of efficiency of this design with huge servers that cost ten time more than they should. I saw $400k infra slower than a $1k laptop.
A SQL server software is a very advanced piece of software, it is designed to take advantage of any bits of hardware, CPU cores, CPU cache, TLB, RAM, disk controllers, hard drive cache... They almost include all filesystem logic. They are developed on regular computer and benchmarked on high end systems. Therfore a SQL server must have its own disks. Installing them on a SAN is like "emulating" a computer, you lose all performance optimisations. SANs are for storing backups, immutable files, and files you just append data to (logs).
Datacenter administrators tend to put all they can on SANs because this way they have only one pool of storage to manage, it's more easy than caring for storage on each server. It's a "I don't want to do my job" choice, and a very bad one, because then they have to deal with performance problems and all the company suffer from this. Just install software on the hardware it is designed for. Keep it simple. Care for I/O bandwidth, cache and context switch overhead, ressource jitter (happens when ressource is shared). You'll end up maintaining 1/10th of the devices for the same raw output power, save your ops team lot of headaches, gain performance that makes your end users happy and more productive, make your company a better place to work in, and save lot of energy (the planet will thank you).
You said in comments, you are considering to put SSD in your server. You will not recognize your setup with dedicated SSDs, compared to a SAN you'll get something like 500x improvement even with data and transaction log files on same drive. A state of the art SQL Server would have fast separate SSD for data and transaction log on different hardware controllers channels (most server motherboard have several). But compared to your current setup we are talking of sci-fi there. Just give SSD a try.
1
It makes me think again about idea of buying dedicated SSD drives for each replica (for data files, maybe also for log files), instead of all 3 using same SAN. I am gradually double checking all items other guys posted above, as well of course
– voodoo_sh
Jun 17 at 16:13
add a comment
|
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "182"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/4.0/"u003ecc by-sa 4.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f240469%2fsql-server-has-encountered-occurences-of-i-o-requests-taking-longer-than-15-seco%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
We have a similar setup and recently encountered these messages in the logs. We are using a DELL Compellent SAN. Here are some things to check when receiving these messages that helped us find a solution
- Review your windows performance counters for your disks that the warning messages are pointing to, specifically:
- Disk avg. read time
- Disk avg. write time
- Disk read bytes/sec
- Disk write bytes/sec
- Disk Transfers/sec
- Avg. disk queue length
- The above are averages. If you have many database files on one drive these averages can skew the result and mask a bottle neck on specific database files. Check out this query from Paul S. Randal which returns average latency for each file from the dmv
sys.dm_io_virtual_file_stats
. In our case the average latency reported was acceptable, but underneath the covers we had many files with > 200 ms average latency. - Check the timings. Is there any pattern? Does it happen more frequently at certain a time in the night? If so check if any maintenance jobs are running at that time or any scheduled activity which may increase disk activity and expose a bottle neck in your IO subsystem.
- Check the windows event viewer for errors. If your switch or SAN is being overloaded or not setup properly for your application you may find some messages in this log, and it is good to take this information to your SAN admin. In our case we were receiving iSCSI connection errors often throughout the day, hinting at the problem.
- Review your SQL Server code. When you receive these messages you shouldn't immediately think it is an IO subsystem issue and pass it to your SAN admin. You need to do your part and review the database. Do you have really bad queries being run often churning through tons of data? Bad indexing? Excessive transaction log writes? You can use some open source queries to get a health check on your database, an example for checking how your query plan looks is sp_blitzCache
- Don't ignore these. Today you may be receiving them a few times a day... then several months later when your workload increases and you forgot to monitor them they start to increase. Receiving lots of these messages can prevent SQL Server from accessing a certain file, and if it is tempdb, that is not good. In our case it got so bad that SQL Server shut itself down.
Our solution was upgrading our switch to a SAN switch. Yes, these are all points to cover within SQL Server. What led us to finding out it was the switch was that we were receiving about 1500 iSCSI pdu disconnect errors in the Windows application event viewer on the SQL Server every day. That prompted the investigation by our SAN admins into the switch.
Immediately after upgrading, the iSCSI errors were gone and average latency came down to around 50 ms for all the files, and that correlated to better performance in the application. With these points in mind hopefully you can find your solution.
1
So system events, not in SQL Server, led you to the resolution, correct? Can you offer any other encompassing troubleshooting help do narrow down if the issue is something internal to SQL Server, at the OS level, Filesystem level, or storage area networking level?
– Sean Gallardy
Jun 14 at 14:02
That is correct Sean. I may be able to add some more information as you suggest, I'll update my answer once I put that together.
– kevinnwhat
Jun 14 at 14:21
add a comment
|
We have a similar setup and recently encountered these messages in the logs. We are using a DELL Compellent SAN. Here are some things to check when receiving these messages that helped us find a solution
- Review your windows performance counters for your disks that the warning messages are pointing to, specifically:
- Disk avg. read time
- Disk avg. write time
- Disk read bytes/sec
- Disk write bytes/sec
- Disk Transfers/sec
- Avg. disk queue length
- The above are averages. If you have many database files on one drive these averages can skew the result and mask a bottle neck on specific database files. Check out this query from Paul S. Randal which returns average latency for each file from the dmv
sys.dm_io_virtual_file_stats
. In our case the average latency reported was acceptable, but underneath the covers we had many files with > 200 ms average latency. - Check the timings. Is there any pattern? Does it happen more frequently at certain a time in the night? If so check if any maintenance jobs are running at that time or any scheduled activity which may increase disk activity and expose a bottle neck in your IO subsystem.
- Check the windows event viewer for errors. If your switch or SAN is being overloaded or not setup properly for your application you may find some messages in this log, and it is good to take this information to your SAN admin. In our case we were receiving iSCSI connection errors often throughout the day, hinting at the problem.
- Review your SQL Server code. When you receive these messages you shouldn't immediately think it is an IO subsystem issue and pass it to your SAN admin. You need to do your part and review the database. Do you have really bad queries being run often churning through tons of data? Bad indexing? Excessive transaction log writes? You can use some open source queries to get a health check on your database, an example for checking how your query plan looks is sp_blitzCache
- Don't ignore these. Today you may be receiving them a few times a day... then several months later when your workload increases and you forgot to monitor them they start to increase. Receiving lots of these messages can prevent SQL Server from accessing a certain file, and if it is tempdb, that is not good. In our case it got so bad that SQL Server shut itself down.
Our solution was upgrading our switch to a SAN switch. Yes, these are all points to cover within SQL Server. What led us to finding out it was the switch was that we were receiving about 1500 iSCSI pdu disconnect errors in the Windows application event viewer on the SQL Server every day. That prompted the investigation by our SAN admins into the switch.
Immediately after upgrading, the iSCSI errors were gone and average latency came down to around 50 ms for all the files, and that correlated to better performance in the application. With these points in mind hopefully you can find your solution.
1
So system events, not in SQL Server, led you to the resolution, correct? Can you offer any other encompassing troubleshooting help do narrow down if the issue is something internal to SQL Server, at the OS level, Filesystem level, or storage area networking level?
– Sean Gallardy
Jun 14 at 14:02
That is correct Sean. I may be able to add some more information as you suggest, I'll update my answer once I put that together.
– kevinnwhat
Jun 14 at 14:21
add a comment
|
We have a similar setup and recently encountered these messages in the logs. We are using a DELL Compellent SAN. Here are some things to check when receiving these messages that helped us find a solution
- Review your windows performance counters for your disks that the warning messages are pointing to, specifically:
- Disk avg. read time
- Disk avg. write time
- Disk read bytes/sec
- Disk write bytes/sec
- Disk Transfers/sec
- Avg. disk queue length
- The above are averages. If you have many database files on one drive these averages can skew the result and mask a bottle neck on specific database files. Check out this query from Paul S. Randal which returns average latency for each file from the dmv
sys.dm_io_virtual_file_stats
. In our case the average latency reported was acceptable, but underneath the covers we had many files with > 200 ms average latency. - Check the timings. Is there any pattern? Does it happen more frequently at certain a time in the night? If so check if any maintenance jobs are running at that time or any scheduled activity which may increase disk activity and expose a bottle neck in your IO subsystem.
- Check the windows event viewer for errors. If your switch or SAN is being overloaded or not setup properly for your application you may find some messages in this log, and it is good to take this information to your SAN admin. In our case we were receiving iSCSI connection errors often throughout the day, hinting at the problem.
- Review your SQL Server code. When you receive these messages you shouldn't immediately think it is an IO subsystem issue and pass it to your SAN admin. You need to do your part and review the database. Do you have really bad queries being run often churning through tons of data? Bad indexing? Excessive transaction log writes? You can use some open source queries to get a health check on your database, an example for checking how your query plan looks is sp_blitzCache
- Don't ignore these. Today you may be receiving them a few times a day... then several months later when your workload increases and you forgot to monitor them they start to increase. Receiving lots of these messages can prevent SQL Server from accessing a certain file, and if it is tempdb, that is not good. In our case it got so bad that SQL Server shut itself down.
Our solution was upgrading our switch to a SAN switch. Yes, these are all points to cover within SQL Server. What led us to finding out it was the switch was that we were receiving about 1500 iSCSI pdu disconnect errors in the Windows application event viewer on the SQL Server every day. That prompted the investigation by our SAN admins into the switch.
Immediately after upgrading, the iSCSI errors were gone and average latency came down to around 50 ms for all the files, and that correlated to better performance in the application. With these points in mind hopefully you can find your solution.
We have a similar setup and recently encountered these messages in the logs. We are using a DELL Compellent SAN. Here are some things to check when receiving these messages that helped us find a solution
- Review your windows performance counters for your disks that the warning messages are pointing to, specifically:
- Disk avg. read time
- Disk avg. write time
- Disk read bytes/sec
- Disk write bytes/sec
- Disk Transfers/sec
- Avg. disk queue length
- The above are averages. If you have many database files on one drive these averages can skew the result and mask a bottle neck on specific database files. Check out this query from Paul S. Randal which returns average latency for each file from the dmv
sys.dm_io_virtual_file_stats
. In our case the average latency reported was acceptable, but underneath the covers we had many files with > 200 ms average latency. - Check the timings. Is there any pattern? Does it happen more frequently at certain a time in the night? If so check if any maintenance jobs are running at that time or any scheduled activity which may increase disk activity and expose a bottle neck in your IO subsystem.
- Check the windows event viewer for errors. If your switch or SAN is being overloaded or not setup properly for your application you may find some messages in this log, and it is good to take this information to your SAN admin. In our case we were receiving iSCSI connection errors often throughout the day, hinting at the problem.
- Review your SQL Server code. When you receive these messages you shouldn't immediately think it is an IO subsystem issue and pass it to your SAN admin. You need to do your part and review the database. Do you have really bad queries being run often churning through tons of data? Bad indexing? Excessive transaction log writes? You can use some open source queries to get a health check on your database, an example for checking how your query plan looks is sp_blitzCache
- Don't ignore these. Today you may be receiving them a few times a day... then several months later when your workload increases and you forgot to monitor them they start to increase. Receiving lots of these messages can prevent SQL Server from accessing a certain file, and if it is tempdb, that is not good. In our case it got so bad that SQL Server shut itself down.
Our solution was upgrading our switch to a SAN switch. Yes, these are all points to cover within SQL Server. What led us to finding out it was the switch was that we were receiving about 1500 iSCSI pdu disconnect errors in the Windows application event viewer on the SQL Server every day. That prompted the investigation by our SAN admins into the switch.
Immediately after upgrading, the iSCSI errors were gone and average latency came down to around 50 ms for all the files, and that correlated to better performance in the application. With these points in mind hopefully you can find your solution.
edited Jun 17 at 17:36
Paul White♦
60.2k16 gold badges311 silver badges490 bronze badges
60.2k16 gold badges311 silver badges490 bronze badges
answered Jun 14 at 1:50
kevinnwhatkevinnwhat
1,2791 gold badge1 silver badge14 bronze badges
1,2791 gold badge1 silver badge14 bronze badges
1
So system events, not in SQL Server, led you to the resolution, correct? Can you offer any other encompassing troubleshooting help do narrow down if the issue is something internal to SQL Server, at the OS level, Filesystem level, or storage area networking level?
– Sean Gallardy
Jun 14 at 14:02
That is correct Sean. I may be able to add some more information as you suggest, I'll update my answer once I put that together.
– kevinnwhat
Jun 14 at 14:21
add a comment
|
1
So system events, not in SQL Server, led you to the resolution, correct? Can you offer any other encompassing troubleshooting help do narrow down if the issue is something internal to SQL Server, at the OS level, Filesystem level, or storage area networking level?
– Sean Gallardy
Jun 14 at 14:02
That is correct Sean. I may be able to add some more information as you suggest, I'll update my answer once I put that together.
– kevinnwhat
Jun 14 at 14:21
1
1
So system events, not in SQL Server, led you to the resolution, correct? Can you offer any other encompassing troubleshooting help do narrow down if the issue is something internal to SQL Server, at the OS level, Filesystem level, or storage area networking level?
– Sean Gallardy
Jun 14 at 14:02
So system events, not in SQL Server, led you to the resolution, correct? Can you offer any other encompassing troubleshooting help do narrow down if the issue is something internal to SQL Server, at the OS level, Filesystem level, or storage area networking level?
– Sean Gallardy
Jun 14 at 14:02
That is correct Sean. I may be able to add some more information as you suggest, I'll update my answer once I put that together.
– kevinnwhat
Jun 14 at 14:21
That is correct Sean. I may be able to add some more information as you suggest, I'll update my answer once I put that together.
– kevinnwhat
Jun 14 at 14:21
add a comment
|
This is far less often a disk issue, and far more often a networking issue. You know, the N in SAN?
If you go to your SAN team and start talking about the disks being slow, they're gonna show you a fancy graph with 0 millisecond latency on it and then point a stapler at you.
Instead, ask them about the network path to the SAN. Get speeds, if it's multipathed, etc. Get numbers from them about the speeds you should be seeing. Ask if they have benchmarks from when the servers were set up.
Then you can use Crystal Disk Mark or diskpd to validate those speeds. If they don't line up, again, it's most likely the networking.
You should also search your error log for messages that contain "FlushCache" and "saturation", because those can also be signs of network contention.
One thing you can do to avoid those things as a DBA is make sure that your maintenance and any other data-heavy tasks (like ETL) aren't going on at the same time. That can definitely put a lot of pressure on storage networking.
You may also want to check the answers here for more suggestions: Slow checkpoint and 15 second I/O warnings on flash storage
I blogged about a similar topic here: From The Server To The SAN
add a comment
|
This is far less often a disk issue, and far more often a networking issue. You know, the N in SAN?
If you go to your SAN team and start talking about the disks being slow, they're gonna show you a fancy graph with 0 millisecond latency on it and then point a stapler at you.
Instead, ask them about the network path to the SAN. Get speeds, if it's multipathed, etc. Get numbers from them about the speeds you should be seeing. Ask if they have benchmarks from when the servers were set up.
Then you can use Crystal Disk Mark or diskpd to validate those speeds. If they don't line up, again, it's most likely the networking.
You should also search your error log for messages that contain "FlushCache" and "saturation", because those can also be signs of network contention.
One thing you can do to avoid those things as a DBA is make sure that your maintenance and any other data-heavy tasks (like ETL) aren't going on at the same time. That can definitely put a lot of pressure on storage networking.
You may also want to check the answers here for more suggestions: Slow checkpoint and 15 second I/O warnings on flash storage
I blogged about a similar topic here: From The Server To The SAN
add a comment
|
This is far less often a disk issue, and far more often a networking issue. You know, the N in SAN?
If you go to your SAN team and start talking about the disks being slow, they're gonna show you a fancy graph with 0 millisecond latency on it and then point a stapler at you.
Instead, ask them about the network path to the SAN. Get speeds, if it's multipathed, etc. Get numbers from them about the speeds you should be seeing. Ask if they have benchmarks from when the servers were set up.
Then you can use Crystal Disk Mark or diskpd to validate those speeds. If they don't line up, again, it's most likely the networking.
You should also search your error log for messages that contain "FlushCache" and "saturation", because those can also be signs of network contention.
One thing you can do to avoid those things as a DBA is make sure that your maintenance and any other data-heavy tasks (like ETL) aren't going on at the same time. That can definitely put a lot of pressure on storage networking.
You may also want to check the answers here for more suggestions: Slow checkpoint and 15 second I/O warnings on flash storage
I blogged about a similar topic here: From The Server To The SAN
This is far less often a disk issue, and far more often a networking issue. You know, the N in SAN?
If you go to your SAN team and start talking about the disks being slow, they're gonna show you a fancy graph with 0 millisecond latency on it and then point a stapler at you.
Instead, ask them about the network path to the SAN. Get speeds, if it's multipathed, etc. Get numbers from them about the speeds you should be seeing. Ask if they have benchmarks from when the servers were set up.
Then you can use Crystal Disk Mark or diskpd to validate those speeds. If they don't line up, again, it's most likely the networking.
You should also search your error log for messages that contain "FlushCache" and "saturation", because those can also be signs of network contention.
One thing you can do to avoid those things as a DBA is make sure that your maintenance and any other data-heavy tasks (like ETL) aren't going on at the same time. That can definitely put a lot of pressure on storage networking.
You may also want to check the answers here for more suggestions: Slow checkpoint and 15 second I/O warnings on flash storage
I blogged about a similar topic here: From The Server To The SAN
edited Jun 13 at 22:14
answered Jun 13 at 12:16
Erik DarlingErik Darling
28.1k13 gold badges86 silver badges144 bronze badges
28.1k13 gold badges86 silver badges144 bronze badges
add a comment
|
add a comment
|
Why storing the data on a SAN? What's the point? All database performance is tied to Disk I/O and you are using 3 servers with only one device for the I/O behind them. That makes no sense... and unfortunately so common.
I spend my life encountering poorly designed hardware platforms where people just try to design a large scale computer. All CPU power here, all disks there... hopefully there is not such a thing as remote RAM. And the saddest is they compensate the lack of efficiency of this design with huge servers that cost ten time more than they should. I saw $400k infra slower than a $1k laptop.
A SQL server software is a very advanced piece of software, it is designed to take advantage of any bits of hardware, CPU cores, CPU cache, TLB, RAM, disk controllers, hard drive cache... They almost include all filesystem logic. They are developed on regular computer and benchmarked on high end systems. Therfore a SQL server must have its own disks. Installing them on a SAN is like "emulating" a computer, you lose all performance optimisations. SANs are for storing backups, immutable files, and files you just append data to (logs).
Datacenter administrators tend to put all they can on SANs because this way they have only one pool of storage to manage, it's more easy than caring for storage on each server. It's a "I don't want to do my job" choice, and a very bad one, because then they have to deal with performance problems and all the company suffer from this. Just install software on the hardware it is designed for. Keep it simple. Care for I/O bandwidth, cache and context switch overhead, ressource jitter (happens when ressource is shared). You'll end up maintaining 1/10th of the devices for the same raw output power, save your ops team lot of headaches, gain performance that makes your end users happy and more productive, make your company a better place to work in, and save lot of energy (the planet will thank you).
You said in comments, you are considering to put SSD in your server. You will not recognize your setup with dedicated SSDs, compared to a SAN you'll get something like 500x improvement even with data and transaction log files on same drive. A state of the art SQL Server would have fast separate SSD for data and transaction log on different hardware controllers channels (most server motherboard have several). But compared to your current setup we are talking of sci-fi there. Just give SSD a try.
1
It makes me think again about idea of buying dedicated SSD drives for each replica (for data files, maybe also for log files), instead of all 3 using same SAN. I am gradually double checking all items other guys posted above, as well of course
– voodoo_sh
Jun 17 at 16:13
add a comment
|
Why storing the data on a SAN? What's the point? All database performance is tied to Disk I/O and you are using 3 servers with only one device for the I/O behind them. That makes no sense... and unfortunately so common.
I spend my life encountering poorly designed hardware platforms where people just try to design a large scale computer. All CPU power here, all disks there... hopefully there is not such a thing as remote RAM. And the saddest is they compensate the lack of efficiency of this design with huge servers that cost ten time more than they should. I saw $400k infra slower than a $1k laptop.
A SQL server software is a very advanced piece of software, it is designed to take advantage of any bits of hardware, CPU cores, CPU cache, TLB, RAM, disk controllers, hard drive cache... They almost include all filesystem logic. They are developed on regular computer and benchmarked on high end systems. Therfore a SQL server must have its own disks. Installing them on a SAN is like "emulating" a computer, you lose all performance optimisations. SANs are for storing backups, immutable files, and files you just append data to (logs).
Datacenter administrators tend to put all they can on SANs because this way they have only one pool of storage to manage, it's more easy than caring for storage on each server. It's a "I don't want to do my job" choice, and a very bad one, because then they have to deal with performance problems and all the company suffer from this. Just install software on the hardware it is designed for. Keep it simple. Care for I/O bandwidth, cache and context switch overhead, ressource jitter (happens when ressource is shared). You'll end up maintaining 1/10th of the devices for the same raw output power, save your ops team lot of headaches, gain performance that makes your end users happy and more productive, make your company a better place to work in, and save lot of energy (the planet will thank you).
You said in comments, you are considering to put SSD in your server. You will not recognize your setup with dedicated SSDs, compared to a SAN you'll get something like 500x improvement even with data and transaction log files on same drive. A state of the art SQL Server would have fast separate SSD for data and transaction log on different hardware controllers channels (most server motherboard have several). But compared to your current setup we are talking of sci-fi there. Just give SSD a try.
1
It makes me think again about idea of buying dedicated SSD drives for each replica (for data files, maybe also for log files), instead of all 3 using same SAN. I am gradually double checking all items other guys posted above, as well of course
– voodoo_sh
Jun 17 at 16:13
add a comment
|
Why storing the data on a SAN? What's the point? All database performance is tied to Disk I/O and you are using 3 servers with only one device for the I/O behind them. That makes no sense... and unfortunately so common.
I spend my life encountering poorly designed hardware platforms where people just try to design a large scale computer. All CPU power here, all disks there... hopefully there is not such a thing as remote RAM. And the saddest is they compensate the lack of efficiency of this design with huge servers that cost ten time more than they should. I saw $400k infra slower than a $1k laptop.
A SQL server software is a very advanced piece of software, it is designed to take advantage of any bits of hardware, CPU cores, CPU cache, TLB, RAM, disk controllers, hard drive cache... They almost include all filesystem logic. They are developed on regular computer and benchmarked on high end systems. Therfore a SQL server must have its own disks. Installing them on a SAN is like "emulating" a computer, you lose all performance optimisations. SANs are for storing backups, immutable files, and files you just append data to (logs).
Datacenter administrators tend to put all they can on SANs because this way they have only one pool of storage to manage, it's more easy than caring for storage on each server. It's a "I don't want to do my job" choice, and a very bad one, because then they have to deal with performance problems and all the company suffer from this. Just install software on the hardware it is designed for. Keep it simple. Care for I/O bandwidth, cache and context switch overhead, ressource jitter (happens when ressource is shared). You'll end up maintaining 1/10th of the devices for the same raw output power, save your ops team lot of headaches, gain performance that makes your end users happy and more productive, make your company a better place to work in, and save lot of energy (the planet will thank you).
You said in comments, you are considering to put SSD in your server. You will not recognize your setup with dedicated SSDs, compared to a SAN you'll get something like 500x improvement even with data and transaction log files on same drive. A state of the art SQL Server would have fast separate SSD for data and transaction log on different hardware controllers channels (most server motherboard have several). But compared to your current setup we are talking of sci-fi there. Just give SSD a try.
Why storing the data on a SAN? What's the point? All database performance is tied to Disk I/O and you are using 3 servers with only one device for the I/O behind them. That makes no sense... and unfortunately so common.
I spend my life encountering poorly designed hardware platforms where people just try to design a large scale computer. All CPU power here, all disks there... hopefully there is not such a thing as remote RAM. And the saddest is they compensate the lack of efficiency of this design with huge servers that cost ten time more than they should. I saw $400k infra slower than a $1k laptop.
A SQL server software is a very advanced piece of software, it is designed to take advantage of any bits of hardware, CPU cores, CPU cache, TLB, RAM, disk controllers, hard drive cache... They almost include all filesystem logic. They are developed on regular computer and benchmarked on high end systems. Therfore a SQL server must have its own disks. Installing them on a SAN is like "emulating" a computer, you lose all performance optimisations. SANs are for storing backups, immutable files, and files you just append data to (logs).
Datacenter administrators tend to put all they can on SANs because this way they have only one pool of storage to manage, it's more easy than caring for storage on each server. It's a "I don't want to do my job" choice, and a very bad one, because then they have to deal with performance problems and all the company suffer from this. Just install software on the hardware it is designed for. Keep it simple. Care for I/O bandwidth, cache and context switch overhead, ressource jitter (happens when ressource is shared). You'll end up maintaining 1/10th of the devices for the same raw output power, save your ops team lot of headaches, gain performance that makes your end users happy and more productive, make your company a better place to work in, and save lot of energy (the planet will thank you).
You said in comments, you are considering to put SSD in your server. You will not recognize your setup with dedicated SSDs, compared to a SAN you'll get something like 500x improvement even with data and transaction log files on same drive. A state of the art SQL Server would have fast separate SSD for data and transaction log on different hardware controllers channels (most server motherboard have several). But compared to your current setup we are talking of sci-fi there. Just give SSD a try.
edited Jun 18 at 7:41
answered Jun 14 at 16:43
bokanbokan
1772 bronze badges
1772 bronze badges
1
It makes me think again about idea of buying dedicated SSD drives for each replica (for data files, maybe also for log files), instead of all 3 using same SAN. I am gradually double checking all items other guys posted above, as well of course
– voodoo_sh
Jun 17 at 16:13
add a comment
|
1
It makes me think again about idea of buying dedicated SSD drives for each replica (for data files, maybe also for log files), instead of all 3 using same SAN. I am gradually double checking all items other guys posted above, as well of course
– voodoo_sh
Jun 17 at 16:13
1
1
It makes me think again about idea of buying dedicated SSD drives for each replica (for data files, maybe also for log files), instead of all 3 using same SAN. I am gradually double checking all items other guys posted above, as well of course
– voodoo_sh
Jun 17 at 16:13
It makes me think again about idea of buying dedicated SSD drives for each replica (for data files, maybe also for log files), instead of all 3 using same SAN. I am gradually double checking all items other guys posted above, as well of course
– voodoo_sh
Jun 17 at 16:13
add a comment
|
Thanks for contributing an answer to Database Administrators Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f240469%2fsql-server-has-encountered-occurences-of-i-o-requests-taking-longer-than-15-seco%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
5
Related: Slow checkpoint and 15 second I/O warnings on flash storage
– Sean Gallardy
Jun 13 at 13:24
Is SQL Server running in a VM on those physical machines? If so, you need to ensure the hypervisor is setup correctly, and each VM is configured properly. For VMware, check vmware.com/content/dam/digitalmarketing/vmware/en/pdf/solutions/…
– Max Vernon
Jun 14 at 19:14