SQL Server has encountered occurences of I/O requests taking longer than 15 secondsSlow checkpoint and 15 second I/O warnings on flash storageHelp on building a new Serverinnodb_flush_method=O_DIRECT vs O_DSYNC performance impact on ext3 with LVM disk partitionLog all than queries taking more than X secondsWhat frequency of hash/sort spills into tempdb is concerning?Lower performance on newest serversSQL Server Data File/Log File poor performanceSQL Server query on server taking longer than laptopIF EXISTS taking longer than embedded select statementBest Way To Migrate Existing Production Database To New Physical Storage StructureSlow checkpoint and 15 second I/O warnings on flash storage

Ubuntu 19.10: why is the "locate" command missing?

Why is Macron seemingly opposing an article 50 extension?

If I attempt to use the Teleport spell to teleport into an area warded by the Forbiddance spell, is my spell slot expended?

Etymology of Persian سبز (sabz), meaning "green"

Curious about definition of well-ordered set

what would allow for the use of cannons but not handheld guns

How can women avoid talking during dates while keeping men at ease?

Can I reuse old electrical wire?

Why does it seem everything I push moves at a constant velocity?

Regular expression grep -r 'emm*[a-f].[^ta]$'

Encrypt folder on Linux that can only be decrypted on that specific OS and device

Is it a Hamiltonian cycle on a grid?

Word or phrase for turning the tide against a rival in a competition in the last moments

Logic - How to say "Not only but also".

Why are there two King of Diamonds entries for the Curse of Strahd tarokka card readings?

What is the history of the Eldritch Knight as a class/character option?

Can I include an image covered by CC Attribution-ShareAlike 4.0 in a deck for a paid online course?

Circuit breaker door shocked me. Static?

Simple n-body class in C++

Do you make me up?

What are the correct pronouns for referring to someone whom I have never met in person when the gender is apparent?

How to call my own phone with a loud ringing tone (in order to find it in my house) even if it is in silent mode?

Ultra-Relativistic and Non-Relativistic cases for energy of a particle

Authentication versus Authorisation



SQL Server has encountered occurences of I/O requests taking longer than 15 seconds


Slow checkpoint and 15 second I/O warnings on flash storageHelp on building a new Serverinnodb_flush_method=O_DIRECT vs O_DSYNC performance impact on ext3 with LVM disk partitionLog all than queries taking more than X secondsWhat frequency of hash/sort spills into tempdb is concerning?Lower performance on newest serversSQL Server Data File/Log File poor performanceSQL Server query on server taking longer than laptopIF EXISTS taking longer than embedded select statementBest Way To Migrate Existing Production Database To New Physical Storage StructureSlow checkpoint and 15 second I/O warnings on flash storage






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty
margin-bottom:0;









13

















On Production SQL Server, we have following config:



3 Dell PowerEdge R630 servers, combined into Availability Group
All 3 are connected to single Dell SAN storage unit which is a RAID array



From time to time, on PRIMARY we are seeing messages similar to below:



SQL Server has encountered 11 occurrence(s) of I/O requests taking longer 
than 15 seconds to complete on file [F:DataMyDatabase.mdf] in database id 8.
The OS file handle is 0x0000000000001FBC.
The offset of the latest long I/O is: 0x000004295d0000.
The duration of the long I/O is: 37397 ms.


We are novice in performance troubleshooting



What are the most common ways or best practices in troubleshooting this particular issue related to storage ?
What performance counters, tools, monitors, apps, etc. must be used to narrow down to the root cause of such messages ?
Might be there is a Extended Events that can help, or some kind of audit / logging ?










share|improve this question





















  • 5





    Related: Slow checkpoint and 15 second I/O warnings on flash storage

    – Sean Gallardy
    Jun 13 at 13:24












  • Is SQL Server running in a VM on those physical machines? If so, you need to ensure the hypervisor is setup correctly, and each VM is configured properly. For VMware, check vmware.com/content/dam/digitalmarketing/vmware/en/pdf/solutions/…

    – Max Vernon
    Jun 14 at 19:14

















13

















On Production SQL Server, we have following config:



3 Dell PowerEdge R630 servers, combined into Availability Group
All 3 are connected to single Dell SAN storage unit which is a RAID array



From time to time, on PRIMARY we are seeing messages similar to below:



SQL Server has encountered 11 occurrence(s) of I/O requests taking longer 
than 15 seconds to complete on file [F:DataMyDatabase.mdf] in database id 8.
The OS file handle is 0x0000000000001FBC.
The offset of the latest long I/O is: 0x000004295d0000.
The duration of the long I/O is: 37397 ms.


We are novice in performance troubleshooting



What are the most common ways or best practices in troubleshooting this particular issue related to storage ?
What performance counters, tools, monitors, apps, etc. must be used to narrow down to the root cause of such messages ?
Might be there is a Extended Events that can help, or some kind of audit / logging ?










share|improve this question





















  • 5





    Related: Slow checkpoint and 15 second I/O warnings on flash storage

    – Sean Gallardy
    Jun 13 at 13:24












  • Is SQL Server running in a VM on those physical machines? If so, you need to ensure the hypervisor is setup correctly, and each VM is configured properly. For VMware, check vmware.com/content/dam/digitalmarketing/vmware/en/pdf/solutions/…

    – Max Vernon
    Jun 14 at 19:14













13












13








13


3






On Production SQL Server, we have following config:



3 Dell PowerEdge R630 servers, combined into Availability Group
All 3 are connected to single Dell SAN storage unit which is a RAID array



From time to time, on PRIMARY we are seeing messages similar to below:



SQL Server has encountered 11 occurrence(s) of I/O requests taking longer 
than 15 seconds to complete on file [F:DataMyDatabase.mdf] in database id 8.
The OS file handle is 0x0000000000001FBC.
The offset of the latest long I/O is: 0x000004295d0000.
The duration of the long I/O is: 37397 ms.


We are novice in performance troubleshooting



What are the most common ways or best practices in troubleshooting this particular issue related to storage ?
What performance counters, tools, monitors, apps, etc. must be used to narrow down to the root cause of such messages ?
Might be there is a Extended Events that can help, or some kind of audit / logging ?










share|improve this question















On Production SQL Server, we have following config:



3 Dell PowerEdge R630 servers, combined into Availability Group
All 3 are connected to single Dell SAN storage unit which is a RAID array



From time to time, on PRIMARY we are seeing messages similar to below:



SQL Server has encountered 11 occurrence(s) of I/O requests taking longer 
than 15 seconds to complete on file [F:DataMyDatabase.mdf] in database id 8.
The OS file handle is 0x0000000000001FBC.
The offset of the latest long I/O is: 0x000004295d0000.
The duration of the long I/O is: 37397 ms.


We are novice in performance troubleshooting



What are the most common ways or best practices in troubleshooting this particular issue related to storage ?
What performance counters, tools, monitors, apps, etc. must be used to narrow down to the root cause of such messages ?
Might be there is a Extended Events that can help, or some kind of audit / logging ?







performance performance-tuning sql-server-2017 storage






share|improve this question














share|improve this question











share|improve this question




share|improve this question










asked Jun 13 at 9:47









voodoo_shvoodoo_sh

4113 silver badges19 bronze badges




4113 silver badges19 bronze badges










  • 5





    Related: Slow checkpoint and 15 second I/O warnings on flash storage

    – Sean Gallardy
    Jun 13 at 13:24












  • Is SQL Server running in a VM on those physical machines? If so, you need to ensure the hypervisor is setup correctly, and each VM is configured properly. For VMware, check vmware.com/content/dam/digitalmarketing/vmware/en/pdf/solutions/…

    – Max Vernon
    Jun 14 at 19:14












  • 5





    Related: Slow checkpoint and 15 second I/O warnings on flash storage

    – Sean Gallardy
    Jun 13 at 13:24












  • Is SQL Server running in a VM on those physical machines? If so, you need to ensure the hypervisor is setup correctly, and each VM is configured properly. For VMware, check vmware.com/content/dam/digitalmarketing/vmware/en/pdf/solutions/…

    – Max Vernon
    Jun 14 at 19:14







5




5





Related: Slow checkpoint and 15 second I/O warnings on flash storage

– Sean Gallardy
Jun 13 at 13:24






Related: Slow checkpoint and 15 second I/O warnings on flash storage

– Sean Gallardy
Jun 13 at 13:24














Is SQL Server running in a VM on those physical machines? If so, you need to ensure the hypervisor is setup correctly, and each VM is configured properly. For VMware, check vmware.com/content/dam/digitalmarketing/vmware/en/pdf/solutions/…

– Max Vernon
Jun 14 at 19:14





Is SQL Server running in a VM on those physical machines? If so, you need to ensure the hypervisor is setup correctly, and each VM is configured properly. For VMware, check vmware.com/content/dam/digitalmarketing/vmware/en/pdf/solutions/…

– Max Vernon
Jun 14 at 19:14










3 Answers
3






active

oldest

votes


















12


















We have a similar setup and recently encountered these messages in the logs. We are using a DELL Compellent SAN. Here are some things to check when receiving these messages that helped us find a solution



  • Review your windows performance counters for your disks that the warning messages are pointing to, specifically:

    • Disk avg. read time

    • Disk avg. write time

    • Disk read bytes/sec

    • Disk write bytes/sec

    • Disk Transfers/sec

    • Avg. disk queue length


  • The above are averages. If you have many database files on one drive these averages can skew the result and mask a bottle neck on specific database files. Check out this query from Paul S. Randal which returns average latency for each file from the dmv sys.dm_io_virtual_file_stats. In our case the average latency reported was acceptable, but underneath the covers we had many files with > 200 ms average latency.

  • Check the timings. Is there any pattern? Does it happen more frequently at certain a time in the night? If so check if any maintenance jobs are running at that time or any scheduled activity which may increase disk activity and expose a bottle neck in your IO subsystem.

  • Check the windows event viewer for errors. If your switch or SAN is being overloaded or not setup properly for your application you may find some messages in this log, and it is good to take this information to your SAN admin. In our case we were receiving iSCSI connection errors often throughout the day, hinting at the problem.

  • Review your SQL Server code. When you receive these messages you shouldn't immediately think it is an IO subsystem issue and pass it to your SAN admin. You need to do your part and review the database. Do you have really bad queries being run often churning through tons of data? Bad indexing? Excessive transaction log writes? You can use some open source queries to get a health check on your database, an example for checking how your query plan looks is sp_blitzCache

  • Don't ignore these. Today you may be receiving them a few times a day... then several months later when your workload increases and you forgot to monitor them they start to increase. Receiving lots of these messages can prevent SQL Server from accessing a certain file, and if it is tempdb, that is not good. In our case it got so bad that SQL Server shut itself down.

Our solution was upgrading our switch to a SAN switch. Yes, these are all points to cover within SQL Server. What led us to finding out it was the switch was that we were receiving about 1500 iSCSI pdu disconnect errors in the Windows application event viewer on the SQL Server every day. That prompted the investigation by our SAN admins into the switch.



Immediately after upgrading, the iSCSI errors were gone and average latency came down to around 50 ms for all the files, and that correlated to better performance in the application. With these points in mind hopefully you can find your solution.






share|improve this answer























  • 1





    So system events, not in SQL Server, led you to the resolution, correct? Can you offer any other encompassing troubleshooting help do narrow down if the issue is something internal to SQL Server, at the OS level, Filesystem level, or storage area networking level?

    – Sean Gallardy
    Jun 14 at 14:02











  • That is correct Sean. I may be able to add some more information as you suggest, I'll update my answer once I put that together.

    – kevinnwhat
    Jun 14 at 14:21


















21


















This is far less often a disk issue, and far more often a networking issue. You know, the N in SAN?



If you go to your SAN team and start talking about the disks being slow, they're gonna show you a fancy graph with 0 millisecond latency on it and then point a stapler at you.



Instead, ask them about the network path to the SAN. Get speeds, if it's multipathed, etc. Get numbers from them about the speeds you should be seeing. Ask if they have benchmarks from when the servers were set up.



Then you can use Crystal Disk Mark or diskpd to validate those speeds. If they don't line up, again, it's most likely the networking.



You should also search your error log for messages that contain "FlushCache" and "saturation", because those can also be signs of network contention.



One thing you can do to avoid those things as a DBA is make sure that your maintenance and any other data-heavy tasks (like ETL) aren't going on at the same time. That can definitely put a lot of pressure on storage networking.



You may also want to check the answers here for more suggestions: Slow checkpoint and 15 second I/O warnings on flash storage



I blogged about a similar topic here: From The Server To The SAN






share|improve this answer



































    6


















    Why storing the data on a SAN? What's the point? All database performance is tied to Disk I/O and you are using 3 servers with only one device for the I/O behind them. That makes no sense... and unfortunately so common.



    I spend my life encountering poorly designed hardware platforms where people just try to design a large scale computer. All CPU power here, all disks there... hopefully there is not such a thing as remote RAM. And the saddest is they compensate the lack of efficiency of this design with huge servers that cost ten time more than they should. I saw $400k infra slower than a $1k laptop.



    A SQL server software is a very advanced piece of software, it is designed to take advantage of any bits of hardware, CPU cores, CPU cache, TLB, RAM, disk controllers, hard drive cache... They almost include all filesystem logic. They are developed on regular computer and benchmarked on high end systems. Therfore a SQL server must have its own disks. Installing them on a SAN is like "emulating" a computer, you lose all performance optimisations. SANs are for storing backups, immutable files, and files you just append data to (logs).



    Datacenter administrators tend to put all they can on SANs because this way they have only one pool of storage to manage, it's more easy than caring for storage on each server. It's a "I don't want to do my job" choice, and a very bad one, because then they have to deal with performance problems and all the company suffer from this. Just install software on the hardware it is designed for. Keep it simple. Care for I/O bandwidth, cache and context switch overhead, ressource jitter (happens when ressource is shared). You'll end up maintaining 1/10th of the devices for the same raw output power, save your ops team lot of headaches, gain performance that makes your end users happy and more productive, make your company a better place to work in, and save lot of energy (the planet will thank you).



    You said in comments, you are considering to put SSD in your server. You will not recognize your setup with dedicated SSDs, compared to a SAN you'll get something like 500x improvement even with data and transaction log files on same drive. A state of the art SQL Server would have fast separate SSD for data and transaction log on different hardware controllers channels (most server motherboard have several). But compared to your current setup we are talking of sci-fi there. Just give SSD a try.






    share|improve this answer























    • 1





      It makes me think again about idea of buying dedicated SSD drives for each replica (for data files, maybe also for log files), instead of all 3 using same SAN. I am gradually double checking all items other guys posted above, as well of course

      – voodoo_sh
      Jun 17 at 16:13













    Your Answer








    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "182"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/4.0/"u003ecc by-sa 4.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );














    draft saved

    draft discarded
















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f240469%2fsql-server-has-encountered-occurences-of-i-o-requests-taking-longer-than-15-seco%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown


























    3 Answers
    3






    active

    oldest

    votes








    3 Answers
    3






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    12


















    We have a similar setup and recently encountered these messages in the logs. We are using a DELL Compellent SAN. Here are some things to check when receiving these messages that helped us find a solution



    • Review your windows performance counters for your disks that the warning messages are pointing to, specifically:

      • Disk avg. read time

      • Disk avg. write time

      • Disk read bytes/sec

      • Disk write bytes/sec

      • Disk Transfers/sec

      • Avg. disk queue length


    • The above are averages. If you have many database files on one drive these averages can skew the result and mask a bottle neck on specific database files. Check out this query from Paul S. Randal which returns average latency for each file from the dmv sys.dm_io_virtual_file_stats. In our case the average latency reported was acceptable, but underneath the covers we had many files with > 200 ms average latency.

    • Check the timings. Is there any pattern? Does it happen more frequently at certain a time in the night? If so check if any maintenance jobs are running at that time or any scheduled activity which may increase disk activity and expose a bottle neck in your IO subsystem.

    • Check the windows event viewer for errors. If your switch or SAN is being overloaded or not setup properly for your application you may find some messages in this log, and it is good to take this information to your SAN admin. In our case we were receiving iSCSI connection errors often throughout the day, hinting at the problem.

    • Review your SQL Server code. When you receive these messages you shouldn't immediately think it is an IO subsystem issue and pass it to your SAN admin. You need to do your part and review the database. Do you have really bad queries being run often churning through tons of data? Bad indexing? Excessive transaction log writes? You can use some open source queries to get a health check on your database, an example for checking how your query plan looks is sp_blitzCache

    • Don't ignore these. Today you may be receiving them a few times a day... then several months later when your workload increases and you forgot to monitor them they start to increase. Receiving lots of these messages can prevent SQL Server from accessing a certain file, and if it is tempdb, that is not good. In our case it got so bad that SQL Server shut itself down.

    Our solution was upgrading our switch to a SAN switch. Yes, these are all points to cover within SQL Server. What led us to finding out it was the switch was that we were receiving about 1500 iSCSI pdu disconnect errors in the Windows application event viewer on the SQL Server every day. That prompted the investigation by our SAN admins into the switch.



    Immediately after upgrading, the iSCSI errors were gone and average latency came down to around 50 ms for all the files, and that correlated to better performance in the application. With these points in mind hopefully you can find your solution.






    share|improve this answer























    • 1





      So system events, not in SQL Server, led you to the resolution, correct? Can you offer any other encompassing troubleshooting help do narrow down if the issue is something internal to SQL Server, at the OS level, Filesystem level, or storage area networking level?

      – Sean Gallardy
      Jun 14 at 14:02











    • That is correct Sean. I may be able to add some more information as you suggest, I'll update my answer once I put that together.

      – kevinnwhat
      Jun 14 at 14:21















    12


















    We have a similar setup and recently encountered these messages in the logs. We are using a DELL Compellent SAN. Here are some things to check when receiving these messages that helped us find a solution



    • Review your windows performance counters for your disks that the warning messages are pointing to, specifically:

      • Disk avg. read time

      • Disk avg. write time

      • Disk read bytes/sec

      • Disk write bytes/sec

      • Disk Transfers/sec

      • Avg. disk queue length


    • The above are averages. If you have many database files on one drive these averages can skew the result and mask a bottle neck on specific database files. Check out this query from Paul S. Randal which returns average latency for each file from the dmv sys.dm_io_virtual_file_stats. In our case the average latency reported was acceptable, but underneath the covers we had many files with > 200 ms average latency.

    • Check the timings. Is there any pattern? Does it happen more frequently at certain a time in the night? If so check if any maintenance jobs are running at that time or any scheduled activity which may increase disk activity and expose a bottle neck in your IO subsystem.

    • Check the windows event viewer for errors. If your switch or SAN is being overloaded or not setup properly for your application you may find some messages in this log, and it is good to take this information to your SAN admin. In our case we were receiving iSCSI connection errors often throughout the day, hinting at the problem.

    • Review your SQL Server code. When you receive these messages you shouldn't immediately think it is an IO subsystem issue and pass it to your SAN admin. You need to do your part and review the database. Do you have really bad queries being run often churning through tons of data? Bad indexing? Excessive transaction log writes? You can use some open source queries to get a health check on your database, an example for checking how your query plan looks is sp_blitzCache

    • Don't ignore these. Today you may be receiving them a few times a day... then several months later when your workload increases and you forgot to monitor them they start to increase. Receiving lots of these messages can prevent SQL Server from accessing a certain file, and if it is tempdb, that is not good. In our case it got so bad that SQL Server shut itself down.

    Our solution was upgrading our switch to a SAN switch. Yes, these are all points to cover within SQL Server. What led us to finding out it was the switch was that we were receiving about 1500 iSCSI pdu disconnect errors in the Windows application event viewer on the SQL Server every day. That prompted the investigation by our SAN admins into the switch.



    Immediately after upgrading, the iSCSI errors were gone and average latency came down to around 50 ms for all the files, and that correlated to better performance in the application. With these points in mind hopefully you can find your solution.






    share|improve this answer























    • 1





      So system events, not in SQL Server, led you to the resolution, correct? Can you offer any other encompassing troubleshooting help do narrow down if the issue is something internal to SQL Server, at the OS level, Filesystem level, or storage area networking level?

      – Sean Gallardy
      Jun 14 at 14:02











    • That is correct Sean. I may be able to add some more information as you suggest, I'll update my answer once I put that together.

      – kevinnwhat
      Jun 14 at 14:21













    12














    12










    12









    We have a similar setup and recently encountered these messages in the logs. We are using a DELL Compellent SAN. Here are some things to check when receiving these messages that helped us find a solution



    • Review your windows performance counters for your disks that the warning messages are pointing to, specifically:

      • Disk avg. read time

      • Disk avg. write time

      • Disk read bytes/sec

      • Disk write bytes/sec

      • Disk Transfers/sec

      • Avg. disk queue length


    • The above are averages. If you have many database files on one drive these averages can skew the result and mask a bottle neck on specific database files. Check out this query from Paul S. Randal which returns average latency for each file from the dmv sys.dm_io_virtual_file_stats. In our case the average latency reported was acceptable, but underneath the covers we had many files with > 200 ms average latency.

    • Check the timings. Is there any pattern? Does it happen more frequently at certain a time in the night? If so check if any maintenance jobs are running at that time or any scheduled activity which may increase disk activity and expose a bottle neck in your IO subsystem.

    • Check the windows event viewer for errors. If your switch or SAN is being overloaded or not setup properly for your application you may find some messages in this log, and it is good to take this information to your SAN admin. In our case we were receiving iSCSI connection errors often throughout the day, hinting at the problem.

    • Review your SQL Server code. When you receive these messages you shouldn't immediately think it is an IO subsystem issue and pass it to your SAN admin. You need to do your part and review the database. Do you have really bad queries being run often churning through tons of data? Bad indexing? Excessive transaction log writes? You can use some open source queries to get a health check on your database, an example for checking how your query plan looks is sp_blitzCache

    • Don't ignore these. Today you may be receiving them a few times a day... then several months later when your workload increases and you forgot to monitor them they start to increase. Receiving lots of these messages can prevent SQL Server from accessing a certain file, and if it is tempdb, that is not good. In our case it got so bad that SQL Server shut itself down.

    Our solution was upgrading our switch to a SAN switch. Yes, these are all points to cover within SQL Server. What led us to finding out it was the switch was that we were receiving about 1500 iSCSI pdu disconnect errors in the Windows application event viewer on the SQL Server every day. That prompted the investigation by our SAN admins into the switch.



    Immediately after upgrading, the iSCSI errors were gone and average latency came down to around 50 ms for all the files, and that correlated to better performance in the application. With these points in mind hopefully you can find your solution.






    share|improve this answer
















    We have a similar setup and recently encountered these messages in the logs. We are using a DELL Compellent SAN. Here are some things to check when receiving these messages that helped us find a solution



    • Review your windows performance counters for your disks that the warning messages are pointing to, specifically:

      • Disk avg. read time

      • Disk avg. write time

      • Disk read bytes/sec

      • Disk write bytes/sec

      • Disk Transfers/sec

      • Avg. disk queue length


    • The above are averages. If you have many database files on one drive these averages can skew the result and mask a bottle neck on specific database files. Check out this query from Paul S. Randal which returns average latency for each file from the dmv sys.dm_io_virtual_file_stats. In our case the average latency reported was acceptable, but underneath the covers we had many files with > 200 ms average latency.

    • Check the timings. Is there any pattern? Does it happen more frequently at certain a time in the night? If so check if any maintenance jobs are running at that time or any scheduled activity which may increase disk activity and expose a bottle neck in your IO subsystem.

    • Check the windows event viewer for errors. If your switch or SAN is being overloaded or not setup properly for your application you may find some messages in this log, and it is good to take this information to your SAN admin. In our case we were receiving iSCSI connection errors often throughout the day, hinting at the problem.

    • Review your SQL Server code. When you receive these messages you shouldn't immediately think it is an IO subsystem issue and pass it to your SAN admin. You need to do your part and review the database. Do you have really bad queries being run often churning through tons of data? Bad indexing? Excessive transaction log writes? You can use some open source queries to get a health check on your database, an example for checking how your query plan looks is sp_blitzCache

    • Don't ignore these. Today you may be receiving them a few times a day... then several months later when your workload increases and you forgot to monitor them they start to increase. Receiving lots of these messages can prevent SQL Server from accessing a certain file, and if it is tempdb, that is not good. In our case it got so bad that SQL Server shut itself down.

    Our solution was upgrading our switch to a SAN switch. Yes, these are all points to cover within SQL Server. What led us to finding out it was the switch was that we were receiving about 1500 iSCSI pdu disconnect errors in the Windows application event viewer on the SQL Server every day. That prompted the investigation by our SAN admins into the switch.



    Immediately after upgrading, the iSCSI errors were gone and average latency came down to around 50 ms for all the files, and that correlated to better performance in the application. With these points in mind hopefully you can find your solution.







    share|improve this answer















    share|improve this answer




    share|improve this answer








    edited Jun 17 at 17:36









    Paul White

    60.2k16 gold badges311 silver badges490 bronze badges




    60.2k16 gold badges311 silver badges490 bronze badges










    answered Jun 14 at 1:50









    kevinnwhatkevinnwhat

    1,2791 gold badge1 silver badge14 bronze badges




    1,2791 gold badge1 silver badge14 bronze badges










    • 1





      So system events, not in SQL Server, led you to the resolution, correct? Can you offer any other encompassing troubleshooting help do narrow down if the issue is something internal to SQL Server, at the OS level, Filesystem level, or storage area networking level?

      – Sean Gallardy
      Jun 14 at 14:02











    • That is correct Sean. I may be able to add some more information as you suggest, I'll update my answer once I put that together.

      – kevinnwhat
      Jun 14 at 14:21












    • 1





      So system events, not in SQL Server, led you to the resolution, correct? Can you offer any other encompassing troubleshooting help do narrow down if the issue is something internal to SQL Server, at the OS level, Filesystem level, or storage area networking level?

      – Sean Gallardy
      Jun 14 at 14:02











    • That is correct Sean. I may be able to add some more information as you suggest, I'll update my answer once I put that together.

      – kevinnwhat
      Jun 14 at 14:21







    1




    1





    So system events, not in SQL Server, led you to the resolution, correct? Can you offer any other encompassing troubleshooting help do narrow down if the issue is something internal to SQL Server, at the OS level, Filesystem level, or storage area networking level?

    – Sean Gallardy
    Jun 14 at 14:02





    So system events, not in SQL Server, led you to the resolution, correct? Can you offer any other encompassing troubleshooting help do narrow down if the issue is something internal to SQL Server, at the OS level, Filesystem level, or storage area networking level?

    – Sean Gallardy
    Jun 14 at 14:02













    That is correct Sean. I may be able to add some more information as you suggest, I'll update my answer once I put that together.

    – kevinnwhat
    Jun 14 at 14:21





    That is correct Sean. I may be able to add some more information as you suggest, I'll update my answer once I put that together.

    – kevinnwhat
    Jun 14 at 14:21













    21


















    This is far less often a disk issue, and far more often a networking issue. You know, the N in SAN?



    If you go to your SAN team and start talking about the disks being slow, they're gonna show you a fancy graph with 0 millisecond latency on it and then point a stapler at you.



    Instead, ask them about the network path to the SAN. Get speeds, if it's multipathed, etc. Get numbers from them about the speeds you should be seeing. Ask if they have benchmarks from when the servers were set up.



    Then you can use Crystal Disk Mark or diskpd to validate those speeds. If they don't line up, again, it's most likely the networking.



    You should also search your error log for messages that contain "FlushCache" and "saturation", because those can also be signs of network contention.



    One thing you can do to avoid those things as a DBA is make sure that your maintenance and any other data-heavy tasks (like ETL) aren't going on at the same time. That can definitely put a lot of pressure on storage networking.



    You may also want to check the answers here for more suggestions: Slow checkpoint and 15 second I/O warnings on flash storage



    I blogged about a similar topic here: From The Server To The SAN






    share|improve this answer
































      21


















      This is far less often a disk issue, and far more often a networking issue. You know, the N in SAN?



      If you go to your SAN team and start talking about the disks being slow, they're gonna show you a fancy graph with 0 millisecond latency on it and then point a stapler at you.



      Instead, ask them about the network path to the SAN. Get speeds, if it's multipathed, etc. Get numbers from them about the speeds you should be seeing. Ask if they have benchmarks from when the servers were set up.



      Then you can use Crystal Disk Mark or diskpd to validate those speeds. If they don't line up, again, it's most likely the networking.



      You should also search your error log for messages that contain "FlushCache" and "saturation", because those can also be signs of network contention.



      One thing you can do to avoid those things as a DBA is make sure that your maintenance and any other data-heavy tasks (like ETL) aren't going on at the same time. That can definitely put a lot of pressure on storage networking.



      You may also want to check the answers here for more suggestions: Slow checkpoint and 15 second I/O warnings on flash storage



      I blogged about a similar topic here: From The Server To The SAN






      share|improve this answer






























        21














        21










        21









        This is far less often a disk issue, and far more often a networking issue. You know, the N in SAN?



        If you go to your SAN team and start talking about the disks being slow, they're gonna show you a fancy graph with 0 millisecond latency on it and then point a stapler at you.



        Instead, ask them about the network path to the SAN. Get speeds, if it's multipathed, etc. Get numbers from them about the speeds you should be seeing. Ask if they have benchmarks from when the servers were set up.



        Then you can use Crystal Disk Mark or diskpd to validate those speeds. If they don't line up, again, it's most likely the networking.



        You should also search your error log for messages that contain "FlushCache" and "saturation", because those can also be signs of network contention.



        One thing you can do to avoid those things as a DBA is make sure that your maintenance and any other data-heavy tasks (like ETL) aren't going on at the same time. That can definitely put a lot of pressure on storage networking.



        You may also want to check the answers here for more suggestions: Slow checkpoint and 15 second I/O warnings on flash storage



        I blogged about a similar topic here: From The Server To The SAN






        share|improve this answer
















        This is far less often a disk issue, and far more often a networking issue. You know, the N in SAN?



        If you go to your SAN team and start talking about the disks being slow, they're gonna show you a fancy graph with 0 millisecond latency on it and then point a stapler at you.



        Instead, ask them about the network path to the SAN. Get speeds, if it's multipathed, etc. Get numbers from them about the speeds you should be seeing. Ask if they have benchmarks from when the servers were set up.



        Then you can use Crystal Disk Mark or diskpd to validate those speeds. If they don't line up, again, it's most likely the networking.



        You should also search your error log for messages that contain "FlushCache" and "saturation", because those can also be signs of network contention.



        One thing you can do to avoid those things as a DBA is make sure that your maintenance and any other data-heavy tasks (like ETL) aren't going on at the same time. That can definitely put a lot of pressure on storage networking.



        You may also want to check the answers here for more suggestions: Slow checkpoint and 15 second I/O warnings on flash storage



        I blogged about a similar topic here: From The Server To The SAN







        share|improve this answer















        share|improve this answer




        share|improve this answer








        edited Jun 13 at 22:14

























        answered Jun 13 at 12:16









        Erik DarlingErik Darling

        28.1k13 gold badges86 silver badges144 bronze badges




        28.1k13 gold badges86 silver badges144 bronze badges
























            6


















            Why storing the data on a SAN? What's the point? All database performance is tied to Disk I/O and you are using 3 servers with only one device for the I/O behind them. That makes no sense... and unfortunately so common.



            I spend my life encountering poorly designed hardware platforms where people just try to design a large scale computer. All CPU power here, all disks there... hopefully there is not such a thing as remote RAM. And the saddest is they compensate the lack of efficiency of this design with huge servers that cost ten time more than they should. I saw $400k infra slower than a $1k laptop.



            A SQL server software is a very advanced piece of software, it is designed to take advantage of any bits of hardware, CPU cores, CPU cache, TLB, RAM, disk controllers, hard drive cache... They almost include all filesystem logic. They are developed on regular computer and benchmarked on high end systems. Therfore a SQL server must have its own disks. Installing them on a SAN is like "emulating" a computer, you lose all performance optimisations. SANs are for storing backups, immutable files, and files you just append data to (logs).



            Datacenter administrators tend to put all they can on SANs because this way they have only one pool of storage to manage, it's more easy than caring for storage on each server. It's a "I don't want to do my job" choice, and a very bad one, because then they have to deal with performance problems and all the company suffer from this. Just install software on the hardware it is designed for. Keep it simple. Care for I/O bandwidth, cache and context switch overhead, ressource jitter (happens when ressource is shared). You'll end up maintaining 1/10th of the devices for the same raw output power, save your ops team lot of headaches, gain performance that makes your end users happy and more productive, make your company a better place to work in, and save lot of energy (the planet will thank you).



            You said in comments, you are considering to put SSD in your server. You will not recognize your setup with dedicated SSDs, compared to a SAN you'll get something like 500x improvement even with data and transaction log files on same drive. A state of the art SQL Server would have fast separate SSD for data and transaction log on different hardware controllers channels (most server motherboard have several). But compared to your current setup we are talking of sci-fi there. Just give SSD a try.






            share|improve this answer























            • 1





              It makes me think again about idea of buying dedicated SSD drives for each replica (for data files, maybe also for log files), instead of all 3 using same SAN. I am gradually double checking all items other guys posted above, as well of course

              – voodoo_sh
              Jun 17 at 16:13
















            6


















            Why storing the data on a SAN? What's the point? All database performance is tied to Disk I/O and you are using 3 servers with only one device for the I/O behind them. That makes no sense... and unfortunately so common.



            I spend my life encountering poorly designed hardware platforms where people just try to design a large scale computer. All CPU power here, all disks there... hopefully there is not such a thing as remote RAM. And the saddest is they compensate the lack of efficiency of this design with huge servers that cost ten time more than they should. I saw $400k infra slower than a $1k laptop.



            A SQL server software is a very advanced piece of software, it is designed to take advantage of any bits of hardware, CPU cores, CPU cache, TLB, RAM, disk controllers, hard drive cache... They almost include all filesystem logic. They are developed on regular computer and benchmarked on high end systems. Therfore a SQL server must have its own disks. Installing them on a SAN is like "emulating" a computer, you lose all performance optimisations. SANs are for storing backups, immutable files, and files you just append data to (logs).



            Datacenter administrators tend to put all they can on SANs because this way they have only one pool of storage to manage, it's more easy than caring for storage on each server. It's a "I don't want to do my job" choice, and a very bad one, because then they have to deal with performance problems and all the company suffer from this. Just install software on the hardware it is designed for. Keep it simple. Care for I/O bandwidth, cache and context switch overhead, ressource jitter (happens when ressource is shared). You'll end up maintaining 1/10th of the devices for the same raw output power, save your ops team lot of headaches, gain performance that makes your end users happy and more productive, make your company a better place to work in, and save lot of energy (the planet will thank you).



            You said in comments, you are considering to put SSD in your server. You will not recognize your setup with dedicated SSDs, compared to a SAN you'll get something like 500x improvement even with data and transaction log files on same drive. A state of the art SQL Server would have fast separate SSD for data and transaction log on different hardware controllers channels (most server motherboard have several). But compared to your current setup we are talking of sci-fi there. Just give SSD a try.






            share|improve this answer























            • 1





              It makes me think again about idea of buying dedicated SSD drives for each replica (for data files, maybe also for log files), instead of all 3 using same SAN. I am gradually double checking all items other guys posted above, as well of course

              – voodoo_sh
              Jun 17 at 16:13














            6














            6










            6









            Why storing the data on a SAN? What's the point? All database performance is tied to Disk I/O and you are using 3 servers with only one device for the I/O behind them. That makes no sense... and unfortunately so common.



            I spend my life encountering poorly designed hardware platforms where people just try to design a large scale computer. All CPU power here, all disks there... hopefully there is not such a thing as remote RAM. And the saddest is they compensate the lack of efficiency of this design with huge servers that cost ten time more than they should. I saw $400k infra slower than a $1k laptop.



            A SQL server software is a very advanced piece of software, it is designed to take advantage of any bits of hardware, CPU cores, CPU cache, TLB, RAM, disk controllers, hard drive cache... They almost include all filesystem logic. They are developed on regular computer and benchmarked on high end systems. Therfore a SQL server must have its own disks. Installing them on a SAN is like "emulating" a computer, you lose all performance optimisations. SANs are for storing backups, immutable files, and files you just append data to (logs).



            Datacenter administrators tend to put all they can on SANs because this way they have only one pool of storage to manage, it's more easy than caring for storage on each server. It's a "I don't want to do my job" choice, and a very bad one, because then they have to deal with performance problems and all the company suffer from this. Just install software on the hardware it is designed for. Keep it simple. Care for I/O bandwidth, cache and context switch overhead, ressource jitter (happens when ressource is shared). You'll end up maintaining 1/10th of the devices for the same raw output power, save your ops team lot of headaches, gain performance that makes your end users happy and more productive, make your company a better place to work in, and save lot of energy (the planet will thank you).



            You said in comments, you are considering to put SSD in your server. You will not recognize your setup with dedicated SSDs, compared to a SAN you'll get something like 500x improvement even with data and transaction log files on same drive. A state of the art SQL Server would have fast separate SSD for data and transaction log on different hardware controllers channels (most server motherboard have several). But compared to your current setup we are talking of sci-fi there. Just give SSD a try.






            share|improve this answer
















            Why storing the data on a SAN? What's the point? All database performance is tied to Disk I/O and you are using 3 servers with only one device for the I/O behind them. That makes no sense... and unfortunately so common.



            I spend my life encountering poorly designed hardware platforms where people just try to design a large scale computer. All CPU power here, all disks there... hopefully there is not such a thing as remote RAM. And the saddest is they compensate the lack of efficiency of this design with huge servers that cost ten time more than they should. I saw $400k infra slower than a $1k laptop.



            A SQL server software is a very advanced piece of software, it is designed to take advantage of any bits of hardware, CPU cores, CPU cache, TLB, RAM, disk controllers, hard drive cache... They almost include all filesystem logic. They are developed on regular computer and benchmarked on high end systems. Therfore a SQL server must have its own disks. Installing them on a SAN is like "emulating" a computer, you lose all performance optimisations. SANs are for storing backups, immutable files, and files you just append data to (logs).



            Datacenter administrators tend to put all they can on SANs because this way they have only one pool of storage to manage, it's more easy than caring for storage on each server. It's a "I don't want to do my job" choice, and a very bad one, because then they have to deal with performance problems and all the company suffer from this. Just install software on the hardware it is designed for. Keep it simple. Care for I/O bandwidth, cache and context switch overhead, ressource jitter (happens when ressource is shared). You'll end up maintaining 1/10th of the devices for the same raw output power, save your ops team lot of headaches, gain performance that makes your end users happy and more productive, make your company a better place to work in, and save lot of energy (the planet will thank you).



            You said in comments, you are considering to put SSD in your server. You will not recognize your setup with dedicated SSDs, compared to a SAN you'll get something like 500x improvement even with data and transaction log files on same drive. A state of the art SQL Server would have fast separate SSD for data and transaction log on different hardware controllers channels (most server motherboard have several). But compared to your current setup we are talking of sci-fi there. Just give SSD a try.







            share|improve this answer















            share|improve this answer




            share|improve this answer








            edited Jun 18 at 7:41

























            answered Jun 14 at 16:43









            bokanbokan

            1772 bronze badges




            1772 bronze badges










            • 1





              It makes me think again about idea of buying dedicated SSD drives for each replica (for data files, maybe also for log files), instead of all 3 using same SAN. I am gradually double checking all items other guys posted above, as well of course

              – voodoo_sh
              Jun 17 at 16:13













            • 1





              It makes me think again about idea of buying dedicated SSD drives for each replica (for data files, maybe also for log files), instead of all 3 using same SAN. I am gradually double checking all items other guys posted above, as well of course

              – voodoo_sh
              Jun 17 at 16:13








            1




            1





            It makes me think again about idea of buying dedicated SSD drives for each replica (for data files, maybe also for log files), instead of all 3 using same SAN. I am gradually double checking all items other guys posted above, as well of course

            – voodoo_sh
            Jun 17 at 16:13






            It makes me think again about idea of buying dedicated SSD drives for each replica (for data files, maybe also for log files), instead of all 3 using same SAN. I am gradually double checking all items other guys posted above, as well of course

            – voodoo_sh
            Jun 17 at 16:13



















            draft saved

            draft discarded















































            Thanks for contributing an answer to Database Administrators Stack Exchange!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f240469%2fsql-server-has-encountered-occurences-of-i-o-requests-taking-longer-than-15-seco%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown









            Popular posts from this blog

            Tamil (spriik) Luke uk diar | Nawigatjuun

            Align equal signs while including text over equalitiesAMS align: left aligned text/math plus multicolumn alignmentMultiple alignmentsAligning equations in multiple placesNumbering and aligning an equation with multiple columnsHow to align one equation with another multline equationUsing \ in environments inside the begintabularxNumber equations and preserving alignment of equal signsHow can I align equations to the left and to the right?Double equation alignment problem within align enviromentAligned within align: Why are they right-aligned?

            Training a classifier when some of the features are unknownWhy does Gradient Boosting regression predict negative values when there are no negative y-values in my training set?How to improve an existing (trained) classifier?What is effect when I set up some self defined predisctor variables?Why Matlab neural network classification returns decimal values on prediction dataset?Fitting and transforming text data in training, testing, and validation setsHow to quantify the performance of the classifier (multi-class SVM) using the test data?How do I control for some patients providing multiple samples in my training data?Training and Test setTraining a convolutional neural network for image denoising in MatlabShouldn't an autoencoder with #(neurons in hidden layer) = #(neurons in input layer) be “perfect”?