Is my Windows BSOD ready?

The blue screen of death in short BSOD is the most interesting system preparation section when someone is deploying a new system into the production environment. This is because it really amuses me on how much an engineer do really understand the reason of having a system to be BSOD ready. Most of the time, I will hear from support engineers screaming out that I did not get a MEMORY.DMP after a BSOD event and there is a 2 common reasons for the root cause.

1. You did not have sufficient page file size to allow the RAM to page out
2. You did not have sufficient free disk space to write the MEMORY.DMP

So basically if you drill into these 2 common reasons, one will start to understand that in an event of a BSOD the common component in relationship of getting a successfull MEMORY.DMP would be as the following below.

1. Advanced System Settings Configuration
1.1. PageFile.SYS Location
For any Windows version before WIndows Server 2008, I will recommend you to keep it in C:\ drive. The reason is that in a BSOD event, the system will need to access the system drive to page out the memory.
For Windows Server 2008 and later, we know that we can have PageFile.SYS in a different location other than C:\ drive. (KB969028)

1.2. PageFile Size
Microsoft recommends the page file size to be 1.5 times of RAM for system with 1,373MB of RAM. If your system have larger than 1,374MB of RAM , Microsoft recommends 2048MB plus 16MB for Kernel dump on 32-Bit system and total RAM size plus 128MB for Kernel dump on 64-Bit system. (KB307973)

2. Disk
2.1. MEMORY.DMP Location
By default, it will be %SystemRoot%\MEMORY.DMP but you can move it to a different location provided the system can access it during a BSOD event.

2.2. Disk Free Space Size
If you understand PageFile Size, you will understand how much free disk space size is required. I will recommend at least having 2048MB + 16MB of free space for 32-Bit Kernel dump and 2048MB + 128MB of free space for 64-Bit Kernel dump. In term of Complete Dump, I will recommend having total RAM size plus 1MB of free space for Complete dump.

2.3. Disk Driver
Usually, people will ask me why on this. I will explain this as you read further down.

In my reality, I encountered this particular Windows Server 2008 R2 server that has 60GB Hard Disk with 24GB of RAM. This start to complicate supportability when you are having a small C:\ drive size on a physical server with all the applications installed into C:\ drive by project engineer. The engineer decided to use Windows Server 2008 R2 feature to move the page file and memory dump location to another drive.

In order to see if this type of configuration actually work out, I will usually manually crash the server to verify that the configuration comply to the theory. I opened the Command Prompt and executed the NotMyFault /Crash command to simulate an actual crash using the MyFault.SYS driver. (You can download tool from here.)

So this is what happen when you have your page file and memory dump location setup on another location which is a LUN/SAN drive. Windows Server 2008 or later will crash and prepare to page out the memory to generate a dump. But because the PageFile.SYS and the MEMORY.DMP is configured on a LUN/SAN drive, the system lost the connection to the fabric for the LUN/SAN drive.

This causes the system to be unable to load the crash dump driver and therefore cannot page out to the LUN/SAN drive.

Conclusion
1. Never configure your PageFile.SYS to a LUN/SAN drive
2. Never configure your %SystemRoot%\Memory.DMP to the LUN/SAN drive
3. Always include a Crash Dump Test as part of the system preparation and QA process before signing off to production environment

Personally, I will recommend engineers to configure PageFile.SYS for a Complete Dump if you are building a physical server. This is to accommodate the likelihood that in an event of a BSOD, you will be able to get a Complete dump to debug both User Mode and Kernel Mode memory. As it is a painful process for a physical server, you cannot just change the hardware easily. As for virtual server, it will be fine to configure PageFile.SYS for a Kernel dump for a start because you simply add more virtual disk space and increase the PageFile.SYS size when you requires a Complete dump to debug the real unknown issue.