Zero in on the Problem!
Last Revised March 24, 2005 (Version 2.1)
Hold mouse here for list of most recent changes.
When a problem suddenly crops up in Windows, how do you start trying to understand it and fix it? The following ten keys should help you zero in on the problem in most cases. (Before trying them, you might want to use my Computer Health page to tune-up your system — often, these are the easiest fixes!)
Before we hit the “Top 10” list below, we should remember the real Rule No. 1 of troubleshooting: Most problems are user-originated.
If the network is up and the computer boots, 90% of all remaining computer errors start on the space bar side of the keyboard.
This isn’t just techy hubris. We computer support professionals count ourselves among the “users” mentioned in this rule. I am certain that I have been personally responsible for at least 90% of the problems that have cropped up on my own computers. Not the hardware, not the software, not Windows, and not Bill Gates’ insidious plan to take over the world — just li’l ol’ me. Realizing this up front (and dispensing with blame and excuses) gives that extra bit of mental clarity which will make it easier to find the real cause — and to learn something useful!
Now, on to the troubleshooting steps!
Some computer problems appear one time and then never appear again. I call these “transient quirks” or “hit-and-run gremlins.” Don’t worry about them. Stuff like this just happens when you’re using a computer. If a problem doesn’t occur twice, then it is no longer a problem.
A second cousin to these “transient quirks” are matters that never were problems from the beginning. These are situations where computer users are relying on some measurement or informational tool on the computer that seems to be saying that something is wrong, but there never really was a problem. One example common in Windows 95, 98, and ME is the observation that System Resources are, say, around 60%, and concern that this shows a problem — yet, on questioning, we usually find that, no, there hasn’t been a problem with the computer (which is no surprise). Or, in Windows 2000 and XP, a user may decide to dig through the Event Viewer, find all sorts of things listed in the logs, and worry about what the various events mean. Rather, I recommend only consulting the Event logs when there is an actual problem you’re trying to track.
“A problem isn’t a problem unless it’s a problem.” Don’t make a mountain out of a molehill. And don’t let the hit-and-run gremlins mow you down!
Define the presenting problem carefully. Describe actual behavior more than technical speculation. “While I was doing such-and-so, the following happened” usually is the best way to zero in. If you are helping someone else fix a computer problem, having them describe to you exact replication steps is the fastest way to catch user error! (And haven’t you noticed that as soon as you start explaining a question to somebody else, you suddenly understand the answer?)
Good problem definition includes specifying whether the problem can be reliably replicated on the original computer, and, if so, whether it can be replicated on other computers also. Walk through, step-by-step, and, if possible, discover every keystroke and mouse click that is necessary to cause the problem to recur.
In one sense, this is just a restatement of the two previous recommendations. Nonetheless, it is a step that is all too frequently skipped. When I am working with an end-user by telephone or email in particular, if the problem isn’t clear in my head from the original description, I often will ask them to —walk me through it, click-by-click and keystroke-by-keystroke.” Much of the time this will even let me discover a solution in a program I’ve never seen and know nothing about! Much of the rest of the time, it opens the door to further productive questions. Even if you end up having to pass the problem on to someone else, you have an exceptionally useful description.
If a problem suddenly appears (with Windows, the network connection, or an application), restart the computer. This one step takes care of so many things that it’s silly not to try it.
This reboot trick is so helpful, in fact, that a standard joke in the IT world is that a company’s IT helpdesk is the department with the job of advising end-users to restart their computers. (Sometimes this joke is good-natured, and sometimes not. Sometimes it just isn’t justified... but, sadly, sometimes it is.) I’m about two-thirds serious when I recommend that all businesses install a shortcut on every computer called “Helpdesk Recovery Tool #1” — which is nothing more than a shortcut to reboot the computer! (To create such a shortcut, see my Shortcuts to Shutdown & Restart Windows pages.) User instructions would be: When a computer problem appears, close everything you can close, then click the “Helpdesk Recovery Tool #1” shortcut and let the repair utility finish processing; if that doesn’t solve the problem, then call the helpdesk! <bg> (Helpdesk calls would reduce by an estimate 30-50%.)
Reboot Windows in Safe Mode. If the problem persists, you’ve ruled out a dozen or so possible causes; and if Safe Mode resolves it, you have a clear path to further troubleshooting using clean boot troubleshooting techniques. The “clean boot” method involves process-of-elimination of a particular list of items. See the appropriate articles for your Windows version on my Knowledge Base Links: Troubleshooting Strategies page.
An old rule of thumb stated that if a problem persists in Safe Mode, it’s a hardware problem; otherwise, it’s a software problem. That rule is too simplistic, but often can nudge you in the right direction. It is more reliable to say that if a Windows problem persists in Safe Mode, it is most likely a hardware problem or fundamental damage to Windows itself; and if it does not occur in Safe Mode then it is probably a software or driver problem. There are still exceptions — but this is a clue — and knowing the differences between Safe Mode and Normal Mode will help you narrow the field considerably.
If Safe Mode is used, startup program launches do not occur and a standard VGA video driver is used. (No protected mode drivers are launched.) For a list of startup items not launched in Safe Mode, see my article, Startup Program Loading. For a list of the more complex differentiations between Safe Mode and Normal Mode in Windows 9x, see Step 8 here.
TO REBOOT IN SAFE MODE you need to restart the computer and bring up the Boot Menu. How to do this varies slightly with different versions of Windows.
By now, most computer users know that they have to protect against viruses. You need a good antivirus program running on your computer in real time, monitoring and checking as files are accessed, as well as running periodic scans of all files. You need to use an up-to-date virus definition file with this AV program (sometimes these are updated almost every day, so automated updating is preferable). The virus protection on your computer should be so solid that there is rarely any doubt that you are virus-free — the only room for doubt being whether a new virus snuck in before your antivirus software’s manufacturer had a definition file that would catch it. If suspicious, run your AV program to check the system as part of zeroing in on a problem that suddenly develops on your computer. You can also try one or more of the free online virus scanners listed on my Parasites & Other Security Issues page.
But there are also nonviral invaders that have become as big a problem as viruses. In fact (perhaps because people are less aware of these and less mindful of protecting themselves), these parasites may be an even greater risk to your computer’s proper running. Adware, spyware, browser hijackers, automatic diallers, and other forms of nonviral malware — some intentionally if misguidedly installed by the user, some foisted on you without your awareness — are, at least in a few cases, as destructive.
And, since they often are badly written, they commonly announce themselves unintentionally by breaking some functionality on the computer. Therefore, checking for these is an important early step in troubleshooting computer problems, especially if the problems appear suddenly. If there is a serious browser or Windows Explorer/My Computer problem not related to a bad or damaged browser install, failing hardware, or user error, 90% of the time the problem will be the result of one of these parasites. Because Internet Explorer is integrated into the kernel of all Windows versions after Win95, these “browser problems” can manifest as general performance degradation or error conditions in the Windows shell. If you’ve ruled out the obvious in troubleshooting browser failures, the eruption of many error messages, inability to launch programs, or sudden (in contrast to gradual) serious slowing of your computer, checking for parasites should probably be your next diagnostic step.
For an onderly seven-step approach for identifying and removing these parasitic invaders, see my Quick Fix Protocol page. For more in-depth reading on the subject, see my page The Parasite Fight!
Several of these parasites are intentionally added to the computer by the user because the program looks like a cool toy. For example, Hotbar is a popular browser add-on that causes big problems on most computers. Many people install Gator (now renamed Claria) to manage online passwords. People install the insidious and pernicious IEPlugin to get “faster, smarter web browsing,” and live to regret it. And so forth. Other parasites are snuck onto your computer often without your knowledge. An important early step in all troubleshooting of Windows problems, therefore, is the isolation and removal of such parasites.
In one sense, this step belongs at the beginning, with defining the problem. I list it here instead because the previous steps have done all the easy work and, if they haven’t solved the problem, it’s time to roll up shirt sleeves. For that, we need to be sure that we have a good history!
When did the problem begin? Does it occur consistently or only sometimes? Is there a pattern? Does it occur if you try the same task in an alternate way? (Windows usually has four or five ways to do the same task.)
What changes (additions, removals, new configurations) were made to the computer hardware, software, or operating system soon before the problem began? (Consider user-installed utilities and other malware among these changes.)
Has this problem occurred in the past? Was a solution found?
If the problem can be traced to a specific point in time, and to a particular event (say, the installation of a patch, program, or driver), reverse the change and test to see if the problem resolves. If you are using Windows ME or Windows XP, System Restore is a powerful tool to take you back to a time just before the problem started, reversing changes to the Registry and many other kinds of changes on your system. (Some other Windows versions have other native tools for recovering an earlier version of the Registry, and various third-party utilities exist.)
Nobody wants pain; but pain serves a very useful function most of the time of letting you know that something is wrong! The same is true of error messages. Don’t curse them. Praise them! Don’t say you hate error messages. Say you love’m. You don’t want computer errors but, when errors occur, error messages are your “new best friends.”
Often an error message is the only thing that can tell you what is going on. Therefore, you want to get the most information from them that you can. Windows doesn’t provide this by default. (I suppose Microsoft understands how much computer users dislike seeing error messages, so more attention has been put on having the computer correct itself than on providing the user with diagnostic information.) You have to make a couple of changes in Windows to get the best information on your errors.
First, run Dr. Watson. Launch it from your Startup folder on every Windows startup. I recommend this for all Windows versions that have a Dr. Watson, but it will be especially helpful in Windows 98 and ME. In those two versions, Dr. Watson is mature enough to be a very helpful diagnostic program, and you won’t get all of your error message data without it. Click the Details button on an error message to get more information. Record the error message verbatim — exactly and completely.
In Windows XP, the default is for Windows to restart itself when a sufficiently serious problem occurs (or restart a component, such as the Explorer shell, which fails). It doesn’t display the error message — it just reboots. Disabling the “restart on system failure” feature may permit the exact cause of your problem to be isolated: Right-click on My Computer, click Properties, click the Advanced tab. Under “Startup & Recovery,” click Settings. Under “System Failure,” uncheck the box in front of “Automatically restart.” Once you can see the Stop Messages, analyze these using my Stop Messages page.
For all versions of Windows, you can find assistance with the various classes of error messages on my Knowledge Base Links: Windows Error Messages pages.
Know and use any other error diagnostic information available to you. This will vary with varying Windows versions. For example, in Windows 2000 and XP, use the Event Viewer to view event diagnostics. (Logged errors are marked by a red circle with a white X.) The fastest way to the Event Viewer is to launch EventVwr.msc from a Run box. You can also get there by right-clicking on My Computer, selecting Manage, then picking System Tools | Event Viewer. Another example of useful logged data is the bootlog which can be created by all Windows versions, and which often proves helpful in startup (and some shutdown) issues. For Windows 9x, you can make its information more comprehensible with the excellent freeware Bootlog Analyser.
Especially if the issue seems to be related to the Windows file system, the integrity of one or more files, or your ability to access information on the hard drive, run ScanDisk in Windows 95/98/ME, or ChkDsk in Windows 2000/XP.
Computer hangs, error messages, and similar failures that occur randomnly and unpredictably are usually hardware failures. The test for randomness is whether there is reliable replication — whether you can identify the circumstances where the error occurred and replicate them more or less at will. For example, the following presenting problem is probably a software problem: “If I’m running Program X while online and try to paste text, the program throws me out and I eventually have to reboot to get it to work right.” However, it is probably a hardware issue if the problem is, “Sometimes when I try to paste text from the clipboard, Windows hangs or throws an error message, but it only happens sometimes and doesn’t seem to be with any particular program or any specific circumstances I can see.”
Of course, some hardware problems are replicable and nonrandom. However, random problems are almost always hardware issues. If you are in a support position, try to identify patterns the user may have missed, and to rule out possible contributing factors to narrow the focus. Windows troubleshooting is often like basic policework: Build the biggest list of suspects you can, then eliminate as many as possible and see what remains on the list!
More tips and resources are scattered all over this Windows Support Center site. In particular, I suggest you look at my page on Computer Health and, more geneally, all of the “Optimizing Your Computer” articles on my Articles & FAQ Files portal page.
If none of this solves the problem, ask for help! Post your question on the AumHa Forums. Having worked through the above steps, you are now equipped to provide a clear statement of the issue, provide the history and other background information you have gathered, and to itemize what you have tried. Write up the problem, its history, your best assessment of the situation, etc., then ask our volunteer troubleshooters for help.
(Thanks go to MS-MVPs Alex Nichol and Ron Martell
for valuable ideas I have incorporated into this page.)