When overclocking a chip, there are a few considerations to make. A lot of people know that an increase of the bus speed gives you higher CPU speed, higher bandwidth on RAM, and in fact an overall increase in bandwidth of your other peripherals as well. Many of you are also aware of the result of way-beyond spec PCI and AGP speeds; corrupt data on your harddrive, even defective drives, and general system instabilities. Some are probably also familiar that when you have a chip that "almost does 850MHz" you can increase the voltage a bit to compensate for that, and it will probably run at the desired speed -- for a while.
But what happens to a chip when you overclock it, run it at over rated speeds and don't provide it with proper cooling? What are the long-term damages?
This is what will be discussed in depth in this article.
What is electromigration?
Harris Semiconductor Lexicon of technical terms puts it this way:
"Motion of ions of a metal conductor (such as aluminum) in response to the passage of high current through it. Such motion can lead to the formation of "voids" in the conductor, which can grow to a size where the conductor is unable to pass current. Electromigration is aggravated at high temperature and high current density and therefore is a reliability "wear-out" process. Electromigration is minimized by limiting current densities and by adding metal impurities such as copper or titanium to the aluminum."
Electromigration is an effect that occurs when an extremely dense electron flow knocks off atoms within the wire and moves them, leaving a gap at one end and high stress at the other. In a chip, the formation of such a void will cause an open circuit and result in a failure. At the other end, the increase of stresses can cause fracture of the insulator around the wire and shorting.
Fig. 1 Schematic diagram showing drifting of the cathode and mass accumulation at the anode.
What this amounts to is the fact that when an electrical current passes through a conductor some of the metal atoms is swept along with the flow of electrons.
Fig. 2 The scanning electron microscope (SEM) picture below shows the real world effects of electrom
The smaller we make the conductor, the bigger the effect will appear. And, as the trend in CMOS technology is, they are getting smaller and smaller every year. As the cores are shrinking, the operating frequencies are increasing and the CPU speeds have increased by 30% per year the last 15 years on average according to the Semiconductor Industry Association.
The increase will continue exponentially. In 1999, the Semiconductor Industry Association predicted shrinkage in the core from the current 0,18 micron to 0,13 micron by the end of 2003. Even as I type this, I read that Intel and AMD seem to be a little bit ahead of the SIA's predictions.
So far, it seems as if the problems regarding electromigration will increase at the same ratio as the CPU frequencies.
In the mid -90ies, IBM found a way to use copper as interconnect instead of aluminum without having problems with the atoms migrating out of a copper wire into surrounding chip material. This technique is known as dual Damascene Cu.
As most CPU designs nowadays are moving over to the copper interconnect technology because of speed and price considerations, this changeover also carries a hidden bonus. Research has shown that the dual Damascene Cu has a much higher resistance to electromigration than the previously utilized aluminum interconnect-technology. This is clearly illustrated in the image below:
The two metals are exposed to the exact same conditions, yet the aluminum interconnect has lost almost 1/3 of its length while the copper interconnect is barely affected.
This is an extract from an IBM report comparing copper and aluminum interconnect wiring:
"The integrity of the damascene-copper process was evaluated by fabricating 288-Kb SRAMs and performing the standard package thermal cycle and temperature/humidity/bias stresses. In addition, we performed electromigration and stress-migration lifetime measurements of 300 nm wide / 400 nm high wires. For all stresses, subtractive-aluminum control wafers and chips were also fabricated as a reference. The yields for the packaging stresses were excellent, with 0% fails for the damascene-copper chips and 0.4% for the subtractive-aluminum-control chips. Figures 1 and 2 show electromigration and stress-migration data for the 300 nm multilevel wires. Compared with the subtractive-aluminum wires, the damascene-copper wires exhibited more than two orders of magnitude improved reliability. Based on these reliability data, we believe that damascene-copper wiring is fundamentally more reliable than subtractive-aluminum wiring."
2.3V 100°C 5hrs
2.7V 140°C 5000hrs
0 - 125°C thermal cycle 20X
-40 - 150°C thermal cycle 200X
-65 - 150°C thermal cycle 1000X
-160 - 300°C thermal cycle 200X
288-Kb SRAM Functional Stress Results for Subtractive-Aluminum and Damascene-Copper Interconnects.
Fig. 3 Electromigration Data at 295°C and 2.5 MA/cm2 for Damascene-Copper vs. Subtractive-Aluminum.
Fig. 4 Stress-Migration Test Structure Resistance vs. Time for Damascene-Copper and Subtractive-Aluminum
The important data for us is shown in figure 1. The mean time to failure of copper interconnects compared to aluminum is nearly 100 times less! (150 hours compared to 1.3 hours).
There have been a lot of scientific papers published on the effects of electromigration, although nearly all of them focus on forced electromigration for chip test purposes. There is almost no literature or documentation on the effects of electromigration neither during normal chip use nor overclocked chips. But when reading about the forced electromigration one can clearly see the similarities with overclocked chips: The higher temperature, the higher possibility for electromigration to occur.
Voltage and heat:
As mentioned earlier, when you have a stubborn CPU, you might want to increase the voltage a bit.
Fig. 5 The relationship between maximum operating frequency and supply voltage.
This figure shows us the relationship between maximum operating frequency and supply voltage. The maximum operating frequency is proportional to (Vth-V) 1,25/V, where we assume Vth is 0,6V. Between 1V and 3V, the operating frequency is approximately proportional to the supply voltage, meaning that if you have a CPU that does 850MHz at 1,5V you will most likely make it run at about 1,0GHz to 1,13GHz when you increase its core voltage to 2,0V.
Increasing the core voltage automatically means higher wattage output of the chip; doubling the voltage means doubling the frequency, but it also increase the total wattage output by about 800%. If a CPU that originally emit 25W it will at double voltage and speed now radiate 200W of heat!
How to get rid of the heat:
Now that we have learned that higher voltage and frequencies result in higher temperatures, it is time to have a look at the countermeasures to it:
CPU cooling becomes an important issue when overclocking. In the last few years, we have seen a trend with big and expensive heatsinks and other, even more sophisticated cooling devices on the commercial marked. When I first entered the world of overclocking, there were no such thing as copper in-laid heatsinks with dual, high CFM fans on it. Everything had to be hand made, and the first time I saw a slot1 Alpha heat sink, I was in awe.
Today, the different companies fight against each other to deliver the best heatsinks ever available; CPU cooling has become a science, and it has become big business.
Not only does it prevent the danger and possibility of electromigration, it also lets you run your CPU much more stably and most likely at a higher speed.
I realize that the average overclockers do it partially for an extra, and free, bang for the bucks and maybe for the sport of it. That means that it is very unlikely that those of you that are doing this for the first time cash out $40-50 for a copper heat sink, but more likely go for a more cost-effective solution.
A heatsink is supposed to dissipate the heat from the CPU to the surrounding air.
Quite a few hours have been put down to find the "secret formula" to the ultimate heatsink. There are, however, a few common things to look for when you are going to put your greens into a new heatsink.
The bigger surface area it has, the better it dissipates the heat generated by your CPU. What you want is a heatsink with lots of fins or pins and a shroud to prevent the air from looping.
A heatsinks efficiency is measured in K/W or C/W where K=Kelvin and C=Celsius. Lower K/W is better.
Let's assume we have a 0.4 K/W heatsink and we want to overclock our PIII 700 to 931. At 931, the PIII 700 puts out approximately 35W with the core at 1,8V:
0.4 x 35 = 14 degrees higher than your case temperature.
If you have a lot of high RPM harddrives, DVD players and other heat producing devices and a case that is not vented properly, you might have case temps at 30C to 40C.
This gives us a CPU temperature at: 40C + 0.4 x 35 = 54C.
It is obvious that such high temperature is not desired, and what you must look for here are other solutions to bring the temperature down.
A lot of retail heatsinks nowadays are being sold with a thermal pad attached to their bases. Remove that one and use thermal paste instead. But keep in mind that certain CPUs have a fragile "slug". Apply a thin and even layer of thermal paste, and your CPU temps might drop by a few to several degrees.
Another highly recommended, and free, way to increase the efficiency of a cooler is to reduce the thermal resistance between the CPU and heatsink: lapping the heat sink. I did this on a retail Intel heatsink just for fun, and I was shocked at how concave it really was. Not only did the temperatures drop 3C°, but it let me run my CeleronII566 at 918MHz instead of 892MHz!
When it comes to lapping the CPU core it self, I think I would be a little bit careful, both because of ESD and, as I just mentioned, the vulnerability of certain CPUs
Case cooling is the most effective way to achieve all round lower temperatures, and it's quite easy too.
A normal computer case comes with one fan in it, and it is the one inside the power supply which normally is located at, or near the top of the case. First, make sure of the direction of the fan. We all know that hot air rise from the ground. Therefore, it is important that the fan inside the power supply blows the already heated air out of the case, not the other way around.
If it doesn't, and if you don't mind doing so, I would recommend flipping the fan so that the airflow goes in the right direction. Keep in mind that this procedure will void any warranty on that particular part.
If your power supply fan does suck air into the case, you will end up with air that has passed through the heatsink inside your power supply being blown into the top of your case - - and it stays there.
Since the hot air don't sink to the bottom of your case, all the hot air from the power supply and from all your other peripherals gets trapped in the top of your case, and your case temperature keep rising.
Next, you should get a second fan and put in the lower front of the case sucking cold air into it. I would recommend a big fan here; A 4,5" DC12V running at 5-6V provides you with all the airflow you need, plus it is super silent compared to its smaller counterparts. Doing these two things will guaranteed lower both your case temperatures and in turn CPU temperatures by several degrees.
Finally, two easy and absolutely free ways of improving the airflow inside your case are to get rid of what prevents it. IDE-, SCSI- and power-cables have a bad tendency to start floating around everywhere. My best advice is that you open your case and have a look inside.
Fig. 6 A tidy case improves your airflow.
If it looks like this, I would say you're on the safe side. If not, I would highly recommend that you think about cleaning up.
Fig. 7 Splitted UDMA33 cables.
Splitting the IDE cables is very easy, and free up a lot of space. A clean case is not only about aesthetics, it really help cooling your case.
A chip's design is tested for environmental stresses such as temperature and humidity before being released to production, and this testing includes electromigration testing. Because of this testing, electromigration is not really an issue to "normal" chip users
Overclocked chips are not running within normal operating specifications and thus because of the higher voltages / temperatures run a higher risk of electromigration than chips running at normal specifications.
There is no data available to show exactly how much of an increased effect this will have on chip lifetime. But we can assume, taking into account all the additional stresses of running at increased clock frequencies, that voltages and temperatures that the lifetime will be reduced perhaps by a factor of 10 to 100 times that of a "normal" chip.
For normal users of microchips, electromigration is not an issue, especially with the new copper chips that are being released to the market. However, for people who are overclocking their chips, one thing becomes clear: The higher the temperature and voltage within the conductor, the faster the metal atoms will move, and the faster the chip will fail due to electromigration. There is not much we can do about this, as there is really only one factor we can change - the temperature.
If we lower the temperature for the chip, we lower the energy of the atoms within the interconnects of the chip. This means that it takes a lot more energy to get the metal atoms to move and hence the possibilities of electromigration to occur are significantly reduced.
Thanks to Richard Cooper at Canon Europe N.V. for providing me with information