|How SMP Really Works
By: Sverre Sjøthun, July 8, 2003 Print this article
In dual processor x86 systems, the two processors operate indepedent of each other and are able to function simultaneously. But there are many myths and different beliefs on how SMP works. Read on to find out how it really works.
In dual processor x86 systems, the two processors operate indepedent of each other and are able to function simultaneously. This is made possible through the use of a special hardware controller called an APIC (advanced programmable interrupt controller). Through this piece of hardware (separate on p5 systems and actually integrated into the processor on ppros and up) the operating system can schedule jobs on the processors in the system.
The Operating System
The operating system is responsible for communicating with an APIC (should one exist) to determine if there is more than one processor installed in the system. It can then schedule processes on those processors through memory mapped I/O and interrupts to the APIC.
Linux, NT, win2k, BeOS, and OS/2 are the most common operating systems that support multiprocessing through the use of the APIC. In addition to their normal scheduling routine, they can schedule threads from various applications and system processes on the various processors and thus achieve more scalability.
A Second Processor
Adding a second processor does not necessarily translate into a 100% performance improvement due to 2 factors. These are the facts that most modern operating systems and applications do not lend themselves very well to multiprocessing. Its a rare case where a single application can achieve more than 50 or 60% speedup through the use of multithreading. The processors simply run out of things to do. However, the margin between theoretical speedup and actual speedup decreases as the number of threads that use significant processor time increase.
Scalability on Different Platforms
But the second factor that limits performance is inescapable--and that is that both processors share the same bus, I/O, and memory. Competition between the two processors for regions of memory or acquiring locks on the PCI bus limit the resources available to each processor. Cache coherency protocols, memory locking schemes like semaphores, and the distribution of interrupt requests between the processors all bring overhead to the party. As most of us know by now, AMD uses the Digital/Alpha EV6-protocol. This is a lot better than Intels GLT+ which shares the bus between the CPU's. In a Intel setup, the more CPU's the less bandwidth for each processor. In an AMD-setup, each CPU has a separate bus, and no decrease in bandwidth.
Dual systems or even quad systems are desirable not for their linear speedup across the board, but for their ability to handle large loads such as would be present in servers that need to launch hundreds and thousands of threads simultaneously. Each network connection that a server accepts requires a second thread to be created, and on multiprocessor systems, the extra processing power available for parallelization makes the system ideal for handling these tasks.
How SMP Really Works