Sie sind auf Seite 1von 1

The IBM processor cell cache layout and organisation is as follows;Each core has a 64 KB, four-way set-associative instruction

cache and a 64 KB data cache of an eigh t-way set-associative design with a two-stage pipeline supporting two independent 32-b it reads or one 64-bit write per cycle. Each core has semi-private 4 MiB unified L2 cache , where the cache is assigned a specific core, but the other has a fast access to it. Th e two cores share a 32 MiB L3 cache which is off die, using an 80 GB/s bus The AMD Opteron has 4 cores, each core containing its own individual L1 and L2 cache blocks, all tied together into a massive L3 block and the DIMM controller. It is essentially easy for simple single threaded processes to swap data around the co res, because that L3 cache is essentially a bridge that ties all of the cores togethe r. The Phenom II from AMD shares a lot of organizational similarities with this design. But what happens when it gets to multi-threading? In concept, the data sharing i s still easy via the shared L3, but L3 is slower than L2, and L2 is slower than L1, so o n point of efficiency this processor design is best at running lots of interconnection s inglethreaded processes, and as more and more cores are brought on to multi-threading , the L3 and it s speed will be the controlling factor in how well these processes can fun ction. AMD is trying to push past this, following Microsoft and the new thread manageme nt software that will be embedded in the Windows 8 kernel. The Bulldozer architecture or Intel Sandy bridge processor has sets of 2 cores. Each core will have its own L1 cache, but the L2 and L3 will be shared by two cores. There is no common memory bank like the L3 in the older designs, but there is a common memory switch, allowing the L3 banks to communicate with each other without havi ng to reach out to the comparatively slow main memory bus. What this relationship i s creating is an environment where the operating system will have to be smart in h ow it spreads out multi-threaded tasks, so process one must be assigned to core 1 and 2, so that they can operate on the shared cache, not core 1 and 3 where data would nee d to be swapped across the L3 switch. Windows 8 Developer s Preview has been shown to handle this very easily, with benchmarkers seeing a 10-15% improvement in overal l performance, where the Sandy Bridge and Phenom II showed a very similar 6-8% dro p in performance. Recent Linux kernel releases have also picked up on scheduled mu ltithreading, and the FX has seen similar performance increases with these as well

Das könnte Ihnen auch gefallen