Given: • 20% of branches are unconditional branches Of conditional branches, • 66% branch forward & are evenly split between taken & not taken And the rest • branch backwards & are always taken Example 2: What is the contribution to CPI of conditional branch stalls, given: • 15% branch frequency • A BTB for conditional branches only with a • 10% miss rate and a 3-cycle miss penalty • 92% prediction accuracy and a 7 cycle misprediction penalty • Base CPI is 1 • What is the hit rate ? 90%
BTB result Prediction Frequency (per Penalty (cycles) Stalls
instruction) Miss - .15 * .10 = .015 3 0.045
Hit correct .15 * .90 * .92 = .124 0 0
Hit incorrect .15 * .90 * .08 = .011 7 0.076
Total contribution to CPI 0.121
Example 3: • Suppose we have a deeply pipelined processor, for which we implemented a branch target buffer (BTB) for conditional branches only. • Assume that the misprediction penalty is always 4 cycles and the BTB miss penalty is always 3 cycles. • Assume 15% branch frequency, and 90% hit rate and 80% accuracy. • How much faster is the processor with the branch target buffer versus a processor that has a fixed 2 cycle branch penalty? Compute CPI without BTB and CPI with BTB Example 4: • Assume a processor with a standard five-stage pipeline (IF, ID, EX, MEM,WB) and a branch prediction unit (a branch history table) in the ID-stage. Branch resolution is performed in the EX-stage. There are four cases for conditional branches: • The branch is not taken and correctly predicted as not taken (NT/PNT) • The branch is not taken and predicted as taken (NT/PT) • The branch is taken and predicted as not taken (T/PNT) • The branch is taken and correctly predicted as taken (T/PT)
• Suppose that the branch penalties with this design are:
• NT/PNT: 0 cycles • T/PT: 1 cycle • NT/PT, T/PNT: 2 cycles Example 4 continued: a) Calculate the average CPI for the processor assuming a base CPI of 1.2. Assume 20% conditional branches and that 65% of these are taken on average. Assume further that the branch prediction unit mispredicts 12% of the conditional branches. b) In order to increase the clock frequency from 500 MHz to 600 MHz, a designer splits the IF-stage into two stages, IF1 and IF2. This makes it easier for the instruction cache to deliver instructions in time. This also affects the branch penalties for the branch prediction unit as follows: • NT/PNT: 0 cycles • T/PT: 2 cycles • NT/PT, T/PNT: 3 cycles c) How much faster is this new processor than the previous that runs on 500 MHz?