Beruflich Dokumente
Kultur Dokumente
Andreas Herkersdorf
.. Name
Matriculation #
Seat
At first, please fill in the title page with your name, matriculation number and the number of your seat. Do not forget to sign the exam! If you hand in any extra sheets of paper, they must also contain your name and matriculation number. We will check the student ID and passport during the exam. The numbers in parentheses are indicative for the number of credits you can earn for a correct answer of this question. The maximum number of credits to be earned is 70. Subquestions which may be answered independently of other parts are marked with an asterisk (*). Please note that in multiple-choice questions false answers lead to negative credits.
No materials allowed in the exam except: pen, non-programmable pocket calculator, one sheet A4 with your personal notes.
Good Luck!
1) Right or wrong? (8) Remark: correct answer +1, wrong answer -1, no answer 0 points, min.: 0 Yes No VLIW processors and superscalar processors both have more than one execution unit. DSP processors are usually RISC cores extended with HW multiply-accumulate units for signal processing tasks. Sonet/SDH frame synchronization may be achieved by analyzing inter-frame gaps. Sonet/SDH networks are typically installed in a star topology. In a shared medium LAN there are usually more collisions between communication partners than in switched LANs. The FIFOs in input buffered switches need a higher memory access bandwidth than those in shared output buffered switches. High-speed differential serial I/O is a prerequisite for implementing the L2 interface of single STM-4 framer devices. Thermal conduction dominates thermal radiation with respect to heat dissipation from the active area to the environment. 2) Explain two reasons for payload scrambling in a Sonet/SDH transmission system! (2)
3) Microprocessor Architecture: CISC vs. RISC a) * Explain the conceptual difference between a RISC and CISC processor! (2)
b) * Assume a pipelined implementation of a RISC architecture and a CISC architecture, both implementations have five pipeline stages. What architecture would achieve the higher clock frequency? Explain why you think so! (2)
4) Name the two fundamentally different switch architectures with respect to port contention resolution strategy. Sketch their respective offered load delay behavior into the two diagrams provided below! (4) __________________ switch architecture __________________ switch architecture
64
64
Normalized Delay
16 8 4 2 1
Normalized Delay
20% 40% 60% 80% 100% 120%
32
32 16 8 4 2 1
Offered Load
Offered Load
(7)
hcase-air=7 W/mK kcase=237 W/mK kglue=0.8 W/mK ksilicon=149 W/mK Junction (active area)
The die is 0.45 mm thick, the glue 0.05 mm and the case cover is 0.7 mm thick. The die area is 125 mm and the case surface (for convection only!) is 1200 mm. Formulas: Rcond=t/kA, Rconv=1/hA. a) Derive the conductive thermal resistance between the active area of the die and the top of the case. Further derive the convective thermal resistance of the case and the total thermal resistance of the diepackage combination! (4)
b) * Calculate the junction temperature for an ambient air temperature of 27C, when the chip dissipates 800mW! If you could not solve a) assume a thermal resistance of 125 K/W! (3)
(25)
DMA
CryptoHW
SDRAM Ctrl.
...
SDRAM module 32b, 250MHz, 7-1 Given is an NP-SoC with a run-to-completion architecture as depicted above. It is designed to encrypt and forward IP packets (no L2 protocol shall be regarded here!) at a speed rate of 2 Gbit/s. The IP packets are variable in length with a fixed header size of 20 bytes and a payload section between 26 bytes and 1460 bytes. Before encryption an additional header of 20 bytes is appended to each packet as shown in the following diagram:
20 B IP 26 B 1460 B Payload 20 B IP 20 B 26 B 1460 B
Crypt. Payload Head. The CPUs are single-issue pipelined RISC cores that run at 2 GHz and have a single level of cache. The SRAM, which serves as instruction memory, has a minimum access time of 2 ns and an internal memory width of 128 bits. The instruction bus is running at 250 MHz and is 256 bits wide. The packet memory consists of several SDRAM modules with a 32 bit data bus, 250 MHz clock (single data rate) and a 7-1 access pattern.
a) * What is the optimum CPICPU of each of the given cores? Ignore pipeline hazards! (1)
b) * What is the L1 miss penalty for the instruction accesses to the SRAM, given a cache line size of 512 bit? Assume that you will have to wait for an additional half cache line read by another CPU on average. The cache miss is resolved after the entire cache line has been read. (4)
For the following questions (c) to e)) assume a software-only processing without the HW crypto core! c) * How large is the time budget for each CPU to complete packet processing in a work-conserving operation if you assume (I) a continuous flow of shortest-size packets and (II) a continuous flow of longest size packets? (4)
d) * Derive the number of CPU instructions that need to be executed in the two above mentioned cases! Forwarding and header insertion require 120 instructions and software encryption costs 24 instructions per payload byte. (2)
e) Is the given clock frequency of 2 GHz sufficient to process the packets in the two scenarios (I) and (II)? Assume an instruction miss rate of 4% and a data cache miss rate of 0%! If you could not solve b) assume the miss penalty to be 25 cycles, and time budgets (c)) of 750 ns (I) and 24000 ns (II) per CPU. (6)
Now, we want to offload the encryption function from the CPU cores to a HW crypto core. f) * The crypto core has an internal data path width of 16 bits and the encryption of each (16 bit) word is achieved in a single clock cycle. The longest logic path has 55 gate levels and you know the following parameters from the CMOS library: Register: tSU=350 ps, tpd=thold=200 ps Logic: tgate=90 ps What is the maximum frequency that the crypto core may run at? (2)
g) * Is a single crypto core sufficient to process the entire traffic, if you assume that the data transfers into and out of the core are achieved with at least the same speed as the encryption itself? Derive the processing times for the two scenarios (I) and (II) and compare them to the figures obtained in c)! If you could not solve f) assume a frequency of 180 MHz! (3)
h) * As we offload the encryption to hardware, we gain a lot of headroom on our CPU cores. Could a pipelined operation of "single CPU + HW accelerator" perform the task of the NP? Think about the worst-case traffic from the perspective of the CPU now, and assume that the HW accelerator is in any case work-conserving! (3)
7) ATM Switch System (20) In the following we consider a 4x4 switch for ATM cells, which originate from STM-4c SDH connections. There is one receive FIFO per port and we use an on-chip bus system as switch element. The cells from the FIFO are read out with the bus width and frequency and written into an output shift register that may hold an entire cell. The shift register adapts the data to the output port width and frequency. You have a CMOS library that allows you to implement all switch-internal components at a maximum frequency of 300 MHz, and your I/Os support a maximum of 200 MHz.
FIFO Shift Reg.
a) * Sketch the frame format (including dimensions) of STM-4c and derive the payload data rate! (6)
b) * What is the data path width and operating frequency for the switch I/O ports? If you could not solve a) assume 620 Mbit/s! (3)
c) * What effect with respect to output port contention can you expect to see in such a switch type? (1)
d) * In order to mitigate the negative effect on the system performance, you want to run the bus with a speedup of four, i.e. 4x the speed necessary for the typical implementation. Calculate the bus width and operating frequency! (4)
e) Assume that only a single cell is switched from an input to an output. Derive the minimum cell latency considering that the cell must be fully received by the FIFO, is then transmitted over the bus and retransmitted after the entire cell sits in the output shift register. (3)
f) Calculate the total power consumption of the chip, given a switch core logic (i.e. FIFOs, bus, registers, ) power dissipation of 200 mW. Assume an I/O switching factor of 0.35 and a 50 pF load capacitance. The I/Os of the switch operate on 3.3V. Assume that you need a clock line and data valid line in addition to the I/O data lines derived in b)! (3)