Sie sind auf Seite 1von 5

Paper 3 Domain Specific Processors in Future SoC

ABSTRACT
Systems on Chip have inherent benefits for product development by reducing their time to market and providing cost effective and improved designs. With the ever changing landscape of computing today, new applications are evolving by the day. For most of such applications, heterogeneous multicore architectures provide low power high performance solutions. To design specialized processors directed towards providing high performance at low cost, it is necessary to categorize these applications into specific types of domains, envisioned as domains. This paper predicts and validates the domain specific processors to be used in future computers.

Keywords
!omain"Specific #rocessors, $ultimedia #rocessors, #robability #rocessors, $ission Critical #rocessors, %utomotive #rocessors. Figure / - The ,ardware"Software co"design space . For achieving this, the applications that the future computing infrastructure may run, need to be identified and classified according to their computational needs and design a specialized processor for each need. There are a gamut of processes that need huge amount of parallel processing and fre1uent access to the memory such as signal processing and they 2ustify the %S&C plus ,!3 side of the spectrum in Figure / due to the sheer necessity of high performance. .n the other hand, a system has to be reconfigurable to keep it low cost which demands programmability which 2ustifies 4&SC plus C side of Figure /. &n this paper we look into it in more detail. Section 5 describes the need to classify the domain specific processors and section 6 provides a detailed discussion on each domain. Finally a discussion is presented on the need for sub" domain and super"domain division.

1.

INTRODUCTION

&ntegrating a complete electronic system on a chip has inherent benefits for system manufacturers. &t helps in reducing the design cycle cost by reducing the area and power consumption of the whole system. 'arious apparent lower level system issues for systems assembled on a #C( such as signal integrity, electromagnetic compatibility, component and module interfacing gets obliterated by integrating the whole system inside of a chip. System reliability increases many folds as a result. #resent trend of computer applications is towards enhanced interaction with physical environment for providing a natural user interface. The processing overhead for such applications and interfaces are fundamentally different from the traditional data processing needs of yesteryears which were optimally handled by general purpose processors )*##+. ,uge sets of data collected by a gamut of sensors need conditioning and processing simultaneously with high energy efficiency. % highly intelligent processor handling such huge data set with high efficiency is very common in nature - the human brain. % take away from analyzing the working of human brain is that the data processing of all the things that we humans hear, see, taste, smell, think or do the number"crunching etc, everything is done by various specialized locations in the brain, which vary in their size, properties and functioning. Taking cue from this, we can argue that the processors of the future devices, which has to support natural user interfaces, need to have non"uniform types of specialized data processors for individual kinds of sensing that has to be supported by the system. The most pressing problem is to decide the granularity with which the SoC needs to be designed - whether as a processor or an %pplication Specific &C )%S&C+. The processor approach renders the system highly programmable and low cost but it is one of the slowest on the performance scale. .n the other hand, an %S&C is non programmable hardware and costlier than processor but has the best performance among all other hardware configurations as shown in Figure / . &n my view, the future processors need to be programmable as well energy efficient and have to rank very high on the performance scale. Thus, we have to take an approach that is a mi0 of both.

2.

PROBLEM STATEMENT

&n the systems on chip used in personal computing devices, there is already a partition in the workload based on the fundamental difference in the nature of data processing needed, namely C#7 and *#7. This combination of C#7 and *#7 has taken the computing market by storm because of its enhanced graphics performance and thus increase in the overall system performance. %s depicted by 8vidia , the trend shows that there is considerable growth prospects for this C#7 plus *#7 system on chip architecture. The curve shows that somewhere near 59/6" /:, we will have console level gaming e0perience on our smartphones. &ndeed, *#7 performance is scaling up. We also have to address the fact that most of the signal conditioning and number crunching tasks are done by the multicore C#7 and a performance hit is emerging there. The present day and future applications such as natural user interfaces will be highly data intensive where a lot of sensory data would be incident on the C#7 which would need to be conditioned and processed simultaneously.

and cloth simulation, finite element analysis to name a few, are highly computationally intensive tasks and becomes highly power consuming for a *##. =uite a few ##7s were marketed like F#*% based S#%4T% and %S&C based ,>33%S from #enn State 7niversity and *eorgia Tech, #hys< from %geia which was ac1uired by 8vidia . Though till date, these were mostly used for gaming industry, they may find new and huge applications in natural user interfaces and various educational animations.

6.6

!igital Signal #rocessors

Figure 5 - *raphics #erformance trend plot by 8vidia . We need to address this by splitting this C#7 again with respect to the data processing needs, as done earlier in case of *#7. Thus identifying these classes of data incident on the processors is necessary to be able to classify the types of future SoCs.

3. PROPOSED SOLUTIONS
To start identifying the types of data processing re1uirements of future computers, we need to identify the types of possible applications that can run on these systems on chips. &n other words, we need to decide on the types and scenarios where the computing hardware may be used. We will discuss each of these domains with a few possible commercial architectures in these categories and their salient features. This paper proposes the following classifications for the processors for future systems *eneral #urpose #rocessors !igital Signal #rocessors #robability #rocessors $ission Critical #rocessors *raphics ; #hysics #rocessors $ultimedia #rocessors 8etworking #rocessors %utomotive Control #rocessors

These are the most essential parts of today?s systems on chips. #resent day applications ranging from voice processing on any phone, voice"band modems, and facsimile machines to a medical imaging device, they all re1uire these processors at their core. These are modified architecture specialized to run various signal processing algorithms like FFT, F&4, fast data sampling and various other similar tasks for the above mentioned applications. The architectural re1uirements for these processors are fast floating"point multipliers and shifters, fast and large memory access capabilities, large instruction word issue capabilities, special addressing such as modulo addressing, zero overhead looping. These specific re1uirements cannot be fulfilled in a *##. For e0ample, a floating point multiplication on a *## takes around times more time than addition. Such latency defeats the purpose of such large data processing needed to complete the tasks in near real time. The power budget for running !S# algorithms on *## is also very high and for most of the computing infrastructure, power metric is a dominant determinant of success.

6./

*eneral #urpose #rocessors

Computer without a processor makes no sense. % C#7 in the archaic sense means a *##. Starting from %4$ mobile cores to &ntel #entium series, &ntel Core i< series, %$! and $&#S, all fall under this category. 7ntil recently, they used to be single core processors, but due to the power limitations, the fre1uency scaling of single core processors reached its practical limits and now the trend is moving towards homogeneous multicore processors. The re"programmability and ease of reconfiguration is the strong point for these processors.

6.5

*raphics and #hysics #rocessors

With a booming gaming industry and predicted high resolution natural user interfaces in the near future, graphics processors are an essential component of today computing systems. There is enough evidence that graphics processors have firmly set its foot in the processor market. Figure 5 shows the pro2ected growth of graphics processor performance. The mammoth parallel processing capability it possesses encourages a few other type of data and signal processing operations to be handled by it too. &nfact, almost all computers today come with a *## plus *#7 unit in one chip. #hysics processing units )##7+ or physics engines in gaming systems are new additions to these *#7s. These processors compute the calculations of physical world obeying advanced laws of physics very simple and fast. Calculations such as rigid and soft body dynamics, fluid dynamics, collision detection, hair

Figure 6c - Simplified !iagram of %nalog !evices Shark !S# . %s shown in Figure 6c simplified diagram of %nalog !evices Shark !S#, there are dedicated multipliers, instruction data caches and also some omitted features such as 9bit accumulator is built into the multiplier to reduce round off errors associated with multiple fi0ed point multiplications . %nother highly useful feature is use of shadow registers for all key registers which are helpful in fast conte0t switching@ the ability to handle interrupts 1uickly. This processor can run an efficient F&4 filter with /99 coefficients within /9A"//9 cycles as opposed to many thousands of cycles re1uired by a *##.

6.:

$ultimedia #rocessors

The introduction of digital audio and video into the mainstream was the starting point for multimedia processors. Today it has

scaled to the use of high"definition T', mobile video streaming, video conferencing, $#>* video compressions, voice"data over internet etc. 8ewer natural user interfaces are being developed and will be introduced in the near future. These interfaces incorporate a bunch of sensors that provide all kinds of raw data from touch"sensors, bend"sensors, pressure"sensors, &4, accelerometer, gyroscope, bluetooth and in future mobile medical imaging and diagnosis sensors etc. The nature of processing needed for data from these are similar to media compression algorithms. The need for a number"crunching mammoth processor like a multimedia processor is only reinforced. The increasing market trend of mobile video users as shown in Figure 6d, supports this statement.

mobile multimedia applications. The block diagram for #hillips 8e0peria T$ is shown in Figure 6e. The $&#S core runs the operating system and the hardware accelerator block contains modules such as a 5"! rendering engine, $#>*"5 video decoder, image composition processor, video input processor, scaler, and system processor.

Figure 6e - %rchitecture !iagram of #hillips 8e0peria T$.

6.A

#robability #rocessors

Figure 6d - $obile 'ideo 7sers Trend . !igital audio and video re1uire a tremendous amount of information bandwidth. To put things into perspective, high" definition television ),!T'+ )/B590/9 9 pi0els+ is e0pected to be compressed into 59 $bitCs, while a ,.5D6 video"phone terminal using sub1uarter"common intermediate format )/5 0BD pi0els+ with E.A framesCs is e0pected to be /9-59 FbitCs . 4eal time audio and video compression re1uires a lot of processing capabilities. The present day !S# chips do have such processing capabilities but they have $#>* conversion as a built"in hardware function in addition to their basic architecture, which differs a lot from chip to chip. %rchitectural changes have also occurred such as accommodation of very long instruction word )'3&W+ controls. Some microprocessors are trying to incorporate some enhancements in their instruction set to accommodate media functionalities. % long word %37 is broken into several small word %37 which can process several short"word data in a single instruction. This approach was used in graphics processors, then 4&SC processors in servers to process $#>* video and later in #C media accelerators too. (ut this enhancement makes the programming for the processor highly complicated . ,owever, efficiency can be achieved by tuning the program using assembly language, similar to !S# programming. %n alternative way is to use the graphics accelerators or *#7s that are an inevitable part of today?s computers, but for this option to work, these *#7s have to lose their specificity towards graphics related tasks and assume a more general processor )*#*#7+ type of role which is not desirable. There are various architectures from C#7s and !S#s for such multimedia processing, but most of them use a lot of power due to their operation at higher fre1uencies. $ultimedia processing done on a *## consumes more power due to operation at higher fre1uency and being less parallel. The !S# based multimedia processor is a highly parallel architecture running on very low fre1uency and thus consuming /90 low power . #hillips 8e0peria T$ and ST 8omadic T$ are 5 multimedia processors used for digital video entertainment platforms and

#robability processor is fundamentally different from traditional electronics right from the inception - it performs calculations using probability instead of binary logic and holds promising results for banking calculations and software, flash memory in smart phones and error correction algorithms. 3yric Semiconductor )www.lyricsemiconductor.com+ is a startup that is making these probability processors that compute on chances. This processor implements statistical calculations in a simpler and energy efficient way. %pplications that depend solely on probability calculations are %mazon?s recommendations about products, fraud check on credit cards and e"mail spam filters to name a few . &mplementing the math for such application is simpler on probability processors than conventional logic processors and hence these smaller and more energy efficient processors result in faster outputs. The technology is still in the cradles and 3yric is finding it difficult to prove its technology reliability. The building blocks for #robability #rocessors are called G(ayesian 8%8!H gates, to put in direct relation to (ayesian #robability. The output of a (ayesian 8%8! gate represents the chances that the two input probabilities match. This makes it possible to perform calculations that use probabilities as their input and output . !%4#% is looking for potential applications that suit such processors where the information states are not clear like distorted radio signals or machine vision systems. #robability processors are finding utmost use in companies producing flash memories. Conventional flash memories store data by storing charge on semiconductor surfaces. The difference between I/? and I9? is roughly /99 electrons and about / in /999 bits gets corrupted. !ue to miniaturization of semiconductor, this error rate will only increase with time. >rror checking chips can correct them by generating uni1ue codes each time something is written onto the flash and the checksum can be used to detect any unintentional flip in data and also correct it. This re1uires the kind of statistical calculation that is difficult to implement in digital logic but ideal for probability processors . The data structures and programming language for such processors are represented as chains, trees or grids. $icrosoft is trying to develop a probability coding language called &nfer.net and *oogle is using a language called 4. Such systems don?t find a solution but set constraints and let the system solve using these constraints . With all such applications, this processor has potential to usher a new domain and class of processors.

6.D

8etwork #rocessors

6.E

$ission Critical #rocessors

Today network is synonymous with computers. !evices that are not networked don?t solve the purpose of having it. These have not only increased in number, but also demand more data rate. This certainly puts a huge packet processing overload to the core backhaul processor. &n the earlier days, when data rates were not as demanding, the packet processing was implemented in software running on a *##. With the huge demand on data rates and traffic volume, the clock rates of *## have certainly fallen far behind than needed for faster and bulkier packet processing. With core backhaul networks running on :9*bps and being scaled to 9*bps or higher, and edge networks emerging upto /*bps, more comple0 networking protocols, =oS, security aspects are being added. These re1uire processing powers that surpass the capabilities of the most advanced *##s. 7ntil recently, %S&Cs used to be serving this space which meant loss of programmability and higher time"to"market, rendering the networks rigid and stagnant. (ut with fast evolving protocols and features, having a fle0ible network is essential that can adapt to these changes as 1uickly as possible. F#*%s are a very good alternatives eliminating almost all the shortcomings of %S&Cs, but fall behind on power consumption and cost scenarios. !ue to these shortcomings of %S&Cs and F#*%s, a new kind of processor having complete re"programmability is needed.

$ilitary and space applications need a new domain of processors that take care of the special time critical and safety critical applications. With defense and aviation budgets swelling each year worldwide, this area is definitely a thrust for future processor architecture development. $ulti"$ission $obile #rocessor )$6#+ is a ground based S(&4S )Space (ased &nfrared Satellite+ system that is designed to provide !irect !ownlink of the S(&4S constellation )with ,>.C*>. launches+ when it has fully been implemented. These are ne0t generation of missile detection systems and are still in design phase. The same 7S %rmy report suggests use of advanced architectures for intelligence, surveillance and reconnaissance activities too. !!C"& )www.ddci.com+ is a company working in the safety critical application domain holds the view that heterogeneous multicore processors are integral to future of avionics and aerospace applications. The company is focused on enhancing a product to safely schedule multiple cores and ensure that one core hitting the cache or resources doesnJt unnecessarily degrade performance and throw off the timing of another application running on another core . Cavium )www.cavium.com+ is another company producing heterogeneous multicore processors that enable ne0t"generation routing and switching capability in tactical environments where data has to be securely maintained and transmitted among airborne, seaborne and terrestrial assets helping real"time critical interoperability of military communication devices . These are novel domains and a concrete working architecture is yet to evolve, but the need to address the new class of data processing, which the *##s are unable to address, is 1uite evident from the above e0amples.

6.

%utomotive Control #rocessors

Figure 6f - *eneric 8# %rchitecture. Typically a network processor supports pattern matching to determine the type of packet, lookup for various destination &#, data manipulation like recalculating C4C checks or packet segmentations or encryption and 1ueue management. ,owever, #rogrammability for *##s is not suitable for packet processing. 8etwork processors employ multiple programmable processing engines )##>+ within a single processing device. %mong various flavors of 8#s having different architectures, the commonality lies in the fact that they all employ ##> . Some 8# manufacturers use 4&SC based instruction"set with multithreading while others use '3&W based architecture. Some of these architectures incorporate hardware co"processors to perform common networking tasks that don?t re1uire programming such as C4C calculations, making it more efficient. Figure 6f shows generic network processor architecture with several ##>s, separate lookup engines and switch fabric interface. &ntel?s &<#5 A9 is a heterogeneous multiprocessor 8# chip. &t has an array of /D multi"threaded micro"engines to handle packets, <scale processor to handle control, 5 crypto engines, multiple ports for packet ingressCegress and #C&e ports. &ntel provides an S!F for software simulation and real time debugging.

Today?s automobiles are comple0 digital networks of a number of embedded processors controlling and monitoring virtually every aspect of the automobiles. Safety critical systems such as engine control, automatic braking system and air"bag control re1uire processors with e0treme reliability. .ther non"critical tasks such as in"cabin navigation and infotainment also rely heavily on signal processing automotive processors. &n future automobiles, new safety systems would incorporate video ; radar processing and engine ; braking control systems would adopt more computationally demanding model"based approaches, in which comple0 run"time calculations would replace the look" up table references that are prevalent today. The automotive on" chip integration is 1uite different from other computing re1uirements . $ultichannel %!Cs are a must for a processor for an automotive control system. %n engine control system receives dozens of analog signals from various sensors placed strategically throughout the chassis that sense throttle position, engine speed and temperature, intake air density, and e0haust gas o0ygen content. The controller generates updated fuel in2ection and ignition outputs in response. &ntegrated flash memory is a highly desirable feature due to the use of large look"up tables used for calibration inputs of various control systems. These features are not present in *##s. %lso, the electronic components for automotive applications need to withstand higher temperatures and should have higher reliability which makes them costlier than commercial grade parts. Controlled %rea 8etwork )C%8+ and $edia .riented Systems Transport )$.ST+ are network protocols used specifically for automotive systems and they are of little use for *##s and other computing hardware. Figure 6g shows a Freescale?s architecture block diagram of an automotive control processor.

Figure 6g - Freescale $#CADD (lock !iagram .

4. DISCUSSION ON SOC DOMAINS


&n this paper we have proposed different types of domain" specific processors. Some of these are integral for a lot of computing systems and some are very specific to a few application zones. ,ere we would investigate into further grouping them in some more sub"groups and super"groups. %s seen in the discussion till now, there are some processors that inherently assume the presence of another basic processor for their working. %ll of these processors can be termed together to form a super"group and the smaller domains can form the respective sub"groups. %pplications handled by automotive processor and aerospace ; military specific mission critical processors fall under a broader ambit of time and safety critical data processing. These need to be highly reliable processors even at elevated temperatures, and various other rugged conditions. The type of components used for these systems are very different from commercial grade components. These two surely form a super"domain@ we name it as Safety Critical Specialized Computing !omain. .n similar lines are the domains of digital signal processors and multimedia processors. !S#s are integral to the functioning of a lot processing infrastructure like automotive and mission critical applications, audio and video processing etc. &nfact, multimedia processors encompass hardware accelerators and !S#s as their integral parts. So we can have a super"domain here, we name it Signal #rocessing and $edia !omain. %nother super"domain which is overlooked completely in this paper is the Server and Supercomputing Class which will encompass hugely parallel architecture such as *#*#7, *## and some other specialized architectures.

5. CONCLUSION
&n this paper we see how our computing infrastructure can be classified into different domain specific zones which will not only help in delivering targeted high performance in a much higher energy efficient way thus keeping the cost as low as possible.

6. RE ERENCES

Das könnte Ihnen auch gefallen