Sie sind auf Seite 1von 6

Technology Primer

Silicon Hive Technology Primer


Reconfigurable accelerators that bring computational efficiency (MOPS/W)
and programmability together, to displace ASIC and DSP co-processors in
Systems-on-Chips
Abstract Why programmability?
Standards and market uncertainties, non-recurring There are several reasons why programmability has
engineering costs, and price pressures will require become a priority in today’s SoCs and systems in
the next generation of embedded computing general.
platforms to be fully programmable and low-cost.
Key enablers to such platforms are reconfigurable Application IP protection
accelerators that replace currently-used ASIC co- Application and/or service providers usually have
processors. Silicon Hive currently provides two types proprietary algorithms, the secrecy of which they
of reconfigurable accelerators that address these need to defend. They will not disclose
market needs. Our block accelerators are implementation information to chip-makers. Although
reconfigurable cores with very high computational IC design is not at all their core capability, these
efficiency (MOPS/W) and small silicon footprint, companies are often forced into it, delivering solely
targeting the acceleration of critical digital signal an encrypted net-list to a chip-maker.
processing kernels. Silicon Hive’s streaming Reconfigurability offers application and service
accelerators are also reconfigurable cores targeting providers the more cost-effective option of mapping
very high and/or dynamic data-rate functions like the their algorithms themselves onto a pre-fabricated
IF front-end of wireless devices. The key part.
differentiators of our approach are: the robustness
and reliability of our technology, our ability to
Uncertainties
customise our solutions to specific customer needs,
the high-level programmability of our cores (ANSI Uncertainty related to multiple or immature
C), and the fact that they can operate either as standards makes it harder to fix the functionality of a
standard DSPs or coarse-grained FPGAs. device at design-time. Sometimes, sheer lack of in-
house understanding of a standard may render an
Value proposition ASIC implementation impossible. In addition, today’s
market realities make it unclear what functionality a
device should support. Which applications will catch
Lower cost than ASICs, and fully-programmable
up and which will soon become obsolete?
Many of today’s Systems-on-Chips (SoCs) deploy Reconfigurability substantially reduces the risk of
multiple ASIC cores to support different applications making an early choice.
or standards in the system. A single Silicon Hive
reconfigurable core can replace a set of ASICs at Time-to-Market
similar computational efficiency (MOPS/W), allowing
for a simpler solution that remains programmable This is another growing concern in markets with
after device fabrication. Since the same hardware is increasingly short windows of opportunity for product
time-shared to implement functionality previously introduction. Often, one cannot afford the long
mapped onto multiple ASICs, the solution is also design and fabrication cycles for ASICs, but would
smaller in silicon footprint. rather program a pre-designed, pre-fabricated,
reconfigurable part.
Superior performance and lower power
Non-Recurring Engineering (NRE)
Other SoC design teams prefer to map most, if not
all, functionality onto programmable solutions. NRE costs have been a lot talked about recently,
However, the markets they can address are limited mainly in relation to the increasing costs of mask-
by the lower computational efficiency (MOPS/W) sets. However, mask-sets are only a small part of a
achievable with traditional processors and DSPs. total chip design project cost. Still, the NRE cost of
For these, a Silicon Hive reconfigurable core can ever larger VLSI design teams and associated infra-
enable the SoC to support more and higher structure is a bigger issue in itself. This can be
performance applications, standards, and features. controlled with the use of a reconfigurable part, as
long as this part is straight-forward to program, and

1  Philips Electronics N.V. 2003


Technology Primer

the recurring silicon area penalty implied is reduced Our vision is illustrated in Figure 1. The Silicon
(in both cases, unlike FPGAs). Hive reconfigurable cores replace both the ASICs
and the DSP accelerators.
Logistics and inventory Since all accelerators are now reconfigurable,
The costs of managing a large variety of slightly their resources can be time-multiplexed. This allows
different products can translate into expensive for a more optimal utilisation of the available
inefficiencies. That can be solved when product hardware, reducing redundancies. Depending on the
variations are implemented in software. case, it can lead to a smaller silicon foot-print than
can be achieved with ASICs.
Deploying reconfigurable accelerators
We enable the replacement of all ASIC accelerators Our reconfigurable architectures
in SoCs with reconfigurable cores, rendering the The basic component of the architecture of Silicon
SoC fully programmable after fabrication. This way, Hive’s accelerators is the Processing and Storage
flexibility is maintained throughout the product life- Element (PSE). See Figure 2. A PSE is a VLIW-like
cycle. At the same time, our cores will preserve a data-path consisting of several interconnect
computational efficiency (MOPS/W) and a silicon networks (IN), one or more operation-issue slots (IS)
foot-print comparable to that achievable with ASIC with associated function units (FU), distributed
cores. register files (RF) and, optionally, local memory
storage (MEM). PSEs are designed according to a
template that ensures all PSEs are easy and clean
data-paths for a compiler to handle, guaranteeing
high-level of programmability by construction.
A matrix of one or more PSEs, together with a
VLIW-like controller (CTRL) and configuration
memory (CONFIG. MEM), make up a cell. A cell is a
fully-operational processor capable of computing
complete algorithms. PSEs within a cell can
communicate with each other via data
communication lines (CL). Typically, one application
function at a time is mapped onto the matrix of
PSEs. The more PSEs are present, the more the
function can be mapped in space, in a data-flow
manner.

Figure 1. SoCs with Silicon Hive cores.

Figure 2. Silicon Hive’s architecture design space overview.

2  Philips Electronics N.V. 2003


Technology Primer

An array of one or more cells, connected together across the streaming array, occupying different sub-
via a data-driven communication mechanism, forms sets of cells, so to implement complete sub-systems.
a streaming array. The communication across cells
takes place through blocking FIFOs accessed from
load/store (LD/ST) units within the cells. Multiple
functions can be concurrently mapped onto the
streaming array, each one occupying a non-
overlapping sub-set of the cells.
Many different trade-offs between the number of
cells, and the number of PSEs per cell, can be
made. In general, the closer the data-rate of a given
function is to the clock rate of the processor, the
more cells should be used. This is because the cycle
budget for what a single cell can do is, in this case,
very small, so the computation must be spread
across potentially smaller cells. For applications with
data-rates far lower than the processor’s clock rate,
the cycle budget is higher, and a small number of
cells with a relatively high number of PSEs may lead Figure 3. Diagram of Silicon Hive’s block accelerators.
to more optimal exploitation of instruction-level
parallelism, higher modularisation, and ease of
programming.
Silicon Hive’s internal design methodology and
tools allow for the very quick design of different PSE
architectures, cells with different numbers of PSEs
interconnected in different ways, and streaming
arrays with different numbers of cells interconnected
in a variety of patterns. This way, Silicon Hive can
timely create a virtually unlimited variety of
application-domain-specific cores, as market
evolution dictates. Each core is reconfigurable after
fabrication, and flexible enough to tackle all
application needs within its target domain.

Reconfigurable accelerator products


Silicon Hive makes and licenses synthesisable cores
that are to be integrated into third-party Systems-on-
Chips (SoCs). The cores are reconfigurable
accelerators that are designed to run alongside a
host processor (e.g. an ARM), within a standard Figure 4. Diagram of Silicon Hive’s stream
system-level infrastructure (buses, memories, I/O accelerators.
modules, etc.). Currently, we offer two types of
products: stream accelerators and block Programming flow
accelerators. Silicon Hive provides high-level programmability for
Block accelerators typically consist of a single all its cores. Our strategy is to abstract details of the
cell, containing from one to several PSEs in a matrix architecture and low-level scheduling and resource
organisation. See Figure 3. They target functions allocation away from programmers, so they can
with a data-rate substantially lower than the focus on optimising applications in C. We offer three
accelerator’s clock speed. They are made to process different tools to tackle all stages of programming
framed data in memory (e.g. MPEG blocks, or our cores.
OFDM symbols). Typically, they will process one
application function at a time, time-multiplexing their
The Partitioning Compiler
data-path resources across different functions.
Stream Accelerators consist of multiple, small The partitioning compiler is a powerful profiler and
cells with typically one PSE each. See Figure 4. code transformation tool that helps programmers
They target functions with extremely high and/or partition code between accelerator(s) and host
dynamic data-rates, approximating the accelerator’s processor(s), without the need for prior application
own clock speed (e.g. IF front-end filtering, or time- knowledge. It also helps partitioning a kernel across
domain equalisers). Stream accelerators are made different cells of a streaming array. The parameters
to process streaming data. Typically, several used by the tool for proper isolation of kernels
different functions are cascaded simultaneously include: the identification of functions and loops with
high cycle count, the minimisation of communication

3  Philips Electronics N.V. 2003


Technology Primer

and synchronisation overheads with a host This is like an ASIC data-path…


processor and system memory, and the identification A typical ASIC data-path will consist of a number
of functions with high amounts of intrinsic registers and function units geographically
parallelism. distributed in the architecture at the locations where
The tool reads in ANSI C, and produces ANSI C they are needed. This locality of reference allows for
at the end, so it can be seamlessly integrated into higher performance (short operand delays) and
virtually any existing programming flow. lower power dissipation (low capacitive load on
The partitioning compiler is an optional tool to aid interconnect lines). Just like an ASIC, a Silicon Hive
system-level partitioning, and is not required to core contains a number of function units and small,
program any of Silicon Hive's cores. distributed register files. The register files are
geographically positioned close to the function units
The Spatial Compiler that produce the operands they need to store,
The spatial compiler will read in a kernel program therefore exploiting locality of reference the same
written in a sub-set of ANSI C and use powerful, way ASICs do.
innovative constraint-analysis and scheduling
techniques to extract the intrinsic Instruction-Level
Parallelism (ILP). The constraint analysis module
basically turns the partial-connectivity of the
accelerator data-path into an advantage, as opposed
to a hindrance. It prunes infeasible schedules (due
to the lack of full connectivity) from the scheduling
space, therefore helping the scheduler converge to
the optimal solution. The scheduling techniques
utilised are deterministic, therefore considerably
more reliable than traditional scheduling heuristics,
like list scheduling. Given sufficient time (typically in
the order of a few minutes), the compiler will find the
optimal solution.
The compiler automatically schedules all
operations in time, in a temporal assignment task
that is classical in compiler technology. However, the
spatial compiler also allocates all resources of the
architecture (function units, registers, and Figure 5. Placing pre-compiled cell configurations
interconnect lines) in space, with an aim to maximise onto a streaming array (tool snapshot).
locality of reference. This is analogous to what
hardware synthesis tools do. Therefore, it is fair to Moving pipeline control to the compiler
say that Silicon Hive’s spatial compiler blurs the The pipeline control and forwarding logic of
borders between compilers and hardware synthesis traditional processors and DSPs make the work of
tools, while preserving a sequential ANSI C input programmers and compilers easier, by creating the
format. illusion that each instruction is executed in a single
The spatial compiler is the only tool needed to cycle. However, this comes at a high silicon and
program Silicon Hive's block accelerators. efficiency cost, due mainly to the large multiplexers
necessary. Silicon Hive cores have no pipeline
SHAPE control overhead. All pipeline management and
The Silicon Hive Array Programming Environment operand forwarding issues are moved to the
(SHAPE) maps pre-compiled cell configurations onto compiler, which explicitly schedules all pipeline
physical cells in a streaming array. See Figure 5. stages. This pushes the computational efficiency of
SHAPE automatically assigns programs to cells in our cores beyond that of traditional processors and
the 2D-grid. The process is analogous to placement DSPs.
& routing in FPGAs, and aims at fitting as many
tasks as possible within the physical limits of the No centralised resources
accelerator at hand, maximising Task-Level The use of centralised resources in traditional
Parallelism (TLP). processors and DSPs, like a single register file fully-
In addition to program placement, SHAPE is also connected to all function units in the architecture,
a debugging and simulation environment that again makes the work of the compiler a lot easier.
provides full visibility to the internal state of the cells However, this has highly detrimental effects to the
as a cycle-true simulation is carried out. cost and efficiency of the processor due to the
massive operand networks implied. These
Computational efficiency architectures cannot scale. Silicon Hive processors
How can Silicon Hive cores achieve a computational use distributed resources, with focus on local
efficiency (MOPS/W) comparable to that of ASICs? interconnect. This implies very little operand network

4  Philips Electronics N.V. 2003


Technology Primer

overheads. When a direct interconnect line is not require the least amount of attention from chip
available between two resources, Silicon Hive’s designers during SoC integration and validation.
spatial compiler spatially schedules the
communication with a minimal number of local hops. Cost-effective, customized solutions
Another distinguishing characteristic of Silicon Hive’s
Silicon efficiency solutions is their domain-specific nature. Many
How can Silicon Hive cores achieve a silicon reconfigurable or reprogrammable processors in the
footprint smaller than the equivalent ASIC solution? market today are catch-all solutions, made for
general-purpose use. As a consequence, the
Higher arithmetic density architectures carry considerable overhead in cost
Unlike traditional processors and DSPs, Silicon Hive and efficiency for any specific application domain.
cores have no hardware dedicated to pipeline Because of its internal design methodology, which
management and operand forwarding logic. Instead, allows for the generation of new cores and compilers
these tasks are statically performed by the spatial within days, Silicon Hive can tune its processors to
compiler. In addition, because all data-path the needs of particular application domains and/or
resources in a Silicon Hive core are distributed, the customers. This enhances the ease and speed with
complexity of the operand networks is much which a customer can integrate and use our solution,
reduced. This way, most silicon in a Silicon Hive and dramatically reduces the silicon overheads
core is dedicated to actually crunching data. This implied. Making cost-effective solutions for very cost-
substantially increases their silicon efficiency when sensitive consumer markets is an inherited part of
compared to traditional processors or DSPs. our culture and history within Philips.

Hardware re-use High-level programmability


The feature of our reconfigurable cores that allow Unlike FPGAs and most other reconfigurable
them to often surpass the silicon efficiency of ASICs processors available today, Silicon Hive’s cores are
is data-path time-sharing. The same functional units, built with high-level programmability in mind. In fact,
registers, and interconnect networks that are used to our internal design methodology enforces that each
compute a certain function at a certain point in time, of our cores be programmable-by-construction. The
are then re-used to compute an entirely different cores are supported by the revolutionary spatial
function moments later. This hardware re-use, unlike compiler, which automatically extracts the intrinsic
in an ASIC, is automatically accomplished by the application parallelism from a sequential ANSI C
spatial compiler. Silicon Hive has made and specification. There is no need for hardware
delivered reconfigurable cores that beat an expertise to program our cores. Silicon Hive itself
equivalent ASIC in silicon footprint, even when only employs application developers with a pure software
one single application is used as the basis for the background to build its application libraries.
comparison.
The best of FPGAs and DSPs
Key technology differentiators Data-intensive circuits mapped spatially across an
In this section, we elaborate upon the key FPGA have a flow-through computational model that
characteristics of Silicon Hive’s approach that enables very high throughput. However, the control
distinguish us from other players in the field of of these circuits has typically to be crafted as
reconfigurable computing. complex state machines in an HDL, and is difficult to
validate. On the other hand, DSPs are more
Robust building blocks sequential in nature, and more limited in throughput.
Advanced, new computing technologies and However, state machines can be easily implemented
paradigms carry the promise of tremendous value. on them as sequential software, often generated by
Higher performances can be achieved, enabling new a compiler.
and exciting applications, and opening new market Silicon Hive cores operate in two modes. The first
opportunities. However, this value is often offset by is a flow-through mode, wherein the configuration
the unreliability of immature technologies, yet to be remains fixed and data ripples through the
pipe-cleaned by the realities of commercial SoC architecture as it is clocked. In a second mode, the
design and manufacturing. same core operates as a normal VLIW DSP
Unlike many of its competitors, Silicon Hive’s processor, executing one instruction per cycle and
reconfigurable computing technology is based on running arbitrary control code like conditional
building blocks that have been steadily used and branches. In both modes, the code is automatically
improved in a commercial environment for several generated by the spatial compiler. The compiler also
years. The architecture template of Silicon Hive’s automatically makes the choice of mode for different
cores is fundamentally simple and robust, leveraging code segments. This way, our cores can switch
on pragmatic, well-understood concepts derived seamlessly from a mode wherein they operate in a
from fields like traditional VLIW processors and way analogous to an FPGA, to another wherein they
compilers. Our synthesisable IP cores will often behave just like a classical VLIW DSP. Sounds

5  Philips Electronics N.V. 2003


Technology Primer

complex? You would be surprised how simple and


robust it actually is.

Conclusions
A key enabler to truly programmable and cost-
effective SoCs are reconfigurable accelerators that
can replace currently-used ASIC co-processors with
a comparable computational efficiency (MOPS/W)
and lower silicon overhead. Silicon Hive currently
provides two types of reconfigurable accelerators
that address these market needs. The key
differentiators of our approach are: the robustness
and reliability of our technology, our ability to
customise our solutions to specific customer needs,
the high-level programmability of our cores (ANSI
C), and the fact that they can operate either as
standard DSPs or coarse-grained FPGAs.

6  Philips Electronics N.V. 2003

Das könnte Ihnen auch gefallen