Sie sind auf Seite 1von 8

64bit SMP NetBSD OS Porting

for TILE-Gx VLIW Many-Core Proe!!or


Toru Nishimura
Sanctum Networks, Pvt. Ltd.
nisimura@sanctumnetworks.com
"b!trat
Many-core processor is an attractive platform to run a general purpose OS like NetBSD. We, a team in Sanctum
Networks, ported NetBSD .! to "#it $%&W many-core processor named '&%(-)*. &n t+is paper we introduce
distinctive features of '&%(-)* in some dept+ as t+e product remains less known in our engineering community.
NetBSD porting was made well and smoot+ t+an anticipated. We reali,ed t+at NetBSD .! is a mature SM- OS w+ic+
provides streamlined kernel structure and offers ric+ set of kernel .-& specifically designed for large degree SM-,
#eyond /0 processor, configuration. .s a part of conclusion we mention a#out some of '&%(-)* NetBSD application
area w+ic+ we1re willing to #uild.
#$ Pro%et o&t'oo( an) ti*e 'ine
'+is porting pro2ect was initiated in mid 3une 0!40 at
'okyo. '+e goal was to ac+ieve full SM- capa#ilities on
t+e / core '&%(-)* processor and understand t+e
suita#ility of t+e '&%( arc+itecture for various application
development.
our porting target is 'ilera (mpower 45 computer
wit+ / core '&%(-)* processor.
development environment wit+ cross compiler was
made at early 3uly 0!40.
geograp+ically separated two )&' repositories in
pus+-pull sync+roni,ing.
3apan side +osts are in Sakura $-S and .ma,on
(60.
kernel image linking completed in late 3uly 0!40. &t
contained a lot of sta# code guaranteed not to work.
single core kernel was successful in running ramdisk
sysinst program at 0!40-4!-/!.
since t+en, kernel sta#ility, )#( driver and SM- +as
#een persuaded.
as of Marc+ 0!4/ porting pro2ect is going active. .
num#er of functionalities, in particular ones for
'&%(-)* uni7ue features, are under development.
+$ TILE-Gx feat&re!
'&%(-)* design was invented #y an M&' professor, Dr.
.nant .garwal. &t1s t+e latest incarnation of a long time
researc+ since 489!s. '&%(-)* is t+e t+ird generation
product. '+ere are two successors, '&%(" and '&%(-ro
w+ic+ are #ased on t+e same '&%( arc+itecture approac+.
:ollowing to two /0#it designs, t+e t+ird generation
model is made "#it processor.
'&%( arc+itecture emp+asi,es t+e scala#ility and low
power wit+ a uni7ue on-c+ip inter-connection tec+nology.
'&%(-)* product family +as 8 ; 4!! core configuration.
(ac+ of core is laid out in tiles-on-wall fas+ion. 6ore
runs at 4.0)<, clock. 5nder work load wit+ modest
network activity / core '&%(-)* is said to ac+ieve
0=-/!W power consumption in total.
'&%(-)* is a "#it processor. &t +as "#it integer and
"#it floating point operation, "#it register and "#it
address space #y "#it pointer. '+ese simple
c+aracteristics are 7uite familiar to any of plain old 5N&>
programmers since t+e time w+en M&-S ?"!!! was
introduced at early 488!s.
+$#$ In!tr&tion !et feat&re
'&%( instruction set arc+itecture is / way $%&W. 'wo or
t+ree instructions are in a single "#it word w+ic+ is in
turn called @instruction #undle.A '&%( can run up to
t+ree instructions simultaneously.
'&%(-)* +as si*ty four "#it register file. 'a#le 0-4
s+ows register definition. =9 of t+em, including two
+ardwired ,ero registers, are general purpose. &t contains
t+read pointer tp register to facilitate t+read
programming. &t s+ows .B& to define t+e common
register usage program oug+t to follow.
regi!ter *ne*oni ty,e &!age
! - 8 r- -r. saved #y caller argumentsBreturn values
4! - 08 r#- - r+. saved #y caller @temporaryA
/! - =4 r/- - r0# saved #y callee @safe across callA
=0 r0+ saved #y callee frame pointer
=/ t, dedicated t+read pointer
=" !, dedicated stack pointer
== 'r saved #y callee return address
= !n always ,ero
=C
=9
=8
!
4
0
i)n-
i)n#
&)n-
&)n#
&)n+
&)n/
onc+ip network
communication
&BO Dynamic Network !
&BO Dynamic Network 4
5ser Dynamic Network !
5ser Dynamic Network 4
5ser Dynamic Network 0
5ser Dynamic Network /
/ 1ero always ,ero
Tab'e +-# regi!ter a!!ign*ent for TILE-Gx "BI
Note t+at '&%( offers a rat+er large num#er of, 4!,
arguments in register for function call. '+e return values
are placed in t+e same set of register for arguments. '+is
means r! register content will #e destroyed 7uite often
w+en a function e*its. Better to remind it as one of '&%(
de#ugging tips.
Si* registers are reserved for inter-core communication
via on-c+ip network named 5DN and &DN. '+ese
registers work as :&:O ports muc+ like :S% @fast-
simple*-linkA in MicroBla,e processor. ?eading from
empty register or writing on occupied register may cause
t+e processor to stall until condition meets.
?egisters are s+ared resource among $%&W concurrent
e*ecution flow. (ac+ su#routine +as a single entry point
and a single e*it. No parallel su#routines are in action.
?egister conflicts in parallel e*ecution is considered
program error. &t #rings undefined B une*pected values in
register file. ?egister conflict avoidance is t+e
programmer1s responsi#ility.
'&%( instruction +as two different formatsD >-format for
0 instruction in a #undle and E-format for / instruction in
a #undle. Basic arit+metics is done in / operand form.
.s register file is " in si,e, /-operand instruction
re7uires 49#it F * / for register designation plus some
more for opcode. /-in-4 "#it #undle is considered rat+er
tig+t encoding, +owever, contri#utes instruction density
muc+. 0 register arit+metics +ave signed 9#it immediate
or signed 4#it immediate value. '+e latter instruction
#elongs to 0-in-4 #undle >-format as it needs to +ave
longer encoding. 5noccupied instruction slot in a #undle
is filled wit+ NO- w+ic+ works as a flowing #u##le in
e*ecution pipeline.
'+e most nota#le difference from conventional 6&S6 B
?&S6 instruction set is t+e lack of @register indirect wit+
offsetA addressing mode. '&%( +as no @%W ?/,
!*4C9G?0HA style memory access. '+is means t+at local
varia#le on stack andBor G6 languageH struct mem#er must
#e accessed t+roug+ an e*plicit pointer in a temporary
register to refer t+e target address. &t1s a stark contrast to
t+e case w+ere @register indirect wit+ offset@ addressing
mode can ac+ieve load B store operation wit+ @#ase
register I immediate value offsetA very +andy for local
varia#le and struct mem#er. '+is is anot+er tip for '&%(
programming to remem#er.
+$+$ Ot2er &!ef&' in!tr&tion
'&%( instruction set +as an ort+ogonal set of atomic mat+
B lock instructions.
fetc+add .tomic addition
fetc+and .tomic logical .ND
fetc+or .tomic logical O?
cmpe*c+ compare-and-swap
Tab'e +-+ ato*i in!tr&tion!
.ll of t+em +ave two variations for 9#yte operation and
"#yte operation. '+ese instructions are comforta#ly
useful to implement NetBSD atomicJopsG/H routines.
'+ey are well defined set of M--safe atomic operations
and widely used in SM- NetBSD kernel construct andBor
parallel programming li#rary like pt+readG/H.
6mpe*c+ instruction is for 6.S @compare-and-swapA or
'.S @test-and-setA operation. &t works like as &ntel
cmp*c+g instruction. Note t+at it1s not #ased on %%BS6
sync+roni,e model found in M&-S, .lp+a, -ower-6 and
.?M". '&%( cmpe*c+ works wit+ accompanying
@6mp$alueA S-? register. %ocking primitives can #e
implemented wit+ it in usual manner.
'&%(-)* +as ric+ set of DS- and S&MD instruction. &t
also +as some fancy instructions. . set of #it field
operations, 6%K Gcount-leading-,erosH and 6?6/0
polynomial mat+ for +as+ing B c+ecksum and so on.
'&%(-)* floating point mat+ does not +ave dedicated
register set. :- instruction uses )- registers for source B
destination operands.
+$/$ "))re!! !,ae
'&%(-)* +as "0#it effective address #it out of "#it $.
virtual address pointer. $irtual address is separated in
upper 0'B space and lower 0'B space. '+ere is a large
void in #etween. $.L/M"4N is eit+er of all-! or all-4, t+at
is, sign e*tended from $.O"4P value.
'&%(-)* +as no M&-S QS()! B QS()4 B >Q-<ES like
address segment e*ists to distinguis+ cac+e nature.
Software is in c+arge of address in+a#itation and cac+e
nature control wit+ +elp of smart '%B usage.
+$4$ Layere) ,rotetion
'&%( arc+itecture provides four level protection sc+eme.
%evel is ranging from ! least protected to / most
protected. &t allows to #uild layered protection domains
w+ic+ run protected programs in eac+ level.
-%! 5ser applications
-%4 )uest OS
-%0 'ilera <ypervisor
-%/ @ virtual mac+ine monitorA Gw+o knows w+at
really it is.H
Tab'e +-/ TILE ,rotetion 'e3e'
-rogram runs in low order protection level is in+i#ited to
touc+ resources in +ig+t order level. (ac+ core runs one
of four protection level. 6urrent protection level of
individual core is called 6-%. 6ontrol transfer is done #y
e*ecuting dedicated instructionD swint!, swint4, swint0,
swint/. NetBSD uses swint4 instruction for system call to
#e issued #y applications programs.
'+ere is a large set of S-? registers. mfspr B mtspr
instructions operate t+em. S-? num#er is encoded in
4"#it. Most of S-? registers +ave t+eir own M%
@minimal protection levelA value to ar#itrate w+ic+ level
G! ; /H of program can access to. M-% is t+e #asis of
layered protection for '&%( runtime environment.
+$0$ TLB an) TSB
'%B plays a central role in '&%( arc+itecture. &n '&%(
arc+itecture '%B dost not 2ust make $M virtual memory
possi#le #ut also reali,es c+ip-wide glo#al cac+e
co+erency. '&%( '%B entry is designed to #e multi-core
aware. '%B entry optionally +olds t+e location of core in
c+ip Gin >-E coordinateH to track and identify +ow '%B
entry to tell $.--. mapping is tied wit+ a specific core.
%ike as most of modern processors, '&%( '%B is
software managed. '&%(-)* +as independent '%B stores
for instruction and dataD 4 entry i'%B and /0 entry
d'%B. Note t+at '%B is a s+ared resource among
programs w+ic+ run in different protection domain. '+e
"0#it $. space is also s+ared among t+em. 'ilera
<ypervisor reserves some of '%B entries for its own.
?emaining is free for guest OS and application programs
to use.
'+e '%B management strategy is modeled after S-.?6
processor. '&%( uses @'SBA and @''(A nomenclature
for t+e very same purposes.
'SB @translation store #ufferA is a software e*tension of
'%B. 'SB +olds a super set of '%B in main memory. &t
works as a staging area to in2ect '%B entry into
processor1s i'%B or d'%B. <$ is in c+arge for '%B miss
+andling. &t always consults wit+ 'SB content in action.
)uest OS can only operate 'SB store. .s '%B is one of
+ig+ly sensitive s+ared resource among various
programs, guest OS can not make access '%B. 'SB is
normally reserved inside protected guest OS memory
area. '&%( 'SB is a unified one to +old i'%B entries and
d'%B entries. '+e approac+ is different from S-.?6"
w+ic+ +as i'SB and d'SB in parallel. ''( @translation
ta#le entryA is t+e software defined intermediate format
of '%B entry.
On '%B miss <$ takes control to run '%B refill
operation. &t searc+es first t+e offending '%B entry in
'SB store. &f t+e target entry is found, <$ in2ects it to
eit+er of i'%B or d'%B and complete refill operation. &f
<$ finds 'SB +as no suc+ entry, t+en it posts a re7uest
for guest OS to come in and solve t+is @'SB missA
condition. )uest OS, in turn, responds to t+e '%B miss
e*ception identifying it as genuine access error or
recovera#le fault condition. '+e rest of operation is
identical to popular software managed '%B processors.
&f guest OS finds t+e e*ception is true '%B refill case, it
adds t+e offending '%B entry into 'SB store and returns.
<$ will take care t+e refilling. &f guest OS finds t+e
e*ception access error or protection violation, it performs
its way to +andle t+e cases.
'&%( +as .S&D @address space identifier.A .S&D is to
improve '%B +it ratio, t+at is, #etter $.-P-. translation
efficiency. &t1s as normal as and identical to ot+er .S&D
processors. '&%( .S&D is 9#it, offering 0= individual
address spaces to #e distinguis+ed for '%B lookup
operation. Some literatures incorrectly mention t+at .S&D
is an e*tension of $., like saying it reali,es concatenated
9 I "0 address #it. .S&D is to virtuali,e '%B, or to make
imaginary multiple '%B stores w+ic+ are num#ered and
iterated #y .S&D. .S&D demands a smarter $M to
operate. '+is topic will #e discussed in a later section.
.s ot+er processors do, '&%( processor +andles many
kinds of interrupt B e*ception. Device async+ronously
posts variety of re7uests and different types of e*ception
+appens w+ile a processor is in action. '&%( uses &-&
@inter-processor interruptA not only for pure inter-
processor messaging, #ut also for &BO device interrupt
notification. .s '&%( integrated on-c+ip devices are
located apart of core and notification comes across on-
c+ip network, it1d #e reasona#le to use &-& laminating
many into a single form.
+$6$ iMe!2 on-2i, inter-onnet
-rocessing core is laid out in a tiles-covering-wall fas+ion
wit+ mes+ s+ape inter-connect to couple eac+ ot+er.
&nter-connect +as >-E B street-avenue like layout. .t
eac+ crossing is an independent switc+ processor to tie a
computing node wit+ t+e entire switc+ network. 'ilera
names it iMes+ tec+nology.
Switc+ processor is 4#it ?&S6 to run low latency and
+ig+ #andwidt+ switc+ing function t+roug+ limited
num#er of signal connections. Besides of " pat+s for N,
(, S and W directions to neig+#oring switc+es, one
switc+ data-pat+ is coupled wit+ processor1s %0 cac+e.
Data stream travels t+roug+ %0 first, t+en eit+er of %4
i6ac+e or d6ac+e reac+ing to a processing core.
'+e inter-connect offers 5DN @5ser Dynamic NetworkA
and &DN @&BO Dynamic NetworkA for general purpose on-
c+ip streaming and messaging communication. 'otal
register of '&%(-)* processor are assigned to
accommodate t+e ease of programming.
&t s+ould #e reminded t+at iMes+ does not implement nor
enforce any kind of @smart network topology.A '+ere
was a num#er of massively-parallel multi-processor super
computers #uilt from time to time. .ll of t+em more-or-
less persuaded a smarter topology for processor inter-
connect to maintain low-latency and +ig+ #andwidt+.
Nota#le e*amples are 6ray '/D and Si6orte* S6=9/0.
'/D +ad a /-dimensional @torusA grap+ topology to make
.lp+a processors tig+tly coupled eac+ ot+er. S6=/90 +ad
@Qaut, grap+A topology to inter-connect -core M&-S"
processor wit+ t+e +elp wit+ #uilt-in DM. engine to talk
wit+ %0 cac+es and &BO devices.
&n '&%( arc+itecture on-c+ip inter-connect is software
defined. Switc+ processor can program t+e network
topology to adapt varying demands. &n t+is way, '&%(
arc+itecture can maintain t+e fle*i#ility and t+e
scala#ility in parallel. &t1s unlikely @topology optimi,edA
super computers can ac+ieve #ot+ natures in #alance.
iMes+ .-& is provided to make finer control over on-
c+ip network. 6ores can #e partitioned into groups w+ic+
work parallel as if t+ey are islands. '+is feature is
implemented #y switc+ network programma#ility
Wit+ +elp #y @topology-awareA and @cac+e attri#ute
awareA '%B entries, iMes+ acts a central role for cac+e
co+erency.
+$4$ Ca2e )e!ign an) feat&re
(ac+ core +as /0QB i6ac+e, /0QB d6ac+e and 0=QB iB
d com#ined %0 cac+e. (it+er of %4 cac+e +as $&-'
@$irtual &nde* and -+ysical 'agA nature.
%4 i6ac+e /0QB, 0 way associative, "B line si,e.
%4 d6ac+e /0QB, 0 way associative, "B line si,e,
write-t+roug+.
%0 cac+e 0=QB, iBd com#ined. 9 way associative,
"B line si,e, write-#ack.
Tab'e +-4 a2e 2arateri!ti!
Some '&%( processor literatures mention to @co+erent %/
cac+e.A &t1s some+ow imprecise. '+e %/ functionality is
ac+ieved #y a group of %0 cac+e. '+e sc+eme is called
@cac+e +oming.A %et us start t+e e*planation.
'&%( %4 cac+e is inclusive to %0. %4 +olds su#set of %0
contents at any moment. %0 miss +appens w+en
offending cac+e line data is not found in %0. 6ore asks
a#out t+e missing cac+e line data to @neig+#oring coresA
w+ic+ are grouped #y <$ for a single OS instance. &f
found t+ere, cac+e line data is transferred to re7uesters %0
cac+e.
:oreign %0 cac+es work as an e*tension of local %0
cac+e. &n ot+er words, a group of cores s+are t+eir %0
cac+e contents eac+ ot+er. '+is sc+eme is named @cac+e
+omingA and 'ilera calls t+e group of %0 cac+e as
@co+erent %/A cac+e. / core )* processor +as @8MB
co+erent %/A F /* 0=QB %0.
6ac+e lines can #e populated sparsely among different %0
to improve t+e cac+e efficiency. %/ cac+e +oming is one
page attri#utes. &t1s controlla#le #y per-page #asis.
/$ TILE-Gx on-2i, )e3ie!
integrated multiple DDR memory controller
0 controllers in / B4 core models, 4 in 8 core model.
'&%(-ro, t+e successor of '&%(-)*, " core model +ad
four DD?0 memory controller on c+ip. Wit+ dual
controller configuration, memory can #e driven in
interleaved fas+ion.
mPP! packet classi"ier
&t1s a programma#le intelligent packet engine. &t offers
@frame parseA function to run @sieve-to-forwardA
classification on incoming (t+ernet frame stream at line
speed. m-&-( is tig+tly integrated wit+ )#( B 4!)
(t+ernet network interface.
"* 4!) ports are availa#le in / core model.
(ac+ port can #e reprogrammed to +ost "* )#(
network interface.
)#(-only ports are also availa#le in 4 B 8 core
model.
m-&-( +as local #uffer memory to +andle incoming and
outgoing (t+ernet frames. m-&-( can perform load
#alancing to distri#ute ingress frames to cores.
6ore #inds m-&-( device register set to a particular
virtual address wit+ a designated d'%B entry for control.
m-&-( in turn +olds an &BO '%B entry to access data
w+ic+ resides in target G;accelerating application or guest
OSH address space so t+at it can understand $.-P-.
translation for frame data and accompanying descriptors.
m-&-( +as its own /0#it instruction set. . special )66
toolc+ain is provided to program it.
two or t+ree operand instruction.
/0* /0#it register fileD 00 of t+em are general
purpose.
-rivate S-? registers wit+ mfspr and mtspr to use.
#i$% crypto and compression engine
&t1s a standalone computing processor populated inside
'&%(-)*. Multiple Mi6. processors are on a single )*.
Mi6. can copy data w+ile encrypting and compressing
operation in action. &t1s a streaming operation.
6ore #inds Mi6. device register set to a particular
virtual address wit+ a designated d'%B entry for control.
Mi6. in turn +olds an &BO '%B entry to access data
w+ic+ resides in target address space so t+at it can
understand $.-P-. translation for crypto B compression
data.
$onventional &' devices
'+ere are some conventional &BO devices like -6&e,
5SB0.! and &06BS-& in our porting target computer.
-6&e controller works in eit+er root-comple* G+ostH or
end-point GdeviceH mode. 5SB0.! is used for multiple
purpose. &t works as virtual console w+ile in
development and de#ug. &t can also in2ect a #inary image
to )* processor to run. '+e #inary image consists of
#oot programs, <$ image and guest OS in predefined
format.
4$ Ti'era 5y,er3i!or
<$ utili,es '&%( protection level feature. )uest OS +as
+eavily limited access to S-? registers. Only +andle
num#er of S-? registers allowed to used #y )uest OS.
<$ is populated at t+e 4MB area in t+e upper 0'B space
wit+ a +ardwired '%B entries.
<$ +as great control over t+e entire '&%( processor
comple*. <$ makes cores into groups w+ic+ are manged
in M * N rectangle s+ape to form OS instance.
<$ assigns &BO devices to particular instances wit+ &BO
'%B entries. 'ilera calls t+e sc+eme MM&O @memory
mapped &OA sc+eme w+ile S-.?6 names it @&OMM5.A
<$ allows several guest OS1es to run simultaneously.
Device and core grouping is defined a <$ configuration
at t+e mac+ine startup. Because of it, <$ is yet to #e
improved as fle*i#le as w+at >en can do in t+ese days.
'wo serial ports are provided in )* processor. <$ can
dynamically #ind one of serial ports to running OS
instance as it console.
(#! )*are metal environment+
BM( is an .-& to #uild @lig+t weig+t monitorA w+ic+
runs designated coreGsH run special purpose @driverA for
data-plane processing. &n general any BM( program
needs accompanying fully-featured OS, like %inu*, as a
control plane to manage t+e w+ole software comple*.
iMes+ messaging facility .-& is used #y control OS to
communicate wit+ BM( programs w+ic+ run on separate
coreGsH.
Several code e*amples are provided #y 'ileraD
one '&%( core runs @encryption serverA on BM(
w+ile %inu* as @clientA w+ic+ receives t+e results
from BM(. &n t+is e*ample data transfer is done in a
s+are page wit+ +elp of 5DN messaging #etween
two.
. num#er of %inu* process get private cores to run
and communicate eac+ ot+er wit+ 5DN messaging
and s+ared pages.
0$ NetBSD6ti'e
'+is port is #ased on NetBSD .! S'.B%( code set. &t1s
a "#it SM- kernel and "#it userland. '+e kernel runs as
a guest OS con2unction wit+ 'ilera <ypervisor.
NetBSDBtile uses )66 "../ ported #y 'ilera. We +ave
#een using it as it is. .s )66 ".= is still in use in
NetBSD .! code set, we integrated )66 "../ to start.
"#it pmap was implemented from scratc+. &t1s modeled
after .lp+a pmap. .lt+oug+ '&%(-)* offers 4/ different
page si,es, <$ employs muc+ +um#le page si,e
selection. We c+ose "QB page for NetBSDBtile as it is
parallel to 'ilera %inu* $M implementation. '+e virtual
address partitioning is @4! I 9 I 9 I 4.A
NetBSDBtile utili,es SM- ready NetBSD kernel internal
as large as possi#le. NetBSD= introduced muc+
sop+isticated kernel constructs and .-& sets w+ic+ are
effective and useful for scala#le SM- OS. Since t+en
gradual streamlining +as #een done for fore-running SM-
NetBSD ports. Now NetBSD is a mature platform to
make a 2ump start for fres+ SM- porting.
'+e following is t+e typical set of useful SM- .-&D
atomicJopsG/H
kcpusetG8H
*callG8H
'+e first group must #e implemented in early kernel
porting stage. &n most cases t+ey +ave to #e written wit+
assem#ler code to #e #est suited for particular processor
nature. '+e latter two are pure software construct written
in plain 6 code.
-arallel programming model is NetBSD pt+read.
NetBSD pt+read is well organi,ed to adapt various
processors wit+ minimum effort. We did not make
particular modification for '&%(-)* support. &t works
2ust like as any ot+er pt+read implementations like one in
'ilera %inu*.
$ery limited num#er of assem#ler files were written so
far. One one for kernelD it1s @locore.SA '+e file contains
" well define ma2or routinesD
6-5 startup for primary core and secondary cores.
(*ception entry B dispatc+ B return
6-5 conte*t switc+
fast software interrupt dispatc+ B return
Ot+er assem#ler files are for li#raries and a few
application program like rtldG4H. '+e following is t+e list
of ma2or '&%(-)* dependency in concern.
srcBcommonBli#Bli#cBarc+B
srcBli#Bli#cBarc+B
srcBli#e*ecBrtldBarc+B
0$#$ 7ey )e!ign )ei!ion!
&n t+is section we descri#e concisely a#out some design
decisions to make a port reali,ed.
struct trapframe, struct switc+frame and struct pc#.
5-.6( to +old kernel stack and struct pc#.
pmapG8H to interface processor wit+ NetBSD $M.
(*ception and interrupt +andling to comply target
processor design intent.
&-& @inter-processor interruptA w+ic+ is essential to
make SM- possi#le.
struct trapframe is a snaps+ot image of runtime conte*t.
One trapframe is always created at t+e +ig+ end address
of 5S-.6(. .ctual kernel stack starts 2ust #elow of it to
grow downward. '+e reserved trapframe area is for user
process conte*t. W+enever user process gets interrupted
#y e*ception or device notification, t+e trapframe is to
record t+e user conte*t to resume later. '+is area is also
used for system call. W+ile in kernel mode, kernel gets
interrupted #y t+e same reasons as user mode process
does. .t t+e occasion, trapframe is created and pus+ed on
kernel stack.
'&%( arc+itecture +as "* register file. 9 out of t+em are
not a part of process conte*t and to #e e*cluded. We
c+ose "* "#it F =40B si,e anyway for struct trapframe.
&n vacant fields we place some e*tra conte*ts for process
to retain. '+ey are e*ception return address, status
register value at t+e time w+en e*ception +appened,
offending e*ception type and a value of a certain S-?,
@6mp$alueA indeed, for cmpe*c+ instruction.
struct switc+frame is for 6-5 conte*t switc+. NetBSD
defines two conte*t switc+ routines. cpuJswitc+toG8H and
lwpJreturnG8H are t+e routine to perform conte*t switc+.
'&%( arc+itecture +as a large set of caller-saved register.
Our switc+frame is 0=* 9B F 0!!B in si,e.
struct pc# is one of longest surviver among 5N&> kernel
primitives. &t got smaller t+an used to #e since t+e way
+ow to run conte*t switc+ made smarter. Our struct pc#
is as small as 2ust to +old struct switc+frame and a #it
e*tra.
5S-.6( si,e is "QB as aligned wit+ NetBSDBtile page
si,e.
0$+$ "SID *anage*ent
.S&D management is modeled after t+e way used for
NetBSDBalp+a and NetBSDBmips. &n t+is section we
e*plain it in larger degree.
Qernel +as a varia#le for @.S&D generation num#erA to
make sure a uni7ue .S&D assigned for running process in
processor. &t1s a central idea. Our .S&D management
algorit+m works in t+is way.
pmapJactivateG8H, one of NetBSD kernel .-&,
switc+es processor1s current .S&D value w+enever a
new process is ready to take control.
Switc+ing current .S&D is a lig+t weig+t operation
for OS as it eliminates t+e necessity of '%B flus+ at
every conte*t switc+. .S&D-less processors need to
perform t+e w+ole scale '%B invalidation to discards
all entries at every conte*t switc+. .s '%B works as
a cac+e for $M address translation, '%B flus+ +earts
severely '%B +it ratio spoiling $M performance.
.S&D-aware processors 2ust need to switc+ current
.S&D value. 6+anging processor current .S&D can
#e considered to switc+ imaginary '%B store w+ic+
e*ists for eac+ .S&D value.
(very new #orn process +as no .S&D assigned.
pmapJactivateGH c+ooses new one w+ic+ is never
allocated so far and assign it wit+ t+e process.
pmapJactivateGH also records t+e current .S&D
generation num#er in t+e process1s pmap store.
.S&D is a small num#er to count only up to 0==. &f
pmapJactivateGH finds t+e 9#it gets e*+austed, t+en it
#umps .S&D generation num#er in a kernel varia#le
#y 4 and c+ooses a new .S&D wrapped to t+e least
availa#le num#er Gnormally 4 as .S&D ! is reserved
for NetBSD kernel pmapH. On t+is occasion, kernel
makes full scale '%B invalidation to discard all '%B
entries.
W+enever pmapJactivateGH is a#out to switc+ current
.S&D, it c+ecks .S&D generation num#er in kernel
varia#le matc+es t+e process1s generation num#er
recorded at .S&D creation. &f t+ey differ, it means
t+e process1s .S&D is no longer valid.
pmapJactivateGH selects and assigns a fres+ .S&D for
t+e process to run recording current .S&D generation
num#er too.
)iven any moment every running process +as its own
uni7ue .S&D. '+e generation num#er sc+eme reduces
t+e necessity of full scale '%B invalidation in great
degree. '%B flus+ only +appens w+en .S&D range gets
run out and .S&D generation num#er is to #e #umped.
0$/$ TLB !2oot)o8n
'%B s+ootdown is t+e essential operation in any SM-
kernel. %ike as processor cac+e, '%B is a local resource
to processor core. '+e way to invalidate local cac+e or
local '%B is provided #y a certain mec+anism. &n
general invalidating remote '%B is as +ard to arc+ive as
invalidating remote cac+e.
&n SM- system, '%B invalidate operation must #e
propagated to multiple cores w+ic+ +ave #een running a
particular process. -rocess1s pmap must maintain a
@processor setA to track w+ic+ cores +ave run it. <ere
goes t+e e*planation of remote '%B s+ootdown #y .S&D
#umpD
W+en pmapG8H detects t+e necessity to invalidate one or
more '%B entry of particular process, kernel needs to run
invalidate operation #ot+ forD
t+e @localA core w+ic+ +appens to run t+e kernel on
#e+alf of pmapGH at t+e very moment, and
all of @remoteA cores w+ic+ t+e process1s pmapGH are
aware of.
'+e latter operation is named @'%B s+ootdown.A &t1s
implemented wit+ &-&. &t triggers a remote core action #y
inter-core message. '%B s+ootdown logic can #e #uilt in
wit+ +elp of *callG8H @cross callA kernel .-&.
. smart .S&D management can ac+ieve remote '%B
invalidation wit+ a small cost.
mark .S&D in offending process1s pmapGH store
@unassigned.A
#roadcast a *callG8H message to remote cores
triggering &-&.
W+en one of cores is a#out to run t+e process in t+e
ne*t sc+eduling, pmapJactivateGH will c+oose and
assign a fres+ .S&D t+e offending process. '+e stale
'%B entry wit+ a#andoned .S&D gets invalidated at
once.
0$4$ 9!ef&' SMP fai'itie! in NetBSD6
SM- NetBSD kernel provides t+e way to manage 6-5 in
finer gain. '+ere are less known set of useful commands.
%et us mention a#out t+em in #rief.
cpuctlG9H ... try @BusrBs#inBcpuctl listA on your modern
&ntel computers. &t s+ows t+e list of 6-5 state w+ic+ tells
online B offline.
prsetG9H ... try @BusrBs#inBprset -pA on your modern &ntel
computers. &t can create ar#itrary num#er of @processor
setA w+ic+ is #ound wit+ any process. 6-5 affinity is
made possi#le #y processor set #inding. Would #e
possi#le to #ind a processor set wit+ a kt+read Gkernel
t+readH w+ic+ runs specific kernel su#system like )#(
andBor disk drivers.
sc+edctlG9H ... try @BusrBs#inBsc+edctl -p 4A on your
modern &ntel computers.
&t assigns one of predefine sc+eduling policy to a
process. &t replaces niceG4H and reniceG9H priority
control commands.
'+ree difference sc+eduling policies provided #y
NetBSD so far.
'ime-s+aring w+ic+ follows t+e tradition 5N&>
semantic used for long time.
:irst-in, :irst-out
?ound-ro#in
0$0$ :&t&re )e3e'o,*ent
'+is pro2ect is active. <ere we try to make a summary
a#out missing functionalities and future development in
some ar#itrary order.
Soon to use '&%(-)* native :- instructions. 6urrently
t+e entire NetBSD including userland is made wit+ @R
DSO:':%O.'A compile option.
Drivers for some conventional -6&e devices like S.'.
andBor 4!!M (t+ernet N&6. 6urrently w+ole system
code image is in2ected wit+ 5SB de#ugging facility to
run N:S diskless configuration.
iMes+ communication .-& for NetBSD. &t remain under
researc+. :or now t+ere is no provision to utili,e iMes+
programming.
Mi6. integration wit+ a proper .-&. NetBSD kernel +as
pcuG8H @per-6-5-unitA framework. &t1s for t+e
encapsulation of 6-51s +ardware conte*t to save B
restore. &t +andsomely covers t+e cases #eyond t+e
general purpose register. '+e typical usage of pcuGH is to
manipulate :-5 register set. We1re considering w+et+er
pcuGH can integrate multiple Mi6. units to NetBSD
kernel in sane manner.
NetBSDB*en allows dynamic attac+ B detac+ maneuvre
w+ile kernel is up-running. &t allows core to attac+ B
detac+ dynamically and allows #lock device attac+ B
detac+ dynamically. We assume it1d #e some difficult to
implement similar functionality in '&%(, +owever, it1d
wort+ persuading t+e way to make t+em possi#le.
We1re aware of 'ilera <$ +as no provision to startup S
tear down @targetedA core w+ile up-running. <$ source
code is disclosed as a part of 'ilera MD( development
package. &t1s said t+at <$ can #e e*tended for customer1s
own needs.
%%$M transition from )66" is recogni,ed mandatory as
it would e*ploit t+e potential of '&%( $%&W nature.
6$ NetBSD TILE-Gx a,,'iation!
We focus on compute-intensity markets. We1re
considering to engaged in SDN, $%DB searc+ engine and
desktop <-6.
SDN )So"tware De"ined Network+
&t1s t+e t+ird wave of virtuali,ation tec+nologyD server
virtuali,ation, storage virtuali,ation and t+en network
virtuali,ation. &ndustry trends predict t+at routers and
firewall will vanis+ soon w+ile t+ey are morp+ing into
#ig smart switc+es.
Bangalore team is now e*ploiting super fast frame
forwarding algorit+ms. '+ey are generali,ed for @searc+-
and-lookupA computational comple*ity reduction
pro#lem. -ro#lem statements are now #eing defined.
'+e implementation of algorit+ms must #e ro#ust enoug+
to +andle incoming frame stream as fast as arriving in
wire-speed rate. '+ey must also #e ro#ust enoug+
com#ination e*plosions of matc+ing rules.
,LD( search engine
&n t+ese days $ery %arge scale DB are directly connected
wit+ &nternet. &t1s working in real time manner. '+e
typical case is SNS like :aceBook. @mem-cac+ingA is
now a common tactic to implement super fast searc+
engine. We recogni,e many-core processor and )-)-5
are now gat+ering industry attention as t+ey would #e
good ve+icle in engineering sense for $%DB searc+
engine.
Desktop -P$ )-igh Per"ormance $omputing+
&t1s a kind of +uman #eing1s forever desire to own super
computer at +and. '&%(-)* can #e a +andy #asis of
many-core "#it general purpose computer. &t1s said t+at
t+e ne*t generation of )* can #e e*tended #y wiring
multiple processor wit+ &nter%aken inter-connect. 'oday
a pair of 'esla )-)-5 4* lane -6&e cards can arc+ive
'flops grade computing power. '+en, +ow a#out making
t+e twenty first century incarnation of desktop personal
computer, let1s say, w+ose outlook are 2ust like as S)&
&ndigo or Ne*t6u#eT
4$ Con'&!ion
-oring NetBSD .! to '&%(-)* is found easier t+an
anticipated since NetBSD .! provides SM- ready kernel
constructs and .-& sets to use. '+e num#er of lines
written in assem#ler was very small as t+e essential part
of porting #urden are well defined. $%&W nature of t+e
processor is recogni,ed not a +urdle.
"(no8'e)g*ent
Sanctum Networks wis+es to e*press its gratitude to all
mem#ers involved in t+is pro2ect, especially t+e mem#ers
from 3apan w+o contri#uted critically in t+e early stages.

Das könnte Ihnen auch gefallen