IA A
cu») United States
2) Patent Application Publication co) Pub. No.: US 2020/0242723 Al
Colenbrander (43) Pub, Date: Jul. 30, 2020
3AME CONSOLE CPU/GPU (62) US.cL
ME CONSOLE AND cre GO6T 1/20 (201301); ABIE 13/385
(2014.09): GO6F 1/60 (201301)
on ABSTRACT
(2) Filed: Jan,
(1) nec.
Gor 120
Gus 1760
ABP 132385
Cloud gaming
Client game console | <>»
management server
a 212 pe
Storage Server 1 4 Storage Server N
RAM | RAM
206 ney 4Patent Application Publication Jul. 30, 2020 Sheet 1 of 7
Primary Dispa
18) Speakers Processor
24
20 Network ||9PUt Device Cera
Interface 28 Zz
Pot | Medium |~ GPS
24 2a 0
Near Fists
Bluetooth Neat Field Auxiliary
34 \Element__99| Sensor(s)
OTA Infrared
38 42)
44,
Consumer Electronic Device D.
Speakers] it nevioalPOcesse4
Network |!" Camera
Interface SA a
Port Medium, GPS
84)
Bluetooth [Neat Field Auxiliary
Communication
salElement__z0| Sensor(s)
Climate | Biometric
Sensor(s)|_ Sensor(s)
52
56.
Infrared
Py
37
50
L-s3
72
‘US 2020/0242723 AL
Consumer
Electronic
Device
Cable or
satellite
source
28a
j
Network
Interface
= /
Medium
34)
Processor]Patent Application Publication Jul. 30,2020 Sheet 2 of 7 US 2020/0242723 Al
200 202
Cioud gaming
|
|
|
Client game console | <———> |
| Management server
|
|
|
I
28 22,
Storage Server 1 [ Storage Server N
.
RAM RAM |
Noes Nae
FIG. 2Patent Application Publication Jul. 30,2020 Sheet 3 of 7 US 2020/0242723 Al
790
I
304 sf :
\ feru Deru GPU] CPU i
308 | Memory 12 Memory {
sro} Semmolen Controller |
|] Memory (RAM) Memory (RAM) i
i
FIG. 3. Non-uniform memory access
400. 402
cPU_| GPU GPU_[ CPU
404
Memory Controller
406
Memory
FIG. 4 Separate dies ~ all processors
eae 502 504
ease
| cpu _| Gpu
soe |
Memory [i____t
| | Controtter ae
7] Memory (RAM)
|
Let
FIG. 5 Extra die with extra APUPatent Application Publication Jul. 30,2020 Sheet 4 of 7 US 2020/0242723 Al
608
Video |__| ene
encodef| [ Scan out unit
604
Register Register
| | |
606
Buffer 1D | Buffer ID
FIG. 6Patent Application Publication Jul. 30, 2020 Sheet 5 of 7 ‘US 2020/0242723 AL
700 702 704
Assign memory Program registers of
regions as frame
scanout unit to point
Cycle through buffers
buffers (e.g., two | to buffer managed by *) to output HDMI
buffers) different GPU
NUMA technique A — GPUs render
FIG.7 ferent frames
800 802 804
Assign memory Program registers to Receive frame from
regions as frame }———> point only to local |-————>} other GPU via Direct
buffers buffers Memory Access
| 806
Cyole through frames
to output HDMI
FIG, g NUMA technique B - GPUs render
different frames,Patent Application Publication Jul. 30, 2020 Sheet 6 of 7. US 2020/0242723 AL
200 902 908
Generate N of M Generate M-N lines Output HDMI frame:
lines from buffer 4 from buffer 2 lines 1-M
FIG, 9 GPUs render different portions of
"each frame (NUMA technique 1)
4000 1002 1004
Receive M-N lines
|» from second GPU via} ——>}
DMA
Generate N of M
lines from buffer 4
Output HDMI frame
fines 1 -M
FIG. 19 GPUS Fender different portions of
IS each frame (NUMA technique 2)
1400 4102 1104
GPU 1 renders N GPU 2 renders M-N Output HDMI frame
lines to frame buffer lines to same buffer lines 1-M.
FIG. 11 GPUS render different portions of each
«1 frame (shared memory controller)Patent Application Publication Jul. 30,2020 Sheet 7 of 7 US 2020/0242723 Al
|
| Physically connect HDMI port to eon
| particular GPU
|
Which GPU controls HDMI
FIG. 12 output (technique 1)
1300 1302
Each GPU has its own Multiplexer toggles
output port between ports
Which GPU controls HDMI
FIG. 13 output technique 2)
1402 1404
Hor ZEH oe
Multiplexer "Ch
GPU
FIG. 14US 2020/0242723 AI
SCALABLE GAME CONSOLE CPUK
DESIGN FOR HOME CONSOLE AND.
CLOUD GAMING
u
FIBLD
[0001] The application relates generally to sealable game
‘console CPUIGPU designs for home consoles and cloud
aming
BACKGROUND
10002} | Simulation consoles such as computer game con-
soles typically use a single chip, referred to as “system on
chip” (SoC) that contsins a central processing unit (CPU)
‘and a praphies processing unit (GPU). Due to semiconductor
Scaling challenges and yield issues, multiple small chips can
be linked hy high-speed coherent busses to form big chips
While such a sealing solution is slightly less optimal ia
performance compared to building a hnge monolith chip,
iis less costly,
SUMMARY,
10003] As understood herein. SoC technology canbe
applied to video simulation consoles such as game consoles,
‘an in particular a single SoC may’ be provide for “ight”
version of the coasole while plural So's may be wsed 0
provile a “high-end” version of the console with grester
processing and storage capsbilty than the “light” version.
‘The “high end” system can also contain more memory such,
as random-access memory (RAM) and other features and
may’ also be wsed for a cloud-optimized version using the
same game console chip with more performance,
10003] As further understood herein, however, such
“high end” multiple SoC design poses challenges to the
software and simulation (game) desiga, which must scale
accontingly. As an example, challenges arise related (0
non-uniform memory access (NUMA) and thresd manage-
ment, as well as providing hints 19 software t0 use the
hardware in the best way. In the ease of GPUs working in
‘concert the framebuffer management and control of high
definition multimedia (HDMI) output may be addressed,
Other challenges as well may be addressed herein.
[0005] Accordingly, an apparatus includes at least a frst
_araphics processing unit (GPU), and a least a second GPU
‘communicatively coupled to the first GPU. The GPUs are
programmed to render respective portions of video, such
that the frst GPU renders frst portions of video and the
sccond GPU renders second portions of the video, with the
first and second portions being different from each other.
[0006] Stated differenly, the first GPU may be pro-
grammed fr rendering first frames of vido to provide a ist
‘output, while the second GPU is programmed rendering
sovne, but not all frames of the video to provide @ second
‘output. The frames rendered by the second GPU are different
from the frames rendered by the first GPU. The first and
second outputs may be combined to render the video. I
sition, or altematvely the frst GPU may be programme
Jor rendering all of some, but not all, Hines of frame of
video to provide firs ine outpat and the second GPU may
be programmed for rendering Some, but not all, lines of the
frame ofthe video to provide a second fine output. The lines
rendered by the second GPU are different from the lines
rendered bythe fist GPU, The first and secons-line outpits
‘can be combined to render the frame
Jul. 30, 2020
[0007] In some embodiments, the first and second GPUs
fare implemented on # common die, In other embodiments,
the fist and second GPUs are implemented on respective
Tint and second dies The fist GPU may be associated With
a first central processing unit (CPU) and the second GPU
‘may be asiociated with a second CPU.
[0008] In some implementations, a first memory controller
And first memory’ are associated with the first GPU and
second memory controller and second memory are aso
sted with the second GPU. In other implementations, the
GPUs share a common memory controller controlling. a
common memory.
[0009] In some examples, each GPU is programmed
render all ofsome, but nt al frames of video diferent rom
frames of the video rendered by the other GPU to provide a
espoctve output The outputs ofthe GPUs can be combined
to render the video. In other examples, each GPU is pro-
grammed to render all of some, but not all, lines ofa frame
‘of wide, with ines of frame of video rendered by a GPU
being diferent from lines ofthe frame readered by the other
i. The outputs of the
ubined ta render the video,
[0010] In an example technique, the fist GPU ineludes at
Jeastone scanout unit pointing to atleast one butler managed
by the second GPU. The first GPU eaa be programmed 10
cyele through butfers to output a complete sequence of
frames of the video. In another example, the frst GPU
includes at least one seanot wit pointing only to bullers
‘managed by the first GPU and is programmed to receive
frames ofthe video from the second GPU via direct memory
access (DMA) fo output a complete sequence of frames of
the video.
[011] In yet another example technique, the fist GPU
includes at least one seanoutvnit pointing to atleast a frst
buffer managed by the first GPU and a second buffer
managed by the second GPU. In this technique, the frst
GPU is programmed to eycle through bulfers to output 2
complete sequence of frame of video sing I-N lines asso-
ate withthe first buffer and (N#1)-M lines associated with
the second buller. The 1-N lines are different lines of the
same frame associated withthe (N)-M lines
[0012] Yet again, the fist GPU can include at loast one
‘Seanout unit pointing oat Teast frst buffer managed by the
first GPU and not toa second buffer managed by the second
GPU, In this implementation, the first GPU may be pro-
grammed t0 cycle through bulfers to output a complete
Sequence of frame of video using I-N lines associated with
the fist buffer and (N+1)-M lines associated with the second
buller and received by the first GPU via direct memory
‘eess (DMA). The 1-N lines and (Ne1)-M lines are diller
ent Fines of the frame of video.
[0013] In still another technique, the frst GPU includes at
Teast one scanout unit pointing to at least first busfer
commiiicating with the common memory controller. The
second GPU includes a second buffer communicating with
the common memory controller. The fist GPU is pro-
sarammed for rendering I-N lines associated with the fi
buller and the second GPU is programmed for rendering
(N+ 1)-M lines associated with the socond bute.
[0014] In some examples, the first GPU manages video
‘ata output from the fist and second GPUs, This may’ be
affected by physically comnecting a HIDMI port tothe first
GPU. In other examples, the GPUs output video data to aUS 2020/0242723 AI
plexes the frames odo ines fo
together to output video.
In another aspect, in @ muligraphics processing
‘method. incldes
‘causing plural GPUs to render respective frames of Video, oF
to render respective portions of each frame of video, or both
to render respective Trames axl respective portions of
frames of video. The method ineludes controlling frame
‘ouput using a first one of the GPUs receiving fame
information ffom at least one other of the GPUIG), oF
rutiplexing ontputs of the GPUS togeher, or both using &
first one of the GPUs receiving frame information from at
Jeast one other ofthe GPU(s) and multiplexing outputs ofthe
GPUS together.
[0016] In another aspect, a computer simulation apparatus
includes at Jeast a fist graphics processing unit (GPU)
programmed for rendering a respective fist portion of
simulation video, anda least a second GPU programmed for
rendering a respective second portion of simulation video.
AA east the first GPU is programmed to combine the fist
‘and second portions and to render an ourput establishing 8
‘complete simulation video.
[0017] The dt
structure and operation, can best be w
to the accompanying drawings, in which like reference
numerals refer vo like parts, and in which,
0015]
unit (GPU) simulation environment
BRIEF DESCRIPTION OF THE DRAWINGS
0018] FIG. 1 isa block diagram of an example system
Jncluding an example in accordance with preset principle
0019] FIG. 2 is 9 schematic dingeam of a cloud-based
Bamig system
{0020} FIG. 3 is a block diggram of an example oon-
‘niform memory acess (NUMA) architecture, n which to
APUsare shows on a single Tare, it being understood that
the NUMA architecture may be implemented by APU oa
separate fabrics and that more than two APUS may be
implemented
{0021} FIG. 4s a block diagram of « shal memory
farchiteerio in which two APUs are showa with cach
Procetsr being implemented on is own respective de, it
being understood thatthe arehiteture may be implemented
‘on fewer or even one dic and that more than two APUs may
be implemented,
0022] FIG, 5 is a block diagram of a shared memory
‘chiteenire in which to APUs are showin ith each APU
being implessened on ts own respective fare ad with the
shared memory controller being implemented on one ofthe
fabrics, it being understood that the architecture may be
implemented on one fabric and that more than wo APUs
may be implemented on one or more dies:
[0023] FIG. 6 isa block diagram of an example GPU with
Scanout unit:
10024] FIG. Ti « Bow chart of example logic ofa NUMA
‘embovliment in which each GPU renders complete frames
‘with cach GPU rendering different frames of the same video
than the other GPU, with one of the GPUs having registers
pointing to bullers of the eter GPU(6}:
10025] FIG. 8 sa flow chart of example logic ofa NUMA
‘embodiment in which each GPU renders complete frames
with each GPU rendering diferent frames ofthe same video
than the other GPU, with one of the GPUs receiving frames
via DMA from the other GPU(S}:
Jul. 30, 2020
(0026) 1G. 9 sa ow chart oF example logic ofa NUMA
cmbodiment in which cach GPU readers portions (oa
lines) of frames with each GPU rendering diferent portions
ofthe sume frame than the other GPUs
{0027} FIG. 10 is a flow chact of example logic of a
[NUMA embodiment in which each GPU renders portions
(Ge, lines) af frames with each GPU rendering dilleret
Portions ofthe same frame tan te ether GPU, with one of
{he GPUs receiving lines via DMA from the other GPU):
{0028} FIG. IL sa fow chart of example logic of shared
semory embodiment in which cich GPU renders portions
(Ce. lines) of frames with each GPU rendering diferent
portions of the same frame than the other GPU:
{0029} FIG. 12 is a low chart of example logie for
controling video outpt using a single GPU eonncctd t 3
HDMI pore
(0030) FIG, 13 is @ flow chart of example logie for
controling video omipu using a multiplexer, an
{0081} FIG. 14isa Block diagram assoeated with FIG. 13.
DETAILED DESCRIPTION
0032] This disclosure relates generally to computer eo
systems including aspects of consumer electronics (CE)
device networks such af but not limited to distributed
‘computer game networks, video broadcasting, content dliv~
ery networks, vital machines, and machine leaming app
cations. A system herein may include server and client
fomponens, connected over @ network such that data may
be exchanged between the client and server components
‘The client components may include one or more computing
devices including game consoles such as Sony PlayStation
and related motherboards, portable televisions (e.g, smart
TVs, Internet-enabled TVs), portable computers such as
laptops and tablet computers, and other mobile devices
including smart phones and additional examples discussed
below. These client devices may operate with a variety of
‘operating environments. For example, some of the client
‘computers may employ, examples, Orbis or Linux oper-
ating systems, operating systems from Microsoft, ora Unix
‘operating system, or operating systems proxuced by Apple
‘Computer or Google. These operating environments may be
used to execute one of more browsing programs, such a8 @
broviser made by Microsoft or Google or Movilia or other
browser program that can access Websites hosted by the
Internet servers discussed below. Also, an operating envi-
ronment aeconting to present principles may be used 10
fexecte one of more computer game programs
[0033] Servers andlor gateways may include one or more
processors executing instructions that configure the servers
fo receive and transmit data over a network such a the
Internet. Or, client and server cat be connected over a local
inteanet or a virtual private network. A server or controller
‘nay be instantiated by a game console and/or one or more
‘motherboards thereof such as a Sony PlayStation®, a pet-
sonal computer, ee
[0034] Information may be exchanged over a network
‘between the clients and servers, To this end and for security,
servers andr clients can include firewalls, load balancer
Temporary stories, and proxies, and other network infra.
structure for reliability and security. One or more servers
‘may fom an apparatus that implement methods of providing
f secure community such as an online social website to
retwork membersUS 2020/0242723 AI
10035] As used herein, instructions refer to computer
‘implemented steps for processing information inthe system.
Insiruedons can be implemented in software, firmware or
hardware and inclode any type of programmed step under-
taken by components of the system,
10036] A processor may be any conventional general-
purpose single- or multichip processor that can execute
logie by means of various lines such as address Hines, data
lines, and contol Hines aad repsters and shill registers.
[0037] Software modules described by way of the flow
‘chars and user interfaces herein can include various sub=
routines, procedures, ete. Without limiting the disclosure,
logic stited to be executed by a particular module can be
redistributed to other software modules and/or combined
together ina nodule and/or mide available in
shareable brary
[0038] Present principles described herein ean be imple:
mente as hardware, software, fimWare, oF eombinations
thereof, hence, illustrative components, blocks, modules
‘iruits, and steps are set forth in terms of thei functionality:
10039] Further to what has been alluded to above, logical
blocks, modules, and circuits described below ean be imple-
mented or perfonned with a general purpose processor, @
digital signal processor (DSP), a field programmable gate
array (FPGA) or other programmable logic device such as
fan application specific integrated circuit (ASIC). discrete
ate or transistor logie, discrete hardware components, oF
fy combination thereof designed to perform the functions
dsribed herein. A processor can be implemented by 3
‘contolle of state machine or @ combination of computing
devices,
0040] The functions and methods described below, when
‘implemented in software, can be writen ia an appropriate
Janguage suchas but not limited to ava, C# or C+, and ean
be slored on or irinsmitted through computer-readable
storage medium such as a random access memory (RAM),
read-only memory (ROM), electrically erasable program-
rable read-only memory (EEPROM), compact disk read-
‘only memory (CD-ROM) or other optical disk storage such
fs digital versatile dise (DVD), magnetic disk storage or
‘ther magnetic storaze devices including removable thumb
drives, ete. A connection may establish a computer-readsble
‘medivin. Such connections ean include, as examples, hard-
‘wired cables including fiber opties and coaxial wires and