You are on page 1of 4

Maxwell (microarchitecture)

From Wikipedia, the free encyclopedia

Nvidia Maxwell
Transistors 28 nm
Predecessor Kepler
Successor Pascal

Maxwell is the codename for a GPU microarchitecture developed by Nvidia as the successor to
the Kepler microarchitecture. The Maxwell architecture was introduced in later models of the
GeForce 700 series and is also used in the GeForce 800M series, GeForce 900 series, and
Quadro Mxxx series, all manufactured in 28 nm.[1]

The very first Maxwell-based products to hit the market were the GeForce GTX 750 and the
GeForce GTX 750 Ti. Both were released on February 18, 2014, both with the chip code number
GM107. Earlier GeForce 700 series GPUs had used Kepler chips with the code numbers GK1xx.
The GM10x GPUs are also used in the GeForce 800M series and the Quadro Kxxx series. A
second generation of Maxwell-based products was introduced on September 18, 2014 with the
GeForce GTX 970 and GeForce GTX 980, followed by the GeForce GTX 960 on January 22,
2015, the GeForce GTX Titan X on March 17, 2015, and the GeForce GTX 980 Ti on June 1,
2015. These GPUs have GM20x chip code numbers.

Maxwell introduced an improved Streaming Multiprocessor (SM) design that increased power
efficiency,[2] the sixth and seventh generation PureVideo HD, and CUDA Compute Capability

The architecture is named after James Clerk Maxwell, the founder of the theory of
electromagnetic radiation.

1 First generation Maxwell (GM10x)

o 1.1 Polymorph-Engine 3.0

o 1.2 NVENC

o 1.3 PureVideo

o 1.4 Chips
2 Second generation Maxwell (GM20x)

o 2.1 Chips

3 Performance

4 Successor

5 See also

6 References

First generation Maxwell (GM10x)

First generation Maxwell GPUs (GM107/GM108) were released as GeForce GTX 745, GTX
750/750 Ti, GTX 850M/860M (GM107) and GeForce 830M/840M (GM108). These new chips
provided few consumer-facing additional features, as Nvidia instead focused more on GPU
power efficiency. The L2 cache was increased from 256 KiB on Kepler to 2 MiB on Maxwell,
reducing the need for more memory bandwidth. Accordingly, the memory bus was reduced from
192 bit on Kepler to 128 bit, further saving power.[3] The streaming multiprocessor design from
Kepler was also retooled and partitioned, renaming it to SMM for Maxwell. The structure of the
warp scheduler was inherited from Kepler, with the texture units and FP64 CUDA cores still
shared, but the layout of most execution units were partitioned so that each warp schedulers in an
SMM controls one set of 32 FP32 CUDA cores, one set of 8 load/store units and one set of 8
special function units. This is in contrast to Kepler, where each SMX has 4 schedulers that
schedule to a shared pool of execution units.[4] Prior to Kepler, these units were connected to a
crossbar that uses unnecessary power to allow them to be shared.[4] On Maxwell, the crossbar
was removed as it became redundant.[3][4] This allowed for a finer-grained and more efficient
allocation of resources than in Kepler, saving power when the workload isn't optimal for shared
resources. Nvidia claims a 128 CUDA core SMM has 90% of the performance of a 192 CUDA
core SMX while efficiency increases by a factor of 2.[3] Also, each Graphics Processing Cluster,
or GPC, contains up to 4 SMX units in Kepler, and up to 5 SMM units in first generation

GM107 also supports CUDA Compute Capability 5.0 compared to 3.5 on GK110/GK208 GPUs
and 3.0 on GK10x GPUs. Dynamic Parallelism and HyperQ, two features in GK110/GK208
GPUs, are also supported across the entire Maxwell product line.

Maxwell also provides native shared memory atomic operations for 32-bit integers and native
shared memory 32-bit and 64-bit compare-and-swap (CAS), which can be used to implement
other atomic functions.

Maxwell GPUs and later use tiled rendering.[5]

Polymorph-Engine 3.0
The brand Polymorph-Engine denotes the unit responsible for Tessellation. It corresponds
functionally with AMD's Geometric Processor. Maxwell features version 3.0.


Main article: Nvidia NVENC

Maxwell-based GPUs also contain the NVENC SIP block introduced with Kepler. Nvidia's video
encoder, NVENC, is 1.5 to 2 times faster than on Kepler-based GPUs, meaning it can encode
video at six to eight times playback speed.[3]


Main article: Nvidia PureVideo

Nvidia also claims an eight to ten times performance increase in PureVideo Feature Set E video
decoding due to the video decoder cache, paired with increases in memory efficiency. However,
H.265 is not supported for full hardware decoding, relying on a mix of hardware and software
decoding.[3] When decoding video, a new low power state "GC5" is used on Maxwell GPUs to
conserve power.[3]




Second generation Maxwell (GM20x)

Second generation Maxwell GPUs introduced several new technologies: Dynamic Super
Resolution,[6] Third Generation Delta Color Compression,[7] Multi-Pixel Programming Sampling,
Nvidia VXGI (Real-Time-Voxel-Global Illumination),[9] VR Direct,[10][11][12] Multi-Projection
Acceleration,[7] Multi-Frame Sampled Anti-Aliasing(MFAA)[13] (however, support for Coverage-
Sampling Anti-Aliasing(CSAA) was removed),[14] and Direct3D12 API at Feature Level 12_1.
HDMI 2.0 support was also added.[15][16]

The ROP to memory controller ratio was changed from 8:1 to 16:1.[17] However, some of the
ROPs are generally idle in the GTX 970 because there are not enough enabled SMMs to give
them work to do, reducing its maximum fill rate.[18]

Second generation Maxwell also has up to 4 SMM units per GPC, compared to 5 SMM units per

GM204 supports CUDA Compute Capability 5.2 (compared to 5.0 on GM107/GM108 GPUs,
3.5 on GK110/GK208 GPUs and 3.0 on GK10x GPUs).[7][17][19]
GM20x GPUs have an upgraded NVENC which supports HEVC encoding and adds support for
H.264 encoding resolutions at 1440p/60FPS & 4K/60FPS (compared to NVENC on Maxwell
first generation GM10x GPUs which only supported H.264 1080p/60FPS encoding).[12]

After consumer complaints,[20] Nvidia revealed that it is able to disable individual units, each
containing 256KB of L2 cache and 8 ROPs, without disabling whole memory controllers.[21] This
comes at the cost of dividing the memory bus into high speed and low speed segments that
cannot be accessed at the same time for reads, because the L2/ROP unit managing both of the
GDDR5 controllers shares the read return channel and the write data bus between the GDDR5
controllers. This makes simultaneous reading from both GDDR5 controllers or simultaneous
writing to both GDDR5 controllers impossible.[21] This is used in the GeForce GTX 970, which
therefore can be described as having 3.5 GB in a high-speed segment on a 224-bit bus and 512
MB in a low-speed segment on a 32-bit bus.[21] The peak speed of such a GPU can still be
attained, but the peak speed figure is only reachable if one segment is executing a read operation
while the other segment is executing a write operation.[21]





The theoretical single-precision processing power of a Maxwell GPU in GFLOPS is computed as
2 (operations per FMA instruction per CUDA core per cycle) number of CUDA cores core
clock speed (in GHz).

The theoretical double-precision processing power of a Maxwell GPU is 1/32 of the single
precision performance (which has been noted as being very low compared to the previous
generation Kepler).[22]

The successor to Maxwell is codenamed Pascal.[23] The Pascal architecture features High
Bandwidth Memory, Unified Memory, and NVLink.[23]

See also
List of Nvidia graphics processing units