Sie sind auf Seite 1von 5

The Bits And Bytes Of The Machines Storage http://www.nextplatform.com/2016/01/25/the-bits...

ABOUT CONTRIBUTORS CONTACT THE REGISTER THE CHANNEL

HOME COMPUTE STORE CONNECT CONTROL CODE ANALYZE HPC ENTERPRISE HYPERSCALE CLOUD ISC16

The Bits And Bytes Of The Machines Storage


January 25, 2016 Mark Funk

By now, as we have seen in other parts of this series, we have a


pretty good sense for the basic topology of The Machine from
Hewlett Packard Enterprise. In it are massive amounts of fabric
memory, any node and application can access it, no matter
where they are executing. In there somewhere, though, is your The Next Platform Weekly
file, your object, your database table. You know its yours and
Tap the stack to painlessly subscribe
only you have the right to access it. So, what in The Machine is
for a weekly email edition of The
ensuring that only you get to access it? And, in doing so, still Next Platform, featuring highlights,
allow you efficient access to it. analysis, and stories from the week
directly from us to your inbox with
Folks designing The Machine speak absolutely correctly about
nothing in between.
the need to have integrated security and integrity throughout The Machines design. So lets start
by looking at one very low level aspect of that integrated security.

As you have seen, every processor on any node can access the contents of any byte of fabric
memory, no matter the node on which it resides. You have also seen that only local processors can
access a nodes DRAM. You might also know that in more standard symmetric multi-processor
systems, or SMPs, every byte of the physical memory is known by a unique real address. A
programs instructions executing on a processor generate such real addresses, using that real
address to uniquely request the data found there, and then work with that data.

Knowing that the DRAM is only locally accessible, you might picture that both the volatile and the
Fabric Memory as being mapped under a Real Address Space as in the following figure:

In such a mapping, any processor could generate a real address into local DRAM and that address
would only access the memory from its own node, and no other. The hardware guarantees that
merely from the notion of private/local memory. However, with fabric memory being global, the
whole of the fabric memory would be spread out across the remainder of the real address space,
allowing every byte found there to be accessed with a unique real address, no matter the processor
using that real address.

Yes, that would work, but that is not the mental picture to have for The Machine. Indeed, suppose it
was, and any processor can have concurrent access to hundreds of petabytes of persistent
memory. Just to keep it simple, lets say The Machines fabric memory size someday becomes 256
pebibytes; that is 256 X 10245, or 28 * 250 bytes, requiring at least 58 bits to span this real address
space. If a processor were to need to concurrently access all of this memory, the processor would
need to be capable of supporting this full 58-bit address. For comparison, the Intel Xeon Phi

1 of 5 07/22/2016 01:35 PM
The Bits And Bytes Of The Machines Storage http://www.nextplatform.com/2016/01/25/the-bits...

supports a 40-bit physical address in 64-bit mode. Its not that it cant be done, but that is quite a Similar Vein
jump. And from Keith Packards presentation we find that The Machine did not make that jump:

In our current implementation, we are using an ARM-64 core; a multi-core processor. It has 48
bits of virtual addressing, but it has only 44 bits of physical addressing. . . . Out of that, we get 53
bits (real address), and that can address 8 petabytes of memory. . . and we translate those into actual
memory fabric addresses, which are 75 bits for an address space potential of 32 zettabytes.

Still, if such a huge global real address were supported, that means that if any processor can
generate such a real address which also means that if any thread in any process in any OS can
generate such a real address it then also has full access to the whole of The Machines memory at Weaving Together The
any moment. If this were the way that all programs actually access memory, system security and Machines Fabric
integrity would have a real problem. There are known ways ways used on most systems and The Memory
Machine as well (as well see in a subsequent article in this series) that this can be avoided; done
right, todays systems tend to be quite secure. Even so, The Machine takes addressing-based
security and integrity a step further even at this low level of real addresses as we will see next.

In the real addressing model used by The Machine, rather than real address space being global (as
in the above figure), picture instead a real address space per node, or more to the point, one scoped
only to the processors of each node. Said differently, the processors of any given node have their
own private real address space. Part of that is, of course, used to access the node-local DRAM. But
now also picture regions of each nodes real address space as being mapped securely by the
hardware onto windows of various sizes into fabric memory, any part of fabric memory. The
The Intertwining Of
processors of each node could potentially generate arbitrary real addresses within their own real
Memory And
address space, but it is only those real-address regions securely mapped by the hardware onto
Performance Of HPEs
physical memory that can actually be accessed. No mapping, no access. Even though the nodes
real address space may be smaller than the whole of fabric memory, those portions of fabric Machine
memory needed for concurrent access are nonetheless accessible.

For example, a file manager on your node wants access to some file residing in fabric memory,
perhaps a single file residing in spread out amongst a set of different regions on a number of
different nodes. Your OS requests the right to access all of that file. Portions of the file are each
known to reside physically on a particular set of nodes and, within those nodes, at particular
regions within them. That fact alone does nothing for your program or the processor accessing it;
the file is at well-defined locations in physical DRAM, but the processor proper can only generate
real addresses. Said differently, the program and the processor could generate real addresses with
the intent of accessing fabric memory, but that real address is not the physical tuple like Node Drilling Down Into The
ID::Media Controller ID::DIMM::Offset where the files bytes really reside. Machine From HPE

To actually allow the access, your nodes hardware must be capable of translating the processors
real address into such a physical representation. That real-to-physical region mapping is held and
set securely in the hardware; your program knows nothing about it, only trusted code and the
hardware do. Your processor generates the real address of the file as your OS perceives it and the
hardware supporting the fabric memory translates that real address to the actual location(s) of
your file.

First Steps In The


Program Model For
Persistent Memory

Operating Systems,
Virtualization, And The
Machine

Of course, persistent or not, the fabric memory is also just memory; slower, yes, but there a lot of
it. Redrawing the previous figure slightly more abstractly as below (its still the same four nodes),
we can see that if your program needs more memory, it can ask for more; upon doing so, your
program is provided a real address a real address as understood by your nodes processors and

2 of 5 07/22/2016 01:35 PM
The Bits And Bytes Of The Machines Storage http://www.nextplatform.com/2016/01/25/the-bits...

that real address had been mapped onto some physical region of fabric memory (which can
include the local nodes fabric memory).

Additionally, the fabric memory may enable data persistence, but if it is only more memory that
your program needs, your program need not manage it as persistent. As we saw earlier, blocks in
fabric memory is cache-able, each block tagged in the cache using real addresses. Once such
blocks are flushed from the cache, the real address is provided to this mapping hardware, which in
turn identifies where in fabric memory the flushed block is to be stored. If your object did not need
to actually be persistent, rather than explicitly forcing cached blocks out to memory, you can just
allow such blocks to sooner or later be written back to even fabric memory. Your program need
not know when or even if; as with DRAM, the changed data can sooner or later makes its way back
to memory.

Interestingly, as a different concept, even though every node shown here does have a processor
(and its own DRAM), if one or more nodes are only contributing fabric memory to the system,
while it is only the persistent memory of such nodes being used, the processors and DRAM on
such nodes could conceivably be shutdown, saving the cost of keeping them powered.

The Shared Versus Persistent Data Attributes Of Fabric Memory


As implied in the previous section, the topology of The Machine introduces an interesting side
effect, perhaps even an anomaly, showing the two sides of fabric memory. The volatile DRAM
memory is accessible by only the processors residing on the same node, so any sharing possible is
by only the processors on that node. That is as far as that sharing there goes. So if processor-based
sharing is to occur amongst any of The Machines processors and OSes, its the non-volatile fabric
memory that is being used for that purpose, not the volatile DRAM. Interestingly, for much of that
sharing the data shared need not also require persistence.

See the point? The Machines non-volatile fabric memory is being used for essentially two separate
reasons;

1. For active data sharing, for data that does not need to be maintained as persistent, and
2. Separately, as memory which truly is persistent, and interestingly is likely being shared as a
result.

I did not actually say that the inter-node shared data does not need to be in fabric memory. It does.
Inter-node sharing cannot count on cache coherence. In order for another node to see the data
being shared, that shared data must be in fabric memory and invalidated from the cache of nodes
that want to see the changes.

Said differently, suppose processors of two nodes, Node A and B, want to share data. A processor
on Node A has made the most recent change to the shared data, with the change residing still in the
cache of that processor. If cache coherence spanned multiple nodes, a processor on Node B would
be capable of seeing that changed data, even if still in Node As processors cache. But cache
coherence does not span nodal boundaries. So if Node As processor wants to make the change
visible to the processors on Node B, Node As processor must flush the changed cache lines back

3 of 5 07/22/2016 01:35 PM
The Bits And Bytes Of The Machines Storage http://www.nextplatform.com/2016/01/25/the-bits...

out to fabric memory. Additionally, in order for a processor on Node B to see this same data, that
data block cannot then reside in a Node B processors cache; if it does, that block (unchanged) must
also be flushed from that cache to allow a Node B processor to see the change. Seems complex, and
it is important to enable inter-node sharing of changing data, but The Machine provides APIs to
enable such sharing.

So, yes, we did need to make the shared data reside in fabric memory in order to allow it to be seen
by another node, but we did not actually need it to be persistent. That item of data is in persistent
fabric memory in order for it to be available to be seen by Node B and others, but actual
persistence to failure is a bit more subtle than that. If after successfully flushing the changed cache
lines to fabric memory holds the changed/shared data, it will still be there after power cycling, but
does there exist enough information for the restarted system to find the changed data? Its that
which well try to explain in the next section.

Related Items
Drilling Down Into The Machine From HPE

The Intertwining Of Memory And Performance Of HPEs Machine

Weaving Together The Machines Fabric Memory

The Bits And Bytes Of The Machines Storage

Operating Systems, Virtualization, And The Machine

Future Systems: How HP Will Adapt The Machine To HPC

Share this:

Reddit Facebook 10 LinkedIn 125 Twitter Google Email

Similar Vein

Weaving Together The The Intertwining Of Drilling Down Into The


Machines Fabric Memory And Machine From HPE
Memory Performance Of HPEs
Machine

First Steps In The Operating Systems, Programming For


Program Model For Virtualization, And The Persistent Memory Takes
Persistent Memory Machine Persistence

Categories: Compute, Enterprise, Store

Tags: HPE, The Machine

Baidu Looks to Next Generation Dirt Simple HPC: Making the


Deep Learning Accelerators Case for Julia

4 of 5 07/22/2016 01:35 PM
The Bits And Bytes Of The Machines Storage http://www.nextplatform.com/2016/01/25/the-bits...

Leave a Reply

Your email address will not be published. Required fields are marked *

Comment

Name *

Email *

Website

Post Comment

Pages Recent Posts


About Are ARM Virtualization Woes
Contact Overstated?
Contributors Datacenters, Poised To Spend, Take
Newsletter A Breather From Intel
Stacking Up Oracle S7 Against Intel
Xeon
Systems Are The Table Stakes For
IBMs Evolution
HPC Flows Into Hyperscale With
Dell Triton

Copyright 2015 The Next Platform

5 of 5 07/22/2016 01:35 PM

Das könnte Ihnen auch gefallen