Beruflich Dokumente
Kultur Dokumente
HOME COMPUTE STORE CONNECT CONTROL CODE ANALYZE HPC ENTERPRISE HYPERSCALE CLOUD ISC16
As you have seen, every processor on any node can access the contents of any byte of fabric
memory, no matter the node on which it resides. You have also seen that only local processors can
access a nodes DRAM. You might also know that in more standard symmetric multi-processor
systems, or SMPs, every byte of the physical memory is known by a unique real address. A
programs instructions executing on a processor generate such real addresses, using that real
address to uniquely request the data found there, and then work with that data.
Knowing that the DRAM is only locally accessible, you might picture that both the volatile and the
Fabric Memory as being mapped under a Real Address Space as in the following figure:
In such a mapping, any processor could generate a real address into local DRAM and that address
would only access the memory from its own node, and no other. The hardware guarantees that
merely from the notion of private/local memory. However, with fabric memory being global, the
whole of the fabric memory would be spread out across the remainder of the real address space,
allowing every byte found there to be accessed with a unique real address, no matter the processor
using that real address.
Yes, that would work, but that is not the mental picture to have for The Machine. Indeed, suppose it
was, and any processor can have concurrent access to hundreds of petabytes of persistent
memory. Just to keep it simple, lets say The Machines fabric memory size someday becomes 256
pebibytes; that is 256 X 10245, or 28 * 250 bytes, requiring at least 58 bits to span this real address
space. If a processor were to need to concurrently access all of this memory, the processor would
need to be capable of supporting this full 58-bit address. For comparison, the Intel Xeon Phi
1 of 5 07/22/2016 01:35 PM
The Bits And Bytes Of The Machines Storage http://www.nextplatform.com/2016/01/25/the-bits...
supports a 40-bit physical address in 64-bit mode. Its not that it cant be done, but that is quite a Similar Vein
jump. And from Keith Packards presentation we find that The Machine did not make that jump:
In our current implementation, we are using an ARM-64 core; a multi-core processor. It has 48
bits of virtual addressing, but it has only 44 bits of physical addressing. . . . Out of that, we get 53
bits (real address), and that can address 8 petabytes of memory. . . and we translate those into actual
memory fabric addresses, which are 75 bits for an address space potential of 32 zettabytes.
Still, if such a huge global real address were supported, that means that if any processor can
generate such a real address which also means that if any thread in any process in any OS can
generate such a real address it then also has full access to the whole of The Machines memory at Weaving Together The
any moment. If this were the way that all programs actually access memory, system security and Machines Fabric
integrity would have a real problem. There are known ways ways used on most systems and The Memory
Machine as well (as well see in a subsequent article in this series) that this can be avoided; done
right, todays systems tend to be quite secure. Even so, The Machine takes addressing-based
security and integrity a step further even at this low level of real addresses as we will see next.
In the real addressing model used by The Machine, rather than real address space being global (as
in the above figure), picture instead a real address space per node, or more to the point, one scoped
only to the processors of each node. Said differently, the processors of any given node have their
own private real address space. Part of that is, of course, used to access the node-local DRAM. But
now also picture regions of each nodes real address space as being mapped securely by the
hardware onto windows of various sizes into fabric memory, any part of fabric memory. The
The Intertwining Of
processors of each node could potentially generate arbitrary real addresses within their own real
Memory And
address space, but it is only those real-address regions securely mapped by the hardware onto
Performance Of HPEs
physical memory that can actually be accessed. No mapping, no access. Even though the nodes
real address space may be smaller than the whole of fabric memory, those portions of fabric Machine
memory needed for concurrent access are nonetheless accessible.
For example, a file manager on your node wants access to some file residing in fabric memory,
perhaps a single file residing in spread out amongst a set of different regions on a number of
different nodes. Your OS requests the right to access all of that file. Portions of the file are each
known to reside physically on a particular set of nodes and, within those nodes, at particular
regions within them. That fact alone does nothing for your program or the processor accessing it;
the file is at well-defined locations in physical DRAM, but the processor proper can only generate
real addresses. Said differently, the program and the processor could generate real addresses with
the intent of accessing fabric memory, but that real address is not the physical tuple like Node Drilling Down Into The
ID::Media Controller ID::DIMM::Offset where the files bytes really reside. Machine From HPE
To actually allow the access, your nodes hardware must be capable of translating the processors
real address into such a physical representation. That real-to-physical region mapping is held and
set securely in the hardware; your program knows nothing about it, only trusted code and the
hardware do. Your processor generates the real address of the file as your OS perceives it and the
hardware supporting the fabric memory translates that real address to the actual location(s) of
your file.
Operating Systems,
Virtualization, And The
Machine
Of course, persistent or not, the fabric memory is also just memory; slower, yes, but there a lot of
it. Redrawing the previous figure slightly more abstractly as below (its still the same four nodes),
we can see that if your program needs more memory, it can ask for more; upon doing so, your
program is provided a real address a real address as understood by your nodes processors and
2 of 5 07/22/2016 01:35 PM
The Bits And Bytes Of The Machines Storage http://www.nextplatform.com/2016/01/25/the-bits...
that real address had been mapped onto some physical region of fabric memory (which can
include the local nodes fabric memory).
Additionally, the fabric memory may enable data persistence, but if it is only more memory that
your program needs, your program need not manage it as persistent. As we saw earlier, blocks in
fabric memory is cache-able, each block tagged in the cache using real addresses. Once such
blocks are flushed from the cache, the real address is provided to this mapping hardware, which in
turn identifies where in fabric memory the flushed block is to be stored. If your object did not need
to actually be persistent, rather than explicitly forcing cached blocks out to memory, you can just
allow such blocks to sooner or later be written back to even fabric memory. Your program need
not know when or even if; as with DRAM, the changed data can sooner or later makes its way back
to memory.
Interestingly, as a different concept, even though every node shown here does have a processor
(and its own DRAM), if one or more nodes are only contributing fabric memory to the system,
while it is only the persistent memory of such nodes being used, the processors and DRAM on
such nodes could conceivably be shutdown, saving the cost of keeping them powered.
See the point? The Machines non-volatile fabric memory is being used for essentially two separate
reasons;
1. For active data sharing, for data that does not need to be maintained as persistent, and
2. Separately, as memory which truly is persistent, and interestingly is likely being shared as a
result.
I did not actually say that the inter-node shared data does not need to be in fabric memory. It does.
Inter-node sharing cannot count on cache coherence. In order for another node to see the data
being shared, that shared data must be in fabric memory and invalidated from the cache of nodes
that want to see the changes.
Said differently, suppose processors of two nodes, Node A and B, want to share data. A processor
on Node A has made the most recent change to the shared data, with the change residing still in the
cache of that processor. If cache coherence spanned multiple nodes, a processor on Node B would
be capable of seeing that changed data, even if still in Node As processors cache. But cache
coherence does not span nodal boundaries. So if Node As processor wants to make the change
visible to the processors on Node B, Node As processor must flush the changed cache lines back
3 of 5 07/22/2016 01:35 PM
The Bits And Bytes Of The Machines Storage http://www.nextplatform.com/2016/01/25/the-bits...
out to fabric memory. Additionally, in order for a processor on Node B to see this same data, that
data block cannot then reside in a Node B processors cache; if it does, that block (unchanged) must
also be flushed from that cache to allow a Node B processor to see the change. Seems complex, and
it is important to enable inter-node sharing of changing data, but The Machine provides APIs to
enable such sharing.
So, yes, we did need to make the shared data reside in fabric memory in order to allow it to be seen
by another node, but we did not actually need it to be persistent. That item of data is in persistent
fabric memory in order for it to be available to be seen by Node B and others, but actual
persistence to failure is a bit more subtle than that. If after successfully flushing the changed cache
lines to fabric memory holds the changed/shared data, it will still be there after power cycling, but
does there exist enough information for the restarted system to find the changed data? Its that
which well try to explain in the next section.
Related Items
Drilling Down Into The Machine From HPE
Share this:
Similar Vein
4 of 5 07/22/2016 01:35 PM
The Bits And Bytes Of The Machines Storage http://www.nextplatform.com/2016/01/25/the-bits...
Leave a Reply
Your email address will not be published. Required fields are marked *
Comment
Name *
Email *
Website
Post Comment
5 of 5 07/22/2016 01:35 PM