Sie sind auf Seite 1von 9

Data Storage

Computer data storage, often called storage or memory, is a technology consisting of


computer components and recording media used to retain digital data. It is a core function and
fundamental component of computers.

The central processing unit (CPU) of a computer is what manipulates data by performing
computations. In practice, almost all computers use a storage hierarchy, which puts fast but
expensive and small storage options close to the CPU and slower but larger and cheaper options
farther away. Often the fast, volatile technologies (which lose data when powered off) are
referred to as "memory", while slower permanent technologies are referred to as "storage", but
these terms are often used interchangeably. In the Von Neumann architecture, the CPU consists
of two main parts: control unit and arithmetic logic unit (ALU). The former controls the flow of
data between the CPU and memory; the latter performs arithmetic and logical operations on data.

Functionality
Without a significant amount of memory, a computer would merely be able to perform fixed
operations and immediately output the result. It would have to be reconfigured to change its
behavior. This is acceptable for devices such as desk calculators, digital signal processors, and
other specialised devices. Computers differ in having a memory in which they store their
operating instructions and data. Such computers are more versatile in that they do not need to
have their hardware reconfigured for each new program, but can simply be reprogrammed with
new in-memory instructions; they also tend to be simpler to design, in that a relatively simple
processor may keep state between successive computations to build up complex procedural
results.

Data organization and representation


A modern digital computer represents data using the binary numeral system. Text, numbers,
pictures, audio, and nearly any other form of information can be converted into a string of bits, or
binary digits, each of which has a value of 1 or 0. The most common unit of storage is the byte,
equal to 8 bits. A piece of information can be handled by any computer or device whose storage
space is large enough to accommodate the binary representation of the piece of information, or
simply data. For example, the complete works of Shakespeare, about 1250 pages in print, can be
stored in about five megabytes (40 million bits) with one byte per character.

Data is encoded by assigning a bit pattern to each character, digit, or multimedia object. Many
standards exist for encoding (e.g., character encodings like ASCII, image encodings like JPEG,
video encodings like MPEG).

Data compression methods allow in many cases to represent a string of bits by a shorter bit string
("compress") and reconstruct the original string ("decompress") when needed. This utilizes
substantially less storage (tens of percents) for many types of data at the cost of more
computation (compress and decompress when needed). Analysis of trade-off between storage

Introduction to Computers
cost saving and costs of related computations and possible delays in data availability is done
before deciding whether to keep certain data in a database compressed or not.

For security reasons certain types of data (e.g., credit-card information) may be kept encrypted in
storage to prevent the possibility of unauthorized information reconstruction from chunks of
storage snapshots.

The fundamental components of a general-purpose computer are arithmetic and logic unit,
control circuitry, storage space, and input/output devices. Generally, the lower a storage is in the
hierarchy, the lesser its bandwidth and the greater its access latency is from the CPU. This
traditional division of storage to primary, secondary, tertiary and off-line storage is also guided
by cost per bit. In contemporary usage, "memory" is usually semiconductor storage read-write
random-access memory, typically DRAM (Dynamic-RAM) or other forms of fast but temporary
storage.

"Storage" consists of storage devices and their media not directly accessible by the CPU
(secondary or tertiary storage), typically hard disk drives, optical disc drives, and other devices
slower than RAM but non-volatile (retaining contents when powered down). Historically,
memory has been called core, main memory, real storage or internal memory while storage
devices have been referred to as secondary storage, external memory or auxiliary/peripheral
storage.

Primary storage

Primary storage (or main memory or internal memory), often referred to simply as memory, is
the only one directly accessible to the CPU. The CPU continuously reads instructions stored
there and executes them as required. Any data actively operated on is also stored there in
uniform manner. It is small-sized, light, but quite expensive at the same time. (The particular
types of RAM used for primary storage are also volatile, i.e. they lose the information when not
powered).

Traditionally there are two more sub-layers of the primary storage, besides main large-capacity
RAM:

 Processor registers are located inside the processor. Each register typically holds a word
of data (often 32 or 64 bits). CPU instructions instruct the arithmetic and logic unit to
perform various calculations or other operations on this data (or with the help of it).
Registers are the fastest of all forms of computer data storage.
 Processor cache is an intermediate stage between ultra-fast registers and much slower
main memory. It's introduced solely to increase performance of the computer. Most
actively used information in the main memory is just duplicated in the cache memory,
which is faster, but of much lesser capacity. On the other hand, main memory is much
slower, but has a much greater storage capacity than processor registers. Multi-level
hierarchical cache setup is also commonly used—primary cache being smallest, fastest
and located inside the processor; secondary cache being somewhat larger and slower.

2
Introduction to Computers
Main memory is directly or indirectly connected to the central processing unit via a memory bus.
It is actually two buses an address bus and a data bus. The CPU firstly sends a number through
an address bus, a number called memory address that indicates the desired location of data. Then
it reads or writes the data itself using the data bus. Additionally, a memory management unit
(MMU) is a small device between CPU and RAM recalculating the actual memory address, for
example to provide an abstraction of virtual memory or other tasks.

As the RAM types used for primary storage are volatile (cleared at start up), a computer
containing only such storage would not have a source to read instructions from, in order to start
the computer. Hence, non-volatile primary storage containing a small startup program (BIOS) is
used to bootstrap the computer, that is, to read a larger program from non-volatile secondary
storage to RAM and start to execute it. A non-volatile technology used for this purpose is called
ROM, for read-only memory

Secondary storage

Secondary storage (also known as external memory or auxiliary storage), differs from primary
storage in that it is not directly accessible by the CPU. The computer usually uses its input/output
channels to access secondary storage and transfers the desired data using intermediate area in
primary storage. Secondary storage does not lose the data when the device is powered down—it
is non-volatile. Per unit, it is typically also two orders of magnitude less expensive than primary
storage. Modern computer systems typically have two orders of magnitude more secondary
storage than primary storage and data are kept for a longer time there.

In modern computers, hard disk drives are usually used as secondary storage. The time taken to
access a given byte of information stored on a hard disk is typically a few thousandths of a
second, or milliseconds. By contrast, the time taken to access a given byte of information stored
in random-access memory is measured in billionths of a second, or nanoseconds. This illustrates
the significant access-time difference which distinguishes solid-state memory from rotating
magnetic storage devices: hard disks are typically about a million times slower than memory.
Rotating optical storage devices, such as CD and DVD drives, have even longer access times.
With disk drives, once the disk read/write head reaches the proper placement and the data of
interest rotates under it, subsequent data on the track are very fast to access. To reduce the seek
time and rotational latency, data are transferred to and from disks in large contiguous blocks.

Some other examples of secondary storage technologies are: flash memory (e.g. USB flash
drives or keys), floppy disks, magnetic tape, paper tape, punched cards, standalone RAM disks,
and Iomega Zip drives.

The secondary storage is often formatted according to a file system format, which provides the
abstraction necessary to organize data into files and directories (folders), providing also
additional information (called metadata) describing the owner of a certain file, the access time,
the access permissions, and other information.

Most computer operating systems use the concept of virtual memory, allowing utilization of
more primary storage capacity than is physically available in the system. As the primary memory

3
Introduction to Computers
fills up, the system moves the least-used chunks (pages) to secondary storage devices (to a swap
file or page file), retrieving them later when they are needed. As more of these retrievals from
slower secondary storage are necessary, the more the overall system performance is degraded.

Tertiary storage

Tertiary storage or tertiary memory, provides a third level of storage. Typically it involves a
robotic mechanism which will mount (insert) and dismount removable mass storage media into a
storage device according to the system's demands; this data is often copied to secondary storage
before use. It is primarily used for archiving rarely accessed information since it is much slower
than secondary storage (e.g. 5–60 seconds vs. 1–10 milliseconds). This is primarily useful for
extraordinarily large data stores, accessed without human operators. Typical examples include
tape libraries and optical jukeboxes.

When a computer needs to read information from the tertiary storage, it will first consult a
catalog database to determine which tape or disc contains the information. Next, the computer
will instruct a robotic arm to fetch the medium and place it in a drive. When the computer has
finished reading the information, the robotic arm will return the medium to its place in the
library.

Off-line storage

Off-line storage is computer data storage on a medium or a device that is not under the control of
a processing unit. The medium is recorded, usually in a secondary or tertiary storage device, and
then physically removed or disconnected. It must be inserted or connected by a human operator
before a computer can access it again. Unlike tertiary storage, it cannot be accessed without
human interaction.

Off-line storage is used to transfer information, since the detached medium can be easily
physically transported. Additionally, in case a disaster, for example a fire, destroys the original
data, a medium in a remote location will probably be unaffected, enabling disaster recovery. Off-
line storage increases general information security, since it is physically inaccessible from a
computer, and data confidentiality or integrity cannot be affected by computer-based attack
techniques. Also, if the information stored for archival purposes is rarely accessed, off-line
storage is less expensive than tertiary storage.

In modern personal computers, most secondary and tertiary storage media are also used for off-
line storage. Optical discs and flash memory devices are most popular, and to much lesser extent
removable hard disk drives. In enterprise uses, magnetic tape is predominant. Older examples are
floppy disks, Zip disks, or punched cards.

Characteristics of storage

Storage technologies at all levels of the storage hierarchy can be differentiated by evaluating
certain core characteristics as well as measuring characteristics specific to a particular
implementation. These core characteristics are volatility, mutability, accessibility, and
4
Introduction to Computers
addressability. For any particular implementation of any storage technology, the characteristics
worth measuring are capacity and performance.

Volatility

Non-volatile memory
Will retain the stored information even if it is not constantly supplied with electric power.
It is suitable for long-term storage of information.
Volatile memory
Requires constant power to maintain the stored information. The fastest memory
technologies of today are volatile ones Since primary storage is required to be very fast, it
predominantly uses volatile memory.
Dynamic random-access memory
A form of volatile memory which also requires the stored information to be periodically
re-read and re-written, or refreshed, otherwise it would vanish.
Static random-access memory
A form of volatile memory similar to DRAM with the exception that it never needs to be
refreshed as long as power is applied. (It loses its content if power is removed.)

An uninterruptible power supply (UPS) can be used to give a computer a brief window of time to
move information from primary volatile storage into non-volatile storage before the batteries are
exhausted. Some systems have integrated batteries that maintain volatile storage for several
hours.

Mutability

Read/write storage or mutable storage


Allows information to be overwritten at any time. A computer without some amount of
read/write storage for primary storage purposes would be useless for many tasks. Modern
computers typically use read/write storage also for secondary storage.
Read only storage
Retains the information stored at the time of manufacture, and write once storage (Write
Once Read Many) allows the information to be written only once at some point after
manufacture. These are called immutable storage. Immutable storage is used for tertiary
and off-line storage. Examples include CD-ROM and CD-R.
Slow write, fast read storage
Read/write storage which allows information to be overwritten multiple times, but with
the write operation being much slower than the read operation. Examples include CD-
RW and flash memory.

Accessibility

Random access
Any location in storage can be accessed at any moment in approximately the same
amount of time. Such characteristic is well suited for primary and secondary storage.
Most semiconductor memories and disk drives provide random access.
5
Introduction to Computers
Sequential access
The accessing of pieces of information will be in a serial order, one after the other;
therefore the time to access a particular piece of information depends upon which piece
of information was last accessed. Such characteristic is typical of off-line storage.

Addressability

Location-addressable
Each individually accessible unit of information in storage is selected with its numerical
memory address. In modern computers, location-addressable storage usually limits to
primary storage, accessed internally by computer programs, since location-addressability
is very efficient, but burdensome for humans.
File addressable
Information is divided into files of variable length, and a particular file is selected with
human-readable directory and file names. The underlying device is still location-
addressable, but the operating system of a computer provides the file system abstraction
to make the operation more understandable. In modern computers, secondary, tertiary and
off-line storage use file systems.
Content-addressable
Each individually accessible unit of information is selected based on the basis of the
contents stored there. Content-addressable storage can be implemented using software
(computer program) or hardware (computer device), with hardware being faster but more
expensive option.

Capacity

Raw capacity
The total amount of stored information that a storage device or medium can hold. It is
expressed as a quantity of bits or bytes (e.g. 10.4 megabytes).
Memory storage density
The compactness of stored information. It is the storage capacity of a medium divided
with a unit of length, area or volume (e.g. 1.2 megabytes per square inch).

Performance

Latency
The time it takes to access a particular location in storage. The relevant unit of
measurement is typically nanosecond for primary storage, millisecond for secondary
storage, and second for tertiary storage. It may make sense to separate read latency and
write latency, and in case of sequential access storage, minimum, maximum and average
latency.
Throughput
The rate at which information can be read from or written to the storage. In computer
data storage, throughput is usually expressed in terms of megabytes per second or MB/s,
though bit rate may also be used. As with latency, read rate and write rate may need to be

6
Introduction to Computers
differentiated. Also accessing media sequentially, as opposed to randomly, typically
yields maximum throughput.
Granularity
The size of the largest "chunk" of data that can be efficiently accessed as a single unit,
e.g. without introducing more latency.
Reliability
The probability of spontaneous bit value change under various conditions, or overall
failure rate.

Energy use

 Storage devices that reduce fan usage, automatically shut-down during inactivity, and
low power hard drives can reduce energy consumption 90 percent.
 2.5 inch hard disk drives often consume less power than larger ones. Low capacity solid-
state drives have no moving parts and consume less power than hard disks. Also, memory
may use more power than hard disks.

Cloud storage
Cloud storage is a model of data storage where the digital data is stored in logical pools, the
physical storage spans multiple servers (and often locations), and the physical environment is
typically owned and managed by a hosting company. These cloud storage providers are
responsible for keeping the data available and accessible, and the physical environment protected
and running. People and organizations buy or lease storage capacity from the providers to store
end user, organization, or application data.

Cloud storage services may be accessed through a co-located cloud compute service, a web
service application programming interface (API) or by applications that utilize the API, such as
cloud desktop storage, a cloud storage gateway or Web-based content management systems.

Cloud storage typically refers to a hosted object storage service, but the term has broadened to
include other types of data storage that are now available as a service, like block storage.

Cloud storage is:

 Made up of many distributed resources, but still acts as one - often referred to as
federated storage clouds
 Highly fault tolerant through redundancy and distribution of data
 Highly durable through the creation of versioned copies
 Typically eventually consistent with regard to data replicas

Advantages

 Companies need only pay for the storage they actually use, typically an average of
consumption during a month. This does not mean that cloud storage is less expensive,
only that it incurs operating expenses rather than capital expenses.
7
Introduction to Computers
 Organizations can choose between off-premises and on-premises cloud storage options,
or a mixture of the two options, depending on relevant decision criteria that is
complementary to initial direct cost savings potential; for instance, continuity of
operations (COOP), disaster recovery (DR), security (PII, HIPAA, SARBOX, IA/CND),
and records retention laws, regulations, and policies.
 Storage availability and data protection is intrinsic to object storage architecture, so
depending on the application, the additional technology, effort and cost to add availability
and protection can be eliminated.
 Storage maintenance tasks, such as purchasing additional storage capacity, are offloaded
to the responsibility of a service provider.
 Cloud storage provides users with immediate access to a broad range of resources and
applications hosted in the infrastructure of another organization via a web service
interface.
 Cloud storage can be used for copying virtual machine images from the cloud to on-
premises locations or to import a virtual machine image from an on-premises location to
the cloud image library. In addition, cloud storage can be used to move virtual machine
images between user accounts or between data centers.
 Cloud storage can be used as natural disaster proof backup, as normally there are 2 or 3
different backup servers located in different places around the globe.

Potential concerns

Attack surface area

Outsourcing data storage increases the attack surface area.

1. When data is distributed it is stored at more locations increasing the risk of unauthorised
physical access to the data. For example, in cloud based architecture, data is replicated
and moved frequently so the risk of unauthorised data recovery increases dramatically.
(e.g. disposal of old equipment, reuse of drives, reallocation of storage space) The
manner that data is replicated depends on the service level a customer chooses and on the
service provided. Different cloud vendors offer different service levels. Risk of
unauthorized access to data can be mitigated through the use of encryption, which can be
applied to data as part of the storage service or by on-premises equipment that encrypts
data prior to uploading it to the cloud.
2. The number of people with access to the data who could be compromised (i.e. bribed, or
coerced) increases dramatically. A single company might have a small team of
administrators, network engineers and technicians, but a cloud storage company will have
many customers and thousands of servers and therefore a much larger team of technical
staff with physical and electronic access to almost all of the data at the entire facility or
perhaps the entire company. Encryption keys that are kept by the service user, as opposed
to the service provider limit the access to data by service provider employees.
3. It increases the number of networks over which the data travels. Instead of just a local
area network (LAN) or storage area network (SAN), data stored on a cloud requires a
WAN (wide area network) to connect them both.

8
Introduction to Computers
4. By sharing storage and networks with many other users/customers it is possible for other
customers to access your data. Sometimes because of erroneous actions, faulty
equipment, a bug and sometimes because of criminal intent. This risk applies to all types
of storage and not only cloud storage. The risk of having data read during transmission
can be mitigated through encryption technology. Encryption in transit protects data as it
is being transmitted to and from the cloud service. Encryption at rest protects data that is
stored at the service provider. Encrypting data in an on-premises cloud service on-ramp
system can provide both kinds of encryption protection.

Supplier stability

Companies are not permanent and the services and products they provide can change.
Outsourcing data storage to another company needs careful investigation and nothing is ever
certain. Contracts set in stone can be worthless when a company ceases to exist or its
circumstances change. Companies can:

1. Go bankrupt.
2. Expand and change their focus.
3. Be purchased by other larger companies.
4. Be purchased by a company headquartered in or move to a country that negates
compliance with export restrictions and thus necessitates a move.
5. Suffer an irrecoverable disaster.

Accessibility

 Performance for outsourced storage is likely to be lower than local storage, depending on
how much a customer is willing to spend for WAN bandwidth
 Reliability and availability depends on wide area network availability and on the level of
precautions taken by the service provider. Reliability should be based on hardware as
well as various algorithms used.

Other concerns

 Security of stored data and data in transit may be a concern when storing sensitive data at
a cloud storage provider
 Users with specific records-keeping requirements, such as public agencies that must
retain electronic records according to statute, may encounter complications with using
cloud computing and storage. For instance, the U.S. Department of Defense designated
the Defense Information Systems Agency (DISA) to maintain a list of records
management products that meet all of the records retention, personally identifiable
information (PII), and security (Information Assurance; IA) requirements
 Cloud storage is a rich resource for both hackers and national security agencies.
 Piracy and copyright infringement may be enabled by sites that permit files haring.
 The legal aspect, from a regulatory compliance standpoint, is of concern when storing
files domestically and internationally.

9
Introduction to Computers

Das könnte Ihnen auch gefallen