Sie sind auf Seite 1von 8

Ans 1 Input Stack Purpose Several modules are implied to tokenise, compact &

parse a user statement into a parse tree that can be executed in a lower level The
Input Stack controls the process of the statements by passing them through the
modules from the higher to the lower level, like in the 7 layers TCP/IP stack, and
checking if errors occur It also provides methods to wrap and return the errors that
may be returned by its modules Overview The purpose of the Input Stack is to
provide the user with a method to build a parse tree from a textual statement. To
do, several modules are implied (currently the Tokeniser, Compactor & Parser) to
execute the data sequentially. Each one takes the output of the previous module as
input, performs its process and returns a list of errors. Thus a higherlevel structure
is required to pass valid data between the modules and manage the errors. Thereby
the Input Stack can be seen as a module wrapper, which is analogue to the TCP/IP
7-layer structure, because it is designed to take a character string as input and pass
it to the lower-level module until it can be understood and executed by the
database. Its structure has to be simple, easily extensible and maintainable.
Moreover its process must be standardised in order to make its working obvious and
emphasize its multi-layer structure. In this way modules can be easily added and
removed from the stack. One of the main purposes of the Input Stack is also to
provide a method to wrap the errors from the modules, so that it can be reported to
the GUI. To do, a generalised error structure has been defined (see the Error
Handling document). So the Input Stack has to exploit it by providing the functions
that can properly wrap each type of module errors in it. This document deals with
the Input Stacks specifications, process and structure. It also explains the error
wrapping method that has been defined. It also provides information for the
developer about the file structure of the stacks modules and some instructions
about compiling. Specification Input & Output The input of the Stack is the one of its
first module. Actually it corresponds to the Tokenisers input, which is a statement
represented by a character string. The Input Stack returns the following output: 1.
The output of its last module which corresponds currently

Ans 2 o - Since I clearly don't understand the difference between a SAN, a NAS, and a DAS I
though I would post the wikipedia versions or definitions of the 3. These are my understandings of
what they are.
http://en.wikipedia.org/wiki/Direct-attached_storage - Direct-attached storage (DAS) refers to a
digital storage system directly attached to a server or workstation, without a storage network in
between. It is a retronym, mainly used to differentiate non-networked storage from SAN and NAS.
http://en.wikipedia.org/wiki/Network-attached_storage - Network-attached storage (NAS) is file-level
computer data storage connected to a computer network providing data access to heterogeneous
clients. NAS not only operates as a file server, but is specialized for this task either by its hardware,
software, or configuration of those elements. NAS is often made as a computer appliance a

specialized computer built from the ground up for storing and serving files rather than simply a
general purpose computer being used for the role.
http://en.wikipedia.org/wiki/Storage_area_network - A storage area network (SAN) is a dedicated
network that provides access to consolidated, block level data storage. SANs are primarily used to
make storage devices, such as disk arrays, tape libraries, and optical jukeboxes, accessible to
servers so that the devices appear like locally attached devices to the operating system. A SAN
typically has its own network of storage devices that are generally not accessible through the local
area network by other devices.
Now I learned something new - A Direct Attached SAN. I don't know what that is or what that would
look like if I saw one. Trying to research this and I don't get any hits.

information lifecycle
management
Ans 3

Improve system availability and performance by archiving data to readily accessible


storage. Our information lifecycle management (ILM) software can help streamline your IT
infrastructure by decommissioning legacy systems and automating data retention according
to rules you define minimizing risk with control over your data.

Archive obsolete data to enhance performance and reduce database size and
administration costs

Support data retention rules by creating separate archives based on varying data
lifetimes

Reduce IT management costs by consolidating systems and decommissioning


legacy systems

Preserve compliance auditing and reporting capabilities for decommissioned


systems

Reduce the cost and risk of legal discovery by automating data collection for legal
cases

What is structured information ?


It is information that is already structured in fields, such as date, title, subject, unit price,
quantity, total price, commission percentage. Typically, what you find in a record of a relational
database table.
When information is structured, it is usually relatively easy to search it, since you can easily tell a
program : give me the list of record numbers in the table CUSTOMERS, where total sales is greater
than 1,000 and name starts with the letter A.
The drawback is that such RDBMS systems usually require that the fields have a certain maximum
size : a date can have maximum 8 digits yyyymmdd, a name can have maximum 30 characters, etc
This is because the information must fit into columns and tables, and it is difficult for such systems
to handle efficiently data that can vary significantly between one row and the next.

What is unstructured information ?


Generally speaking, what people mean by that is text, such as can be found printed on a 2-page
memo. Although there may be some visual structure for a human reader (its easy to find the date,
wether its on the left or right side; its easy to find the subject once you read a couple of paragraphs),
for a program it is something else altogether. Contrary to popular belief, the amount of unstructured
information in this day and age is several orders of magnitude larger than the amount of structured
information. Unstructured information does not fit easily in the columns and rows concept of
relational databases : the text of a memo may contain 1 paragraph or 100 (especially mine), a book
may have chapters of varying lengths, a technical description of an Airbus plane requires a few
hundred boxes of drawings and text pages, etc.. Thus relational database engines have trouble
handling that kind of data, and must generally handle it as blobs [1] stored in a different way than
the usual columns and rows, which also makes their handling more difficult for this kind of programs.

Ans 4

Disk Drive

A disk drive is a device that reads and/or writes data to a disk. The
most common type of disk drive is a hard drive (or "hard disk
drive"), but several other types of disk drives exist as well. Some
examples include removable storage devices, floppy drives, and
optical drives, which read optical media, such as CDs and DVDs.
While there are multiple types of disk drives, they all work in a
similar fashion. Each drive operates by spinning a disk and

reading data from it using a small component called a drive head.


Hard drives and removable disk drives use a magnetic head, while
optical drives use a laser. CD and DVD burners include a highpowered laser that can imprint data onto discs.
Since hard drives are now available in such large capacities, there is
little need for removable disk drives. Instead of expanding a
system's storage capacity with removable media, most people now
use external hard drives instead. While CD and DVD drives are still
common, they have become less used since software, movies, and
music

can

now

often

be downloaded from

the

Internet.

Therefore, internal hard drives and external hard drives are the most
common types of disk drives used today.
Ans 5 AID S (also known as Parity RAID): This is an alternate, proprietary method

for striped parity RAID from EMC Symmetrix that is no longer in use on current
equipment. It appears to be similar to RAID 5 with some performance enhancements,
as well as the enhancements that come from having a high-speed disk cache on the
disk array.

Downsides of using RAID


Nested RAID levels are more expensive to implement than traditional RAID levels
because they require a greater number of disks. The cost per GB of storage is also
higher for nested RAID because so many of the drives are used for redundancy.
Nested RAID has become popular in spite of its cost because it helps to overcome
some of the reliability problems associated with standard RAID levels.

Margaret Rouse asks:

What RAID level(s) do you use in


your organization?
1 Response

Join the Discussion

Initially, all the drives in a RAID array are installed at the same time. This makes the
drives the same age, and subject to the same operating conditions and amount of wear.
But when a drive fails, there is a high probability that another drive in the array will
also soon fail.
Some RAID levels (such as RAID 5 and RAID 1) can only sustain a single drive
failure (although some RAID 1 implementations consist of multiple mirrors and can
therefore sustain multiple failures). The problem is that the RAID array and the data it
contains are left in a vulnerable state until a failed drive is replaced and the new disk
is populated with data.
Even if a second disk failure does not occur while the failed disk is being replaced,
there is a chance the remaining disks may contain bad sectors or unreadable data.
These types of conditions may make it impossible to fully rebuild the array.
Nested RAID levels address these problems by providing a greater degree of
redundancy, greatly decreasing the chances of an array-level failure due to
simultaneous disk failures.
This was first published in April 2015

Ans 6 A File System is a refinement of the more general abstraction of permanent storage. Databases and Object
Repositories are other examples. A file system defines the naming structure, characteristics of the files and the set of
operations associated with them.
The classification of computer systems and the corresponding file system requirements are given below. Each level
subsumes the functionality of the layers below in addition to the new functionality required by that layer.

Distributed File Systems constitute the highest level of this classification. Multiple users using multiple machines
connected by a network use a common file system that must be efficient, secure and robust.
The issues of file location and file availability become significant. Ideally, file location should be transparent so that
files are accessible by a path name in the naming structure that is independent of the physical location of the file. It
is of course simpler to statically bind names or sub trees of the namespace to machine locations that must be
provided by the user to access the relevant files. This is not satisfactory for users as they are required to remember
machine names and when these names are hardwired into applications it is inconvenient to relocate parts of the file
system data when carrying out ordinary administration activities. Availability is achieved through replication, which
introduces complications for maintaining consistency. Availability is important, as a user may not have control over
the physical location where a file is stored and, when using that file at another site, should still have access
irrespective of the status of the host where the file may be located.
File Systems and Databases
File Systems and Databases have many similar properties and requirements. However there are many conceptual
differences as well.
Encapsulation: File systems view data in files as a collection of uninterpreted bytes whereas databases contain
information about the type of each data item and relationships with other data items. As the data is typed the
database can constrain the values of certain items. A database therefore is a higher level of abstraction of permanent
storage as it subsumes a portion of the functionality that would be required by applications built on file systems. In
addition to enforcing constraints it provides query and indexing mechanisms on data.
Naming: File Systems organise files into directory hierarchies. The purpose of these hierarchies is to help humans
deal with a large number of named objects. However, for very large file namespaces this method of naming can still
be difficult for users to deal with. Databases allow associative access to data, that is, data is identified by content that
matches search criteria.

The ratio of search time to usage time is the factor that determines whether access by name is adequate. If an item is
used very frequently, the ratio is low and a file system is adequate. Not surprisingly, the usage patterns of a file
system exhibit considerable temporal locality of this kind. Databases usage patterns exhibit very little temporal
locality.

File Systems and Databases (Continued)


Dealing with usage patterns: Distributed file systems and databases therefore use two different strategies for
enhancing performance. Distributed file systems use data shipping where data is brought to the point of use. It is
cached and commonly reused frequently. Distributed databases use function shipping where computation is shipped
to the site of the data storage. Transporting the search function requires only a small amount of network bandwidth.
The alternative would be to transport the entire amount of data to be searched to where the search was to be done.
Granularity of Concurrency: Databases are typically used by applications that involve concurrent read and write
sharing of data at fine granularity by large numbers of users. In addition there are requirements for strict consistency
of this data and atomicity of groups of operations (transactions). With file systems, sharing is more coarse grained, at
the file level, and individual files are rarely shared for write access by multiple users although read sharing is quite
common, for example, system executable files. It is this combination of application characteristics that makes it a lot
more difficult to implement distributed databases than distributed file systems.
Empirical Observations
Many of the current techniques for designing distributed file systems have arisen from empirical studies of the usage
of existing centralised file systems.
The size distribution of files is useful for determining the most efficient means of mapping files to disk
blocks. It is found that most files are relatively small.
The relative and absolute frequency of different types of file operations has influenced the design of cache
fetch algorithms and concurrency mechanisms. Most file access is sequential rather than random and read
operations are more common than write. It is also noted that users rarely share files for writing.
Information on file mutability is also useful. It is found that most files that are written are overwritten often.
Consider for example temporary files generated by a compiler or files which we are editing and save
repeatedly. This information can guide a cache write policy.
The type of a file may substantially influence its properties. Type specific file information is useful in file
placement and in the design of replication mechanisms. Executable files are easily replicated as access is read
only.
What is the size of the set of files referenced in a short period? This will influence the size of the cache and
catch fetching policy. Most applications exhibit temporal locality within the file namespace and this set if
found to be small in general.
Problems with Empirical Data
Gathering empirical data requires modification to an operating system to monitor these parameters. This process
itself may impact on the system performance in a small way.
When interpreting the data, consideration must be given to the environment in which it was gathered. A study of a
file system used in an academic environment may not be sufficiently general for other kinds of environments.
Another concern relates to the interdependency of the design studied and the data observed. The nature of a
particular file system implementation may unduly influence the way in which users use the system. For example, if
the caching system used whole-file transfer there would be a disincentive to create very large files.

Another issue is that most empirical studies are carried out on existing centralised file systems under the assumption
that user behaviour and programming characteristics will not change significantly in a distributed system.
Mechanisms for Building Distributed File Systems
Mounting
Mount mechanisms allow the binding together of different file namespaces to form a single hierarchical namespace.
The Unix operating system uses this mechanism. A special entry, known as a mount point, is created at some
position in the local file namespace that is bound to the root of another file name space. From the users point of view
a mount point is indistinguishable from a local directory entry and may be traversed using standard path names once
mounted. File access is therefore location transparent for the user but not for the system administrator. The kernel
maintains a structure called a mount table which maps mount points to appropriate file systems. Whenever a file
access path crosses a mount point, this is intercepted by the kernel, which then obtains the required service from the
remote server.

Das könnte Ihnen auch gefallen