Sie sind auf Seite 1von 29

Sym

Enterprise Vault

TECHNICAL WHITE PAPER

A Technical Overview of the


Enterprise Vault™ 2007 Storage Layer

December 2007
TABLE OF CONTENTS

Introduction...................................................................................................................... 3
Purpose of This Whitepaper ............................................................................................... 3
Target Audience ................................................................................................................... 3
Virtualization of the storage layer—the Open Storage Layer ........................................ 4
The “logical” Storage concept ...................................................................................... 7
The highest level of abstraction: Vault Store ................................................................... 7
Defining Storage devices and locations: Vault Store Partitions .................................... 9
Logically seperating a Vault Store : Archives ................................................................. 10
What is SIS? ........................................................................................................................ 12
Retention management ..................................................................................................... 13
The “physical” Storage architecture ......................................................................... 14
Inside a Vault Store Partition ........................................................................................... 14
Closing of Vault Store Partitions — Improving Backups ............................................... 15
What is a DVS file? ............................................................................................................. 15
DVS files for open standards ............................................................................................ 16
Using a flat file system ...................................................................................................... 17
Collection services — a more intelligent flat file system ............................................. 17
Migration service — migration as a central activity ...................................................... 18
Summary of the interaction of Enterprise Vault with an NTFS file system ................ 19
Storage hardware and systems .................................................................................. 20
Storing Enterprise Vault archives on Network Attached Storage................................ 21
NetApp storage systems...................................................................................................................21
Hitachi Content Archive Platform (HCAP) ......................................................................................22
Systems integrating with the Enterprise Vault Migrator .............................................. 22
Pegasus Disk Technologies – InveStore ........................................................................................23
Fujitsu Eternus ...................................................................................................................................24
Veritas NetBackup .............................................................................................................................24
Integration with Tivoli Storage Manager / IBM DR 550...............................................................25
Content Addressed Storage and other API-based storage ........................................... 25
EMC Centera........................................................................................................................................26
Read-only storage and retention management in Centera ........................................................26
Single Instance Storage and collections with Centera ...............................................................26
Centera summary ...............................................................................................................................28
Other supported storage systems ................................................................................... 29
Summary of the Open Storage Layer of Enterprise Vault 2007 .................................... 29

Page 2
Introduction
Purpose of This Whitepaper

Customers implement archiving solutions to reduce the cost of storage for primary applications
such as email or file shares while allowing for long-term retention of the large amounts of
business-critical information produced by these systems.

As this paper will show, Enterprise Vault is architected to meet these two objectives and
specifically provides:

Reduced overall cost of storage

– Storage tier-ing: Keeping inactive or noncritical data on less expensive storage media
– Storage rationalization: Minimizing content duplication and size
– Backup optimization: Shrinking overall backup and disaster recovery (DR) time

Long-term data retention

– Data integrity: Ensuring that data is captured accurately and reliably


– Data resilience: Maintaining data across document formats and storage platform
– Data fidelity: Optionally leveraging specialist storage to prevent data tampering

The intent of this paper is to show how, through its interaction with storage systems, Enterprise
Vault achieves the above goals.

In addition, we will show you how architectural decisions we have made in building Enterprise
Vault will enable the solution to continue to meet customer needs for the foreseeable future.

Target Audience

The primary target audiences for this white paper are Enterprise Vault partners or customers that
are looking for an introduction to the concepts that Enterprise Vault uses for storing archived
items, indices and metadata.

Page 3
Virtualization of the storage layer—the Open Storage Layer

The Open Storage Layer (OSL) is the virtual layer that contains all functions that affect and
control the way Enterprise Vault interacts with physical storage systems and devices. End users
need not be aware of archiving taking place, especially that an item has moved from the primary
application to the archive. In addition, they should not know that items may subsequently move to
other storage systems as they age, in a process called migration (storage treeing, or Information
Lifecycle Management). Furthermore, most users will not be interested in the particular
characteristics of a storage system (e.g., basic NTFS versus a WORM media store)

The Open Storage Layer allows Enterprise Vault to virtualize the underlying storage, so that
users of the archive are not aware of the storage system they are using today, and more
importantly, a new storage system can be introduced to the archive at any time.

At a very high level, the benefits that the OSL gives to the customer, and hence the advantages
that Enterprise Vault gives to the customer, are:

– Storage tier-ing:
Automated migration of data (by policy) from the primary tier of the archive to a
secondary tier—whether that second tier is disk, tape, or other media • Storage
rationalization: Compression of archived data and Single Instance Storage to remove
duplicate copies of archived data

– Backup optimization:
Reduction of primary data to backup as well as an efficient format (leveraging archive
container files and/or partitions) to reduce the amount of archived data to backup

– Data integrity:
A “safety copy” feature that optionally ensures that archived items are backed up or
replicated before they are removed from the primary server, to reduce the risk of data
loss

– Data resilience:
Protection to move to new storage platforms in the future and an archive file format
that inherently preserves a “future-proofed” copy of every archived document (in
HTML, or alternatively, an XML rendition), while also retaining the original in a
transparent format that can be viewed independent of Enterprise Vault

– Data fidelity:
An approach that prevents data/metadata elements from being lost during archiving
and integration with underlying WORM or WORM-like technologies that ensure that
data is retained for the desired period and is not tampered with during that period
Figure 1 shows the OSL and its components. Each of these virtual and physical
components will be explained during the course of this paper and this diagram will
serve as a convenient way to highlight all of the value points of the Enterprise Vault
solution’s advanced interaction with storage systems

Page 4
Figure 1. The Open Storage Layer contains all the functions that determine how Enterprise Vault interacts with
storage systems and devices

Storage for archiving requires a longer-term view

One important consideration when examining the correct “type” of storage in which to house an
archive is the length of time that items reside in the archive. Typically, IT purchase decision
cycles are based on the accounting write-off period (the interval at which, in accounting terms, the
item no longer has any value); a typical figure for this is three years.
Companies are finding that they often require, or are even obliged by law, to retain content for
longer than this write-off period. An average figure is often seven years. This means that, on
average, we would expect any single item in the archive to “live” on more than two storage
systems during its life span. In an increasing number of cases, much longer retention periods are
required, so the ability of an archive system to evolve painlessly through multiple generations of
storage is mandatory rather than a luxury.

To this end, migrating between storage systems is a key facility of the Open Storage Layer and
allows the archive to consume new storage and new storage systems over time

Page 5
The key elements of Enterprise Vault

Before we take a look at the interaction of Enterprise Vault and storage systems, we should have
an overview of what elements are involved in an Enterprise Vault installation and what content is
stored in an archive.

Figure 2 shows that there are three main content sources that combine to make Enterprise Vault.

Figure 2 - The three main content sources in Enterprise Vault

• The structured Metabase:


Enterprise Vault has a dependency on Microsoft® SQL Server to maintain a database of
structured information about archived items. This information could be the current location of the
item or extra meta tags describing the item. Use of the SQL Server database also means that
several Enterprise Vault servers can share information about the same item quickly and easily.
Archived items are not stored physically in the Microsoft SQL Server database

• Full text index:


Every item that is stored in Enterprise Vault will undergo full text indexing so users can perform
rapid keyword searches. This is not performed by Microsoft SQL Server database but by the
AltaVista index engine. The index services store their items in flat file structures on disk.

• Physical storage:
The full text search and the Microsoft SQL Server database both aid management of the physical
storage. It is this area of Enterprise Vault that this paper will discuss. Individual items are added
to the archive as DVS files and later may be combined into larger collections
Migration, rationalization of content, and partitioning are all actions that may happen against the
stored items and will be explained in this paper.

Page 6
The “logical” Storage concept

The highest level of abstraction: Vault Store

Imagine two storage Administrators discussing their daily problems in the company’s cafeteria:

– “I have real trouble tracking the information on all our different storage devices.
Wouldn’t it be great to have a database that tracks data across all our file-systems,
tapes and WORM storage devices? If I move a file to some other system, a simple
database query would still get me to the information”

- “That’s a great idea! But the database probably shouldn’t track filenames; it
should use a hashing checksum so that a rename of a file does not break the
database.”

– “Indeed! Using a hash also allows us to check for identical files, so that we can
reduce the number of duplicate copies in the system”

- “That’s going to save us a lot of storage. But if we also track the age of the
information and expire content we can make sure that we keep only the data we
need to retain.”

– “Hey hang on, if we also check who “owns” which objects, we could manage
permissions from a central point, without the need to drill down into all file-systems
and manually check the permissions.”

In fact, those two Storage experts have just described the Enterprise Vault “Vault Store”.
It is essentially a SQL database that stores the following information about any item stored in
Enterprise Vault:

– Current storage location


– Hash-code (checksum)
– Number of Sharers (Single-Instance Storage)
– Archived and Modified Date
– Retention Category
– Permissions

Note that although it is called Vault Store, it is not referring to a storage device itself. At this level,
we only track in which of the “known” storage locations the item is residing at the moment.

You can actually see each Vault Store in the Enterprise Vault Administration Console (VAC).
It is the top-level container referring to the storage sub-system within the Enterprise Vault product.

Page 7
Figure 3 - The Vault Administration Console exposes the content for management via the Vault Store, a collection
of Vault Store partitions physically on disk.

The main benefit of the Vault Store is efficient management of the archive’s storage. It allows the
administrator to abstract the physical storage of archived items by abstracting multiple devices,
locations and tiers by tracking the location of items inside a SQL Database.

For a user accessing the data, it is completely transparent whether the item is stored on a CAS,
NAS or DAS, if it is still stored in a single-file or a container or has been migrated to another
storage tier – Enterprise Vault will look up the location of the requested item and manage the
retrieval accordingly.

The Vault Store consists of two structures:

• Vault Store Database:

This is the SQL database that acts as a directory to all items inside a Vault Store.

Each item, on average, uses between 500 and 3,000 bytes of storage in the Vault Store
Database (the size depends on what is being archived i.e. File System data, Exchange
mailbox data, Domino mailbox data). This small amount of information is all that is
needed to identify a archived items storage location. Therefore even a large installation
with hundreds of millions of objects will have a relatively small and manageable database
for each store.

• Vault Store Partitions:

These physical subdivisions of a Vault Store contain the “managed locations” where
archived information is stored. A Vault Store consists of one or more Vault Store
Partitions. These Partitions could be dubbed “Physical Storage locations” as they refer to
existing storage locations like a particular share on a File-Server or a mounted Volume
on a SAN.

One Enterprise Vault server can have multiple Vault Stores, but these stores cannot be across
multiple Enterprise Vault servers.

Page 8
When you create a new Vault Store in the Vault Administration Console, you select the Enterprise
Vault Server and storage service that will host the new Vault Store. The storage devices and
other storage related settings will be configured at the partition level.

Defining Storage devices and locations: Vault Store Partitions

As shown above, the Vault Store is a high-level container, visible to the administrator as a major
subdivision of the archive, but without any reference to the storage itself.

Partitions in contrast are the representation of a specific storage location and policy.

Settings include:

- The device and technology used


- Settings for the “Volume” on the device
- Whether Single-Instance Storage should be used
- If the device is WORM capable
- A flag that NTFS ACLs are available on the device

- All settings regarding Collections


- The device and settings used for migrations to another tier of storage.

Partitions within the Vault Store are usually (but don’t have to be) located on specific physical
devices, which can be mixed within a particular Vault Store, not only in their location (e.g., a
particular NTFS volume) but also in their storage technology. For example, a tape-based partition
could be closed and a new magnetic disk partition opened within the same Vault Store, or a NAS
or SAN device could be replaced by a CAS store. This means that an organization changing to a
new storage technology does not have to perform an immediate store migration before starting to
use the new device – all devices can be used in parallel.

While a Vault Store can contain more than one Vault Store Partition, only one of these partitions
will be open for writing at any one time. The only activity in the other partitions will be to delete
items as they expire, and no further items will be added to them. This means that different backup
and DR regimes can be applied to different partitions of a Vault Store depending on their status,
as described later.

In addition, Vault Stores can contain multiple types of data — for example, both the end-user
mailbox archives as well as the journal archives for a given set of users. As we will describe later
in this paper, this allows companies to achieve Single Instance Storage across instances of email
from journal and end-user mailbox archives.

The Vault Store Partition is the most critical item in the storage structure and, to this end, is
designed to be fully self-contained. This means that if a Vault Store Partition was recovered from
a backup tape, for example, then all content required to make that Vault Store operational (e.g.,
item security, retention periods, and original locations) are contained in the Vault Store.

Given the size and number of objects held in an average Enterprise Vault Partition,
Administrators are urged to backup the SQL database and the Altavista Indices, as retrieving the
information alone can literally take weeks – still it is good to know that there are tools to recreate
that information from the stored items in the partition as a “last line of defense” in a complete
disaster scenerio.

Page 9
By creating multiple VaultStore databases, each containing multiple partitions that each may
easily contain millions of individual archived items, you can easily scale Enterprise Vault to
manage huge ammounts of data across dozens of completely different storage devices. This
approach is the key to effective business continuity and to ultimate scalability even if the system
needs to service tens of thousands of users and contain many Terrabytes of data.

Logically seperating a Vault Store : Archives

We have now seen how the Vault Administration Console helps the administrator by virtualizing
the view of the physical storage to aid in management of the archive. These container structures
not only aid the management of the archive, but they also improve the resiliency and scalability of
Enterprise Vault. When looking at the Vault Administration Console, you will see another logical
concept that should be mentioned: the Archives container shown in figure 4 below:

Figure 4 - The Archives container is used to manage and represent logically connected items inside Enterprise
Vault (e.g., a single user’s mailbox), regardless of their physical type or location.

Where the Vault Stores are the container structures used for managing the physical data on the
disk, Archives are used to manage logical collections of items (e.g., a single user’s mailbox, a file-
server share or a Sharepoint document library), regardless of the storage location of this archived
content. The Archives container in the Vault Administration Console also helps manage the
indexing services and other tasks on “logically connected items,” like the management of security
(You can find more about the index concepts relating to archives in the white paper “Enterprise
Vault Technical White Paper - Indexing and Search” )

There are several different types of archives available in Enterprise Vault 2007:

– Exchange Mailbox Archives

Page 10
References all items archived from an Exchange users mailbox and keeps the
security and index settings accordingly. Mailbox archives are structured, meaning
that they keep information about the folder hierarchy inside the archive.

– Exchange Journal Archives

References all items that are achived from one or more Exchange journal maiboxes.
Journal archives are flat archives (without information about folder hierarchies).

– Exchange Public Folder Archives

References all items archived from one or more public folder root paths. Permissions
are synchronized from the Exchange permissions on the Public Folder

– Domino Mailbox Archives

References all items stored from a single Domino users mailbox. Domnio archives
are always flat as Notes user use “views” instead of folders, so that items can
potentially belong to several different views.

– Domino Journal Archives

References all items archived from one or more Domino journal mailboxes. Domino
journal archives are also always flat.

– File Archives

References all items archived from one or more file system archive points.
Permissions are synchronised from the file system. These archives are also
structured.

– Sharepoint Archives

References all items archived from one or more SharePoint document libraries.
These archives are flat.

– Shared Archives

References an archive that contains archived information that can be used to help
users share information across sources and groups of users. Shared archives are flat
archives.

Over a single archive may grow very large, and the actions of the EV migration services may
mean that the content is physically located over several different storage systems. A single
archive can be span mulitple Vault Store partitions, but when an end user searches an archive,
he or she should be presented with a consistent set of results and need not be aware of the
location of the actual items.

WORM - Building a tamper-proof read-only archive

Page 11
The default Enterprise Vault deployment, regardless of the underlying storage technology, will
create a read-only archive. Even if the archive is housed on a standard NTFS disk with read and
write permissions, the default actions will be to disallow users from modifying or deleting content
from the vault directly.

There are options that can be set to allow users to delete content (over which they have rights)
directly but they can never modify previously archived content. Enterprise Vault is effectively a
read-only, or “fixed content,” store. Enterprise Vault does not provide any way for item content to
be changed once written to the archive, unlike, say a regular file-share.

If an item that has already been copied to the archive (e.g., from a file system) is subsequently
edited and then re-archived, this modified file will be treated as a new item and archived
separately. A search request would show multiple versions, distinguished by their time stamps.

Most of the time this is considered sufficient protection against tamering, but in certain regulatory
compliance situations, the additional protection of WORM storage may be required to defend
against illicit tampering at media level. Enterprise Vault supports the latest WORM technologies
from all leading vendors. You find a description of the available storage options later in this
document.

What is SIS?

If we have the same 10 MB email sent to every user in the organization, then why do we need to
store many copies of the email when we can store a single version of the document and
“describe” that the email is owned by a number of different people?

Using hashing algorithms, each file generates its own unique ID. If we determine that this unique
ID has been archived before, we do not store the item again, but instead simply add a second,
user-specific header to the existing saveset file containing the item already stored. This will
contain all the user-specific properties for the second user sharing the same item, but without the
need to store another copy of the main content.

To check for the SIS opportunity for a single file (e.g., a Word file), the metadata properties of the
file are separated out and then a full hash of the main file body is performed to create a unique ID
for that file. As for email messages, the “per user” attributes (e.g., read/unread status or follow-up
flags) are separated out, and then the potentially shareable part of the email (e.g., recipient lists,
subject, message body, and attachments) is examined and a unique hash created for SIS
checking.

By doing this, we not only maximize the effectiveness of SIS storage, but we also ensure that
when items are presented back to the end user, they exactly match “their” copy of the file. Note
that this is a key feature of the way that Enterprise Vault performs SIS. If an apparently identical
message was shared between recipients without taking into account that per-user properties
could differ, then we would lose, or worse, change information, such as the fact that a message
was unread or that the user had changed the title in their own mailbox copy.

Single Instance Storage operates between items within a single Vault Store Partition. When
Enterprise Vault is installed, a key design consideration is defining partitions in a way that
optimizes SIS benefits. The reasons for having more than one partition in a Vault Store will be
discussed later.

Page 12
Retention management

So far we have spoken about putting items into the archive and how the system is both resilient
and easy to recover in the event of a disaster. However, another key activity any archive should
be able to perform is the deletion of content once it has been kept long enough to meet either
business or regulatory commitments.

Even though using Enterprise Vault means that the cost of ownership of older items can be
significantly lower than keeping everything in the front-line stores, it is still even more cost-
effective to remove items when they have no further use. From the pure storage point of view, it
saves direct storage costs as well as indirect storage ownership costs. From the regulatory point
of view, deleting items when they are no longer required to be retained can potentially save the
considerable cost of unnecessary discovery of those items in response to a subsequent litigation
or request from regulator.

Every item that is added to the archive is assigned a retention category that will indicate to
Enterprise Vault how long the item should be retained and given a date when the expiry service
will delete the item from the archive. The retention category is not a physical date added to the
archive item, rather it is a category associated with the item. If a physical date was added to the
file, then there would be no efficient way to make wholesale changes to the retention dates
assigned to items. A retention category will allow the expiry service to determine on a file-by-file
basis if the expiry time has been reached and, at the same time, allow easy updates to the
retention category, for example, if a regulatory body extended a retention period. The expiry
service runs as a separate service, meaning that it can be run at a “quiet” time, for example,
when users are not accessing the system to any great extent.

Note that although automatic expiration is the most common way to remove items from the
archive, there are both end-user and administrator functions to delete items explicitly, but these
are subject to permissions.

Though not directly connected to the storage service, Enterprise Vault also has the ability to
manage the lifecycle of any shortcuts created in Microsoft Exchange. This means that not only
are items in the vault retained exactly for as long as needed, minimizing the risk and the amount
of storage consumed, but the same benefits also can be applied to the users’ shortcuts in a
centrally controlled fashion, so that shortcuts can be removed sooner than the item expiration
period. This is necessary as over time many shortcuts in a users mailbox, for example, could fill
the mailbox quota. An example of this is to allow shortcuts to be retained in a users mailbox for
one year, while the items themselves remain in the archive for a further five years. Once the
shortcuts have been expired these items can still be searched and retrieved by the various
search applications and Archive Explorer. The old shortcuts no longer clutter up the mailbox.
In any case, shortcuts are automatically deleted when archived items are expired.

Page 13
The “physical” Storage architecture

Inside a Vault Store Partition


As we have stated before, the Vault Store Partition is the level where the “physical storage” is
added to the Enterprise Vault configuration. For most storage devices this will be a primary
storage path and (if needed) a second tier where the data should be migrated after a given period
of time. Only when using an EMC Centera, you will add the IP Addresses of a Centera Frame
without the option to further migrate the data.

In this chapter we will focus on File-System based storage devices like NTFS volumes or CIFS
shares.

File structure on disk


Most archiving use-cases are based on archiving data according to its age. When archiving from
mailboxes or file-shares, this age might be a few weeks, while journal archiving of email
messages means that the item is only a few seconds old. Nevertheless customers see the
archive build up as a timeline of information, therefore Enterprise Vault is designed to organize
the data in a flexible folder structure that represents the last modification date of the information.

This structure will automatically be created during the arcive process and will grow in a very
predictable and efficient way, resulting in a folder hierachy that never exceeds 4 levels of
subfolders and will split any days worth of information into a maximum of 25 folders. (1 “day”
folder containing 24 “hour” subfolders)

This is illustrated below:

Figure 5 - Enterprise Vault stores content in a flat file structure and, initially, as separate items

As can be seen, the partition is divided into folders in the format


YEAR (YYYY) \ MONTH (MM) \ DATE (DD) \ HOUR (HH)

The lowest folder level shows items collected from the same time period, and by default, these
items will come from the same location in the primary application.

Page 14
Every single item that is archived will have the ability to create its own “Saveset” file; however, as
will be shown later when discussing Single Instance Storage, not every item will create a file as it
can be referenced to an already existing one (providing they are the same).

Closing of Vault Store Partitions — Improving Backups

As we have already seen, the Vault Store contains one or more Vault Store Partitions. At any one
time, only one Vault Store Partition is open and being written to. Organizations can take
advantage of this to further reduce the TCO of Enterprise Vault

As described earlier, Enterprise Vault stores items in individual files rather than in a single large
structured file, which lends itself to the use of incremental file backups rather than having to back
up the entire archive store every time. However, with some backup solutions, even this approach
becomes inefficient as the time taken to detect the new or modified files can become prohibitive
when vast numbers of files are being targeted.

The concept of Vault Store Partitions addresses this problem. Partitions are given a maximum
theoretical size, and once this size is reached can be closed and another partition opened to
store all future data. Once a partition is closed, nothing new is written to that partition. Even items
that are called from that partition and are edited and re-archived will be stored in the current
“Open” partition.
1
Imagine an archive where 80% of your corporate email content now resides in Enterprise Vault,
and that this equates to 5 TB of archive storage. If the Vault Store Partitions were limited to 200
GB, then at any one time only a maximum of 200 GB out of the 5 TB will be liable to change due
to newly archived items. This makes the archive very “backup-friendly,” as little of the overall
corporate content now changes, and entire areas of the archive that are now closed can be
backed up much less frequently, if at all.

In practice, the only changes that will be made to a closed partition are deletions, which means
that the frequency of backups for these closed partitions can be greatly reduced without fear of
data loss. Even restoring an older closed partition may be all right as references to the deleted
items will have been deleted from the directory and the indices, which may be acceptable in any
but the most closely regulated environments

With large ammounts of your corporate knowledge potentially residing in Enterprise Vault, it is
imperative that consideration is given to how to effectively back up the archive and, more
importantly, how to recover the archive in the event of a disaster. The self-contained nature of the
DVS file and the concept of closing Vault Store Partitions meets these goals to create a storage
system that is highly resilient to failure and, in addition, can be efficiently recovered from a
disaster

Moving a Vault Store Partition

It is perfectly possible to move NTFS based Vault Store partitions between volumes on the same
storage device or between NTFS storage devices. Please reference the following technotes on the
Symantec support site for more information:

1
It is estimated that 70-80% of all content in a typical messaging system is older than 30 days.

Page 15
273271 – How to move a Vault Store partition or Vault Store on the same Enterprise Vault server
from one location to another

282880 – How to move a Vault Store and Vault Store partition to a different Enterprise Vault (EV)
server in the SAME site.

What is a DVS file?

The DVS file is a reminder of how long Enterprise Vault has been working in the enterprise
environment. DVS stands for Digital Vault SaveSet, and the “Digital” part of that name refers to
the company Digital Equipment Corporation, where Enterprise Vault was originally created. A
DVS file is often refered to as a “Saveset”.

A DVS file is a single piece of archived content. Each DVS file consists of two main sections; the
main section contains the actual archived content and its index (HTML) rendition in a compressed
format, the second section describes where the content came from originally and who owns it (per
user information). The DVS file obtains its name from the following parameters:

<checksum><date><time><seconds><saveset_uniqueid>.dvs

In addition to holding the original item, Enterprise Vault retains an HTML text version of the item
in the DVS file (provided that the content can be converted). This offers a degree of futureproofing,
as it means that there is a version of the item that can be read without the need for the original
viewing application such as Outlook or Notes. This is important because of the increasing
longevity of items held in the archive. Many companies are adhering to retention periods that could
be several decades, so they should ask themselves the question, “Will applications such as
Microsoft Office be available in 100 years time, and if not can I access the stored content?”. In
addition, the maintenance of an HTML rendition means that an item can be rapidly viewed from a
Web application without having to perform on-the-fly conversions. This is extremely valuable to
save time and cost during a larger legal dicovery where highly paid legal staff do not want to wait
for applications to open or - even worse – to be installed on demand by IT staff.

DVS files for open standards

Imagine a situation where you walk into the office and find a discarded backup tape on the foyer
floor. On examination of the tape, you find that it contains a partial backup of Enterprise Vault with
lots of DVS files.

Can you open these files without Enterprise Vault?


We have already stated that this is a proprietary format, so surely this is impossible?

It should, of course, be stated that every effort is made during the design and implementation of
Enterprise Vault to ensure that the archive is secure and that a discarded backup tape is an
extreme situation out of the control of Enterprise Vault, but the answer to the above question is,
yes, you will be able to open the DVS file.

There are tools available from Enterprise Vault support and professional services that will allow
you to open these files and recover the content. The actual workings of this process are beyond
the scope of this paper, and if you require further information, please contact Symantec support.

Page 16
Using a flat file system

By looking at the files on the disk, we can already determine a great deal about how content is
stored and arranged in Enterprise Vault to maximize the efficiency of storing the archive. We can
see that DVS files roughly equate to a single archived item, meaning that Enterprise Vault is a flat
file system. To see why Enterprise Vault benefits from storing items in a flat file manner, we
would have to look at some of the problems associated with the longer-term retention of content
within Microsoft Exchange.

Exchange stores its content in large database files (.EDB files). This lends itself to the
performance requirements for a dynamic front-line system but is not ideal for longterm retention
of large and growing volumes of information. Each EDB file could contain millions of items, and if
viewed from the file system, only one huge file would be visible. This huge file highlights the
problems of long-term storage scalability in Exchange that Enterprise Vault solves.

The EDB file and Exchange themselves will be able to scale very well, but often the critical
supporting applications alongside Exchange will fail long before Exchange reaches a limit. For
example, the backup applications will struggle to recover large EDB files in a limited time period
Enterprise Vault initially stores items in a flat file format rather than within a database or pseudo-
database structure. This means that there are far fewer scalability headaches, and in addition, the
single items can be accessed without the need to load a large database file. For archive stores,
we find that there is usually no advantage in caching archived items, as the access pattern is that
of relatively infrequent and random retrieval from very large volumes of items.

However, while writing individual flat files is optimum for the short term for the reasons outlined
above and for good transactional integrity and single-instancing, this approach needs to be
balanced in the long term with certain inefficiencies in holding extremely large numbers of
discrete files. Our approach to this is discussed later in the “Collections” section.

A further advantage of the flat file approach is that a more granular transactional approach can be
taken to guarantee the integrity of an archived item. Enterprise Vault only deletes an item from
the target system (or replaces it with a shortcut) when it is assured of the safety of the archived
copy. This can simply be a case of only deleting it once the item has been successfully written to
the store, but it can be further enhanced by deferring deletion until the archived version has itself
been backed up. This is achieved by a separate “watcher” service that monitors the backup
status of archive files and only triggers the deletion of the original file once the archive file has
been backed up. This “safety copy” feature is a critical element for our customers in maintaining
data integrity in terms of the overall archive.

The current location and status of the DVS file is stored in the Vault Store Database, held in the
Microsoft SQL Server database. It is important to note that only Enterprise Vault data such as file
name and location is stored in the SQL database, which does not store any of the archived
content or metadata. Maintaining an accurate position of both where a DVS file currently resides
and what content is stored within a DVS file is very important when we look further at the lifecycle
of a DVS file, as it moves to another storage system or becomes part of a larger container

Collection services — a more intelligent flat file system

It was previously stated that Exchange and other primary applications were not designed for the
long-term storage of content as they store content in large file structures, unlike Enterprise Vault,
which stores its content in a flat file format or as single files.

Page 17
Initially, storing items as single files has many advantages, most especially speed of access to
the most recent content. However, as items age, it becomes less and less favorable to store them
as single files. For example, most file based backup software can struggle to perform incremental
backup of file structures with huge numbers of files.

There is an ideal balance in the middle. Initially storing items as single files for performance
efficiency then later collecting them for improved storage occupancy and backup optimization
would seem the ideal for an archive, and Enterprise Vault does just this. The collector service
works within a Vault Store Partition to, by policy, collect lots of DVS files into larger containers
(CAB files), which dramatically reduces the number of files in a Vault Store Partition. The backup
software is now presented with fewer, larger items rather than lots of smaller items. This
ulitimately will always increase backp performance.

The collector has a configurable policy, allowing the administrator to both define the age at which
collection starts and also the maximum size of each collection file. In addition, larger files can be
excluded from the collection process but treated as collected if their size alone equates to a
collected file.

This behavior is completely transparent to end users and applications layered on Enterprise Vault.
The mapping of items to containers is maintained in the normal SQL-based Vault Store directory,
which is updated to reflect the new storage structure.

Migration service — migration as a central activity

We have already discussed how an average item in the archive will likely be stored for longer
than the expected life span of the storage system. We would then expect that an item will, at
some point, have to be moved to another storage system. This migration of content, which is built
into the core of Enterprise Vault, can happen in two dimensions. First, new storage technology
can replace the existing store as a primary archive repository, and the partition mechanism
described earlier allows this to be done rapidly and in a way that is transparent to the end user.
Second, as described below, a different storage technology can be introduced as a secondary
archive store sitting behind the primary archive store.

Not only does being able to quickly and easily migrate content allow new storage systems to be
brought on stream as and when needed, it also allows for the continued lowering of the TCO of
the archive store. For example, initially items may reside on a fast NAS system, but as they age
items can be moved to a slower but far less expensive storage system (e.g., magnetic tape).

This migration of content between tiers of storage systems is part of overall Information Lifecycle
Management.

The migration service is configured at a Vault Store Partition level, and the standard service can
copy content from NTFS volume to NTFS volume. The target volume could, for example, be
magnetic tape or an optical store fronted by software that presents it as an NTFS volume
Alternatively, special migrators can be plugged into the mechanism to support non-NTFS stores
For example, there is a Symantec NetBackup™ migrator that allows the same tape infrastructure
that is being used by NetBackup to also be used by Enterprise Vault.

Note that this is a further advantage of collections, since it is collections that are moved to the
secondary store. They are a much more efficient way of handing “slow” storage, such as tape, as
opposed to relatively small files.

Again, this migration behavior is transparent to users and layered applications

Page 18
Figure 6. Enterprise Vault is able to migrate content, by policy, from a Vault Store Partition

Note: The migration service discussed above is a policy-based service that will allow for multiple
tiers of storage. Professional Services offers a migration consultancy service that will allow
organizations to completely move to a new storage system, and cease using the initial system.

This is not the same as the collector/migrator services outlined above, which are likely to be used
when migrating from NTFS to a new storage technology (e.g., EMC Centera), where they are
required to move everything off the old store rather than leave them in place alongside the new
store.

Summary of the interaction of Enterprise Vault with an NTFS file system

Since the largest opportunity for Enterprise Vault has been, and will continue to be, archiving from
environments that are compatible with Active-Driectory security, the predominant file system
tends to be NTFS. In addition, the adoption of NTFS as an open standard interface to other non-
disk-based systems (e.g., tape) means that Enterprise Vault support for the NTFS system is the
base standard from which all other storage support is derived.

There are other, non NTFS based storage systems available for use with Enterprise Vault, but
those need to be carefully checked against the Enterprise Vault certification and compatibility
tables (available on the Symantec support website).

Key benefits to the storage infrastructure maintained by Enterprise Vault are:

Page 19
• Storage tier-ing:
Enterprise Vault inherently moves less active data out of primary applications such as Microsoft
Exchange and, as described above, can optionally migrate older archived data to a secondary tier
of archive storage, whether it is disk, tape, or other media

• Storage rationalization:
The minimum amount of content is actually stored on the storage system. Not only is every item
in the archive compressed and Single Instanced, but also this is done without minimizing data
reliability

• Backup optimization:
Since Vault Partitions can be closed and DVS files are read-only, the archive is very efficient to
back up and recover. Enterprise Vault solves many of the problems of primary applications that
store content in massive data files or on relatively expensive storage solutions. Enterprise Vault
stores items as individual files where appropriate and combines those into larger collections,
managed by a centralized policy

• Data Integrity:
As described earlier, the “safety copy” functionality ensures that no archived data is lost by not
deleting the original item until a backup or replica is created (optionally)

• Data resilience:
From the DVS file all the way through to the Vault Store, each of the storage areas is self-
contained, meaning that it can be recovered without the need for the rest of the archive or any of
the support databases. As described, HTML copies “future-proof” archived data. In addition, while
so far we have only spoken about NTFS, the API infrastructure built into the Open Storage Layer
means that we are able to support a wide range of storage devices today and seamlessly bring
on stream a new storage technology in the future

• Data fidelity:
As mentioned, we have never sacrificed data fidelity in the design of Enterprise Vault, ensuring
that no data or metadata—including per-user items such as the “read-receipt flag”—is lost during
the archiving process

Storage hardware and systems

Page 20
Figure 7 – Vault Store partition options

Enterprise supports a large number of different storage devices and systems, covering all
relevant media, sizes and vendors.

There are three different categories of systems

- File System based storage (First Tier)


- Systems integrating with the Enterprise Vault Migrator (Second Tier)
- CAS and other API-based Storage (No Tiering)

This section focuses on the specific implementations and characteristics of the various storage
systems.

Storing Enterprise Vault archives on Network Attached Storage

We will now examine the interaction of Enterprise Vault with non-windows NTFS storage. Many
vendors have written front ends to allow open access to other non-disk-based storage hardware
(e.g., Pegasus Disk Technologies NTFS front end to UDO storage). Later in the paper we will
examine some of the more specific integrations available with Enterprise Vault and non-NTFS
storage systems

NetApp storage systems

NetApp has a wide range of solutions for customers, from high-end storage area networks to
lower-cost network attached storage. One of the key benefits that NetApp offers is that the same
storage infrastructure can be repurposed between these two offerings at any time. In general,
NetApp filers present themselves as NTFS volumes via the CIFS protocol, so they will work
normally with Enterprise Vault, the exception being SnapLock, where some special handling is
required.

Page 21
NetApp SnapLock
There are many additions to the OnTap operating system to improve resiliency or help with
disaster recovery. One addition is SnapLock, which enables the creation of WORM partitions in a
disk based NetApp environment. Just as with standard NetApp volumes, these are fully
supported with Enterprise Vault, but there are a few considerations that means special handling is
needed when implementing SnapLock. These storage volumes are an explicit administrator
choice for a Vault Store target

1. If you are using a SnapLock volume, you are unable to take advantage of Single Instance
Storage. SnapLock turns a filer into a true WORM device, meaning that data can be neither
deleted nor changed

2. Retention management remains the same from an Enterprise Vault administration point of view,
but it is mapped to the SnapLock WORM mechanism where the retention period is written as an
attribute of the file

NetApp summary

Because of the open nature of the OnTap operating system, NetApp file systems operate in much
the same ways as NTFS volumes. The only difference is if SnapLock has been added to any part
of the system. In addition to this (and the subject for another paper) is the fact that Enterprise
Vault 2007 and NetApp OnTap now work together to provide placeholder support so
organizations that use Enterprise Vault File System Archiving to do policy-driven archiving from
NetApp filers can now leave transparent placeholders on the target filer.

Hitachi Content Archive Platform (HCAP)

The Hitachi Content Archive Platform is a file-system that natively supports NFS, CIFS, HTTP
and WebDAV communication protocols. It is a storage system with a single file system that can
support hundreds of millions of objects on large amounts of capacity.

Enterprise Vault has been integrated with HCAP to use the CIFS protocol for storing information
on the HCAP system while utilizing the device’s WORM and retention management functionality.

HCAP provides a network clustered architecture that allows customers to keep adding nodes,
processors, cache memory, host ports and capacity as needed. This effectively allows using
larger partitions then normally feasible for CIFS file-systems. A customer can start out with a four-
node archive and connect additional nodes, two or more at a time, with no theoretical limit.

Systems integrating with the Enterprise Vault Migrator

Enterprise Vault is designed to provide tiered-storage across technologies, vendors and devices.
While some devices are disk-based and provide fast and random access to the archived data,
there is another more cost-effective group of systems that lack the performance of the primary
store, but provide a viable platform for storing very old information that is only very infrequently
accesses.

Page 22
Figure 8 – Vault Store partition migration options

These systems should be used for data that is

– Outside the scope of Offline Vault (Age Limit for OV is set)


– Is unlikely to be exported to PSTs
– Has a low probability to be relevant to Legal Discovery and Investigation Cases

Also it should be noted that re-indexing from slower storage devices can be very time-consuming
and it is recommended to back up indices regularly, if secondary storage devices are used.

Enterprise Vault 2007 supports the following secondary storage devices:

- Enterprise Vaut File-System Migrator


- Fujitsu Eternus
- Veritas NetBackup
- IBM Tivoli Storage Manager / DR 550

Pegasus Disk Technologies – InveStore

Pegasus InveStore presents Optical Jukeboxes like DVD or UDO to Enterprise Vault like a fully
compatible NTFS file-system. Therefore no proprietary integration is needed and the Enterprise
Vault Migrator is used. This has compelling advantages as opposed to a JukeBox Integration on
driver level, as the File-System presented by Pegasus can be used for other applications as well,
allowing to share the Jukebox and to unify the WORM storage without any driver or hardware
issues.

Page 23
Fujitsu Eternus

Fujitsu Eternus Archive Storage Support for NTFS Partitions in EV 2007 enables EV Collection
files and large Saveset files to be migrated to a Fujitsu Eternus Storage Device. The Eternus
migrator connects to the Eternus device via Fujitsu’s Content Archive Manager Client software.

Due to the potentially huge overhead of migrating and retrieving millions of very small files (it
could be tape!) only Collection files and Saveset large files will be migrated, making the Fujitsu
Eternus an attractive 2nd tier storage device that complements a front-end file-system solution for
younger items.

To provide the necessary interfaces to Enterprise Vault, the Fujitsu Content Archive Manager
Server 1.4 or later needs to be installed and configured on a Fujitsu Eternus storage device and
the Fujitsu Content Archive Manager Client 1.4 or later needs to be installed on the Enterprise
Vault Server.

Figure 9 – Fujitsu Eternus migration options

After selecting the Fujitsu Eternus as an archiving target on the Migrations tab, additional
configuration porperties need to be specified on the Advanced tab.

Refer to the Enterprise Vault documentation for further instrauctions.

Veritas NetBackup

Enterprise Vault 2007 has integration with NetBackup media manager. This means that
customers that have already invested in NetBackup can reuse the same hardware management
infrastructure for storing Enterprise Vault archives alongside their backups. Usually the hardware
would be some type of offline storage, typically tape, but NetBackup can utilize many different
media types, including disk

Page 24
Figure 10 – Veritas NetBackup migration options

Typically this would be done in the context of secondary archive migration, where the NetBackup
infrastructure would be used for the second tier of archive storage. In this case, Enterprise Vault
will only migrate older, archived files to NetBackup as collections. It would be possible to use the
NetBackup infrastructure as the primary archive store, but there would still need to be a primary
disk store acting as a temporary cache until the collection process had completed.

Integration with Tivoli Storage Manager / IBM DR 550

Enterprise Vault is able to talk to either a Tivoli Storage Manager (TSM) or IBM DR 550 system
using the TSM Client API.

To enable this integration the TSM Client needs to be installed on the Enterprise Vault Server.
Another requirement is that the “Data Retention Manager” option is licensed and available on the
TSM server (It is included in the DR550), which controls the retention and expiration of the
content managed by the TSM envirnoment. The integration itself provides the same feature-set
as the Netbackup solution.

Content Addressed Storage and other API-based storage

Enterprise Vault supports more than just NTFS-based file systems. As the archiving market has
matured, storage vendors have modified their offerings to meet this market or have created
completely new storage systems for the long-term storage of content, with reduced costs and
management overheads

This section looks at the differences or additions we have made to support important non- NTFS-
based storage platforms.

Page 25
EMC Centera

EMC Centera is an example of a storage system ideally suited for archiving. It is classified as a
content addressed storage (CAS) system. CAS systems create unique identifiers for items stored
These unique IDs are, in effect, very similar to the SIS “hash codes” Enterprise Vault creates.

Centera can optionally act as a WORM (Write Once Read Many) store, which makes it very
suitable for regulatory retention. In this case, Centera has a concept of retention classes. Items
cannot be deleted from Centera until their time period defined by their retention class has expired,
and the Enterprise Vault expiration service will delete expired items after this period. Any attempt
to delete items before then will result in an error. This has the advantage over classic WORM
devices, such as optical disks, because individual items can be deleted after their retention
periods have expired. In addition, since it is magnetic disk–based, its fast recall times make
extensive searching and discovery feasible.

Enterprise Vault has to take four main considerations into account when interacting with Centera:

• Centera may act as a WORM storage device, and all content is read-only.

• Centera has Single Instance Storage built into its core

• Enterprise Vault retention management capabilities have to be integrated with the Centera
capabilities.

• Centera has a replication mechanism that is used for data integrity and can reduce the need for
a separate backup.

When you create a new Vault Store Partition, you can define this partition as a Centera partition,
and Enterprise Vault will act accordingly, taking into account the points above. It’s also important
to note that only archived items (Enterprise Vault Savesets) are committed to Centera and not
indexing information. Centera is not suitable for storing the index files because they are contain
dynamic and constantly changing data.

Read-only storage and retention management in Centera

Centera is unusual, as it is a read-only disk-based storage system. Since Enterprise Vault is


already read-only, the integration is very straightforward

Retention management undertaken by Enterprise Vault is extended to Centera. Centera has a


retention policy mechanism that is very similar to that of Enterprise Vault. Enterprise Vault
retention categories are mapped to Centera “retention classes,” and the Centera retention class
name is stored with the item. The retention periods specified in Centera retention classes can
subsequently be extended in much the same way as with Enterprise Vault retention categories.

Single Instance Storage and collections with Centera

Retention and read-only characteristics of Centera are really extensions to features that already
exist in Enterprise Vault. While Single Instance Storage already exists in the vault, it has to be
treated differently with Centera. In addition, the way items are “collected” is also changed when
dealing with Centera.

Page 26
First, we need to discuss how applications like Enterprise Vault pass content to Centera, and why
this is different from NTFS for example. When an item is passed to Centera, a hash is generated
that is unique to that piece of content, and two files are stored on Centera: a Clip file, which
primarily contains metadata about the item, and a “blob” file that contains the main data

The unique ID (or Clip ID) is the reference to the stored item that Enterprise Vault will retain in its
Vault Store directory. When a file is written to Centera, if the hash that was generated already
exists (meaning that the item has already been archived), the new Clip file for that item will
contain a pointer to the existing “blob.” If it is the first time that this piece of content has been
archived by Centera, a new blob is created, and the Clip file points to it

The details of the actual items that are stored in Centera (header, body, and attachments) are
then referenced in the original Clip file. Enterprise Vault will read the Clip content to see how to
recover the original content. Along with all the information stored in the Clip are checksums
generated by Enterprise Vault. These checksums help Enterprise Vault guarantee that the
content committed to Centera is valid when recalled by cross-checking with the Clip ID held in the
vault’s directory.

After much experience with Centera, Enterprise Vault uses a variety of methods for storing items
to achieve the best balance of performance and store occupation

1. Small items are stored in the Clip file directly. Items that are smaller than 15 KB can be stored
in the Clip file without any associated blob file. For small files such as this, the disadvantage of
losing the potential for SIS is offset by the performance advantage of having just one file to read
(a file in a database) and the space saving achieved by avoiding “rounding up” due to the store
allocation unit size. In fact, doing SIS for such small files would cost more space than not doing it.

2. Larger items can be stored individually or in collections (see 3 below). Where these items are
messages with attachments that exceed a defined threshold size, those attachments are stored
separately. This leads to a different SIS model than for NTFS messages, as attachments will be
shared between different messages (assuming the attachment is identical) as well as between
multiple copies of the same message. In addition, if the same file is archived from a file system or
MS SharePoint, then it will again achieve SIS with the attachment version. As with NTFS SIS,
user-specific attributes are all retained separately in the Clip files and only the main blob will be
shared.

3. Similarly, collections are done differently than in NTFS. In this case, messages are gathered
into collections, but again, attachments over a certain size are stored separately. Collection size
is limited by the number of messages contained in it or the total size, whichever limit is arrived at
first. Collections are also organized so that only items with the same retention policy are stored in
the same collection. This optimizes store occupancy and minimizes the number of discrete files
stored, with a consequent performance advantage for replication and recovery.

The compromise is that the Clips for collections need to be read and traversed in order to retrieve
messages, but the items themselves are retrieved from the collection by a partial read, so the
effect is minimal. To the user this is transparent behavior, and collections will not be deleted until
all the contained items have expired; but in practice, since all items have the same retention
period and are gathered in a short time frame, this is not a problem.

4. In a similar way to NTFS based partitions being configured to only allow deletion of the original
item once a complete and sucesful backup has taken place, Enterprise Vault can be configured to
wait for Centera replication to take place before deletion is done. In this case, the “watcher”
process only initiates the delete (or replacement by shortcut) when it detects that the replica

Page 27
exists on the secondary Centera store. An important point to note is that on a Centera the domain
of SIS is the whole Centera system, regardless of how Vault Stores are mapped to Centera.

Figure 11 highlights how we determine how items are stored in Centera when Enterprise Vault is
writing to items in collection or “non-collection” mode.

Figure 11 – Flow diagram for storing items in Centera

Centera summary

All of the features of Centera are already features of Enterprise Vault, but the combination of
hardware and software features make a powerful combination to customers. The major area of
development for Enterprise Vault has been the Centera collections service, which takes
advantage of the hardware SIS offered by a CAS system, but does so in a way that maximizes
the throughput of Centera

Page 28
Other supported storage systems

Enterprise Vault works with a wide variety of storage systems, and the list of supported platforms
changes all the time. Please consult the full Certification and Compatibility Matrix found on the
Symantec support website.

Summary of the Open Storage Layer of Enterprise Vault 2007

This paper has shown that there are unique considerations when deciding on your strategy for an
archiving application. To the business, there are many benefits in ensuring that all content is
stored (and expired) safely and reliably and that you have rapid access to it when needed. At the
core of all of these activities is the actual storage that is being used and how this effectively
enforces the core requirements of the archiving application.

We have seen how the Open Storage Layer in Enterprise Vault helps:

• Storage tier-ing: Enterprise Vault inherently moves less active data out of primary applications
such as Microsoft Exchange and, as described earlier, can optionally migrate older archived data
to a secondary tier of archive storage, whether it is disk, tape, or other media

• Storage rationalization: The minimum amount of content is actually stored on the storage
system. Not only is every item in the archive compressed and Single Instanced, but also this is
done without minimizing data reliability.

• Backup optimization: Since Vault Partitions can be closed and DVS files are read-only, the
archive is very efficient to back up and recover. Enterprise Vault solves many of the problems of
primary applications that store content in massive data files or on relatively expensive storage
solutions. Enterprise Vault stores items as individual files where appropriate and combines those
into larger collections, managed by a centralized policy.

• Data integrity: As described earlier, the “safety copy” functionality ensures that no archived data
is lost by not deleting the original item until a backup or replica is created (optionally).

• Data resilience: From the DVS file all the way through to the Vault Store, each of the storage
areas is self-contained, meaning that it can be recovered without the need for the rest of the
archive or any of the support databases. As described, HTML copies “future-proof” archived data.

• Data fidelity: As mentioned, we have never sacrificed data fidelity in the design of Enterprise
Vault, ensuring that no data or metadata—including per-user items such as the “read-receipt
flag”—is lost during the archiving process.

Page 29

Das könnte Ihnen auch gefallen