Beruflich Dokumente
Kultur Dokumente
The architecture consists of a Database Management System During simulation, timestepping information, convergence
(DBMS) and files in managed file directories, called Reservoir parameters and well performance data can be logged and
Input Output System (RIOS). All simulation input data and analyzed. Results, such as pressures and rates from wells and
results are maintained by a Data Management System (DMS). surface facilities and pressures and saturations from the
The reservoir simulator reads input files written from the simulation grid can be monitored and recorded. The state of
DBMS to RIOS and writes results to files in RIOS. DBMS, the simulator can be recorded at specificied intervals to enable
RIOS and integrated management tools (DMS) make up the restart of a run at a later time.
data management environment.
This results in an abundance of data to analyze, visualize,
The environment has been in use inside ExxonMobil since late summarize, report and archive. Over the years, many authors
2000 and now supports close to 500 users (85% of reservoir have tried to address one aspect or another of this data
engineers). There are over 30 individual databases containing management problem and many commercial and proprietary
2TB of online data and about 6TB of online RIOS data. The simulators have made allowances to simplify users work in
environment itself introduces some additional work. Support this area1-3. However, in general, data management has not
staff is required for maintenance of databases, RIOS areas and been a widely investigated aspect of reservoir simulation.
problem resolution. Direct user manipulation of data is not
permitted and additional tools are required to access and Data management in reservoir simulation enables workflows
interpret data. and collaboration, insures data integrity, security and
consistency and expedites access to results. In todays
The environment provides many benefits. While it insures computing environment, data management is an enabler to
data integrity, security and consistency, it also automatically meet the growing need for reservoir simulation and to make
updates defaults, limits, associations, types, etc. This allows simulation available to a wider audience of professionals,
running of older simulations and generation of aggregate including many kinds of engineers and geoscientists.
statistics and usage audit trails.
2 SPE 106075
With its EMpower TM reservoir simulator4-5, ExxonMobil spent connections. Example facilities are wells, platforms,
considerable time and effort in developing, deploying, separators, terminals and the pipelines that connect them. All
supporting and maintaining a data management environment facilities have attributes and constraints that describe them and
surrounding the reservoir simulator. These experiences - and their behavior. For example, all facilities have a name and
not the computational aspects of the reservoir simulator - are active state and all wells have a rate or pressure limit.
the subject of this paper.
A key feature distinguishing ExxonMobils current reservoir
Elements of the Data Management Environment simulation system from its predecessors is its use of an
The data management environment encompasses all extended surface facility network model that is fully integrated
simulation input, results and restart data and a collection of with the reservoir. This key feature contributes greatly to the
software programs, tools and procedures for their management complexity of the data model. All facilities in the network are
(DMS). directly accessible and can be manipulated by the reservoir
engineer for maximum flexibility. In addition, users can add
Simulation Data their own attributes and procedures to a given facility type.
The top-down view of the simulation data starts with a This capability is extremely important. Assume, for instance,
hierarchy of projects, models and cases (Figure 1). A project that the reservoir engineer wants to model submersible pumps
usually encompasses a particular reservoir study. Models are in a way that the current simulator version does not support.
used to distinguish between different simulation approaches, The needed variables and functionality can be added to the
which may require fundamentally different discretizations or well facility type by the engineer and made a part of the
fluid representations such as black-oil vs. compositional timestep calculation. This flexibility is very powerful and
simulation, fractured vs. non-fractured, etc. Cases within a allows rapid prototyping of new functionality.
given model are generally expected to represent minor
changes in the input data or facility network representation, Well Management Logic
with most of the data being shared among them. Currently, Facilities are the most dynamic part of reservoir simulation.
approximately 1,000 projects with 5,000 models and 20,000 In EMpower, they are managed at runtime with user defined
cases are managed worldwide. logic called Well Management Logic. This is part of the input
data but it is such a distinctive concept that it deserves a more
detailed description. The timeline of a reservoir simulation is
usually divided into two segments. The first is history
matching while the second is prediction. During history
matching, the goal is to design a model that will match
historical rates and pressures. During prediction, reservoir
engineers want to experiment with various scenarios in order
to approximate a good production profile for the field. For
instance, the engineer wants to test if it is sufficient to reduce
high GOR wells and increase production of low GOR wells in
order to maintain a given oil-production plateau while keeping
the fields gas production in check, or whether it is necessary
to work-over some wells. While it is theoretically possible to
hard-code scenarios like this, it is impossible to pre-conceive
every possible strategy a reservoir engineer might want to try.
Figure 1: Subset of data model showing project/model/case Allowing the engineer to define such strategies using a
hierarchy and their relationships. Projects can contain programming environment greatly enhances the flexibility and
one or more models, each of which can contain one or utility of a reservoir simulator while complicating the data
more cases. management environment.
All data needed for and produced by a simulation fall within Data Management System
one of three broad categories: Arrays, Granules and Facility In EMpower, the DMS is the central work environment for the
Network Data. Simulation cell and interface data such as simulation engineer. It is the single point of entry for
pressure, mole fractions, fluxes, etc. fall into the first category. preparing, running and analyzing simulations and therefore, it
Granules are collections of parameters that are intended to be has several distinguishing characteristics and requirements.
small in size while containing a variety of different data types.
For instance, black-oil fluid parameters for a given domain First, it is data driven; all dialogs work from data definitions.
comprise such a collection; solver parameters and timestep- Some can display the three data types (arrays, granules and
control parameters are further examples. A Facility Network facility network data) without knowledge of actual data
is a collection of physical facilities represented as nodes and content. Second, user access is controlled by login and data
access is controlled by user, group and world permissions. It
TM
EMpower is a trademark owned by ExxonMobil Upstream is possible to completely hide projects, models and cases from
Research Company. other users and it is also possible to setup a project, model or
case for use by a specific group of users. Third, the DMS
SPE 106075 3
insures backward compatibility, interoperability and data domain is a user-defined region inside the simulation model
integrity with tools that validate and upgrade data and check (Figure 2). An identical copy of a case does not duplicate any
integrity of arrays, granules and facility data. Finally, a set of data, but triggers the creation of a second set of relationships.
administrative tools are supplied to test components of the Selected data can then be unshared to facilitate differences
data management environment, to support different access between cases.
models (administrator, manager, user, etc.) and to provide
functions like managing users, migration of data from one Variable Attribute Repository
version to another and reporting of project, model, case Since the development of a reservoir simulator is an ongoing
statistics. process with new features being added on a regular basis, care
must be taken to avoid frequent changes in the data layout,
Simulation Workflow and Data Management which is costly. Therefore, early in the development process,
One of the great advantages of using a DMS is that it allows it was decided to create a meta-layer between the data model
the definition of dependencies between input data, results data and the data layout to avoid frequent changes in data layout.
and simulation times. For instance, if a user changes input data This meta-layer is called the Variable Attribute Repository
at time t0, the system is able to determine what data becomes (VAR) and describes data items of all three categories
invalid at times t>=t0. Or assume that the user changes from mentioned earlier: arrays, granules and facility network data.
black-oil to compositional simulation. The DMS is able to Assume a new array needs to be added to the system. From a
indicate what additional input data is needed and can provide data layout perspective this is just another generic array that
appropriate defaults. Data validation options such as checking can be linked to a case, time and domain. The VAR however,
fluid property tables or timestep controls can prevent the user (whose layout is fixed) will have an additional entry detailing
from wasting time by supplying ill-conditioned parameters to the purpose of the array, its description, default value, etc.
the simulator. Facility data description is even more versatile: not only is it
possible to define any kind of attribute for a facility type, new
facility types can also be defined from a base set of facility
types. For example, a separator node is similar to network
nodes, but with some unique attributes of its own, such as
temperature.
Data Mining
Potentially the greatest benefit of managing reservoir
simulation data, though, is the capability for data mining. The
amount of data generated for and by simulation is significant.
Figure 2: Subset of data model showing relationships of a It is not easy to analyze results just for one study, let alone
variable to a case, domain and time. The relationships are across many. However, with well-defined data management,
managed with variable use class and each case has a automated tools can scan and analyze data areas to generate
hashed list of variables which is managed by case to overall statistics and trends; this capability is known as data
variable use class. mining. Data mining enables quick overview of just what
kind of models are being worked on as well as providing
Data Sharing insight into the type of problems users run into, etc. This
The project/model/case hierarchy implies that the majority of improves quality control and opens the door to a self learning
the simulation input data is shared among cases within the system.
same model. For instance, the user may try different
permeability values during a history match or test different Architecture of the Data Management Environment
solver parameters to achieve better performance, and there is The simulation environment has been implemented as a
no need to create a complete new set of data that duplicates heterogeneous, distributed, three-tier, client-server
the simulation grid, input arrays, granules and facility architecture. The DMS is the client software at end user
network. However, as simple as the concept sounds, the data workstation. All reservoir simulation data are stored in the
sharing code within the DMS can be quite complicated, since second tier consisting of a database and file directories in a
almost all input data can be time dependent. Mathematically, mass storage area called RIOS. The simulator running on
data sharing is established via a unique relationship (data-item different compute servers is the third component and
to case, time and domain for variables, data-item to case, time represents the server side. Figure 3 summarizes this
and facility for facility attributes and constraints) where architecture.
4 SPE 106075
Object Database
The object database provides many desired features including
transaction oriented, multi-user access with object locking and
rollback functionality. It manages the schema and object
relationships. It enables definition of granularity of
transactions based on user actions. Management of object
relationships is probably its biggest strength. This is difficult
and involved to implement with a relational or object
relational database. There are more than 100 unique object
classes, and approximately 150 distinct object relationships,
some of which can have tens of millions of rows in a simple
relational table implementation.
Figure 3: Diagram of the three tier simulation The database schema is a logical decomposition of the users
environment architecture. The DMS is the client piece of view into the data model. It stores parameters and works in
the architecture and the simulator is the server piece. The parallel with locking. For the development team, a guiding
database and RIOS comprise the second tier. Middleware principle was to minimize changes to the database schema
(not detailed in this paper) manages communication since each change requires migration of existing data, which is
between tiers. cumbersome and time consuming. Therefore, the database
schema is kept relatively simple. The meta-schema or VAR
This architecture explicitly decouples the simulator from the concept is built on top of the database schema and enables
DMS and database. The simulator has completely different definition of all granules, arrays, facility types and attributes
requirements that guide what platform it should run on. It has without any database schema modification.
access to RIOS areas, reads its input from files in RIOS and
writes all its output to files in RIOS. Relationship Management
Executing lookups of array, granule or facility data is a key
Access control, security, data integrity and scalability features performance issue. A quick response time is critical. The
discussed above are inherently addressed by commercial number of domains, arrays and granules in a case is on the
databases. Databases are also ideal for managing and relating order of hundreds of objects. Thousands of objects result
large sets of data. Therefore, for the second tier, a database when many cases in the same model share the same arrays and
was selected for managing primarily input data. When granules. Depending on the number of facilities and time
running a case, the DMS writes input data from the database variant changes, the number of facility attribute and constraint
to RIOS and launches the simulator. Results and restart data objects managed by a case can reach into tens of thousands.
and runtime log files written by the simulator to RIOS are When multiple cases share the same facility network, the
managed and used by the DMS as well. object count can reach hundreds of thousands and more. To
simplify searches, facility data is looked up from a facility
Middleware instead of a case, but still this can mean examining tens of
The three-tier architecture will not work without a middleware thousands of relationships per facility. In a highly interactive
component. The middleware keeps track of running environment, where hundreds to thousands of attributes and
simulations, figures out where files are and enables constraints may be looked up during a user action, slowness of
communication between the DMS and simulator. It consists this capability can be a major bottleneck. To maximize
of one master network service per site, which manages performance, a hashing technique based on cryptographic
services running on different compute servers at this site. hashing keys was developed. With this technique, object use
Each compute server service is aware of RIOS directory lookups are reduced to an average of one or two searches into
names and their mapping to simulation jobs and passes simple hundreds of thousands of elements.
commands and their return codes between the DMS and
simulator. The details of the middleware are not discussed in Although the database schema is relatively simple, the
this paper. quantity of relationships and the number of objects in each
make the database environment quite complex. Test programs
Database were developed to exercise and validate functionality at the
There are several choices for the type of database, including unit level and maintenance programs were written that correct
relational, which is highly pervasive in many industries. inconsistencies and problems with the database. Although the
However, for the needs of this project, which include VAR concept has minimized the need for schema changes,
compatibility with the object oriented paradigm of the DMS programs had to be written to manage upgrades of data when
schema changes occur.
SPE 106075 5
RIOS
The RIOS concept was developed to enable sharing of data
between the DMS and the simulator, as the simulator was
designed to be independent of the database. Every case has a
RIOS directory with a unique name. The database case
objects know about their RIOS directories. Every RIOS area
is associated with a specific business unit. Access can be
controlled in a fashion similar to database permissions with
system level owner, group and world permissions. It is
possible to completely hide a certain RIOS area by allowing
only a specific group ownership and access to it. RIOS areas
can be network accessible or local. When local, unless public
access is granted explicitly by the user, the RIOS is only
accessible by local simulation jobs.
Log Files
The log files record timestepping information, convergence
parameters, well performance data and information on
problem nodes. The presentation of log file data is extremely
sophisticated with a web style interface that displays highly
detailed tables, charts and graphs. The power of this interface
is further enhanced with its ability to present user messages
written from well management logic in these formats as well.
A screenshot of this tool is presented in Figure 4.
When a case is run, the DMS deletes any existing RIOS files
first. When a case is restarted, all RIOS files are truncated to
restart time. The DMS accesses results arrays, granules and
facility data from RIOS files directly.
Bigger Grids and Facility Networks want to talk to the database to get or change data with
Simulation engineers would like to be able to build reservoir automated tools. Users want to be able to get to simulation
models with several million cells and manage several data directly to analyze using their favorite tools.
thousand wells with more flexibility. Currently, hardware and
software limit the DMS to a few million unstructured grid cell To meet all of these challenges, the data management
models and a few thousand wells for comfortable operation. environment must be able to bring in new components. It
As grids get larger and larger and number of facilities in the must become more open and more easily communicate with
facility networks reach the many thousands range, both the other applications. It must provide simple interfaces for users
database and the DMS will be taxed even further. To be able to get to data quickly. This is an area of ongoing work.
to handle this kind of load, they will continue to be the subject
of continuous improvement. The computing environment is Conclusions
also changing to support this load: grid computing, high-end The heterogeneous, distributed and multi-tier data
compute servers and 64-bit desktops for clients. management environment that has been described allows
engineers to work on reservoir models, using logically
Movement of input data, at least large granules and arrays centralized, physically decentralized data sources where
related to grid definition and properties, to the RIOS area is integrity, security, consistency, etc. are managed. The
also being considered. This would eliminate duplicate storage environment was designed, developed and distributed over a
of the same data in different forms, two separate sets of input ten year period and has gone through several versions.
and output routines (one to the database and one to the RIOS)
and allow greater flexibility for external programs to supply The environment increases development work and support
and/or modify simulator input data. A database would still be load and can have implementation issues that take time to
used to manage data relationships, but would not be burdened resolve. Nevertheless, it has proven its value: (1) it has
with having to manage huge arrays and millions of objects enabled penetration of reservoir simulation into a much wider
which have been major bottlenecks to database performance. audience than ever before, (2) it has exposed the volume and
diversity of data in use and the need for good data
Automated History Matching and Optimization management of simulation information, and (3) it has opened
With the requirement to be able to run many slight variations doors to new ways of analyzing simulation data, including
of a base case, automated history matching and optimization data mining.
add another dimension to more. Optimization and history
matching are two areas of increasing popularity and research The environment must continuously improve and adapt to
interest. Both need efficient management of tens to hundreds changing requirements and workflows. Initially designed as
of cases that have little variation. Users have to be able to an all inclusive, self sufficient solution, it must now open up to
design experiments easily, many time dependencies must be enable integration, to handle ever bigger models with
managed behind the scenes, and results must be presented in increasing number of facilities and to enable automated
new ways. Data sharing was a good start ten years ago, but history matching and optimization workflows.
now scenario management becomes very important. This
requires substantial work on the architecture of the data The benefits of good data management are not obvious until
management environment and will involve extension of the the system is in place. It requires full backing and
data sharing concept to the RIOS files. commitment of company management for success. The
experiences discussed in this paper would not have been
Data Mining possible without such a technology leadership.
More simulation runs produce more data. Data mining will
come into greater use to extract useful information from all Acknowledgments
this data. With standardized files in RIOS directories and The authors wish to acknowledge B.L. Beckner, B.A. Boyett,
consistent databases, finding the right interpretation will be the T.K. Eccles, J.D. Hindmon and C.J. Jett for their valuable
key. This is a recent area of development for ExxonMobil. assistance to this paper. The authors also acknowledge the
Tools have been developed to go through databases and RIOS management of ExxonMobil Upstream Research Company for
files, extract information and generate statistics; however, permission to publish this paper.
much more work is necessary in this area and may possibly
require use of another database to collect information for Windows is a registered trademark of Microsoft Corporation
analysis. in the United States and other countries.