Beruflich Dokumente
Kultur Dokumente
Page 1 of 36
ModeShape 3
Table of Contents
1 What's new ________________________________________________________________________ 4
2 Migrating from 2.x ___________________________________________________________________ 6
2.1 Public API _____________________________________________________________________ 7
2.2 Storage vs. connectors ___________________________________________________________ 8
2.3 Federation _____________________________________________________________________ 9
2.4 Binary storage __________________________________________________________________ 9
2.5 Sequencers ___________________________________________________________________ 10
2.6 MIME type detection ____________________________________________________________ 10
2.7 Text extractors _________________________________________________________________ 10
2.8 Configuration and running the engine _______________________________________________ 10
2.9 Migrating content _______________________________________________________________ 15
2.10 JBoss AS _____________________________________________________________________ 15
3 Getting Started ____________________________________________________________________ 16
3.1 Complete Maven examples _______________________________________________________ 16
3.2 Embedding ModeShape in application or library built with Maven _________________________ 16
3.2.1 Prerequisites ____________________________________________________________ 17
3.2.2 Add ModeShape Dependencies _____________________________________________ 17
3.2.3 Configuring a ModeShape repository __________________________________________ 23
3.2.4 Configuring the Infinispan Cache _____________________________________________ 28
3.2.5 Starting a ModeShape Repository ____________________________________________ 31
3.2.6 Using JCR's RepositoryFactory ______________________________________________ 35
3.3 ModeShape and JBoss AS7 ______________________________________________________ 36
Page 2 of 36
ModeShape 3
This guide contains information about:
new features in ModeShape 3,
migration to ModeShape 3 and
getting started with ModeShape 3.
See the child pages for the relevant sections.
Page 3 of 36
ModeShape 3
1 What's new
ModeShape 3 has changed quite significantly since the 2.x releases:
Use of Infinispan for all caching and storage - This gives a powerful and flexible foundation for
creating JCR repositories that are fast, scalable, and highly available. Infinispan offers several storage
options (via cache loaders), but can also be used as a distributed, multi-site, in-memory data grid.
Improved performance - ModeShape 3 is faster than 2.x - most operations are at least one if not
several orders of magnitude faster. We will publish performance and benchmarking results closer to
the final release.
Improved scalability - ModeShape 3 has been designed to store and access the content so that a
node can have hundreds of thousands of child nodes (even with same-name-siblings) yet still be
incredibly fast. Additionally, repositories can scale to millions of nodes and be deployed across many
processes.
Improved configuration - There is no more global configuration of the engine; instead, each
repository is configured with a separate JSON file, which must conform to a JSON Schema and can
be validated by ModeShape prior to use. Repository configurations can even be changed while the
repository is running (some restrictions apply), making it possible to add/change/remove sequencers,
authorization providers, and many other configuration options while the repository is in use.
Runtime management - Each repository can be deployed, started, stopped, and undeployed while
the engine and other repositories are still in use.
Deployment to JBoss AS7 - Our AS7 kit installs ModeShape as an AS7 service, allowing you to
configure and manage repositories using the AS7 tooling. Infinispan and JGroups are also a built-in
services in AS7 that can be managed the same way. Plus, ModeShape clustering will work out of the
box using AS7's built-in clustering (domain management) mechanism. ModeShape and JBoss AS7
will be the easiest way to deploy, manage and operate enterprise-grade repositories. See this page
for details on how to install, configure and use.
Storage options - ModeShape continues to have great options for where it can store content.
Although ModeShape 2 had its own connector framework, ModeShape 3 simply uses Infinispan's
cache loaders with out-of-the-box support for storing content in:
In-memory (no cache loader)
BerkleyDB
Relational databases (via JDBC), including in-memory, disk-based, or remote
File system
Cassandra
Cloud storage (e.g., Amazon's S3, Rackspace's Cloudfiles, or any other provider supported by
JClouds)
Remote Infinispan
JTA support - Clients (including EJBs) can use JCR Sessions within JTA transactions.
Page 4 of 36
ModeShape 3
Visibility of changes - Sessions now immediately see all changes persisted/committed by other
sessions, although transient changes of the session always take precedence. When combined with
the new way node content is being stored it should reduce the potential for conflicts during session
save operations. This means that all of the Sessions using a given workspace can share the cache of
persisted content, resulting in faster performance. It also significantly reduces the overhead of each
session, which means ModeShape can handle more sessions.
Thread-safety - The JCR specification only requires that the Repository and
RepositoryFactory interfaces are thread-safe. ModeShape's implementation of many of the other
interfaces, especially Session, Workspace, NodeTypeManager, etc., are all thread-safe. This
means that it is possible for multiple threads to share one Session for reading; note that Session is
inherently stateful, so sharing a Session for writes is not recommended.
New monitoring API that allows accessing the history for over a dozen metrics using a variety of
windows and durations.
JCR-based sequencing API - sequencers now use the JCR API to access the content being
processed and to create/update the derived content. Sequencers can also dynamically register
namespaces and node types. This simplifies the process for creating custom sequencers. We've
already migrated most of our 2.x sequencers to this new API, and will be migrating the rest over the
next few weeks.
MIME type detector API - A simple API for implementing custom MIME type detectors. (Most of the
time, ModeShape's built-in detectors are sufficient.)
Text extractor API - A simple API for implementing custom code to extract searchable text from
binary values. ModeShape provides extractors for several MIME types, via the Tika library, but
custom extractors can be implemented by wrapping the code that parses the binary formats.
Improved Binary storage - A new facility was added to store binary values of all sizes, including
those that are larger than available memory (e.g., gigabytes). An optimization to store small binary
values with the rest of the content is available. We've started out with a file system store that will work
even in clustered environments, but we also plan to add stores that use Infinispan and DBMSes.
Deprecated APIs - A few API interfaces and methods were deprecated in 2.7.0.Final, and these have
been removed. Most of ModeShape's small public API remains the same or has only
backward-compatible changes.
Bug Fixes - Of course, we've incorporated quite a few bug fixes and other minor improvements, too.
We've been planning on adding support for some other features not outlined in the JCR API, but these are
likely going to be pushed to 3.1:
Federation - Access and manipulate data from other external systems as if it were within the
repository.
Map-Reduce - Use map-reduce operations to perform reporting and custom read-only operations in
parallel against the entire content of a repository. ModeShape will use this to enable validation of
repository content against the current set or a proposed set of node types, as well as optimizing the
storage format/layout of each node.
Page 5 of 36
ModeShape 3
Page 6 of 36
ModeShape 3
several of the methods and interfaces in ModeShape's public API were deprecated by version 2.8,
and these have been removed from the ModeShape 3 API.
ModeShape 3 also passes 100% of the unofficial JSR-283 (JCR 2.0) compatibility tests, as maintained by
the reference implementation. (The official TCK has quite a few bugs that have been fixed by the reference
implementation community. So although these compatibility tests are not official, we believe these tests are
a more accurate representation of the compliance with the intent of the specification. Plus, other
implementations use these same tests.)
ModeShape also provided several other APIs:
The RESTful API that was in 2.x is still supported, although the URLs have changed. ModeShape 3
adds a new RESTful API, and this is now the default. This API is cleaner and more capable. The
RESTful client library is capable of talking to both ModeShape 2.x and 3.x servers.
The WebDAV API is still supported and has been improved.
The JDBC Driver is still supported and is largely unchanged.
Page 7 of 36
ModeShape 3
Page 8 of 36
ModeShape 3
2.3 Federation
ModeShape 3.0 does not yet support using the JCR API to access information in external systems. That is
the most important feature for 3.1, and will reintroduce the concept of a connector as a mechanism to do
this. One major difference, however, will be that ModeShape 3.1 will no longer be able to create a repository
that consists entirely of federated content. Instead, every ModeShape 3 repository will store its own content,
but that you'll also be able to federate and integrate into the repository the content from external systems.
Conceptually this is a bit different than in ModeShape 2, which seemed to allow a repository to be
configured such that all content was federated from external systems. Technically, even
ModeShape 2 required a storage connector to store the repository's system content, so it was
never actually possible to have a repository that consisted entirely of federated content.
Page 9 of 36
ModeShape 3
2.5 Sequencers
ModeShape 3 sequencers work exactly the same way as they did in ModeShape 1.x and 2.x: they
automatically take new or updated content (matched by a path-based rule), generate additional structured
content, and write that new content into the repository (in a location determined by the configuration).
They are configured differently, most notably because each repository is configured with its own sequencers.
Implementing custom sequencers, however, is far easier in 3.0, since sequencers generate the additional
content by directly using the JCR API rather than the proprietary graph API in ModeShape 2. Sequencer
implementations are also able to register the node types programmatically, which simplifies the overall
configuration for a repository.
Page 10 of 36
ModeShape 3
Using a single configuration file for the engine seemed to make sense, but it was also confusing because a
single sequencer might be used in multiple repositories. It was also potentially problematic, because a single
source might be used by multiple repositories, even though this was not allowed. ModeShape 2 didn't allow
modifying the configuration while the engine was running, which meant it was not possible to dynamically
add or remove repositories without completely shutting down and restarting the engine. (In reality, very little
was shared between repositories.)
ModeShape 3 separates the configuration of each repository into a separate file, which are each "deployed"
to an engine:
As you can see, it's now possible to dynamically deploy and undeploy repositories even when the engine is
running and other repositories are in use. There are multiple ways of reading in the configuration, too:
read from a java.io.File
read from a resolved java.net.URL
read from a String containing a URL or a path to a file on the file system or classpath
read a string containing the configuration
Page 11 of 36
ModeShape 3
You might have also noticed in the example above that ModeShape 3 configuration files are JSON files, not
XML files like in ModeShape 1 and 2. We thought that XML configuration files are noisy and make it difficult
to see the bigger picture. JSON files, on the other hand, are quite easy to read and edit. And ModeShape
does use a JSON Schema that dictates the allowed structure of the configuration files, so ModeShape can
even validate your configuration files:
ModeShape 3 configuration files also have sensible defaults for everything, so this file is actually a valid
configuration for a repository named "my-repo":
my-repo.json
{ "name" : "my-repo" }
Of course, you'll likely want to specify more options, so here is another example of a repository with most of
the available options specified:
my-repo-config.json
{
"name" : "my-repo",
"transactionMode" : "auto",
"monitoring" : {
"enabled" : true,
},
"workspaces" : {
"predefined" : ["otherWorkspace"],
"default" : "default",
"allowCreation" : true,
"initialContent" : {
"ws1" : "file1.xml",
"ws2" : "file2.xml",
"*" : "default.xml"
}
},
"node-types" : ["file1.cnd", "file2.cnd"],
"storage" : {
"cacheName" : "Thorough",
"cacheConfiguration" : "infinispan_configuration.xml",
"transactionManagerLookup" =
"org.infinispan.transaction.lookup.GenericTransactionManagerLookup",
"binaryStorage" : {
"type" : "file",
"directory" : "Thorough/binaries",
"minimumBinarySizeInBytes" : 4096
}
},
Page 12 of 36
ModeShape 3
"security" : {
"anonymous" : {
"username" : "<anonymous>",
"roles" : ["readonly","readwrite","admin"],
"useOnFailedLogin" : false
},
"providers" : [
{
"name" : "My Custom Security Provider",
"classname" : "com.example.MyAuthenticationProvider",
},
{
"classname" : "jaas",
"policyName" : "modeshape-jcr",
}
]
},
"query" : {
"enabled" : true,
"textExtracting": {
"threadPool" : "test",
"extractors" : {
"customExtractor": {
"name" : "MyFileType extractor",
"classname" : "com.example.myfile.MyExtractor",
},
"tikaExtractor":{
"name" : "General content-based extractor",
"classname" : "tika",
}
}
},
"indexStorage" : {
"type" : "filesystem",
"location" : "Thorough/indexes",
"lockingStrategy" : "native",
"fileSystemAccessType" : "auto"
},
"indexing" : {
"threadPool" : "modeshape-workers",
"analyzer" : "org.apache.lucene.analysis.standard.StandardAnalyzer",
"similarity" : "org.apache.lucene.search.DefaultSimilarity",
"batchSize" : -1,
"indexFormat" : "LUCENE_35",
"readerStrategy" : "shared",
"mode" : "sync",
"rebuildOnStartup": {
"when" : "if_missing",
"includeSystemContent": false,
"mode": "sync"
},
"asyncThreadPoolSize" : 1,
"asyncMaxQueueSize" : 0,
"backend" : {
"type" : "lucene",
},
"hibernate.search.custom.overridden.property" : "value",
Page 13 of 36
ModeShape 3
}
},
"sequencing" : {
"removeDerivedContentWithOriginal" : true,
"threadPool" : "modeshape-workers",
"sequencers" : {
"zipSequencer" : {
"classname" : "ZipSequencer",
"pathExpressions" : ["default:/files(//)(*.zip[*])/jcr:content[@jcr:data] =>
default:/sequenced/zip/$1"],
},
"delimitedTextSequencer" : {
"classname" : "org.modeshape.sequencer.text.DelimitedTextSequencer",
"pathExpressions" : [
"default:/files//(*.csv[*])/jcr:content[@jcr:data] =>
default:/sequenced/text/delimited/$1"
],
"splitPattern" : ","
}
}
},
"clustering" : {
}
}
See our documentation about the ModeShape JSON configuration file format for more information.
It is also possible to access the configuration of a running repository, change the configuration, and then
update the running repository:
Many configuration changes can be applied to a repository while it is running, but not everything.
For example, changing where data is stored will apply only after the repository is shutdown and
restarted.
Page 14 of 36
ModeShape 3
2.10 JBoss AS
One other major change is that ModeShape 3 can be installed into JBoss AS7, which is a very fast and
lightweight application server. The integration is very good: ModeShape is a service (or _subsystem) within
AS7, and is configured and managed using the regular AS7 configuration files or tooling. Managing a
ModeShape instance across a JBoss AS7 domain (cluster) is just as easy as with any other AS7 subsystem.
Plus, ModeShape just uses AS7's built-in support for Infinispan, JGroups, security, and data sources, which
means you configure these components using AS7's tools.
ModeShape 3 no longer provides integration with JBoss AS 5 and 6.
ModeShape 3 can of course be used with other application and web servers, including JBoss AS5
and 6. But just like with ModeShape 2, doing so basically just embeds ModeShape within your web
application or service, and no other integration with the server is provided.
Page 15 of 36
ModeShape 3
3 Getting Started
We've published the ModeShape artifacts and JARs for this beta release only in the JBoss Maven repository
. The rest of this page shows how you can use ModeShape within your Maven-based projects. We've also
added several distributions on our project's download page:
a binary distribution with all the JARs, JavaDoc, and examples
a kit to install ModeShape into an EAP installation
a source distribution
So without further adieu...
Complete Maven examples
Embedding ModeShape in application or library built with Maven
Prerequisites
Add ModeShape Dependencies
Logging
Use newer Infinispan and JGroups versions
Using a transaction manager
Using the JBoss Transaction Manager
Using other transaction managers
Configuring a ModeShape repository
Configuring the Infinispan Cache
Simple configuration
Cache with Cache Store
Starting a ModeShape Repository
Starting the ModeShape engine
Deploying our Repository
Using the Repository and the JCR API
Stopping the repository and engine
Using JCR's RepositoryFactory
ModeShape and JBoss AS7
Page 16 of 36
ModeShape 3
The instructions on this page are for Java SE applications. If you're creating applications for
deployment onto JBoss AS7, see the specific documentation about how to install ModeShape into
AS7/EAP and use it with your web applications.
3.2.1 Prerequisites
Before you can use Maven to build an application that uses ModeShape, you'll need to have JDK 6 and
Maven 3 installed.
All ModeShape releases since 3.0.0.Final are now available directly from the Maven Central repository. It
takes a few hours (at least) after the artifacts are in the JBoss repository before they appear in Maven
Central. So if you don't see a recent release in Maven Central, just give it a bit of time - or use the JBoss
Maven repository.
Page 17 of 36
ModeShape 3
Then include in the POM "<dependencies>" section the ModeShape modules that you will directly use.
Note that you don't need to specify any of the versions, since that's what the modeshape-bom-embedded
provided. The one module that you need to include is the primary JCR implementation:
Maven dependencies for the JCR API and ModeShape engine
<dependency>
...
<dependency>
<groupId>org.modeshape</groupId>
<artifactId>modeshape-jcr</artifactId>
</dependency>
...
</dependencies>
But you should also include any other modules that you'll either directly use or optional modules that you
want to use. For example, if you're going to use any of ModeShape's public API (instead of just the JCR
API), then you should include this dependency:
Optional Maven dependencies for the ModeShape public API
<dependency>
<groupId>org.modeshape</groupId>
<artifactId>modeshape-jcr-api</artifactId>
</dependency>
If you want to use one of Infinispan's cache stores, then pick from ONE of the following:
Page 18 of 36
ModeShape 3
Adding multiple cache stores may be necessary if you're using multiple Infinispan caches, each
with a different cache store. Adding a dependency on any cache stores that you're not using,
however, simply brings in more unnecessary (transient) dependencies and should be avoided.
If you're going to use the JDBC Cache Store (e.g., "infinispan-cachestore-jdbc"), then you'll also
need to add a dependency on the JDBC driver or embeddable database. For example, here's the
dependency required to use the embeddable H2 database:
Maven Dependency for the H2 embeddable database
<dependency>
<groupId>com.h2database</groupId>
<artifactId>h2</artifactId>
<version>1.1.117</version>
</dependency>
Logging
ModeShape is designed to use the same logging framework as your application, and it can dynamically bind
to Log4J, SLF4J, Logback and the JDK's logging system. Your application or library will probably already be
using one of these logging frameworks and will already have them in the dependencies.
Page 19 of 36
ModeShape 3
Page 20 of 36
ModeShape 3
<project>
<!-- ... -->
<properties>
<infinispan.version>5.2.7.Final</infinispan.version>
<jgroups.version>3.2.10.Final</jgroups.version>
</properties>
<dependencies>
<dependency>
<groupId>org.infinispan</groupId>
<artifactId>infinispan-core</artifactId>
<version>${infinispan.version}</version>
</dependency>
<dependency>
<groupId>org.infinispan</groupId>
<artifactId>infinispan-cachestore-jdbm</artifactId>
<version>${infinispan.version}</version>
</dependency>
<dependency>
<groupId>org.jgroups</groupId>
<artifactId>jgroups</artifactId>
<version>${jgroups.version}</version>
</dependency>
<dependency>
<groupId>javax.jcr</groupId>
<artifactId>jcr</artifactId>
</dependency>
<dependency>
<groupId>org.modeshape</groupId>
<artifactId>modeshape-jcr-api</artifactId>
</dependency>
<dependency>
<groupId>org.modeshape</groupId>
<artifactId>modeshape-jcr</artifactId>
</dependency>
<!-- Other dependencies ... -->
</dependencies>
<dependencyManagement>
<dependencies>
<!-- Import the ModeShape BOM for embedded usage. This adds to the
"dependenciesManagement" section
defaults for all of the modules we might need, but we still have to include in the
"dependencies" section the modules we DO need. The benefit is that we don't have to
specify the versions of any of those modules.-->
<dependency>
<groupId>org.modeshape.bom</groupId>
<artifactId>modeshape-bom-embedded</artifactId>
<version>3.7.2.Final</version>
<type>pom</type>
<scope>import</scope>
</dependency>
</dependencies>
</dependencyManagement>
</project>
Page 21 of 36
ModeShape 3
Note that we're using properties to specify the versions of these artifacts. This makes it easy to change, but
also allows us to put the versions only in one (readable) location (since the " infinispan.version"
property is used in multiple places).
Be sure to pick one of the combinations of Infinispan and JGroups mentioned above.
Note that you don't need to (but can) specify the version, since our BOM already defines the default version.
The BOM also excludes a lot of the dependencies and components not necessary when using in a
non-clustered environment.
By default, the Infinispan configuration will automatically look for and find the transaction manager.
Page 22 of 36
ModeShape 3
<transaction
transactionManagerLookupClass="<specify your TransactionManagerLookup implementation class
here>"
transactionMode="TRANSACTIONAL"
lockingMode="OPTIMISTIC"/>
{ }
That's not a mistake. An empty JSON document is a completely valid repository configuration. Everything
has a default value except for the repository's name, and the filename is used if one is not specified in the
file. In this case, the name of this repository will be "my_repository".
Of course, lots of other options can be specified in the configuration file, but typically only the non-default
values are specified. Since most of the defaults are sensible, many configurations will be pretty small.
Here's a configuration file that uses most of the available fields, most of which happen to be set the same
values as the defaults. (This time we'll show line numbers so we can more easily describe what's going on.)
'my_repository.json'
{
"name" : "Test Repository",
"jndiName" : "jcr/Test Repository",
"monitoring" : {
"enabled" : true
},
"workspaces" : {
"default" : "defaultWorkspace",
"predefined" : ["otherWorkspace"],
"allowCreation" : true
Page 23 of 36
ModeShape 3
},
"storage" : {
"cacheConfiguration" : "/path/to/infinispan/cache/configuration.xml",
"cacheName" : "Test Repository",
"transactionManagerLookup" :
"org.infinispan.transaction.lookup.GenericTransactionManagerLookup",
"binaryStorage" : {
"minimumBinarySizeInBytes" : 4096,
"minimumStringSize" : 4096,
"type" : "file"
}
},
"security" : {
"jaas" : {
"policyName" : "modeshape-jcr"
}
"anonymous" : {
"roles" : ["readonly","readwrite","admin"],
"username" : "<anonymous>",
"useOnFailedLogin" : false
},
"providers" : [
{
"classname" : "org.example.MyAuthorizationProvider",
"member1" : "value of instance member1"
}
]
},
"query" : {
"enabled" : true,
"rebuildUponStartup" : "if_missing", //DEPRECATED use indexing/rebuildOnStartup,
"indexStorage" : {
"type" : "filesystem",
"location" : "/path/on/filesystem",
"lockingStrategy" : "simple",
"fileSystemAccessType" : "auto"
},
"indexing" : {
"rebuildOnStartup": {
"when" : "if_missing",
"includeSystemContent": true,
"mode": async
},
"threadPool" : "modeshape-workers",
"analyzer" : "org.apache.lucene.analysis.standard.StandardAnalyzer",
"similarity" : "org.apache.lucene.search.DefaultSimilarity",
"indexFormat" : "LUCENE_CURRENT",
"readerStrategy" : "shared",
"backend" : {
"type" : "lucene"
},
"batchSize" : -1,
"mode" : "sync",
"asyncThreadPoolSize" : 1,
"asyncMaxQueueSize" : 0
},
"extractors" : [MODE:ModeShape and JBoss AS7]
},
Page 24 of 36
ModeShape 3
"sequencing" : {
"removeDerivedContentWithOriginal" : true,
"threadPool" : "modeshape-workers",
"sequencers" : [MODE:ModeShape and JBoss AS7] => /ddl"
},
{
"name" : "XSD sequencer",
"classname" : "xsd",
"pathExpressions" : [ "/(*.xsd)/jcr:content[@jcr:data]" ],
}
]
}
}
Page 25 of 36
ModeShape 3
The repository can also store all STRING values equal to or larger than a specified number of
characters. In this case, all STRING values with 4096 or more characters (line 17) will be stored in the
binary store that uses the file system (line 18). Smaller STRING values are held in-memory or
persisted with the node information. By default, the maximumStringSize value will be set to the
explicit or default value of maximumBinaryValueInBytes.
The repository will use several security providers for authentication and authorization. By default, only
the anonymous provider is used. The order of the providers is important: a caller will be authenticated
or authorized if any of the providers succeed for the caller:
The JAAS policy named "modeshape-jcr" will be used (lines 23-24). If the "jaas" nested
document is not specified, JAAS will not be used. If specified in this fashion, the JAAS security
provider will always be used first. The "modeshape-jcr" policy is used by default if JAAS is
enabled.
Any providers as configured by the "providers" nested array (lines 31-36), where each array
value is a nested document specifying the provider's name, description, and type (or
classname). Only the "type" (or "classname") field is required. The two built-in types are "
jaas" and "servlet", but any implementation of the '
org.modeshape.jcr.security.AuthorizationProvider" interface can be specified
instead. Any instance members on the implementation class can be set by specifying
additional fields of the same name, as long as the member type is String, a primitive boolean or
number, java.util.Map, or java.util.List.
The anonymous provider (lines 26-30) is enabled by default and (if enabled) always is the last
provider to be consulted. It authenticates all users with read and write permission by default,
although the exact roles (either "read", "readwrite", or "admin") can be configured with the
"roles" field; specify an empty "roles" array to completely disable the anonymous provider.
All sessions that are authenticated by this provider will be given the username given by the "
username" field (line 30), which defaults to the literal "<anonymous>" value (including the
angle brackets). Any user that fails to properly authenticate with another provider will not be
given an anonymous session unless the "useOnFailedLogin" field is set to true.
Page 26 of 36
ModeShape 3
The query system (lines 38-67) is enabled by default but is explicitly enabled on line 39
When the repository starts up, only the missing indexes will be rebuilt (lines 48-51) (which is
also the default), the system content area (under /jcr:system) will be indexed as well(by default
the system area isn't re-indexed) and all of the re-indexing will be done asynchronously (by
default however, it is done synchronously).
The indexes will be stored on the file system under the directory " /path/on/filesystem"
and will use simple locking and automatically choose the kind of file system storage based
upon the operating system (lines 42-45). By default the indexes are stored in memory (with a "
type" value of "ram" and no other fields), so be sure to configure this carefully for your
application/environment.
The indexing system will use the "modeshape-workers" thread pool for re-indexing the
workspace content in the background (line 48), and will use the "StandardAnalyzer" for
tokenizing terms (line 49) and the "DefaultSimilarity" class for scoring (line 50). By
default the indexes will be stored using the current format (line 51), though it's recommended
to explicitly set the value matching the Lucene version you've started using (e.g.,
"LUCENE_34"). The readers will be shared (line 52) until index changes are discovered. The "
backend" nested document (lines 53-55) specifies how ModeShape is to handle changes to
the indexes; the default of "lucene" (line 54) means the changes will be written directly to the
local Lucene indexes, while other options allow using a JMS queue, JGroups, a " blackhole"
option for testing, or even a custom implementation. The other advanced properties (lines
56-59) specify the maximum node updates per transaction, whether the indexes are to be
written synchronously, and the thread pool size and queue size for asynchronous writes.
Text extractors (lines 61-70) are used to find the search terms from BINARY values. No text
extractors are used by default, but specifying the name, description, and type (or classname)
for one or more text extractor implementation classes enables this feature. Two text extractor
types are provided out of the box, and both are configured here with the required " type" fields
(e.g., "tika" and "vdb") and an optional description (useful for documentation and during
administration).
The configured sequencers (lines 72-87) specify the types of sequencers that should be run. Each
sequencer is configured with one or more path expressions that are matched against the paths of
changed nodes; when any changed path matches the expression, the sequencer is called on the
changed property/node and the generated output of the sequencer invocation is written to the location
specified in the path expression. Each sequencer is configured by specifying the required " type"
field, and an optional name and description. Custom implementations of "
org.modeshape.jcr.api.sequencer.Sequencer" interface can be specified using the "
classname" field (instead of the "type" field), and any instance members on the implementation
class can be set by specifying additional fields of the same name, as long as the member type is
String, a primitive boolean or number, java.util.Map, or java.util.List. Several types of
sequencers are available out of the box:
"cnd" parses JCR CND files to generate a node structure describing the namespaces, node
types, property definitions, and child node definitions
"class" and "java" parse Java class files and source files (respectively) and generates a
node structure describing the encoded types, fields, methods, parameters, etc.
Page 27 of 36
ModeShape 3
"ddl" parses the more important DDL statements from SQL-92, Oracle, Derby, and
PostgreSQL, and constructing a graph structure containing a structured representation of these
statements. The resulting graph structure is largely the same for all dialects, though some
dialects have non-standard additions to their grammar, and thus require dialect-specific
additions to the graph structure.
"image" extracts metadata from JPEG, GIF, BMP, PCX, PNG, IFF, RAS, PBM, PGM, PPM
and PSD image files. This sequencer extracts the file format, image resolution, number of bits
per pixel and optionally number of images, comments and physical resolution.
"model" parses the model files produced by the Teiid Designer to extract the structured
relational data model described by the XMI file, and outputs a node structure that represents
this model.
"vdb" parses the VDB archive files produced by the Teiid Designer to extract the virtual
database information and the structured relational data model described in each of the
contained XMI model files, and outputs a node structure that represents the VDB and these
models.
"wsdl" parses WSDL files that adhere to the W3C's Web Service Definition Language (WSDL)
1.1 specification, and output a representation of the WSDL file's messages, port types,
bindings, services, types (including embedded XML Schemas), documentation, and extension
elements (including HTTP, SOAP and MIME bindings). This derived information is intended to
mirror the structure and semantics of the actual WSDL files while also making it possible for
ModeShape users to easily navigate, query and search over this derived information. This
sequencer captures the namespace and names of all referenced components, and will resolve
references to components appearing within the same file.
"xsd" parses XML Schema Documents that adhere to the W3C's XML Schema Part 1 and Part
2 specifications, and output a representation of the XSD's attribute declarations, element
declarations, simple type definitions, complex type definitions, import statements, include
statements, attribute group declarations, annotations, other components, and even attributes
with a non-schema namespace. This derived information is intended to accurately reflect the
structure and semantics of the XSD files while also making it possible for ModeShape users to
easily navigate, query and search over this derived information. This sequencer captures the
namespace and names of all referenced components, and will resolve references to
components appearing within the same files.
"xml" parses XML files and extracts the element, attribute, namespace, DTD, entity, comments
and other information in the file, producing a node structure representative of this information.
"zip" extracts the files and folders contained in the ZIP archive file, extracting the files and
folders into the repository using JCR's nt:file and nt:folder built-in node types. The
structure of the output thus matches the logical structure of the contents of the ZIP file. Note
that the resulting files may then be sequenced.
"mp3" processes MP3 audio files added to a repository and extracts the ID3 metadata for the
file, including the track's title, author, album name, year, and comment, and then writes a node
structure representing this information
"fixedwidth" extracts rows and fixed-width columns from text streams and generates a node
structure representative of the rows and column values in each row.
"delimited" extracts rows and delimited columns from text streams and generates a node
structure representative of the rows and column values in each row.
Page 28 of 36
ModeShape 3
Simple configuration
As with ModeShape, Infinispan's minimal configuration is a (basically) empty file:
Minimal Infinispan configuration
<infinispan />
This default configuration will result in a basic, local mode (not replicated or distributed), non-clustered,
in-memory cache. While this cache will make the ModeShape repository be exceedingly fast, it's not the
most practical. So more often than not, you'll want to configure Infinispan to persist information.
Page 29 of 36
ModeShape 3
Keeping information in memory is fast, but sometimes it's desirable to also persist the information
somewhere. Perhaps all of the information is to be persisted, or perhaps only that which can't be kept in
memory is to be persisted. Either way, Infinispan's cache loaders provide a way for Infinispan to write out
the information to an external store. The cache loaders that can persist information are also called cache
stores.
The cache loader system also means that we can use Infinispan even when we don't have a cluster where
Infinispan can replicate or distribute the information. In other words, we can configure an Infinispan cache
store when we're running ModeShape as a single process, and we're still able to persist the information.
Even in this mode, Infinispan will still act as a cache by keeping the most recently used items in-memory.
Here is a simple configuration file for Infinispan that defines a single cache named " Test Repository"
that stores its contents in a Oracle/Sleepycat BerkleyDB database stored on the file system at "
/path/to/bdb":
Sample Infinispan configuration using BerkleyDB cache store
<infinispan
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="urn:infinispan:config:5.1
http://www.infinispan.org/schemas/infinispan-config-5.1.xsd"
xmlns="urn:infinispan:config:5.1">
<!-- Global settings shared by all caches managed by this cache container. -->
<global>
</global>
<!-- The default configuration template for caches. -->
<default>
</default>
<!-- Individually named caches. -->
<namedCache name="Test Repository">
<loaders
passivation="false"
shared="false"
preload="false">
<loader
class="org.infinispan.loaders.bdbje.BdbjeCacheStore"
fetchPersistentState="false"
purgeOnStartup="false">
<properties>
<property name="location" value="/path/to/bdb"/>
</properties>
</loader>
</loaders>
</namedCache>
</infinispan>
When a MdoeShape repository is configured to use this Infinispan cache, all the repository contents will be
persisted to disk (either in the binary store or in the Infinispan cache). Thus, the repository can be shut down
and restarted without loss of any information.
Page 30 of 36
ModeShape 3
Of course other cache stores are available. You can start out using them by replacing the "
org.infinispan.loaders.bdbje.BdbjeCacheStore" value in the Infinispan configuration with
another value. Again, see the Infinispan documentation for the details of how to properly configure cache
stores for your environment and needs.
org.infinispan.loaders.file.FileCacheStore - A simple loader that store information on
the file system. This has severe limitations but is a simple cache loader for testing purposes. Note that
it is not transactional, and it should not be used on NFS or Windows shares that do not properly
implement file locking.
org.infinispan.loaders.bdbje.BdbjeCacheStore - A very fast cache loader that is ideal
when the client application or library can accept the license terms of BerkleyDB.
org.infinispan.loaders.jdbm.JdbmCacheStore - A cache loader that uses JDBM, a free
alternative to BerkleyDB.
org.infinispan.loaders.jdbc.JdbcStringBasedCacheStore - A JDBC-based cache
loader that stores each ModeShape node in a separate row in a simple 4-column table. This isn't as
fast as some other cache loaders, but works very well when the repository content needs to be stored
in a relational database. See the Infinispan documentation for details on configuring the JDBC store.
org.infinispan.loaders.cloud.CloudCacheStore - A cache loader that stores repository
content in Amazon S3, Rackspace Cloudfiles, or any other provider supported by JClouds.
org.infinispan.loaders.remote.RemoteCacheStore - A cache loader that can access a
remote Infinispan data grid.
org.infinispan.loaders.cassandra.CassandraCacheStore - A cache loader that can store
repository content in an Apache Cassandra database. See the Infinispan documentation for the
details on this cache loader.
Page 31 of 36
ModeShape 3
This uses the org.modeshape.jcr.ModeShapeEngine class' no-argument constructor, and then calls
start(), which will block until the engine is running. Since the engine is extremely lightweight, this returns
almost immediately.
At this point we have a running ModeShape engine, but it doesn't contain any repositories. That's next.
Here, the name of the repository will either be defined in the file, or will be " my-repository-config" due
to the name of the file being read. Of course, we can also optionally change the name programmatically:
Optionally set the repository name programmatically
config = config.withName("My Repository");
Once we've read in the configuration, we can validate it to ensure it was constructed correctly. If not, we'll
print out the problems (which will have the line number and description for each error) and simply exit,
although you probably want to do something more useful.
Page 32 of 36
ModeShape 3
Any errors at this point will absolutely prevent deploying a repository, and they need to be dealt with. That's
why the above sample code exits the process if there are errors. However, not everything in the
configuration can be validated at this time. For example, references to CND files or initial content files can
only be dereferenced within a running environment, something which the RepositoryConfiguration
does not have on its own.
So after we determine the configuration has no errors, the next step is to deploy it to our engine:
Deploy the repository to the engine
javax.jcr.Repository repository = engine.deploy(config);
If there are any catastrophic problems, the repository will not successfully deploy and the above method will
throw an exception. If the repository does successfully deploy, then the repository will be in a running state.
Starting with ModeShape 3.6, the repository will record warnings and errors that do not prevent deployment
but which otherwise may be significant problems:
Checking for deployment problems
Problems problems = repository.getStartupProblems();
if (problems.hasErrors() || problems.hasWarnings()) {
System.err.println("Problems deploying the repository.");
System.err.println(problems);
System.exit(-1);
}
Again, your application should handle such errors more gracefully than the sample code above.
After this, at any time we could shutdown the repository and/or we could remove it from the engine. But lets
continue by getting a JCR Session.
Page 33 of 36
ModeShape 3
And at this point, we can use the standard JCR API to obtain a Session and start using the repository:
Create and use a JCR Session
javax.jcr.Session session = repository.login("default");
// Get the root node ...
Node root = session.getRootNode();
assert root != null;
System.out.println("Found the root node in the \"" + session.getWorkspace().getName() + "\"
workspace");
session.logout();
This entire section showed how to use ModeShape to start an engine, deploy a repository, obtain the
repository, create a Session, and then shutdown the repository and the engine. This required the use of
ModeShape-specific classes, which isn't always desirable. In the next section, we'll see how this same
process can be done while only using the standard JCR API.
Page 34 of 36
ModeShape 3
Note how simple this is, while under the covers it is doing exactly the same process we described above.
Here, the parameters contain implementation-specific properties, but your application can easily read them
from a file to keep all implementation-specific details out of your application code.
ModeShape requires one parameter:
Properties file for the ModeShape RepositoryFactory
org.modeshape.jcr.URL = file:path/to/my_repository.json
where the value of the property is the URL that can be resolved to the JSON configuration file. Other URLs
might be to a file on the file system using an absolute path (e.g., "
file:///abs/path/to/my_repository.json") or even a URL to a web server (or governance
repository!) and the configuration file (e.g., "http://www.example.com/repos/my_repository.json
").
At this point using ModeShape just requires using the standard JCR API.
Oh, and if you want to shut down the ModeShape engine, you can (try to) cast the
javax.jcr.RepositoryFactory instance to a org.modeshape.jcr.api.RepositoryFactory
instance. If successful, you can call the "shutdown()" method that returns a Future<Boolean> like the
ModeShapeEngine's shutdown() method.
Page 35 of 36
ModeShape 3
Page 36 of 36