Beruflich Dokumente
Kultur Dokumente
Action: An execution/computation task (Map-Reduce job, Pig job, a shell command). It can
also be referred as task or 'action node'.
Workflow: A collection of actions arranged in a control dependency DAG (Direct Acyclic
Graph). "control dependency" from one action to another means that the second action can't
run until the first action has completed.
Workflow Definition: A programmatic description of a workflow that can be executed.
Workflow Definition Language: The language used to define a Workflow Definition.
Workflow Job: An executable instance of a workflow definition.
Workflow Engine: A system that executes workflows jobs. It can also be referred as a DAG
engine.
1 Specification Highlights
A Workflow application is DAG that coordinates the following types of actions: Hadoop, Pig,
and sub-workflows.
Flow control operations within the workflow applications can be done using decision, fork
and join nodes. Cycles in workflows are not supported.
Actions and decisions can be parameterized with job properties, actions output (i.e. Hadoop
counters) and file information (file exists, file size, etc). Formal parameters are expressed in
the workflow definition as ${VAR} variables.
A Workflow application is a ZIP file that contains the workflow definition (an XML file), all
the necessary files to run all the actions: JAR files for Map/Reduce jobs, shells for streaming
Map/Reduce jobs, native libraries, Pig scripts, and other resource files.
Before running a workflow job, the corresponding workflow application must be deployed in
Oozie.
Deploying workflow application and running workflow jobs can be done via command line
tools, a WS API and a Java API.
Monitoring the system and workflow jobs can be done via a web console, command line
tools, a WS API and a Java API.
When submitting a workflow job, a set of properties resolving all the formal parameters in
the workflow definitions must be provided. This set of properties is a Hadoop configuration.
Possible states for a workflow jobs
are: PREP , RUNNING , SUSPENDED , SUCCEEDED , KILLED and FAILED .
In the case of a action start failure in a workflow job, depending on the type of failure, Oozie
will attempt automatic retries, it will request a manual retry or it will fail the workflow job.
Oozie can make HTTP callback notifications on action start/end/failure events and workflow
end/failure events.
In the case of workflow job failure, the workflow job can be resubmitted skipping previously
completed actions. Before doing a resubmission the workflow application could be updated
with a patch to fix a problem in the workflow application code.
2 Workflow Definition
A workflow definition is a DAG with control flow nodes (start, end, decision, fork, join, kill)
or action nodes (map-reduce, pig, etc.), nodes are connected by transitions arrows.
The workflow definition language is XML based and it is called hPDL (Hadoop Process
Definition Language).
Refer to the Appendix A for theOozie Workflow Definition XML Schema . Appendix B
has Workflow Definition Examples .
3 Workflow Nodes
Workflow nodes are classified in control flow nodes and action nodes:
Control flow nodes: nodes that control the start and end of the workflow and
workflow job execution path.
Node names and transitions must be conform to the following pattern =[a-zA-Z][\-_a-zA-Z00]*=, of up to 20 characters long.
Syntax:
<workflow-app name="[WF-DEF-NAME]" xmlns="uri:oozie:workflow:0.1">
...
<end name="[NODE-NAME]"/>
...
</workflow-app>
The name attribute is the name of the transition to do to end the workflow job.
Example:
<workflow-app name="foo-wf" xmlns="uri:oozie:workflow:0.1">
...
<end name="end"/>
</workflow-app>
The name attribute in the kill node is the name of the Kill action node.
The content of the message element will be logged as the kill reason for the workflow job.
A kill node does not have transition elements because it ends the workflow job, as KILLED .
Example:
<workflow-app name="foo-wf" xmlns="uri:oozie:workflow:0.1">
...
<kill name="killBecauseNoInput">
<message>Input unavailable</message>
</kill>
...
</workflow-app>
Syntax:
<workflow-app name="[WF-DEF-NAME]" xmlns="uri:oozie:workflow:0.1">
...
<decision name="[NODE-NAME]">
<switch>
<case to="[NODE_NAME]">[PREDICATE]</case>
...
<case to="[NODE_NAME]">[PREDICATE]</case>
<default to="[NODE_NAME]"/>
</switch>
</decision>
...
</workflow-app>
The name attribute in the decision node is the name of the decision node.
Each case elements contains a predicate an a transition name. The predicate ELs are
evaluated in order until one returns true and the corresponding transition is taken.
The default element indicates the transition to take if none of the predicates evaluates
to true .
All decision nodes must have a default element to avoid bringing the workflow into an error
state if none of the predicates evaluates to true.
Example:
<workflow-app name="foo-wf" xmlns="uri:oozie:workflow:0.1">
...
<decision name="mydecision">
<switch>
<case to="reconsolidatejob">
1000000 }
${fs:fileSize(secondjobOutputDir) gt 10 * GB}
</case>
<case to="rexpandjob">
${fs:filSize(secondjobOutputDir) lt 100 * MB}
</case>
<case to="recomputejob">
${ hadoop:counters('secondjob')[RECORDS][REDUCE_OUT] lt
</case>
<default to="end"/>
</switch>
</decision>
...
</workflow-app>
The name attribute in the fork node is the name of the workflow fork node.
The start attribute in the path elements in the fork node indicate the name of the workflow
node that will be part of the concurrent execution paths.
The name attribute in the join node is the name of the workflow join node. The to attribute
in the join node indicates the name of the workflow node that will executed after all
concurrent execution paths of the corresponding fork arrive to the join node.
Example:
<workflow-app name="sample-wf" xmlns="uri:oozie:workflow:0.1">
...
<fork name="forking">
<path start="firstparalleljob"/>
<path start="secondparalleljob"/>
</fork>
<action name="firstparallejob">
<map-reduce>
<job-tracker>foo:8021</job-tracker>
<name-node>bar:8020</name-node>
<job-xml>job1.xml</job-xml>
</map-reduce>
<ok to="joining"/>
<error to="kill"/>
</action>
<action name="secondparalleljob">
<map-reduce>
<job-tracker>foo:8021</job-tracker>
<name-node>bar:8020</name-node>
<job-xml>job2.xml</job-xml>
</map-reduce>
<ok to="joining"/>
<error to="kill"/>
</action>
<join name="joining" to="nextaction"/>
...
</workflow-app>
By default, Oozie performs some validation that any forking in a workflow is valid and won't
lead to any incorrect behavior or instability. However, if Oozie is preventing a workflow from
being submitted and you are very certain that it should work, you can disable forkjoin
validation so that Oozie will accept the workflow. To disable this validation just for a specific
workflow, simply set oozie.wf.validate.ForkJoin to false in the job.properties file. To
disable this validation for all workflows, simply set =oozie.validate.ForkJoin= to false in the
oozie-site.xml file. Disabling this validation is determined by the AND of both of these
properties, so it will be disabled if either or both are set to false and only enabled if both are
set to true (or not specified).
workflow job has to wait until the computation/processing task completes before
transitioning to the following node in the workflow.
The exception is the fs action that is handled as a synchronous action.
Oozie can detect completion of computation/processing tasks by two different means,
callbacks and polling.
When a computation/processing tasks is started by Oozie, Oozie provides a unique callback
URL to the task, the task should invoke the given URL to notify its completion.
For cases that the task failed to invoke the callback URL for any reason (i.e. a transient
network failure) or when the type of task cannot invoke the callback URL upon completion,
Oozie has a mechanism to poll computation/processing tasks for completion.
3.2.1.3 Actions Have 2 Transitions, =ok= and =error=
If the failure is of non-transient nature, Oozie will suspend the workflow job until an manual
or programmatic intervention resumes the workflow job and the action start or end is retried.
It is the responsibility of an administrator or an external managing system to perform any
necessary cleanup before resuming the workflow job.
If the failure is an error and a retry will not resolve the problem, Oozie will perform the error
transition for the action.
in the corresponding sub-path. If the path is absolute, the file or archive it is expected in the
given absolute path.
Files specified with the file element, will be symbolic links in the home directory of the
task.
If a file is a native library (an '.so' or a '.so.#' file), it will be symlinked as and '.so' file in the
task running directory, thus available to the task JVM.
To force a symlink for a file on the task running directory, use a '#' followed by the symlink
name. For example 'mycat.sh#cat'.
Refer to Hadoop distributed cache documentation for details more details on files and
archives.
3.2.2.2 Streaming
Some pipe jobs require Files found on HDFS to be available to the mapper/reducer scripts.
This is done using the file and archive elements described in the previous section.
Pipe properties can be overridden by specifying them in the job-xml file
or configuration element.
3.2.2.4 Syntax
The prepare element, if present, indicates a list of path do delete before starting the job. This
should be used exclusively for directory cleanup for the job to be executed. The delete
operation will be performed in the fs.default.name filesystem.
The job-xml element, if present, must refer to a Hadoop JobConf job.xml file bundled in
the workflow application. The job-xml element is optional and as of schema 0.4,
multiple job-xml elements are allowed in order to specify multiple Hadoop
JobConf job.xml files.
The configuration element, if present, contains JobConf properties for the Hadoop job.
Properties specified in the configuration element override properties specified in the file
specified in the job-xml element.
External Stats can be turned on/off by specifying the
property oozie.action.external.stats.write as true or false in the configuration element of
workflow.xml. The default value for this property is false .
The file element, if present, must specify the target sybolic link for binaries by separating
the original file and target with a # (file#target-sym-link). This is not required for libraries.
The mapper and reducer process for streaming jobs, should specify the executable command
with URL encoding. e.g. '%' should be replaced by '%25'.
Example:
<workflow-app name="foo-wf" xmlns="uri:oozie:workflow:0.1">
...
<action name="myfirstHadoopJob">
<map-reduce>
<job-tracker>foo:8021</job-tracker>
<name-node>bar:8020</name-node>
<prepare>
<delete path="hdfs://foo:8020/usr/tucu/output-data"/>
</prepare>
<job-xml>/myfirstjob.xml</job-xml>
<configuration>
<property>
<name>mapred.input.dir</name>
<value>/usr/tucu/input-data</value>
</property>
<property>
<name>mapred.output.dir</name>
<value>/usr/tucu/input-data</value>
</property>
<property>
<name>mapred.reduce.tasks</name>
<value>${firstJobReducers}</value>
</property>
<property>
<name>oozie.action.external.stats.write</name>
<value>true</value>
</property>
</configuration>
</map-reduce>
<ok to="myNextAction"/>
<error to="errorCleanup"/>
</action>
...
</workflow-app>
In the above example, the number of Reducers to be used by the Map/Reduce job has to be
specified as a parameter of the workflow job configuration when creating the workflow job.
Streaming Example:
<workflow-app name="sample-wf" xmlns="uri:oozie:workflow:0.1">
...
<action name="firstjob">
<map-reduce>
<job-tracker>foo:8021</job-tracker>
<name-node>bar:8020</name-node>
<prepare>
<delete path="${output}"/>
</prepare>
<streaming>
<mapper>/bin/bash testarchive/bin/mapper.sh
testfile</mapper>
<reducer>/bin/bash testarchive/bin/reducer.sh</reducer>
</streaming>
<configuration>
<property>
<name>mapred.input.dir</name>
<value>${input}</value>
</property>
<property>
<name>mapred.output.dir</name>
<value>${output}</value>
</property>
<property>
<name>stream.num.map.output.key.fields</name>
<value>3</value>
</property>
</configuration>
<file>/users/blabla/testfile.sh#testfile</file>
<archive>/users/blabla/testarchive.jar#testarchive</archive>
</map-reduce>
<ok to="end"/>
<error to="kill"/>
</action>
...
</workflow-app>
Pipes Example:
<workflow-app name="sample-wf" xmlns="uri:oozie:workflow:0.1">
...
<action name="firstjob">
<map-reduce>
<job-tracker>foo:8021</job-tracker>
<name-node>bar:8020</name-node>
<prepare>
<delete path="${output}"/>
</prepare>
<pipes>
<program>bin/wordcount-simple#wordcount-simple</program>
</pipes>
<configuration>
<property>
<name>mapred.input.dir</name>
<value>${input}</value>
</property>
<property>
<name>mapred.output.dir</name>
<value>${output}</value>
</property>
</configuration>
<archive>/users/blabla/testarchive.jar#testarchive</archive>
</map-reduce>
<ok to="end"/>
<error to="kill"/>
</action>
...
</workflow-app>
...
</configuration>
<script>[PIG-SCRIPT]</script>
<param>[PARAM-VALUE]</param>
...
<param>[PARAM-VALUE]</param>
<file>[FILE-PATH]</file>
...
<archive>[FILE-PATH]</archive>
...
</pig>
<ok to="[NODE-NAME]"/>
<error to="[NODE-NAME]"/>
</action>
...
</workflow-app>
The prepare element, if present, indicates a list of path do delete before starting the job. This
should be used exclusively for directory cleanup for the job to be executed.
The job-xml element, if present, must refer to a Hadoop JobConf job.xml file bundled in
the workflow application. The job-xml element is optional and as of schema 0.4,
multiple job-xml elements are allowed in order to specify multiple Hadoop
JobConf job.xml files.
The configuration element, if present, contains JobConf properties for the underlying
Hadoop jobs.
Properties specified in the configuration element override properties specified in the file
specified in the job-xml element.
External Stats can be turned on/off by specifying the
property oozie.action.external.stats.write as true or false in the configuration element of
workflow.xml. The default value for this property is false .
The inline and job-xml configuration properties are passed to the Hadoop jobs submitted by
Pig runtime.
The script element contains the pig script to execute. The pig script can be templatized with
variables of the form ${VARIABLE} . The values of these variables can then be specified using
the params element.
NOTE: Oozie will perform the parameter substitution before firing the pig job. This is
different from the parameter substitution mechanism provided by Pig , which has a few
limitations.
The params element, if present, contains parameters to be passed to the pig script.
In Oozie schema 0.2: The arguments element, if present, contains arguments to be passed to
the pig script.
...
</workflow-app>
The delete command deletes the specified path, if it is a directory it deletes recursively all
its content and then deletes the directory.
The mkdir command creates the specified directory, it creates all missing directories in the
path. If the directory already exist it does a no-op.
In the move command the source path must exist. The following scenarios are addressed for
a move :
The file system URI(e.g. hdfs://{nameNode}) can be skipped in the target path.
It is understood to be the same as that of the source. But if the target path does
contain the system URI, it cannot be different than that of the source.
For the target path, if it is a file, then it must not already exist.
However, if the target path is an already existing directory, the move action will
place your source as a child of the target directory.
The chmod command changes the permissions for the specified path. Permissions can be
specified using the Unix Symbolic representation (e.g. -rwxrw-rw-) or an octal representation
(755). When doing a chmod command on a directory, by default the command is applied to
the directory and the files one level within the directory. To apply the chmod command to the
directory, without affecting the files within it, the dir-files attribute must be set to false .
To apply the chmodcommand recursively to all levels within a directory, put
a recursive element inside the element.
The touchz command creates a zero length file in the specified path if none exists. If one
already exists, then touchz will perform a touch operation. Touchz works only for absolute
paths.
The chgrp command changes the group for the specified path. When doing
a chgrp command on a directory, by default the command is applied to the directory and the
files one level within the directory. To apply the chgrp command to the directory, without
affecting the files within it, the dir-files attribute must be set tofalse . To apply
the chgrp command recursively to all levels within a directory, put a recursive element
inside the element.
Example:
<workflow-app name="sample-wf" xmlns="uri:oozie:workflow:0.1">
...
<action name="hdfscommands">
<fs>
<delete path='hdfs://foo:8020/usr/tucu/temp-data'/>
<mkdir path='archives/${wf:id()}'/>
In the above example, a directory named after the workflow job ID is created and the input of
the job, passed as workflow configuration parameter, is archived under the previously created
directory.
As of schema 0.4, if a name-node element is specified, then it is not necessary for any of the
paths to start with the file system URI as it is taken from the name-nodeelement. This is also
true if the name-node is specified in the global section (see Global Configurations )
As of schema 0.4, zero or more job-xml elements can be specified; these must refer to
Hadoop JobConf job.xml formatted files bundled in the workflow application. They can be
used to set additional properties for the FileSystem instance.
As of schema 0.4, if a configuration element is specified, then it will also be used to set
additional JobConf properties for the FileSystem instance. Properties specified in
the configuration element override properties specified in the files specified by any jobxml elements.
Example:
<workflow-app name="sample-wf" xmlns="uri:oozie:workflow:0.4">
...
<action name="hdfscommands">
<fs>
<name-node>hdfs://foo:8020</name-node>
<job-xml>fs-info.xml</job-xml>
<configuration>
<property>
<name>some.property</name>
<value>some.value</value>
</property>
</configuration>
<delete path='/usr/tucu/temp-data'/>
</fs>
<ok to="myotherjob"/>
<error to="errorcleanup"/>
</action>
...
</workflow-app>
NOTE: SSH actions are deprecated in Oozie schema 0.1, and removed in Oozie schema
0.2
The ssh action starts a shell command on a remote machine as a remote secure shell in
background. The workflow job will wait until the remote shell command completes before
continuing to the next action.
The shell command must be present in the remote machine and it must be available for
execution via the command path.
The shell command is executed in the home directory of the specified user in the remote host.
The output (STDOUT) of the ssh job can be made available to the workflow job after the ssh
job ends. This information could be used from within decision nodes. If the output of the ssh
job is made available to the workflow job the shell command must follow the following
requirements:
Syntax:
<workflow-app name="[WF-DEF-NAME]" xmlns="uri:oozie:workflow:0.1">
...
<action name="[NODE-NAME]">
<ssh>
<host>[USER]@[HOST]</host>
<command>[SHELL]</command>
<args>[ARGUMENTS]</args>
...
<capture-output/>
</ssh>
<ok to="[NODE-NAME]"/>
<error to="[NODE-NAME]"/>
</action>
...
</workflow-app>
The host indicates the user and host where the shell will be executed.
The command element indicates the shell command to execute.
The args element, if present, contains parameters to be passed to the shell command. If more
than one args element is present they are concatenated in order.
If the capture-output element is present, it indicates Oozie to capture output of the
STDOUT of the ssh command execution. The ssh command output must be in Java
Properties file format and it must not exceed 2KB. From within the workflow definition, the
output of an ssh action node is accessible via the String action:output(String node,
String key) function (Refer to section '4.2.6 Action EL Functions').
The configuration of the ssh action can be parameterized (templatized) using EL expressions.
Example:
<workflow-app name="sample-wf" xmlns="uri:oozie:workflow:0.1">
...
<action name="myssjob">
<ssh>
<host>foo@bar.com<host>
<command>uploaddata</command>
<args>jdbc:derby://bar.com:1527/myDB</args>
<args>hdfs://foobar.com:8020/usr/tucu/myData</args>
</ssh>
<ok to="myotherjob"/>
<error to="errorcleanup"/>
</action>
...
</workflow-app>
In the above example, the uploaddata shell command is executed with two
arguments, jdbc:derby://foo.com:1527/myDB and hdfs://foobar.com:8020/usr/tucu/
myData .
The uploaddata shell must be available in the remote host and available in the command
path.
The output of the command will be ignored because the capture-output element is not
present.
...
</configuration>
</sub-workflow>
<ok to="[NODE-NAME]"/>
<error to="[NODE-NAME]"/>
</action>
...
</workflow-app>
The child workflow job runs in the same Oozie system instance where the parent workflow
job is running.
The app-path element specifies the path to the workflow application of the child workflow
job.
The propagate-configuration flag, if present, indicates that the workflow job
configuration should be propagated to the child workflow.
The configuration section can be used to specify the job properties that are required to run
the child workflow job.
The configuration of the sub-workflow action can be parameterized (templatized) using EL
expressions.
Example:
<workflow-app name="sample-wf" xmlns="uri:oozie:workflow:0.1">
...
<action name="a">
<sub-workflow>
<app-path>child-wf</app-path>
<configuration>
<property>
<name>input.dir</name>
<value>${wf:id()}/second-mr-output</value>
</property>
</configuration>
</sub-workflow>
<ok to="end"/>
<error to="kill"/>
</action>
...
</workflow-app>
In the above example, the workflow definition with the name child-wf will be run on the
Oozie instance at .http://myhost:11000/oozie . The specified workflow application must
be already deployed on the target Oozie instance.
A configuration parameter input.dir is being passed as job property to the child workflow
job.
The subworkflow can inherit the lib jars from the parent workflow by
setting oozie.subworkflow.classpath.inheritance to true in oozie-site.xml or on a perjob basis by setting oozie.wf.subworkflow.classpath.inheritance to true in a
job.properties file. If both are
specified, oozie.wf.subworkflow.classpath.inheritance has priority. If the
subworkflow and the parent have conflicting jars, the subworkflow's jar has priority. By
default, oozie.wf.subworkflow.classpath.inheritance is set to true.
As with map-reduce and pig actions, it is possible to add files and archives to be available to
the Java application. Refer to section [#FilesAchives][Adding Files and Archives for the Job].
The capture-output element can be used to propagate values back into Oozie context,
which can then be accessed via EL-functions. This needs to be written out as a java properties
format file. The filename is obtained via a System property specified by the
constant JavaMainMapper.OOZIE_JAVA_MAIN_CAPTURE_OUTPUT_FILE
IMPORTANT: In order for a Java action to succeed on a secure cluster, it must propagate
the Hadoop delegation token like in the following code snippet (this is benign on non-secure
clusters):
// propagate delegation related props from launcher job to MR job
if (System.getenv("HADOOP_TOKEN_FILE_LOCATION") != null) {
jobConf.set("mapreduce.job.credentials.binary",
System.getenv("HADOOP_TOKEN_FILE_LOCATION"));
}
IMPORTANT: Because the Java application is run from within a Map-Reduce job, from
Hadoop 0.20. onwards a queue must be assigned to it. The queue name must be specified as a
configuration property.
Syntax:
<workflow-app name="[WF-DEF-NAME]" xmlns="uri:oozie:workflow:0.1">
...
<action name="[NODE-NAME]">
<java>
<job-tracker>[JOB-TRACKER]</job-tracker>
<name-node>[NAME-NODE]</name-node>
<prepare>
<delete path="[PATH]"/>
...
<mkdir path="[PATH]"/>
...
</prepare>
<job-xml>[JOB-XML]</job-xml>
<configuration>
<property>
<name>[PROPERTY-NAME]</name>
<value>[PROPERTY-VALUE]</value>
</property>
...
</configuration>
<main-class>[MAIN-CLASS]</main-class>
<java-opts>[JAVA-STARTUP-OPTS]</java-opts>
<arg>ARGUMENT</arg>
...
<file>[FILE-PATH]</file>
...
<archive>[FILE-PATH]</archive>
...
<capture-output />
</java>
<ok to="[NODE-NAME]"/>
<error to="[NODE-NAME]"/>
</action>
...
</workflow-app>
The prepare element, if present, indicates a list of path do delete before starting the Java
application. This should be used exclusively for directory cleanup for the Java application to
be executed.
The java-opts element, if present, contains the command line parameters which are to be
used to start the JVM that will execute the Java application. Using this element is equivalent
to use the mapred.child.java.opts configuration property.
The arg elements, if present, contains arguments for the main function. The value of
each arg element is considered a single argument and they are passed to themain method in
the same order.
All the above elements can be parameterized (templatized) using EL expressions.
Example:
<workflow-app name="sample-wf" xmlns="uri:oozie:workflow:0.1">
...
<action name="myfirstjavajob">
<java>
<job-tracker>foo:8021</job-tracker>
<name-node>bar:8020</name-node>
<prepare>
<delete path="${jobOutput}"/>
</prepare>
<configuration>
<property>
<name>mapred.queue.name</name>
<value>default</value>
</property>
</configuration>
<main-class>org.apache.oozie.MyFirstMainClass</main-class>
<java-opts>-Dblah</java-opts>
<arg>argument1</arg>
<arg>argument2</arg>
</java>
<ok to="myotherjob"/>
<error to="errorcleanup"/>
</action>
...
</workflow-app>
3.2.7.1 Overriding an action's Main class
This feature is useful for developers to change the Main classes without having to recompile
or redeploy Oozie.
For most actions (not just the Java action), you can override the Main class it uses by
specifying the following configuration property and making sure that your class is
included in the workflow's classpath. If you specify this in the Java action, the main-class
element has priority.
<property>
<name>oozie.launcher.action.main.class</name>
<value>org.my.CustomMain</value>
</property>
Note: Most actions typically pass information to their corresponding Main in specific ways;
you should look at the action's existing Main to see how it works before creating your own.
In fact, its probably simplest to just subclass the existing Main and add/modify/overwrite any
behavior you want to change.
4 Parameterization of Workflows
Workflow definitions can be parameterized.
When workflow node is executed by Oozie all the ELs are resolved into concrete values.
The parameterization of workflow definitions it done using JSP Expression Language syntax
from the JSP 2.0 Specification (JSP.2.3) , allowing not only to support variables as
parameters but also functions and complex expressions.
EL expressions can be used in the configuration values of action and decision nodes. They
can be used in XML attribute values and in XML element and attribute values.
They cannot be used in XML element and attribute names. They cannot be used in the name
of a node and they cannot be used within the transition elements of a node.
Using properties that are valid Java identifiers result in a more readable and compact
definition.
By using properties *Example:*
Parameterized Workflow definition:
<workflow-app name='hello-wf' xmlns="uri:oozie:workflow:0.1">
...
<action name='firstjob'>
<map-reduce>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.mapper.class</name>
<value>com.foo.FirstMapper</value>
</property>
<property>
<name>mapred.reducer.class</name>
<value>com.foo.FirstReducer</value>
</property>
<property>
<name>mapred.input.dir</name>
<value>${inputDir}</value>
</property>
<property>
<name>mapred.output.dir</name>
<value>${outputDir}</value>
</property>
</configuration>
</map-reduce>
<ok to='secondjob'/>
<error to='killcleanup'/>
</action>
...
</workflow-app>
When submitting a workflow job for the workflow definition above, 3 workflow job
properties must be specified:
jobTracker:
inputDir:
outputDir:
If the above 3 properties are not specified, the job will fail.
As of schema 0.4, a list of formal parameters can be provided which will allow Oozie to
verify, at submission time, that said properties are actually specified (i.e. before the job is
executed and fails). Default values can also be provided.
Example:
In the above example, if inputDir is not specified, Oozie will print an error message instead
of submitting the job. If =outputDir= is not specified, Oozie will use the default value, outdir .
It returns the URL UTF-8 encoded value of the given string. A string with null value is
considered as an empty string.
String timestamp()
It returns the UTC current date and time in W3C format down to the second (YYYY-MMDDThh:mm:ss.sZ). I.e.: 1997-07-16T19:20:30.45Z
String toJsonStr(Map) (since Oozie 3.3)
It returns an XML encoded JSON representation of a Map. This function is useful to encode
as a single property the complete action-data of an action,wf:actionData(String
actionName) , in order to pass it in full to another action.
String toPropertiesStr(Map) (since Oozie 3.3)
It returns an XML encoded Properties representation of a Map. This function is useful to
encode as a single property the complete action-data of an action,wf:actionData(String
actionName) , in order to pass it in full to another action.
String toConfigurationStr(Map) (since Oozie 3.3)
It returns an XML encoded Configuration representation of a Map. This function is useful to
encode as a single property the complete action-data of an action,wf:actionData(String
actionName) , in order to pass it in full to another action.
It returns the user name that started the current workflow job.
String wf:group()
It returns the group/ACL for the current workflow job.
String wf:callback(String stateVar)
It returns the callback URL for the current workflow action node, stateVar can be a valid
exit state (=OK= or ERROR ) for the action or a token to be replaced with the exit state by the
remote system executing the task.
String wf:transition(String node)
It returns the transition taken by the specified workflow action node, or an empty string if the
action has not being executed or it has not completed yet.
String wf:lastErrorNode()
It returns the name of the last workflow action node that exit with an ERROR exit state, or an
empty string if no a ction has exited with ERROR state in the current workflow job.
String wf:errorCode(String node)
It returns the error code for the specified action node, or an empty string if the action node
has not exited with ERROR state.
Each type of action node must define its complete error code list.
String wf:errorMessage(String message)
It returns the error message for the specified action node, or an empty string if no action node
has not exited with ERROR state.
The error message can be useful for debugging and notification purposes.
int wf:run()
It returns the run number for the current workflow job, normally 0 unless the workflow job is
re-run, in which case indicates the current run.
Map wf:actionData(String node)
This function is only applicable to action nodes that produce output data on completion.
The output data is in a Java Properties format and via this EL function it is available as
a Map .
int wf:actionExternalId(String node)
It returns the external Id for an action node, or an empty string if the action has not being
executed or it has not completed yet.
int wf:actionTrackerUri(String node)
It returns the tracker URIfor an action node, or an empty string if the action has not being
executed or it has not completed yet.
int wf:actionExternalStatus(String node)
It returns the external status for an action node, or an empty string if the action has not being
executed or it has not completed yet.
This function can also be used to access specific action statistics information. Examples of
action stats and their usage through EL Functions (referenced in workflow xml) are given
below.
Example of MR action stats:
{
"ACTION_TYPE": "MAP_REDUCE",
"org.apache.hadoop.mapred.JobInProgress$Counter": {
"TOTAL_LAUNCHED_REDUCES": 1,
"TOTAL_LAUNCHED_MAPS": 1,
"DATA_LOCAL_MAPS": 1
},
"FileSystemCounters": {
"FILE_BYTES_READ": 1746,
"HDFS_BYTES_READ": 1409,
"FILE_BYTES_WRITTEN": 3524,
"HDFS_BYTES_WRITTEN": 1547
},
"org.apache.hadoop.mapred.Task$Counter": {
"REDUCE_INPUT_GROUPS": 33,
"COMBINE_OUTPUT_RECORDS": 0,
"MAP_INPUT_RECORDS": 33,
"REDUCE_SHUFFLE_BYTES": 0,
"REDUCE_OUTPUT_RECORDS": 33,
"SPILLED_RECORDS": 66,
"MAP_OUTPUT_BYTES": 1674,
"MAP_INPUT_BYTES": 1409,
"MAP_OUTPUT_RECORDS": 33,
"COMBINE_INPUT_RECORDS": 0,
"REDUCE_INPUT_RECORDS": 33
}
Below is the workflow that describes how to access specific information using
hadoop:counters() EL function from the MR stats. *Workflow xml:*
<workflow-app xmlns="uri:oozie:workflow:0.2" name="map-reduce-wf">
<start to="mr-node"/>
<action name="mr-node">
<map-reduce>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<prepare>
<delete path="${nameNode}/user/${wf:user()}/$
{examplesRoot}/output-data/${outputDir}"/>
</prepare>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
<property>
<name>mapred.mapper.class</name>
<value>org.apache.oozie.example.SampleMapper</value>
</property>
<property>
<name>mapred.reducer.class</name>
<value>org.apache.oozie.example.SampleReducer</value>
</property>
<property>
<name>mapred.map.tasks</name>
<value>1</value>
</property>
<property>
<name>mapred.input.dir</name>
<value>/user/${wf:user()}/${examplesRoot}/inputdata/text</value>
</property>
<property>
<name>mapred.output.dir</name>
<value>/user/${wf:user()}/${examplesRoot}/output-data/$
{outputDir}</value>
</property>
<property>
<name>oozie.action.external.stats.write</name>
<value>true</value>
</property>
</configuration>
</map-reduce>
<ok to="java1"/>
<error to="fail"/>
</action>
<action name="java1">
<java>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
<main-class>MyTest</main-class>
<arg> ${hadoop:counters("mr-node")["FileSystemCounters"]
["FILE_BYTES_READ"]}</arg>
<capture-output/>
</java>
<ok to="end" />
<error to="fail" />
</action>
<kill name="fail">
<message>Map/Reduce failed, error message[$
{wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>
</workflow-app>
"RECORD_WRITTEN": "33",
"BYTES_WRITTEN": "1410",
"HADOOP_VERSION": "0.20.2",
"SCRIPT_ID": "bbe016e9-f678-43c3-96fc-f24359957582",
"PROACTIVE_SPILL_COUNT_RECORDS": "0",
"PROACTIVE_SPILL_COUNT_OBJECTS": "0",
"RETURN_CODE": "0",
"ERROR_CODE": "-1",
"SMM_SPILL_COUNT": "0",
"DURATION": "36850",
"job_201112191708_0008": {
"MAP_INPUT_RECORDS": "33",
"MIN_REDUCE_TIME": "0",
"MULTI_STORE_COUNTERS": {},
"MAX_REDUCE_TIME": "0",
"NUMBER_REDUCES": "0",
"ERROR_MESSAGE": null,
"RECORD_WRITTEN": "33",
"HDFS_BYTES_WRITTEN": "1410",
"JOB_ID": "job_201112191708_0008",
"REDUCE_INPUT_RECORDS": "0",
"AVG_REDUCE_TIME": "0",
"MAX_MAP_TIME": "9169",
"BYTES_WRITTEN": "1410",
"Alias": "A,B",
"REDUCE_OUTPUT_RECORDS": "0",
"SMMS_SPILL_COUNT": "0",
"PROACTIVE_SPILL_COUNT_RECS": "0",
"PROACTIVE_SPILL_COUNT_OBJECTS": "0",
"HADOOP_COUNTERS": null,
"MIN_MAP_TIME": "9169",
"MAP_OUTPUT_RECORDS": "33",
"AVG_MAP_TIME": "9169",
"FEATURE": "MAP_ONLY",
"NUMBER_MAPS": "1"
}
}
Below is the workflow that describes how to access specific information using
hadoop:counters() EL function from the Pig stats. *Workflow xml:*
<workflow-app xmlns="uri:oozie:workflow:0.2" name="pig-wf">
<start to="pig-node"/>
<action name="pig-node">
<pig>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<prepare>
<delete path="${nameNode}/user/${wf:user()}/$
{examplesRoot}/output-data/pig"/>
</prepare>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
<property>
<name>mapred.compress.map.output</name>
<value>true</value>
</property>
<property>
<name>oozie.action.external.stats.write</name>
<value>true</value>
</property>
</configuration>
<script>id.pig</script>
<param>INPUT=/user/${wf:user()}/${examplesRoot}/inputdata/text</param>
<param>OUTPUT=/user/${wf:user()}/${examplesRoot}/outputdata/pig</param>
</pig>
<ok to="java1"/>
<error to="fail"/>
</action>
<action name="java1">
<java>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
<main-class>MyTest</main-class>
<arg> ${hadoop:counters("pig-node")["JOB_GRAPH"]}</arg>
<capture-output/>
</java>
<ok to="end" />
<error to="fail" />
</action>
<kill name="fail">
<message>Pig failed, error message[$
{wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>
</workflow-app>
<property>
<name>mapred.compress.map.output</name>
<value>true</value>
</property>
</configuration>
<script>id.pig</script>
<param>INPUT=/user/${wf:user()}/${examplesRoot}/inputdata/text</param>
<param>OUTPUT=/user/${wf:user()}/${examplesRoot}/outputdata/pig</param>
</pig>
<ok to="java1"/>
<error to="fail"/>
</action>
<action name="java1">
<java>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
<main-class>MyTest</main-class>
<arg> ${wf:actionData("pig-node")["hadoopJobs"]}</arg>
<capture-output/>
</java>
<ok to="end" />
<error to="fail" />
</action>
<kill name="fail">
<message>Pig failed, error message[$
{wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>
</workflow-app>
It returns the size in bytes of all the files in the specified path. If the path is not a directory, or
if it does not exist it returns -1. It does not work recursively, only computes the size of the
files under the specified path.
boolean fs:fileSize(String path)
It returns the size in bytes of specified file. If the path is not a file, or if it does not exist it
returns -1.
boolean fs:blockSize(String path)
It returns the block size in bytes of specified file. If the path is not a file, or if it does not exist
it returns -1.
5 Workflow Notifications
Workflow jobs can be configured to make an HTTP GET notification upon start and end of a
workflow action node and upon the completion of a workflow job.
Oozie will make a best effort to deliver the notifications, in case of failure it will retry the
notification a pre-configured number of times at a pre-configured interval before giving up.
See also Coordinator Notifications
If the URL contains any of the following tokens, they will be replaced with the actual values
by Oozie before making the notification:
$status : If the action has not completed yet, it contains the action status 'S:'. If
the action has ended, it contains the action transition 'T:'
6 User Propagation
When submitting a workflow job, the configuration must contain a user.name property. If
security is enabled, Oozie must ensure that the value of the user.nameproperty in the
configuration match the user credentials present in the protocol (web services) request.
When submitting a workflow job, the configuration may contain
the oozie.job.acl property (the group.name property has been deprecated). If
authorization is enabled, this property is treated as as the ACL for the job, it can contain user
and group IDs separated by commas.
The specified user and ACL are assigned to the created job.
Oozie must propagate the specified user and ACL to the system executing the actions.
It is not allowed for map-reduce, pig and fs actions to override user/group information.
A workflow application must contain at least the workflow definition, the workflow.xml file.
All configuration files and scripts (Pig and shell) needed by the workflow action nodes
should be under the application HDFS directory.
All the JAR files and native libraries within the application 'lib/' directory are automatically
added to the map-reduce and pig jobs classpath and LD_PATH .
Additional JAR files and native libraries not present in the application 'lib/' directory can be
specified in map-reduce and pig actions with the 'file' element (refer to the map-reduce and
pig documentation).
For Map-Reduce jobs (not including streaming or pipes), additional jar files can also be
included via an uber jar. An uber jar is a jar file that contains additional jar files within a "lib"
folder. To let Oozie know about an uber jar, simply specify it with
the oozie.mapreduce.uber.jar configuration property and Oozie will tell Hadoop
MapReduce that it is an uber jar. The ability to specify an uber jar is governed by
the oozie.action.mapreduce.uber.jar.enable property in oozie-site.xml . See Oozie
Install for more information.
<action name="mr-node">
<map-reduce>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>oozie.mapreduce.uber.jar</name>
<value>${nameNode}/user/${wf:user()}/my-uberjar.jar</value>
</property>
</configuration>
</map-reduce>
<ok to="end"/>
<error to="fail"/>
</action>
The config-default.xml file defines, if any, default values for the workflow job
parameters. This file must be in the Hadoop Configuration XML format. EL expressions are
not supported and user.name property cannot be specified in this file.
Any other resources like job.xml files referenced from a workflow action action node must
be included under the corresponding path, relative paths always start from the root of the
workflow application.
When a CREATED workflow job is started it goes into RUNNING state, it will remain
in RUNNING state while it does not reach its end state, ends in error or it is suspended.
RUNNING:
When a RUNNING workflow job reaches the end node it ends reaching
the SUCCEEDED final state.
SUCCEEDED:
When a RUNNING workflow job fails due to an unexpected error it ends reaching
the FAILED final state.
FAILED:
--> PREP
12 Client API
Oozie provides a Java Client API that allows to perform all common workflow job
operations.
The client API includes a LocalOozie class useful for testing a workflow from within an IDE
and for unit testing purposes.
The Client API is implemented as a client of the Web Services API.
14 Web UI Console
Oozie provides a read-only Web based console that allows to allow to monitor Oozie system
status, workflow applications status and workflow jobs status.
The Web base console is implemented as a client of the Web Services API.
19 Global Configurations
Oozie allows a global section to reduce the redundant job-tracker and name-node declarations
for each action. The user can define a global section in the beginning of the workflow.xml .
The global section may contain the job-xml, configuration, job-tracker, or name-node that the
user would like to set for every action. If a user then redefines one of these in a specific
action node, Oozie will update use the specific declaration instead of the global one for that
action.
Example of a global element:
<workflow-app xmlns="uri:oozie:workflow:0.4" name="wf-name">
<global>
<job-tracker>${job-tracker}</job-tracker>
<name-node>${namd-node}</name-node>
<job-xml>job1.xml</job-xml>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
</global>
...
20 Suspend On Nodes
Specifying oozie.suspend.on.nodes in a job.properties file lets users specify a list of
actions that will cause Oozie to automatically suspend the workflow upon reaching any of
those actions; like breakpoints in a debugger. To continue running the workflow, the user
simply uses the -resume command from the Oozie CLI . Specifying a * will cause Oozie to
suspend the workflow on all nodes.
For example:
oozie.suspend.on.nodes=mr-node,my-pig-action,my-fork
Specifying the above in a job.properties file will cause Oozie to suspend the workflow when
any of those three actions are about to be executed.
Appendixes
Appendix A, Oozie XML-Schema
Oozie Schema Version 0.4.5
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:workflow="uri:oozie:workflow:0.4.5"
elementFormDefault="qualified"
targetNamespace="uri:oozie:workflow:0.4.5">
<xs:element name="workflowapp" type="workflow:WORKFLOW-APP"/>
<xs:simpleType name="IDENTIFIER">
<xs:restriction base="xs:string">
<xs:pattern value="([a-zA-Z_]([\-_a-zA-Z0-9])*){1,39}"/>
</xs:restriction>
</xs:simpleType>
<xs:complexType name="WORKFLOW-APP">
<xs:sequence>
<xs:element name="parameters" type="workflow:PARAMETERS"
minOccurs="0" maxOccurs="1"/>
<xs:element name="global" type="workflow:GLOBAL" minOccurs="0"
maxOccurs="1"/>
<xs:element name="credentials" type="workflow:CREDENTIALS"
minOccurs="0" maxOccurs="1"/>
<xs:element name="start" type="workflow:START" minOccurs="1"
maxOccurs="1"/>
<xs:choice minOccurs="0" maxOccurs="unbounded">
<xs:element name="decision" type="workflow:DECISION"
minOccurs="1" maxOccurs="1"/>
<xs:element name="fork" type="workflow:FORK" minOccurs="1"
maxOccurs="1"/>
maxOccurs="1"/>
maxOccurs="1"/>
<xs:sequence>
<xs:element name="case" type="workflow:CASE" minOccurs="1"
maxOccurs="unbounded"/>
<xs:element name="default" type="workflow:DEFAULT"
minOccurs="1" maxOccurs="1"/>
</xs:sequence>
</xs:sequence>
</xs:complexType>
<xs:complexType name="CASE">
<xs:simpleContent>
<xs:extension base="xs:string">
<xs:attribute name="to" type="workflow:IDENTIFIER"
use="required"/>
</xs:extension>
</xs:simpleContent>
</xs:complexType>
<xs:complexType name="DEFAULT">
<xs:attribute name="to" type="workflow:IDENTIFIER" use="required"/>
</xs:complexType>
<xs:complexType name="FORK_TRANSITION">
<xs:attribute name="start" type="workflow:IDENTIFIER"
use="required"/>
</xs:complexType>
<xs:complexType name="FORK">
<xs:sequence>
<xs:element name="path" type="workflow:FORK_TRANSITION"
minOccurs="2" maxOccurs="unbounded"/>
</xs:sequence>
<xs:attribute name="name" type="workflow:IDENTIFIER"
use="required"/>
</xs:complexType>
<xs:complexType name="JOIN">
<xs:attribute name="name" type="workflow:IDENTIFIER"
use="required"/>
<xs:attribute name="to" type="workflow:IDENTIFIER" use="required"/>
</xs:complexType>
<xs:element name="kill" type="workflow:KILL"/>
<xs:complexType name="KILL">
<xs:sequence>
<xs:element name="message" type="xs:string" minOccurs="1"
maxOccurs="1"/>
</xs:sequence>
<xs:attribute name="name" type="workflow:IDENTIFIER"
use="required"/>
</xs:complexType>
<xs:complexType name="ACTION_TRANSITION">
<xs:attribute name="to" type="workflow:IDENTIFIER" use="required"/>
</xs:complexType>
<xs:element name="map-reduce" type="workflow:MAP-REDUCE"/>
<xs:element name="pig" type="workflow:PIG"/>
<xs:element name="sub-workflow" type="workflow:SUB-WORKFLOW"/>
<xs:element name="fs" type="workflow:FS"/>
<xs:element name="java" type="workflow:JAVA"/>
<xs:complexType name="ACTION">
<xs:sequence>
<xs:choice minOccurs="1" maxOccurs="1">
<xs:element name="map-reduce" type="workflow:MAP-REDUCE"
minOccurs="1" maxOccurs="1"/>
<xs:element name="pig" type="workflow:PIG" minOccurs="1"
maxOccurs="1"/>
</xs:complexType>
<xs:complexType name="DELETE">
<xs:attribute name="path" type="xs:string" use="required"/>
</xs:complexType>
<xs:complexType name="MKDIR">
<xs:attribute name="path" type="xs:string" use="required"/>
</xs:complexType>
<xs:complexType name="MOVE">
<xs:attribute name="source" type="xs:string" use="required"/>
<xs:attribute name="target" type="xs:string" use="required"/>
</xs:complexType>
<xs:complexType name="CHMOD">
<xs:sequence>
<xs:element name="recursive" type="workflow:FLAG" minOccurs="0"
maxOccurs="1"></xs:element>
</xs:sequence>
<xs:attribute name="path" type="xs:string" use="required"/>
<xs:attribute name="permissions" type="xs:string" use="required"/>
<xs:attribute name="dir-files" type="xs:string"/>
</xs:complexType>
<xs:complexType name="TOUCHZ">
<xs:attribute name="path" type="xs:string" use="required"/>
</xs:complexType>
<xs:complexType name="CHGRP">
<xs:sequence>
<xs:element name="recursive" type="workflow:FLAG" minOccurs="0"
maxOccurs="1"></xs:element>
</xs:sequence>
<xs:attribute name="path" type="xs:string" use="required"/>
<xs:attribute name="group" type="xs:string" use="required"/>
<xs:attribute name="dir-files" type="xs:string"/>
</xs:complexType>
<xs:complexType name="CREDENTIALS">
<xs:sequence minOccurs="0" maxOccurs="unbounded">
<xs:element name="credential" type="workflow:CREDENTIAL"/>
</xs:sequence>
</xs:complexType>
<xs:complexType name="CREDENTIAL">
<xs:sequence minOccurs="0" maxOccurs="unbounded" >
<xs:element name="property" minOccurs="1"
maxOccurs="unbounded">
<xs:complexType>
<xs:sequence>
<xs:element name="name" minOccurs="1"
maxOccurs="1" type="xs:string"/>
<xs:element name="value" minOccurs="1"
maxOccurs="1" type="xs:string"/>
<xs:element name="description" minOccurs="0"
maxOccurs="1" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
<xs:attribute name="name" type="xs:string" use="required"/>
<xs:attribute name="type" type="xs:string" use="required"/>
</xs:complexType>
</xs:schema>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:workflow="uri:oozie:workflow:0.4"
elementFormDefault="qualified"
targetNamespace="uri:oozie:workflow:0.4">
<xs:element name="workflowapp" type="workflow:WORKFLOW-APP"/>
<xs:simpleType name="IDENTIFIER">
<xs:restriction base="xs:string">
<xs:pattern value="([a-zA-Z_]([\-_a-zA-Z0-9])*){1,39}"/>
</xs:restriction>
</xs:simpleType>
<xs:complexType name="WORKFLOW-APP">
<xs:sequence>
<xs:element name="parameters" type="workflow:PARAMETERS"
minOccurs="0" maxOccurs="1"/>
<xs:element name="global" type="workflow:GLOBAL" minOccurs="0"
maxOccurs="1"/>
<xs:element name="credentials" type="workflow:CREDENTIALS"
minOccurs="0" maxOccurs="1"/>
<xs:element name="start" type="workflow:START" minOccurs="1"
maxOccurs="1"/>
<xs:choice minOccurs="0" maxOccurs="unbounded">
<xs:element name="decision" type="workflow:DECISION"
minOccurs="1" maxOccurs="1"/>
<xs:element name="fork" type="workflow:FORK" minOccurs="1"
maxOccurs="1"/>
<xs:element name="join" type="workflow:JOIN" minOccurs="1"
maxOccurs="1"/>
<xs:element name="kill" type="workflow:KILL" minOccurs="1"
maxOccurs="1"/>
<xs:element name="action" type="workflow:ACTION"
minOccurs="1" maxOccurs="1"/>
</xs:choice>
<xs:element name="end" type="workflow:END" minOccurs="1"
maxOccurs="1"/>
<xs:any namespace="uri:oozie:sla:0.1" minOccurs="0"
maxOccurs="1"/>
</xs:sequence>
<xs:attribute name="name" type="xs:string" use="required"/>
</xs:complexType>
<xs:complexType name="PARAMETERS">
<xs:sequence>
<xs:element name="property" minOccurs="1"
maxOccurs="unbounded">
<xs:complexType>
<xs:sequence>
<xs:element name="name" minOccurs="1" maxOccurs="1"
type="xs:string"/>
<xs:element name="value" minOccurs="0"
maxOccurs="1" type="xs:string"/>
<xs:element name="description" minOccurs="0"
maxOccurs="1" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
<xs:complexType name="GLOBAL">
<xs:sequence>
<xs:element name="job-tracker" type="xs:string" minOccurs="0"
maxOccurs="1"/>
</xs:complexType>
<xs:element name="kill" type="workflow:KILL"/>
<xs:complexType name="KILL">
<xs:sequence>
<xs:element name="message" type="xs:string" minOccurs="1"
maxOccurs="1"/>
</xs:sequence>
<xs:attribute name="name" type="workflow:IDENTIFIER"
use="required"/>
</xs:complexType>
<xs:complexType name="ACTION_TRANSITION">
<xs:attribute name="to" type="workflow:IDENTIFIER" use="required"/>
</xs:complexType>
<xs:element name="map-reduce" type="workflow:MAP-REDUCE"/>
<xs:element name="pig" type="workflow:PIG"/>
<xs:element name="sub-workflow" type="workflow:SUB-WORKFLOW"/>
<xs:element name="fs" type="workflow:FS"/>
<xs:element name="java" type="workflow:JAVA"/>
<xs:complexType name="ACTION">
<xs:sequence>
<xs:choice minOccurs="1" maxOccurs="1">
<xs:element name="map-reduce" type="workflow:MAP-REDUCE"
minOccurs="1" maxOccurs="1"/>
<xs:element name="pig" type="workflow:PIG" minOccurs="1"
maxOccurs="1"/>
<xs:element name="sub-workflow" type="workflow:SUBWORKFLOW" minOccurs="1" maxOccurs="1"/>
<xs:element name="fs" type="workflow:FS" minOccurs="1"
maxOccurs="1"/>
<xs:element name="java" type="workflow:JAVA" minOccurs="1"
maxOccurs="1"/>
<xs:any namespace="##other" minOccurs="1" maxOccurs="1"/>
</xs:choice>
<xs:element name="ok" type="workflow:ACTION_TRANSITION"
minOccurs="1" maxOccurs="1"/>
<xs:element name="error" type="workflow:ACTION_TRANSITION"
minOccurs="1" maxOccurs="1"/>
<xs:any namespace="uri:oozie:sla:0.1" minOccurs="0"
maxOccurs="1"/>
</xs:sequence>
<xs:attribute name="name" type="workflow:IDENTIFIER"
use="required"/>
<xs:attribute name="cred" type="xs:string"/>
<xs:attribute name="retry-max" type="xs:string"/>
<xs:attribute name="retry-interval" type="xs:string"/>
</xs:complexType>
<xs:complexType name="MAP-REDUCE">
<xs:sequence>
<xs:element name="job-tracker" type="xs:string" minOccurs="1"
maxOccurs="1"/>
<xs:element name="name-node" type="xs:string" minOccurs="1"
maxOccurs="1"/>
<xs:element name="prepare" type="workflow:PREPARE"
minOccurs="0" maxOccurs="1"/>
<xs:choice minOccurs="0" maxOccurs="1">
<xs:element name="streaming" type="workflow:STREAMING"
minOccurs="0" maxOccurs="1"/>
<xs:element name="pipes" type="workflow:PIPES"
minOccurs="0" maxOccurs="1"/>
</xs:choice>
<xs:complexType name="JAVA">
<xs:sequence>
<xs:element name="job-tracker" type="xs:string" minOccurs="1"
maxOccurs="1"/>
<xs:element name="name-node" type="xs:string" minOccurs="1"
maxOccurs="1"/>
<xs:element name="prepare" type="workflow:PREPARE"
minOccurs="0" maxOccurs="1"/>
<xs:element name="job-xml" type="xs:string" minOccurs="0"
maxOccurs="unbounded"/>
<xs:element name="configuration" type="workflow:CONFIGURATION"
minOccurs="0" maxOccurs="1"/>
<xs:element name="main-class" type="xs:string" minOccurs="1"
maxOccurs="1"/>
<xs:element name="java-opts" type="xs:string" minOccurs="0"
maxOccurs="1"/>
<xs:element name="arg" type="xs:string" minOccurs="0"
maxOccurs="unbounded"/>
<xs:element name="file" type="xs:string" minOccurs="0"
maxOccurs="unbounded"/>
<xs:element name="archive" type="xs:string" minOccurs="0"
maxOccurs="unbounded"/>
<xs:element name="capture-output" type="workflow:FLAG"
minOccurs="0" maxOccurs="1"/>
</xs:sequence>
</xs:complexType>
<xs:complexType name="FLAG"/>
<xs:complexType name="CONFIGURATION">
<xs:sequence>
<xs:element name="property" minOccurs="1"
maxOccurs="unbounded">
<xs:complexType>
<xs:sequence>
<xs:element name="name" minOccurs="1" maxOccurs="1"
type="xs:string"/>
<xs:element name="value" minOccurs="1"
maxOccurs="1" type="xs:string"/>
<xs:element name="description" minOccurs="0"
maxOccurs="1" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
<xs:complexType name="STREAMING">
<xs:sequence>
<xs:element name="mapper" type="xs:string" minOccurs="0"
maxOccurs="1"/>
<xs:element name="reducer" type="xs:string" minOccurs="0"
maxOccurs="1"/>
<xs:element name="record-reader" type="xs:string" minOccurs="0"
maxOccurs="1"/>
<xs:element name="record-reader-mapping" type="xs:string"
minOccurs="0" maxOccurs="unbounded"/>
<xs:element name="env" type="xs:string" minOccurs="0"
maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>
<xs:complexType name="PIPES">
<xs:sequence>
</xs:sequence>
<xs:attribute name="name" type="xs:string" use="required"/>
<xs:attribute name="type" type="xs:string" use="required"/>
</xs:complexType>
</xs:schema>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:workflow="uri:oozie:workflow:0.3"
elementFormDefault="qualified"
targetNamespace="uri:oozie:workflow:0.3">
<xs:element name="workflow-app" type="workflow:WORKFLOW-APP"/>
<xs:simpleType name="IDENTIFIER">
<xs:restriction base="xs:string">
<xs:pattern value="([a-zA-Z_]([\-_a-zA-Z0-9])*){1,39})"/>
</xs:restriction>
</xs:simpleType>
<xs:complexType name="WORKFLOW-APP">
<xs:sequence>
<xs:element name="credentials" type="workflow:CREDENTIALS"
minOccurs="0" maxOccurs="1"/>
<xs:element name="start" type="workflow:START" minOccurs="1"
maxOccurs="1"/>
<xs:choice minOccurs="0" maxOccurs="unbounded">
<xs:element name="decision" type="workflow:DECISION"
minOccurs="1" maxOccurs="1"/>
<xs:element name="fork" type="workflow:FORK" minOccurs="1"
maxOccurs="1"/>
<xs:element name="join" type="workflow:JOIN" minOccurs="1"
maxOccurs="1"/>
<xs:element name="kill" type="workflow:KILL" minOccurs="1"
maxOccurs="1"/>
<xs:element name="action" type="workflow:ACTION"
minOccurs="1" maxOccurs="1"/>
</xs:choice>
<xs:element name="end" type="workflow:END" minOccurs="1"
maxOccurs="1"/>
<xs:any namespace="uri:oozie:sla:0.1" minOccurs="0"
maxOccurs="1"/>
</xs:sequence>
<xs:attribute name="name" type="workflow:IDENTIFIER"
use="required"/>
</xs:complexType>
<xs:complexType name="START">
<xs:attribute name="to" type="workflow:IDENTIFIER" use="required"/>
</xs:complexType>
<xs:complexType name="END">
<xs:attribute name="name" type="workflow:IDENTIFIER"
use="required"/>
</xs:complexType>
<xs:complexType name="DECISION">
<xs:sequence>
<xs:element name="switch" type="workflow:SWITCH" minOccurs="1"
maxOccurs="1"/>
</xs:sequence>
<xs:attribute name="name" type="workflow:IDENTIFIER"
use="required"/>
</xs:complexType>
<xs:element name="switch" type="workflow:SWITCH"/>
<xs:complexType name="SWITCH">
<xs:sequence>
<xs:sequence>
maxOccurs="1"/>
maxOccurs="1"/>
type="xs:string"/>
</xs:sequence>
<xs:attribute name="name" type="workflow:IDENTIFIER"
use="required"/>
</xs:complexType>
<xs:complexType name="START">
<xs:attribute name="to" type="workflow:IDENTIFIER" use="required"/>
</xs:complexType>
<xs:complexType name="END">
<xs:attribute name="name" type="workflow:IDENTIFIER"
use="required"/>
</xs:complexType>
<xs:complexType name="DECISION">
<xs:sequence>
<xs:element name="switch" type="workflow:SWITCH" minOccurs="1"
maxOccurs="1"/>
</xs:sequence>
<xs:attribute name="name" type="workflow:IDENTIFIER"
use="required"/>
</xs:complexType>
<xs:element name="switch" type="workflow:SWITCH"/>
<xs:complexType name="SWITCH">
<xs:sequence>
<xs:sequence>
<xs:element name="case" type="workflow:CASE" minOccurs="1"
maxOccurs="unbounded"/>
<xs:element name="default" type="workflow:DEFAULT"
minOccurs="1" maxOccurs="1"/>
</xs:sequence>
</xs:sequence>
</xs:complexType>
<xs:complexType name="CASE">
<xs:simpleContent>
<xs:extension base="xs:string">
<xs:attribute name="to" type="workflow:IDENTIFIER"
use="required"/>
</xs:extension>
</xs:simpleContent>
</xs:complexType>
<xs:complexType name="DEFAULT">
<xs:attribute name="to" type="workflow:IDENTIFIER" use="required"/>
</xs:complexType>
<xs:complexType name="FORK_TRANSITION">
<xs:attribute name="start" type="workflow:IDENTIFIER"
use="required"/>
</xs:complexType>
<xs:complexType name="FORK">
<xs:sequence>
<xs:element name="path" type="workflow:FORK_TRANSITION"
minOccurs="2" maxOccurs="unbounded"/>
</xs:sequence>
<xs:attribute name="name" type="workflow:IDENTIFIER"
use="required"/>
</xs:complexType>
<xs:complexType name="JOIN">
<xs:attribute name="name" type="workflow:IDENTIFIER"
use="required"/>
<xs:attribute name="to" type="workflow:IDENTIFIER" use="required"/>
</xs:complexType>
<xs:element name="kill" type="workflow:KILL"/>
<xs:complexType name="KILL">
<xs:sequence>
<xs:pattern value="([a-zA-Z_]([\-_a-zA-Z0-9])*){1,39})"/>
</xs:restriction>
</xs:simpleType>
<xs:complexType name="WORKFLOW-APP">
<xs:sequence>
<xs:element name="start" type="workflow:START" minOccurs="1"
maxOccurs="1"/>
<xs:choice minOccurs="0" maxOccurs="unbounded">
<xs:element name="decision" type="workflow:DECISION"
minOccurs="1" maxOccurs="1"/>
<xs:element name="fork" type="workflow:FORK" minOccurs="1"
maxOccurs="1"/>
<xs:element name="join" type="workflow:JOIN" minOccurs="1"
maxOccurs="1"/>
<xs:element name="kill" type="workflow:KILL" minOccurs="1"
maxOccurs="1"/>
<xs:element name="action" type="workflow:ACTION"
minOccurs="1" maxOccurs="1"/>
</xs:choice>
<xs:element name="end" type="workflow:END" minOccurs="1"
maxOccurs="1"/>
<xs:any namespace="uri:oozie:sla:0.1" minOccurs="0"
maxOccurs="1"/>
</xs:sequence>
<xs:attribute name="name" type="workflow:IDENTIFIER"
use="required"/>
</xs:complexType>
<xs:complexType name="START">
<xs:attribute name="to" type="workflow:IDENTIFIER" use="required"/>
</xs:complexType>
<xs:complexType name="END">
<xs:attribute name="name" type="workflow:IDENTIFIER"
use="required"/>
</xs:complexType>
<xs:complexType name="DECISION">
<xs:sequence>
<xs:element name="switch" type="workflow:SWITCH" minOccurs="1"
maxOccurs="1"/>
</xs:sequence>
<xs:attribute name="name" type="workflow:IDENTIFIER"
use="required"/>
</xs:complexType>
<xs:element name="switch" type="workflow:SWITCH"/>
<xs:complexType name="SWITCH">
<xs:sequence>
<xs:sequence>
<xs:element name="case" type="workflow:CASE" minOccurs="1"
maxOccurs="unbounded"/>
<xs:element name="default" type="workflow:DEFAULT"
minOccurs="1" maxOccurs="1"/>
</xs:sequence>
</xs:sequence>
</xs:complexType>
<xs:complexType name="CASE">
<xs:simpleContent>
<xs:extension base="xs:string">
<xs:attribute name="to" type="workflow:IDENTIFIER"
use="required"/>
</xs:extension>
</xs:simpleContent>
</xs:complexType>
<xs:complexType name="DEFAULT">
<xs:attribute name="to" type="workflow:IDENTIFIER" use="required"/>
</xs:complexType>
<xs:complexType name="FORK_TRANSITION">
<xs:attribute name="start" type="workflow:IDENTIFIER"
use="required"/>
</xs:complexType>
<xs:complexType name="FORK">
<xs:sequence>
<xs:element name="path" type="workflow:FORK_TRANSITION"
minOccurs="2" maxOccurs="unbounded"/>
</xs:sequence>
<xs:attribute name="name" type="workflow:IDENTIFIER"
use="required"/>
</xs:complexType>
<xs:complexType name="JOIN">
<xs:attribute name="name" type="workflow:IDENTIFIER"
use="required"/>
<xs:attribute name="to" type="workflow:IDENTIFIER" use="required"/>
</xs:complexType>
<xs:element name="kill" type="workflow:KILL"/>
<xs:complexType name="KILL">
<xs:sequence>
<xs:element name="message" type="xs:string" minOccurs="1"
maxOccurs="1"/>
</xs:sequence>
<xs:attribute name="name" type="workflow:IDENTIFIER"
use="required"/>
</xs:complexType>
<xs:complexType name="ACTION_TRANSITION">
<xs:attribute name="to" type="workflow:IDENTIFIER" use="required"/>
</xs:complexType>
<xs:element name="map-reduce" type="workflow:MAP-REDUCE"/>
<xs:element name="pig" type="workflow:PIG"/>
<xs:element name="sub-workflow" type="workflow:SUB-WORKFLOW"/>
<xs:element name="fs" type="workflow:FS"/>
<xs:element name="java" type="workflow:JAVA"/>
<xs:complexType name="ACTION">
<xs:sequence>
<xs:choice minOccurs="1" maxOccurs="1">
<xs:element name="map-reduce" type="workflow:MAP-REDUCE"
minOccurs="1" maxOccurs="1"/>
<xs:element name="pig" type="workflow:PIG" minOccurs="1"
maxOccurs="1"/>
<xs:element name="sub-workflow" type="workflow:SUBWORKFLOW" minOccurs="1" maxOccurs="1"/>
<xs:element name="fs" type="workflow:FS" minOccurs="1"
maxOccurs="1"/>
<xs:element name="java" type="workflow:JAVA" minOccurs="1"
maxOccurs="1"/>
<xs:any namespace="##other" minOccurs="1" maxOccurs="1"/>
</xs:choice>
<xs:element name="ok" type="workflow:ACTION_TRANSITION"
minOccurs="1" maxOccurs="1"/>
<xs:element name="error" type="workflow:ACTION_TRANSITION"
minOccurs="1" maxOccurs="1"/>
<xs:any namespace="uri:oozie:sla:0.1" minOccurs="0"
maxOccurs="1"/>
</xs:sequence>
<xs:attribute name="name" type="workflow:IDENTIFIER"
use="required"/>
</xs:complexType>
<xs:complexType name="MAP-REDUCE">
<xs:sequence>
<xs:element name="job-tracker" type="xs:string" minOccurs="1"
maxOccurs="1"/>
<xs:element name="name-node" type="xs:string" minOccurs="1"
maxOccurs="1"/>
<xs:element name="prepare" type="workflow:PREPARE"
minOccurs="0" maxOccurs="1"/>
<xs:choice minOccurs="0" maxOccurs="1">
<xs:element name="streaming" type="workflow:STREAMING"
minOccurs="0" maxOccurs="1"/>
<xs:element name="pipes" type="workflow:PIPES"
minOccurs="0" maxOccurs="1"/>
</xs:choice>
<xs:element name="job-xml" type="xs:string" minOccurs="0"
maxOccurs="1"/>
<xs:element name="configuration" type="workflow:CONFIGURATION"
minOccurs="0" maxOccurs="1"/>
<xs:element name="file" type="xs:string" minOccurs="0"
maxOccurs="unbounded"/>
<xs:element name="archive" type="xs:string" minOccurs="0"
maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>
<xs:complexType name="PIG">
<xs:sequence>
<xs:element name="job-tracker" type="xs:string" minOccurs="1"
maxOccurs="1"/>
<xs:element name="name-node" type="xs:string" minOccurs="1"
maxOccurs="1"/>
<xs:element name="prepare" type="workflow:PREPARE"
minOccurs="0" maxOccurs="1"/>
<xs:element name="job-xml" type="xs:string" minOccurs="0"
maxOccurs="1"/>
<xs:element name="configuration" type="workflow:CONFIGURATION"
minOccurs="0" maxOccurs="1"/>
<xs:element name="script" type="xs:string" minOccurs="1"
maxOccurs="1"/>
<xs:element name="param" type="xs:string" minOccurs="0"
maxOccurs="unbounded"/>
<xs:element name="argument" type="xs:string" minOccurs="0"
maxOccurs="unbounded"/>
<xs:element name="file" type="xs:string" minOccurs="0"
maxOccurs="unbounded"/>
<xs:element name="archive" type="xs:string" minOccurs="0"
maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>
<xs:complexType name="SUB-WORKFLOW">
<xs:sequence>
<xs:element name="app-path" type="xs:string" minOccurs="1"
maxOccurs="1"/>
<xs:element name="propagate-configuration" type="workflow:FLAG"
minOccurs="0" maxOccurs="1"/>
<xs:element name="configuration" type="workflow:CONFIGURATION"
minOccurs="0" maxOccurs="1"/>
</xs:sequence>
</xs:complexType>
<xs:complexType name="FS">
<xs:sequence>
minOccurs="1"
minOccurs="1"
minOccurs="1"
maxOccurs="1" />
<xs:element name="parent-client-id" type="xs:string"
minOccurs="0" maxOccurs="1" />
<xs:element name="parent-sla-id" type="xs:string"
minOccurs="0" maxOccurs="1" />
<xs:element name="notification-msg" type="xs:string"
minOccurs="0" maxOccurs="1" />
<xs:element name="alert-contact" type="xs:string"
minOccurs="1" maxOccurs="1" />
<xs:element name="dev-contact" type="xs:string"
maxOccurs="1" />
<xs:element name="qa-contact" type="xs:string"
minOccurs="1"
minOccurs="1"
maxOccurs="1" />
<xs:element name="se-contact" type="xs:string"
maxOccurs="1" />
<xs:element name="alert-frequency" type="sla:alert-
frequencyType"
percentageType"
</xs:complexType>
<xs:complexType name="DEFAULT">
<xs:attribute name="to" type="workflow:IDENTIFIER" use="required"/>
</xs:complexType>
<xs:complexType name="FORK_TRANSITION">
<xs:attribute name="start" type="workflow:IDENTIFIER"
use="required"/>
</xs:complexType>
<xs:complexType name="FORK">
<xs:sequence>
<xs:element name="path" type="workflow:FORK_TRANSITION"
minOccurs="2" maxOccurs="unbounded"/>
</xs:sequence>
<xs:attribute name="name" type="workflow:IDENTIFIER"
use="required"/>
</xs:complexType>
<xs:complexType name="JOIN">
<xs:attribute name="name" type="workflow:IDENTIFIER"
use="required"/>
<xs:attribute name="to" type="workflow:IDENTIFIER" use="required"/>
</xs:complexType>
<xs:element name="kill" type="workflow:KILL"/>
<xs:complexType name="KILL">
<xs:sequence>
<xs:element name="message" type="xs:string" minOccurs="1"
maxOccurs="1"/>
</xs:sequence>
<xs:attribute name="name" type="workflow:IDENTIFIER"
use="required"/>
</xs:complexType>
<xs:complexType name="ACTION_TRANSITION">
<xs:attribute name="to" type="workflow:IDENTIFIER" use="required"/>
</xs:complexType>
<xs:element name="map-reduce" type="workflow:MAP-REDUCE"/>
<xs:element name="pig" type="workflow:PIG"/>
<xs:element name="ssh" type="workflow:SSH"/>
<xs:element name="sub-workflow" type="workflow:SUB-WORKFLOW"/>
<xs:element name="fs" type="workflow:FS"/>
<xs:element name="java" type="workflow:JAVA"/>
<xs:complexType name="ACTION">
<xs:sequence>
<xs:choice minOccurs="1" maxOccurs="1">
<xs:element name="map-reduce" type="workflow:MAP-REDUCE"
minOccurs="1" maxOccurs="1"/>
<xs:element name="pig" type="workflow:PIG" minOccurs="1"
maxOccurs="1"/>
<xs:element name="ssh" type="workflow:SSH" minOccurs="1"
maxOccurs="1"/>
<xs:element name="sub-workflow" type="workflow:SUBWORKFLOW" minOccurs="1" maxOccurs="1"/>
<xs:element name="fs" type="workflow:FS" minOccurs="1"
maxOccurs="1"/>
<xs:element name="java" type="workflow:JAVA" minOccurs="1"
maxOccurs="1"/>
<xs:any namespace="##other" minOccurs="1" maxOccurs="1"/>
</xs:choice>
<xs:element name="ok" type="workflow:ACTION_TRANSITION"
minOccurs="1" maxOccurs="1"/>
<xs:element name="error" type="workflow:ACTION_TRANSITION"
minOccurs="1" maxOccurs="1"/>
</xs:sequence>
<xs:complexType name="SUB-WORKFLOW">
<xs:sequence>
<xs:element name="app-path" type="xs:string" minOccurs="1"
maxOccurs="1"/>
<xs:element name="propagate-configuration" type="workflow:FLAG"
minOccurs="0" maxOccurs="1"/>
<xs:element name="configuration" type="workflow:CONFIGURATION"
minOccurs="0" maxOccurs="1"/>
</xs:sequence>
</xs:complexType>
<xs:complexType name="FS">
<xs:sequence>
<xs:element name="delete" type="workflow:DELETE" minOccurs="0"
maxOccurs="unbounded"/>
<xs:element name="mkdir" type="workflow:MKDIR" minOccurs="0"
maxOccurs="unbounded"/>
<xs:element name="move" type="workflow:MOVE" minOccurs="0"
maxOccurs="unbounded"/>
<xs:element name="chmod" type="workflow:CHMOD" minOccurs="0"
maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>
<xs:complexType name="JAVA">
<xs:sequence>
<xs:element name="job-tracker" type="xs:string" minOccurs="1"
maxOccurs="1"/>
<xs:element name="name-node" type="xs:string" minOccurs="1"
maxOccurs="1"/>
<xs:element name="prepare" type="workflow:PREPARE"
minOccurs="0" maxOccurs="1"/>
<xs:element name="job-xml" type="xs:string" minOccurs="0"
maxOccurs="1"/>
<xs:element name="configuration" type="workflow:CONFIGURATION"
minOccurs="0" maxOccurs="1"/>
<xs:element name="main-class" type="xs:string" minOccurs="1"
maxOccurs="1"/>
<xs:element name="java-opts" type="xs:string" minOccurs="0"
maxOccurs="1"/>
<xs:element name="arg" type="xs:string" minOccurs="0"
maxOccurs="unbounded"/>
<xs:element name="file" type="xs:string" minOccurs="0"
maxOccurs="unbounded"/>
<xs:element name="archive" type="xs:string" minOccurs="0"
maxOccurs="unbounded"/>
<xs:element name="capture-output" type="workflow:FLAG"
minOccurs="0" maxOccurs="1"/>
</xs:sequence>
</xs:complexType>
<xs:complexType name="FLAG"/>
<xs:complexType name="CONFIGURATION">
<xs:sequence>
<xs:element name="property" minOccurs="1"
maxOccurs="unbounded">
<xs:complexType>
<xs:sequence>
<xs:element name="name" minOccurs="1" maxOccurs="1"
type="xs:string"/>
<xs:element name="value" minOccurs="1"
maxOccurs="1" type="xs:string"/>
<xs:element name="description" minOccurs="0"
maxOccurs="1" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
<xs:complexType name="STREAMING">
<xs:sequence>
<xs:element name="mapper" type="xs:string" minOccurs="0"
maxOccurs="1"/>
<xs:element name="reducer" type="xs:string" minOccurs="0"
maxOccurs="1"/>
<xs:element name="record-reader" type="xs:string" minOccurs="0"
maxOccurs="1"/>
<xs:element name="record-reader-mapping" type="xs:string"
minOccurs="0" maxOccurs="unbounded"/>
<xs:element name="env" type="xs:string" minOccurs="0"
maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>
<xs:complexType name="PIPES">
<xs:sequence>
<xs:element name="map" type="xs:string" minOccurs="0"
maxOccurs="1"/>
<xs:element name="reduce" type="xs:string" minOccurs="0"
maxOccurs="1"/>
<xs:element name="inputformat" type="xs:string" minOccurs="0"
maxOccurs="1"/>
<xs:element name="partitioner" type="xs:string" minOccurs="0"
maxOccurs="1"/>
<xs:element name="writer" type="xs:string" minOccurs="0"
maxOccurs="1"/>
<xs:element name="program" type="xs:string" minOccurs="0"
maxOccurs="1"/>
</xs:sequence>
</xs:complexType>
<xs:complexType name="PREPARE">
<xs:sequence>
<xs:element name="delete" type="workflow:DELETE" minOccurs="0"
maxOccurs="unbounded"/>
<xs:element name="mkdir" type="workflow:MKDIR" minOccurs="0"
maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>
<xs:complexType name="DELETE">
<xs:attribute name="path" type="xs:string" use="required"/>
</xs:complexType>
<xs:complexType name="MKDIR">
<xs:attribute name="path" type="xs:string" use="required"/>
</xs:complexType>
<xs:complexType name="MOVE">
<xs:attribute name="source" type="xs:string" use="required"/>
<xs:attribute name="target" type="xs:string" use="required"/>
</xs:complexType>
<xs:complexType name="CHMOD">
<xs:attribute name="path" type="xs:string" use="required" />
<xs:attribute name="permissions" type="xs:string" use="required" />
<xs:attribute name="dir-files" type="xs:string" />
</xs:complexType>
</xs:schema>
</fork>
<action name="secondjob">
<map-reduce>
<job-tracker>${jobtracker}</job-tracker>
<name-node>${namenode}</name-node>
<configuration>
<property>
<name>mapred.mapper.class</name>
<value>org.apache.hadoop.example.IdMapper</value>
</property>
<property>
<name>mapred.reducer.class</name>
<value>org.apache.hadoop.example.IdReducer</value>
</property>
<property>
<name>mapred.map.tasks</name>
<value>1</value>
</property>
<property>
<name>mapred.input.dir</name>
<value>/usr/foo/${wf:id()}/temp1</value>
</property>
<property>
<name>mapred.output.dir</name>
<value>/usr/foo/${wf:id()}/temp2</value>
</property>
</configuration>
</map-reduce>
<ok to="join" />
<error to="kill" />
</action>
<action name="thirdjob">
<map-reduce>
<job-tracker>${jobtracker}</job-tracker>
<name-node>${namenode}</name-node>
<configuration>
<property>
<name>mapred.mapper.class</name>
<value>org.apache.hadoop.example.IdMapper</value>
</property>
<property>
<name>mapred.reducer.class</name>
<value>org.apache.hadoop.example.IdReducer</value>
</property>
<property>
<name>mapred.map.tasks</name>
<value>1</value>
</property>
<property>
<name>mapred.input.dir</name>
<value>/usr/foo/${wf:id()}/temp1</value>
</property>
<property>
<name>mapred.output.dir</name>
<value>/usr/foo/${wf:id()}/temp3</value>
</property>
</configuration>
</map-reduce>
<ok to="join" />
<error to="kill" />
</action>