Sie sind auf Seite 1von 126

Parallel Concurrent

Processing
Mike Swing
TruTek
mswing@trutek.com
RMOUG 2009
1

Conclusions
You dont need RAC to use Parallel Concurrent
Processing (PCP)!
If you have PCP enabled, secondary nodes
must be defined during the upgrade to R12
Tuning of TCP, SQLNet and PMON
parameters can minimize PCP failover time.
Implement Failover Sensitive Workshifts

Concurrent Processing Server


Allows scheduling of jobs batch jobs, or Requests in
Oracle terms.
Processes concurrent programs as a Request.
Requests can be grouped together into Request Sets.
Different types of concurrent managers handle different
types of requests.
A concurrent program can be assigned to a responsibility,
and that responsibility can be assigned to users, allowing
them the permission to run the concurrent program.
Concurrent managers may have limits on the concurrent
programs that can be run, and the times that they can be
started. Requests have priorities, status and log and out
files in the above directory
3

Definitions

CP => Concurrent Processing


DCD => Dead Connection Detection
ICM => Internal Concurrent Manager
IM => Internal Monitor
CRM => Conflict Resolution Manager
PCP => Parallel Concurrent Processing
PMON => Process Monitor for ICM
4

Concurrent Request

Phase and Status of Concurrent Requests


Phase

Status

Description - Action

Pending

Normal

The request is waiting to be picked up by the next


available manager.

Pending

Standby

Waiting for CRM to resolve conflict. CRM could be


slow or an incompatible program is running.

Running

Normal

The request is running normally.

Completed

Normal

The request has finished successfully

Completed

Error

The request has finished with an error. Check


logs.

Completed

Warning

The request has finished with a Warning. Check


the logs.

Inactive

No Manager

Request wont run without a manager.


Specialization rules arent configured properly.
6

PCP Failover
DB Node RH8
Database

RH7

RH8

RH9

PCP

PCP

PCP

sqlnet.ora

Database
Listener
SQL*Net

SQL*Net

SQL*Net

Client

Client

Client

TCP_KEEPALIVE takes 240 seconds before issuing DCD

Concurrent Managers

Concurrent Managers
Manager Type

Service Instance

Program

Internal Concurrent Manager

Internal Manager

FNDLIBR

Conflict Resolution Manager

Conflict Resolution Manager

FNDCRM

Internal Monitor

Internal Monitor:Node

FNDIMON

Service Manager: Node

FNDSM

Concurrent Manager

Standard Manager

FNDLIBR

Concurrent Manager

Inventory Manager

INVLIBR

Concurrent Manager

Session History Cleanup

FNDLIBR

Concurrent Manager

PA Streamline Manager

PALIBR

Transaction Manager

CRP Inquiry Manager

CYQLIB

Transaction Manager

FastFormula Transaction Manager

FFTM

Transaction Manager

PO Document Approval Manager

POXCON

Transaction Manager

Transaction Manager

FNDTMTST

Scheduler/Prerelease Manager

FNDSVC

OAM Generic Collection Service:Node

FNDSVC
9

Concurrent Processing
1. The Concurrent
Web
Processing server
Interface
Browser
communicates with
the database using
Forms Server
Oracle SQL*Net.
JAVA
2. The concurrent
JInitiator
Interface
program log or output
Reports Server
file from a request is
passed back as a
report to the Report
SQL*Net
ICM
Service
Internal
Report
Review Agent.
FNDLIBR
Manager
Monitor
Review
FNDSM
.rdx
FNDIMON
3. The Report Review
Agent
Agent passes a file
Standard
Manager
containing the entire
Requests
Log
Out
FNDCRM
FNDLIBR
report to the forms
server.
4. The Forms Services component passes the report back to the users browser one
page at time. Profile options can be used to control the size of the files and pages
passed, to suit report volume and available network capacity.
HTML

Web Server

10

Internal Concurrent Manager


The Internal Concurrent Manager (ICM) starts, sets the
number of active processes, monitors, and terminates all
other concurrent processes through requests made to
the Service Manager, including restarting any failed
processes.
The ICM also starts and stops, and restarts the Service
Manager for each node.
The ICM will perform process migration during an
instance or node failure.
The ICM will be active on a single node.
This is also true in a PCP environment, where the ICM
will be active on at least one node at all times.
11

Internal Concurrent Manager


The ICM really does not have any scheduling
responsibilities. It has NOTHING to do with scheduling
requests, or deciding which manager will run a particular
request. The function of the ICM is to run 'queue control'
requests; requests to startup or shutdown other
managers.
The ICM is responsible for startup and shutdown of the
whole concurrent processing facility, and it monitors the
other managers periodically, and restarts them if they
should go down. It can also take over the Conflict
Resolution manager's job, and resolve incompatibilities.
If the ICM itself should go down, requests will continue to
run normally, except for 'queue control' requests. Restart
the ICM with 'startmgr'; no need to kill the other
managers first.
12

Internal Concurrent Manager

13

Service Manager
FNDSM process - Communicates with the Internal Concurrent
Manager, Concurrent Manager, and non-Manager Service
processes.
The Service Manager (SM) spawns, and terminates manager and
service processes (these could be Forms, or Apache Listeners,
Metrics or Reports Server, and any other process controlled through
Generic Service Management).
When the ICM terminates the SM that resides on the same node
with the ICM will also terminate.
The SM is chained to the ICM. The SM will only reinitialize after
termination when there is a function it needs to perform (start, or
stop a process), so there may be periods of time when the SM is not
active, and this would be normal.

14

Service Manager
All processes initialized by the SM inherit the
same environment as the SM.
The SMs environment is set by APPSORA.env
file, and the gsmstart.sh script.
The apps_<sid> listener must be active on each
CP node to support the SM connection to the
local instance.
There should be a Service Manager active on
each node where a Concurrent or non-Manager
service process will reside.
15

FNDSM Failure
FNDSM failover as noted in the concurrent manager log:
Could not contact Service Manager FNDSM_RH8_VIS. The TNS
alias could not be located, the listener process on RH8 could not
be contacted, or the listener failed to spawn the Service
Manager process.
Found dead process: spid=(962754), cpid=(2259578), Service
Instance=(1045)
CONC-SM TNS FAIL
Call to PingProcess failed for WFMAILER
CONC-SM TNS FAIL
Call to StopProcess failed for WFMAILER
CONC-SM TNS FAIL
Call to PingProcess failed for FNDCPGSC
16

FNDSM Failover
Found dead process: spid=(716870), cpid=(2259580), Service
Instance=(2009)
Found dead process: spid=(1442020), cpid=(2259579), Service
Instance=(2010)
Starting WFMGSMD Concurrent Manager
: 15-AUG-2008
13:28:56
Starting WFMGSMDB Concurrent Manager
: 15-AUG-2008
13:28:56
Starting WFALSNRSVCB Concurrent Manager : 15-AUG-2008
13:28:57
Starting STANDARD Concurrent Manager
: 15-AUG-2008
13:30:31
Starting Internal Concurrent Manager Concurrent Manager : 15-AUG2008 13:30:32
17

Internal Monitor
(FNDIMON process) - Communicates with the Internal Concurrent
Manager.
This manager/service is used to implement Parallel Concurrent
Processing.
You do not need to run this manager/service unless you are using
Parallel Concurrent Processing.
The Internal Monitor (IM) monitors the Internal Concurrent Manager,
and restarts any failed ICM on the local node. It monitors whether
the ICM is still running, and if the ICM crashes, it will restart it on
another node.
During a node failure in a PCP environment the IM will restart the
ICM on a surviving node (multiple ICM's may be started on multiple
nodes, but only the first ICM started will eventually remain active, all
others will gracefully terminate).
There should be an Internal Monitor defined on each node where
the ICM may migrate.
18

Standard Manager
(FNDLIBR process) - Communicates with
the Service Manager and any client
application process.
The Standard Manager is a worker
process that initiates, and executes client
requests on behalf of Applications batch,
and OLTP clients.

19

Standard Manager

20

Standard Manager - OAM

The Standard Manager is active


on RH9, even though no primary
node is defined

Since no
secondary node is
defined, the
Standard Manager
will not failover
Failover Processes in the Work Shifts definition
are the number of processes that will run (3)
when the Standard Manager fails over to the
secondary node.

21

Transaction Manager
A Transaction Manger communicates with the Service
Manager, and any user process initiated on behalf of
Forms, or a Standard Manager request.
A Transaction Manager:
Supports synchronous processing of requests from a
client program
Gets request for a client program to run a server-side
program synchronously.
Return a status/results to the client program.
At runtime, it starts a number of these managers as
defined.
Doesnt poll concurrent request table for a new request
Only need 1 transaction manager per database, not 1
per instance.
22

Transaction Managers

Some of the Transaction


Managers in R12

23

Configuring Transaction Managers


for RAC
R11i Transaction Managers use DBMS_PIPE
This does not work across RAC instances
RAC users must perform additional configuration
Requires complicated configuration or additional hardware

R12 Transaction Managers use AQ

Works across RAC Instances


Simplifies configuration
Reduces complexity
Profile Option can switch between mechanisms
DBMS_PIPE can be used for non-RAC users if performance
becomes an issue
24

Configuring Transaction Managers


for RAC

Edit $ORACLE_HOME/dbs/<context_name>_ifile.ora and add


these parameters:

_lm_global_posts=TRUE
_immediate_commit_propagation=TRUE

Change the profile option Concurrent: TM Transport Type' to


QUEUE', and verify that the transaction manager works across
the RAC instance. ATG RUP3 (4334965) or higher provides an
option to use AQs in place of Pipes.
Profile Concurrent:TM Transport Type
Set to QUEUE
Pipes are more efficient but require a Transaction Manager to be
running on each DB Instance.
Navigate to Concurrent > Manager > Define screen, and set up
the primary and secondary node names for transaction managers.
25

Configuring Transaction Managers


for RAC

Transaction Managers allow a client to make a request for a


program to be run on the server immediately. The client then waits
for the program to complete and can receive program results from
the server. As the client and server are two separate database
sessions, the communication between has been handled using the
DBMS_PIPE package.
Unfortunately the DBMS_PIPE package does not extend to
communications between sessions on different RAC instances. On
an Applications instance using RAC, the client and server are very
likely to be on different instances, causing transactions to time out
for long periods or fail completely. The current workaround is to
manually set up Transaction managers to connect to all RAC
instances, which not only takes up additional resources, it may
require additional middle-tier hardware or a complicated
configuration that is difficult to maintain.

26

R12 Transaction Managers


In R12, the Transaction Managers use the AQ
mechanism; the Transaction Managers, work on
RAC connected to either instance.
This greatly simplifies the configuration and
reduces the complexity for RAC administrators.
A Profile Option has been introduced to allow
users to switch between the two transports
DBMS_PIPE or AQ.

27

Concurrent:PCP Instance Check


Concurrent processing provides database instancesensitive failover capabilities. When an instance is down,
all managers connecting to it switch to a secondary
middle-tier node.
However, if you prefer to handle instance failover
separately from such middle-tier failover (for example,
using TNS connection-time failover mechanism instead),
use the profile option Concurrent:PCP Instance Check.
When this profile option is set to OFF, Parallel
Concurrent Processing will not provide database
instance failover support; however, it will continue to
provide middle-tier node failover support when a node
goes down.
28

Conflict Resolution Manager

Concurrent managers read requests to start concurrent programs.


The Conflict Resolution Manager checks concurrent program
definitions for incompatibility rules.
If a program is identified as Run Alone, then the Conflict Resolution
Manager prevents the concurrent managers from starting other
programs in the same conflict domain.
When a program lists other programs as being incompatible with it, the
Conflict Resolution Manager prevents the program from starting until
any incompatible programs in the same domain have completed
running.
To enable/disable the Conflict Resolution Manager, use the system
profile option 'Concurrent: Use ICM'. Set this to 'No' (default) allows
the CRM to be started.
Setting it to 'Yes' causes the CRM to be shutdown and the Internal
Manager (ICM) will take over the conflict resolution duties.
If the CRM will not start (it is started automatically by the ICM), check
this profile option.
29

Conflict Resolution Manager


Use the system profile option 'Concurrent:
Use ICM'. 'No allows the CRM to be started.
Setting it to 'Yes' causes the CRM to shutdown.
The Internal Manager (ICM) will take over the
conflict resolution duties.
Using the ICM to resolve conflicts is not
recommended.
The CRM's sole purpose is to resolve conflicts,
while the ICM has other functions to perform as
well.
Setting this option to 'YES' is not recommended.
30

Generic Service Management

An E-Business Suite system depends on a variety of services, such


as Forms Listeners, HTTP Servers, Concurrent Managers, and
Workflow Mailers. These services are composed of one or more
processes. In the past, many of these processes had to be
individually started and monitored by system administrators.
Management of these processes is complicated, since these
services can be distributed across multiple host machines.
The introduction of Generic Service Management in Release 11i
helped simplify the management of these processes by providing a
fault tolerant service framework and a central management console
built into Oracle Applications Manager.
Service Management is an extension of Concurrent Processing, and
provides a framework for managing processes on multiple host
machines. With Service Management, virtually any application tier
service can be integrated into this framework.
Patch 2221688 introduces GSM.
31

GSM

32

Generic Services

33

GSM and Multiple Nodes


GSM enables users to manage Applications
services across multiple middle-tier nodes.
This includes services on Web/Forms nodes that
previously have had no concurrent processing
footprint.
Users configuring GSM in a multiple-node
system should be sure to have followed the
instructions for Parallel Concurrent Processing.
This includes setting the environment variable
APPLDCP=ON and assigning a primary node for
all defined managers and services (if not already
defined.)
34

Seeded GSM Services


When configuring GSM the following GSM
Services are seeded automatically:

Forms Listener
Metrics Server
Metrics Client
Reports Server
Apache Listener

LINUX users should not Activate the Reports


Server under GSM
35

Starting GSM
Apps Listener:
listener.ora
gsmstart.sh
exec FNDSM

36

adcmctl.sh
adcmctl.sh calls:
starmgr.sh
batchmgr.sh
CONCSUB
FNDSVCRG

37

FNDSVCRG Service Controller


Utility
FNDSVCRG is an executable introduced as a
part of the Seeded GSM Services. It provides
improved coordination between the GSM
monitoring of these service and their commandline control scripts.
The $FND_TOP/bin/FNDSVCRG executable is
called from adcmctl.sh control script before and
after the script starts or stops the service.
FNDSVCRG connects to the database using
JDBC and validates the configuration of the
Seeded GSM Service.
38

Verify GSM
To verify GSM is working, start the concurrent
managers.
Once GSM is enabled, the ICM uses Service
Managers to start all concurrent managers and
activated services.
If the ICM is successfully starting the managers,
then GSM has been configured properly.
If managers and/or services fail to start, errors
should appear in the ICM log file.
39

Service Manager Log


Each Service Manager maintains its own
log file named FNDSMxxxx.mgr, located in
the same directory as concurrent manager
log files.
If you cannot locate the Service Manager
log file, it is likely that the Service
Managers are not starting properly and
there is a configuration issue that needs
troubleshooting.
40

Test Kill services and see if


GSM restarts them

Kill FNDSM

applvis 9007 1 0 11:53 ?


00:00:00 FNDSM
applvis 9159 9155 0 11:55 ?
00:00:00 FNDLIBR
applvis 9161 5683 0 11:55 pts/3 00:00:00 grep FND
[applvis@rh9 scripts]$ kill -9 9007
[applvis@rh9 scripts]$ ps -ef |grep FND
applvis 9159 9155 0 11:55 ?
00:00:00 FNDLIBR
applvis 9169 1 0 11:55 ?
00:00:00 FNDSM
applvis 9249 5683 0 11:57 pts/3 00:00:00 grep FND

Kill FNDCRM
[applvis@rh9 scripts]$ ps -ef |grep FNDCRM
applvis 8886 1 0 11:52 ?
00:00:00 FNDCRM
APPS/ZGA13053E1E1B7BA773417089054DA88F194EAC0D687728CC2551870E6B78C4B439
EADB287342795115A88DBC85788CCB4 FND FNDCRM N 10 c LOCK Y RH9 1302318
[applvis@rh9 scripts]$ kill -9 8886
[applvis@rh9 scripts]$ ps -ef |grep FNDCRM
applvis 9457 9392 0 12:09 ?
00:00:00 FNDCRM
APPS/ZG26430816FA3570354BC57DE47FF105D145F8DE226EFE58CE04B416633DCB90126
7BFECFA7585114F7090060EFE1147BE FND FNDCRM N 10 c LOCK Y RH9 1302343
Both of these services were started before I could enter the grep command to find the corresponding
process.
41

11i - Defining PCP Details

In Release 11i,
the Secondary
Node doesnt
need to be filled
in for failover to
occur

42

R12 PCP Details

In Release 12,
failover wont
occur if there is
no Secondary
Node defined

43

R12 PCP Setup


The only
Standard
Manager set
up to fail over
is the
Standard
Manager

44

R12 Manager Failover

45

PCP Failover
DB Node RH8
Database

RH7

RH8

RH9

PCP

PCP

PCP

sqlnet.ora

Database
Listener
SQL*Net

SQL*Net

SQL*Net

Client

Client

Client

TCP_KEEPALIVE takes 240 seconds before issuing DCD

46

Parallel Concurrent Processing


Parallel concurrent processing allows distribution of
concurrent managers across multiple nodes.
Benefits are better: performance, availability and
scalability (load balancing).
Parallel Concurrent Processing (PCP) is activated along
with Generic Service Management (GSM); it can not be
activated independently of GSM.
With parallel concurrent processing implemented with
GSM, the Internal Concurrent Manager (ICM) tries to
assign valid nodes for concurrent managers and other
service instances.

47

Parallel Concurrent Processing


There should be only one ICM and CRM,
at any given time, although the ICM and
CRM could be configured to run on
several of the nodes.
Concurrent Managers migrate to the
surviving node when one of the concurrent
nodes goes down.

48

Parallel Concurrent Processing


Web
Browser

HTML

Web Server

Interface

Forms Server
JInitiator

Internal
Monitor
FNDIMON

FNDCRM

Internal
Monitor
FNDIMON

FNDCRM

Data

JAVA
Interface

ICM
FNDLIBR
Standard
Manager
FNDLIBR

ICM
FNDLIBR
Standard
Manager
FNDLIBR

Reports Server

Service
Manager
FNDSM

Report
Review
Agent

Requests

Logs

Service
Manager
FNDSM

Report
Review
Agent

SQL*Net

.rdx
Out

SQL*Net

.rdx
Database

Requests

Logs

Out

Whats wrong with this picture?


49

APPLDCP Profile Option


Starting with Release 11.5.10, FND.H, the APPLDCP environment
variable is ignored. R12 GSM requires the value of APPLDCP to be
set to ON. The value is hard-coded in afpcsq.lpc version 115.35,
thereby ignoring the value of APPLDCP.
As per ATG Development:
As of file "afpcsq.lpc" version 115.35 or higher, APPLDCP is internally
hard-coded to "ON" when the Generic Service Management (GSM) is
enabled--"keeping in mind, use of the GSM is required".
In short, at "afpcsq.lpc" version 115.35 or higher with the GSM enabled,
the setting of the APPLDCP environment variable is ignored--this is the
"default behavior on all R12 releases."
NOTE: As per ARU, "Patch 11i.FND.H" (3262159) and "Oracle
Applications Release 11.5.10" (3140000) contains "afpcsq.lpc" version
115.37.

From Note: 753678.1

50

PCP Failover Mechanisms

TCP keepalive
PMON ICM Process Monitor
Dead Connection Detection
Connection Failure Recovery R12
10g Timeout Parameters (untested)
sqlnet.inbound_connect_timeout (server)
sqlnet.send_timeout (client and/or server)
sqlnet.recv_timeout (client and/or server)
51

11i PCP Failure


TCP Failure
ICM Lock is released, FNDIMON pings
ICM node, if ping fails, check PMON
PMON detects a dead process, crashed
ICM
reviver.sh
DCD
52

R12 PCP Failure


TCP Failure
PMON detects a dead process
ICM Shutdown
Look for error messages ORA-3113, ORA3114 or ORA-1041

reviver.sh
DCD
53

Reviver
REVIVER

ICM

Start

No

Receive
Shutdown?

Starts to Shutdown

Attempt to
Get DB
Connection

Lost DB
Connection?

No
Sleep

Yes
Yes
Kill Previous DB
Session

No
Spawn Reviver
Yes

Start ICM
Exit

No
ICM
Started?
Yes

From the CM log file:


The ICM has lost its
database
connection and is
shutting down.
Spawning reviver
process to restart
the ICM when the
database becomes
available again.
Spawned reviver
process 10910.

Exit

54

reviver.log
The ICM has lost its database connection
and is shutting down.
Spawning reviver process to restart the ICM
when the database becomes available
again.
Spawned reviver process 10910.

55

TCP
TCP/IP is a connection-oriented protocol; TCP
implements packet timeout and retransmission
in an effort to guarantee the safe and sequenced
order of data packets.
If a timely acknowledgement is not received in
response to the probe packet, the TCP/IP stack
will retransmit the packet some number of times
before timing out.
After TCP/IP gives up, SQL*Net receives
notification that the probe failed.
56

TCP Keepalive
At this time, client side SQL*Net connections do not enable
keepalive for TCP connections by default.
However, it is possible to enable this by adding the
ENABLE=BROKEN parameter to the SQL*Net connect
string, by adding this parameter to the sqlnet.ora file.
**WARNING** Keepalive intervals can typically be set to 2
hours or more (i.e,,it can take more than 2 hours to
notice a dead server even if keepalive is enabled). To
make keepalive useful for PCP and TAF the keepalive
interval needs to be reduced to a smaller value (such as
2 minutes).
If there are a lot of IDLE connections on your network, then
reducing keepalive can increase network traffic
significantly.
57

ENABLE=BROKEN
Sample TNS alias to enable keepalive (notice the
ENABLE=BROKEN clause)
VIS_BALANCE = (DESCRIPTION =
(ENABLE=BROKEN)
(ADDRESS_LIST = (LOAD_BALANCE = ON)
(FAILOVER = ON)
ADDRESS = (PROTOCOL = TCP)
(HOST = rh8)(PORT = 1521)) (ADDRESS =
(PROTOCOL = TCP)(HOST = rh6)(PORT = 1521)))

58

TCP Keepalive
**WARNING** Keepalive intervals are
typically set to 2 hours or more (ie: it can
take more than 2 hours to notice a dead
server even if keepalive is enabled).
To make keepalive useful for TAF, the
keepalive interval would need to be
reduced to a smaller value (such as 2
minutes). Note: 249213.1
59

TCP KeepAlive Parameters for


Linux
tcp_keepalive_time
tcp_keepalive_intvl
tcp_keepalive_probes

Default Settings

the time since the last data


packet sent and the first
keepalive probe
the time between keepalive
probes
the number of probes to be
sent before declaring the
connection dead
tcp_keepalive_time = 7200 seconds
tcp_keepalive_intvl = 75
tcp_keepalive_probes = 9

A total of 7875 seconds, or 2 hours 11 minutes and 15 seconds.


60

TCP Keepalive
Initial Settings
tcp_keepalive_time = 200 secs
tcp_keepalive_intvl = 20
tcp_keepalive_probes = 2

After 200 seconds of no response, TCP sends


the first of 2 probes, 20 seconds apart.
TCP notifies SQL*Net of the failure, and
SQL*Net removes the offending connection.
61

TCP Retries
tcp_retries1 (default: 3) The number of times TCP will
attempt to retransmit a packet on an established
connection normally, without the extra effort of getting
the network layers involved.
tcp_retries2 (default: 15) The maximum number of times
a TCP packet is retransmitted in established state before
giving up
tcp_syn_retries (default: 5) The maximum number of
times initial SYNs for an active TCP connection attempt
will be retransmitted. The default value is 5, corresponds
to approximately 180 seconds.
62

TCP Retries
Now lets consider changing the following
TCP parameters from their default values:
tcp_retries1 = 2
tcp_retries2 = 2
tcp_syn_retries = 2

In this example, the time to initialize the PCP


failover was an average of 8 seconds after
changing these TCP parameters.
63

Disconnect TCP Connection


from RH9
From the ICM log:
The Internal Concurrent Manager has encountered an error.
Review concurrent manager log file for more detailed information. : 12JAN-2009 15:22:55 Shutting down Internal Concurrent Manager : 12-JAN-2009 15:22:55
12-JAN-2009 15:22:55
The ICM has lost its database connection and is shutting down.
Spawning reviver process to restart the ICM when the database
becomes available again.
Spawned reviver process 1541.
The VIS_0112@VIS internal concurrent manager has terminated with
status 1 - giving up.
Found dead process: spid=(17963), cpid=(1302176), ORA pid=(26),
manager=(0/1)
64

PMON & fnd_concurrent _queues


PMON updates the work_start column in the
fnd_concurrent_queues table every 4 PMON cycles
fdpsrp() (running_processes correction):
ICM cannot obtain exclusive lock on
FND_CONCURRENT_QUEUES
Oracle error code returned: 1
This message is information and does not indicate a
problem with CP functionality.
remote call function (FNDIMON)
15-AUG-2008 10:06:02 - Function to call: PingProcess
65

PMON ICM Lock 11i


If the ICM lock is not available, FNDIMON will
now ping the node of the ICM.
If the ping succeeds, we conclude that the ICM is
fine.
What????
If the ping fails, we further check if it has been over
quesiz pmon cycles since the ICM updated the
work_start column fnd_concurrent_queues.
If it has been more than four pmon cycles we
conclude that the ICM is dead.
66

PMON found dead process


On RH9 the PMON found a dead process. The
PMON takes about 1 second to run, then sleeps for
2 minutes:
Process monitor session started : 18-JAN-2009 21:46:05
Found dead process: spid=(16977), cpid=(1321475), Service
Instance=(36543)
Process monitor session ended : 18-JAN-2009 21:46:06
The Internal Concurrent Manager has encountered an error.
Review concurrent manager log file for more detailed
information. : 18-JAN-2009 22:02:01
67

PMON node RH9 is down


From the ICM log:
Process monitor session started : 12-JAN-2009
15:18:27
Internal Concurrent Manager found node RH9 to
be down. Adding it to the list of unavailable
nodes.
CONC-SM TNS FAIL
Call to PingProcess failed for XDPCTRLS
68

PMON
Process monitor session started : 18-JAN-2009
22:38:57
CONC-SM TNS FAIL
Call to PingProcess failed for OAMGCS
18-JAN-2009 22:38:58 - Node:(RH7), Service
Manager:(FNDSM_RH7_VIS) currently unreachable by TNS
Found dead process: spid=(11234), cpid=(1321563), ORA
pid=(167), manager=(0/4)

Process monitor session ended : 18-JAN-2009


22:38:58
69

PMON
Shutting down Internal Concurrent Manager : 18JAN-2009 22:02:01
18-JAN-2009 22:02:01
The ICM has lost its database connection and is
shutting down.
Spawning reviver process to restart the ICM when
the database becomes available again.
Spawned reviver process 10910.

70

PMON runs every 2 minutes


Process monitor session ended : 18-JAN2009 21:49:05
Process monitor session started : 18-JAN2009 21:51:05

71

Edit ICM Runtime Parameters

72

Edit PMON Parameters

73

Edit PMON Parameters

ICM parameters are read


from batchmgr.sh when
adcmctl.sh runs. Changing
these parameters here does
not change batchmgr.sh!

74

$FND_TOP/bin/batchmgr.sh
Make sure the PMON changes are made in the $FND_TOP/bin/batchmgr.sh file.
FILENAME
#
batchmgr
# DESCRIPTION
#
fire up Internal Concurrent Manager process
# USAGE
#
batchmgr arg1=val1 arg2=val2 ...
#
#
Parameters may be sent via the environment.
#
# ARGUMENTS
#
[appmgr|sysmgr]=username/password
#
[sleep=sleep_seconds]
#
[mgrname=manager_name]
#
[logfile=log_filename]
#
[restart=N|mim minutes between restarts]
#
[mailto="user1 user2..."]
#
[PRINTER=printer_name]
#
[pmon=iterations]
#
[quesiz=pmon_iterations]
#
[diag=Y|N]

DEFAULT
15
icm
$FND_TOP/$APPLLOG/$mgrname.mgr
N
current user
4
1
N

75

Reviver
REVIVER

ICM

Start

No

Receive
Shutdown?

Starts to Shutdown

Attempt to
Get DB
Connection

Lost DB
Connection?

No
Sleep

Yes
Yes
Kill Previous DB
Session

No
Spawn Reviver
Yes

Start ICM
Exit

No
ICM
Started?
Yes

From the CM log file:


The ICM has lost its
database
connection and is
shutting down.
Spawning reviver
process to restart
the ICM when the
database becomes
available again.
Spawned reviver
process 10910.

Exit

76

reviver.log
reviver.sh starting up...
[ Mon Jan 12 20:02:15 MST 2009 ] - Read APPS username/password.
[ Mon Jan 12 20:02:45 MST 2009 ] - Attempting database connection...
[ Mon Jan 12 20:02:45 MST 2009 ] - Successful database connection.
[ Mon Jan 12 20:02:45 MST 2009 ] - Killing previous ICM session...
1 row updated.
Commit complete.
[ Mon Jan 12 20:02:45 MST 2009 ] - Looking for a running ICM
process...
[ Mon Jan 12 20:02:45 MST 2009 ] - ICM now running, reviver.sh
complete.

77

reviver.sh
reviver.sh code summary
Sleep 30
Test_connection
Kill_old _icm
Get session
Alter system kill session
Check_running_icm
Fnd_conc.ecm_alive
start_icm
startmgr.sh
78

Dead Connection Detection


Dead Connection Detection (DCD) is a
feature of SQL*Net 2.1 and later, including
Oracle Net8. DCD detects when a partner
in a SQL*Net V2 client/server or
server/server connection has terminated
unexpectedly, and releases the resources
associated with it.

79

Implement DCD
Implement by:
adding SQLNET.EXPIRE_TIME = 1 (Minutes)
to the sqlnet.ora file
If the connection is idle for the time interval
specified in minutes by the
SQLNET.EXPIRE_TIME parameter, the serverside process sends a small 10-byte packet to the
client. The packet is sent using TCP/IP.
80

DCD ICM Lock


ICM and IM can use the DCD functionality
of the Network (TCP sqlnet).
ICM is a client process connected to a
DCD enabled DB dedicated server
process.
ICM holds the named PL/SQL Lock, the
ICM lock.
IM is continuously trying to check whether
it can get the same named PL/SQL Lock.
81

DCD ICM Lock


As soon as the ICM lock is released by the DB / DCD,
FNDIMON pings the ICM node, and the IM deduces that
the ICM has crashed.
If the ping succeeds, we conclude that the ICM is fine.
Obviously, the ICM can be down, even if TCP is working, this is bad
logic.

If the ping fails, FNDIMON determines if its been over four


pmon cycles since the ICM updated the work_start column
fnd_concurrent_queues.
If it has been more than four pmon cycles FNDIMON concludes
the ICM is dead.

The DCD comes into picture here after ICM has crashed
and DB needs to identify that the ICM is gone.
The DB needs to clean up the dedicated server process
resource corresponding to the ICM client process
82

FNDIMON has the ICM Lock


Check if the ICM updated the work_start column fnd_concurrent_queues.

Be aware that if a TCP failure is not detected, failover will not occur.
The following except from a concurrent manager log shows:
fdpsrp() (running_processes correction):
ICM cannot obtain exclusive lock on FND_CONCURRENT_QUEUES
Oracle error code returned: 1
This message is information and does not indicate a problem with CP
functionality.
remote call function (FNDIMON)
15-AUG-2008 10:06:02 - Function to call: PingProcess

The PingProcess continues until the CP processes resume, or a TCP


failure is detected, and failover is begun.

83

11i PCP Failure


TCP Failure
ICM Lock is released, FNDIMON pings
ICM node, if ping fails, check PMON
PMON detects a dead process, crashed
ICM
reviver.sh
DCD
84

R12 PCP Failure


TCP Failure
PMON detects a dead process
ICM Shutdown
Look for error messages ORA-3113, ORA3114 or ORA-1041

reviver.sh
DCD
85

Test PCP Failover Parameters


Test to explore effect of DCD, PMON and TCP
failover methods.
Variables: sqlnet.expire_time, pmon sleep and
number of cycles, and the following TCP
Keepalive parameters:
tcp_keepalive_time,
tcp_keepalive_intvl,
tcp_keepalive_probes
tcp_retries1 (default: 3, new value 2)
tcp_retries2 (default: 15, new value 2)
tcp_syn_retries (default: 5, new value 2)
86

Failover Test Results


Failover time /
Failback time

Expire_time

PMON
Sleep

PMON
Cycles

tcp_KA
time

tcp KA
intvl

tcp KA
probes

tcp
retries

tcp
retries2

tcp syn
retries

241 secs /

1 minute

30 secs

200

20

15

250 secs / 50 secs

5 minute

30 secs

200

20

15

262 secs / 100 sec

10 minutes

30 secs

200

20

15

300 secs / 75 secs

1 minute

15 secs

200

20

15

285 secs / 35 min

10 minute

30 secs

1000

60

10

15

8 secs / 105 secs

1 minute

30 secs

1000

60

10

10 secs / 42 secs

1 minute

30 secs

200

20

7 secs / 40 secs

10 minutes

30 secs

200

20

6 secs / 34 secs

1 minute

15 secs

200

20

87

All Services are UP

88

Concurrent Managers

Processes - Actual = 1 and Target = 1, manager is running


Processes - Actual = 0 and Target = 1, manager is running
89

Actual Processes = 0

Example of Actual Processes = 0,


in this example the CRM is not
running

90

PCP Setup

PCP setup this screen is continued on the next slide


91

Primary and Secondary Nodes


Any
concurrent
programs not
assigned to
the Standard
Manager will
not fail over
The CRM, ICM
and Standard
Manager will
fail over

92

TCP Failure

TCP disconnected at 2:57:25


10 seconds after the TCP connection was pulled, OAM reported the status above.
It took 10 seconds for OAM to register a failure of services on RH9.
93

CRM is DOWN

If any of the subordinate


services fail, it rolls up to the
Dashboard

94

CRM Failure

CRM has failed, Actual


Processes = 0

95

PCP Failover from RH9 to RH7

Adding Node:(RH9), to unavailable list


Found dead process: spid=(9696), cpid=(1321449), ORA pid=(80), manager=(0/0)
Found dead process: spid=(9784), cpid=(1321458), ORA pid=(114), manager=(0/0)
Found dead process: spid=(9783), cpid=(1321457), ORA pid=(104), manager=(0/0)
Found running request 4413565 attached to dead manager process.
Attempting to restart request.
Internal Concurrent Manager found node RH9 to be down. Adding it to the list of
unavailable nodes.

96

GSM tries to restart the services


TCP and TNS is unavailable:
Starting STANDARD Concurrent Manager
: 18-JAN-2009 21:43:42
CONC-SM TNS FAIL
Routine AFPEIM encountered an error while starting concurrent manager STANDARD
with library /d01/oracle/VIS/apps/apps_st/appl/fnd/12.0.0/bin/FNDLIBR.
Check that your system has enough resources to start a concurrent manager process.
Contac : 18-JAN-2009 21:43:42
Starting STANDARD Concurrent Manager
: 18-JAN-2009 21:43:42
CONC-SM TNS FAIL
Routine AFPEIM encountered an error while starting concurrent manager STANDARD
with library /d01/oracle/VIS/apps/apps_st/appl/fnd/12.0.0/bin/FNDLIBR.
Check that your system has enough resources to start a concurrent manager process.
Contac : 18-JAN-2009 21:43:42
Starting STANDARD Concurrent Manager
: 18-JAN-2009 21:43:42
CONC-SM TNS FAIL
Routine AFPEIM encountered an error while starting concurrent manager STANDARD
with library /d01/oracle/VIS/apps/apps_st/appl/fnd/12.0.0/bin/FNDLIBR.

97

ICM and CRM are DOWN

98

RH9 is DOWN

Not really down, just not on the


network

99

PCP is DOWN

This is momentary as
GSM figures out what to
do

100

Failover to Secondary Node

The ICM and CRM failed


over to RH7 in about 1
minute and 30 seconds

101

Failover from RH9 to RH7


Starting Internal Concurrent Manager Concurrent
Manager : 18-JAN-2009 21:51:23
: Started ICM on Target RH7.
Process monitor session ended : 18JAN-2009 21:52:53
: Migration of ICM has completed.
Shutting down Internal Concurrent Manager : 18JAN-2009 21:53:23
The VIS_0118@VIS internal concurrent manager
has terminated successfully - exiting.
102

ICM Failover to RH7


Starting Internal Concurrent Manager Concurrent
Manager : 18-JAN-2009 21:51:23
: Started ICM on Target RH7.
Process monitor session ended : 18JAN-2009 21:52:53
: Migration of ICM has completed.
Shutting down Internal Concurrent Manager : 18JAN-2009 21:53:23
The VIS_0118@VIS internal concurrent manager
has terminated successfully - exiting.
103

RH9 not available

104

Request Failover

105

Standard Manager Failover


Configuration

Note the Inventory Manager, MRP Manager and OAM


Metrics Collection Manager are not setup to failover.
106

Managers with a Secondary Node

Note the Inventory Manager, MRP Manager and OAM


Metrics Collection Manager are not setup to failover.
107

Failback

FAILBACK tcp connected at 31:40


The host, RH9 becomes available on OAM about 2
minutes later.
108

RH9 available

109

ICM Failback

110

Concurrent Manager Log


Starting Internal Concurrent Manager Concurrent
Manager : 18-JAN-2009 22:53:33
: Started ICM on Target RH9.
Process monitor session ended : 18JAN-2009 22:55:03
: Migration of ICM has completed.
Shutting down Internal Concurrent Manager : 18JAN-2009 22:55:33
The VIS_0118@VIS internal concurrent manager
has terminated successfully - exiting.
111

112

Failback Complete

Total Failback Time 3 minutes and 45 seconds


113

Standard Manager before Failover

The Standard Manager


has 3 Actual and Target
processes.

114

Standard Manager is DOWN

115

Standard Manager has 2


Processes on Failover

After 3 minutes and 30 seconds the Standard Manager started on RH7


116

Shutdown of CP

117

Concurrent Processing Load


Balancing
Two types of Load Balancing
Load Balancing with both nodes running
no failover
Load Balancing during failover

118

PCP Load Balancing


One of the benefits Parallel Concurrent
Processing provides:
failover in case of node failure
maintain throughput and keep the business running during
node failures.

When a node fails, the processes that were


running on the failed node are restarted on
secondary nodes.
However, a resource intensive node may
overload the secondary node when it fails-over.
119

PCP Load Balancing


If too many processes are running on the secondary
node when the primary node fails over, the secondary
node may not have the capacity to process the requests
from additional concurrent managers.
R12 introduces Failover Sensitive Workshifts. This
enhancement allows the System Administrator to
configure how many processes failover for each
workshift. With this added control, System Administrators
can enjoy the benefits of PCP failover without risking
performance issues through overloaded resources.

120

R12 Failover Sensitive Workshifts

121

Failover Sensitive Workshifts

122

Failover Sensitive Workshifts

Conversely, if a failover occurs from node 1 to


node 2, we may want to reduce the failover
processes, however, this doesnt work.
Only if the node fails does the failover
processes take effect.
123

Failover Processes

PO Document Approval Manager and the Standard Manager will reduce the number of
processes when RH7 fails. When RH9 fails, the number of failover processes for managers
that run on RH7 are not reduced.

124

Failover Sensitive Workshifts


Its clear: to run a R11i or R12 system during
a failover, there are two choices:
Run the servers at 35% or less utilization
Reduce the number of processes that are
allowed during failover
For most businesses the second option is
the most practical.
125

References

249213.1 - Performance problems with Failover when TCP Network goes down
364171.1- TAF Session Hangs, Select Fails To Complete W/ Loss Of NIC: Tune TCP
Keepalive
211362.1 - Process Monitor Session Cycle Repeats Too Frequently
291201.1 - How To Remove a Dead Connection to the Target Database
362135.1 - Configuring Oracle Applications Release 11i with Oracle10g Release 2 Real
Application Clusters and Automatic Storage Management
Optimizing the E-Business Suite with Real Application Clusters (RAC) - Ahmed Alomari
240818.1 - Concurrent Processing: Transaction Manager Setup and Configuration
Requirement in an 11i RAC Environment
R12 ATG - Concurrent Processing Functional Overview Aaron Weisberg
210062.1 - Generic Service Management (GSM) in Oracle Applications 11i
271090.1 - Parallel Concurrent Processing Failover/Failback Expectations
241370.1 - Concurrent Manager Setup and Configuration Requirements in an 11i RAC
Environment
602899.1 - Some More Facts On How to Activate Parallel Concurrent Processing

126

Das könnte Ihnen auch gefallen