Beruflich Dokumente
Kultur Dokumente
Processing
Mike Swing
TruTek
mswing@trutek.com
RMOUG 2009
1
Conclusions
You dont need RAC to use Parallel Concurrent
Processing (PCP)!
If you have PCP enabled, secondary nodes
must be defined during the upgrade to R12
Tuning of TCP, SQLNet and PMON
parameters can minimize PCP failover time.
Implement Failover Sensitive Workshifts
Definitions
Concurrent Request
Status
Description - Action
Pending
Normal
Pending
Standby
Running
Normal
Completed
Normal
Completed
Error
Completed
Warning
Inactive
No Manager
PCP Failover
DB Node RH8
Database
RH7
RH8
RH9
PCP
PCP
PCP
sqlnet.ora
Database
Listener
SQL*Net
SQL*Net
SQL*Net
Client
Client
Client
Concurrent Managers
Concurrent Managers
Manager Type
Service Instance
Program
Internal Manager
FNDLIBR
FNDCRM
Internal Monitor
Internal Monitor:Node
FNDIMON
FNDSM
Concurrent Manager
Standard Manager
FNDLIBR
Concurrent Manager
Inventory Manager
INVLIBR
Concurrent Manager
FNDLIBR
Concurrent Manager
PA Streamline Manager
PALIBR
Transaction Manager
CYQLIB
Transaction Manager
FFTM
Transaction Manager
POXCON
Transaction Manager
Transaction Manager
FNDTMTST
Scheduler/Prerelease Manager
FNDSVC
FNDSVC
9
Concurrent Processing
1. The Concurrent
Web
Processing server
Interface
Browser
communicates with
the database using
Forms Server
Oracle SQL*Net.
JAVA
2. The concurrent
JInitiator
Interface
program log or output
Reports Server
file from a request is
passed back as a
report to the Report
SQL*Net
ICM
Service
Internal
Report
Review Agent.
FNDLIBR
Manager
Monitor
Review
FNDSM
.rdx
FNDIMON
3. The Report Review
Agent
Agent passes a file
Standard
Manager
containing the entire
Requests
Log
Out
FNDCRM
FNDLIBR
report to the forms
server.
4. The Forms Services component passes the report back to the users browser one
page at time. Profile options can be used to control the size of the files and pages
passed, to suit report volume and available network capacity.
HTML
Web Server
10
13
Service Manager
FNDSM process - Communicates with the Internal Concurrent
Manager, Concurrent Manager, and non-Manager Service
processes.
The Service Manager (SM) spawns, and terminates manager and
service processes (these could be Forms, or Apache Listeners,
Metrics or Reports Server, and any other process controlled through
Generic Service Management).
When the ICM terminates the SM that resides on the same node
with the ICM will also terminate.
The SM is chained to the ICM. The SM will only reinitialize after
termination when there is a function it needs to perform (start, or
stop a process), so there may be periods of time when the SM is not
active, and this would be normal.
14
Service Manager
All processes initialized by the SM inherit the
same environment as the SM.
The SMs environment is set by APPSORA.env
file, and the gsmstart.sh script.
The apps_<sid> listener must be active on each
CP node to support the SM connection to the
local instance.
There should be a Service Manager active on
each node where a Concurrent or non-Manager
service process will reside.
15
FNDSM Failure
FNDSM failover as noted in the concurrent manager log:
Could not contact Service Manager FNDSM_RH8_VIS. The TNS
alias could not be located, the listener process on RH8 could not
be contacted, or the listener failed to spawn the Service
Manager process.
Found dead process: spid=(962754), cpid=(2259578), Service
Instance=(1045)
CONC-SM TNS FAIL
Call to PingProcess failed for WFMAILER
CONC-SM TNS FAIL
Call to StopProcess failed for WFMAILER
CONC-SM TNS FAIL
Call to PingProcess failed for FNDCPGSC
16
FNDSM Failover
Found dead process: spid=(716870), cpid=(2259580), Service
Instance=(2009)
Found dead process: spid=(1442020), cpid=(2259579), Service
Instance=(2010)
Starting WFMGSMD Concurrent Manager
: 15-AUG-2008
13:28:56
Starting WFMGSMDB Concurrent Manager
: 15-AUG-2008
13:28:56
Starting WFALSNRSVCB Concurrent Manager : 15-AUG-2008
13:28:57
Starting STANDARD Concurrent Manager
: 15-AUG-2008
13:30:31
Starting Internal Concurrent Manager Concurrent Manager : 15-AUG2008 13:30:32
17
Internal Monitor
(FNDIMON process) - Communicates with the Internal Concurrent
Manager.
This manager/service is used to implement Parallel Concurrent
Processing.
You do not need to run this manager/service unless you are using
Parallel Concurrent Processing.
The Internal Monitor (IM) monitors the Internal Concurrent Manager,
and restarts any failed ICM on the local node. It monitors whether
the ICM is still running, and if the ICM crashes, it will restart it on
another node.
During a node failure in a PCP environment the IM will restart the
ICM on a surviving node (multiple ICM's may be started on multiple
nodes, but only the first ICM started will eventually remain active, all
others will gracefully terminate).
There should be an Internal Monitor defined on each node where
the ICM may migrate.
18
Standard Manager
(FNDLIBR process) - Communicates with
the Service Manager and any client
application process.
The Standard Manager is a worker
process that initiates, and executes client
requests on behalf of Applications batch,
and OLTP clients.
19
Standard Manager
20
Since no
secondary node is
defined, the
Standard Manager
will not failover
Failover Processes in the Work Shifts definition
are the number of processes that will run (3)
when the Standard Manager fails over to the
secondary node.
21
Transaction Manager
A Transaction Manger communicates with the Service
Manager, and any user process initiated on behalf of
Forms, or a Standard Manager request.
A Transaction Manager:
Supports synchronous processing of requests from a
client program
Gets request for a client program to run a server-side
program synchronously.
Return a status/results to the client program.
At runtime, it starts a number of these managers as
defined.
Doesnt poll concurrent request table for a new request
Only need 1 transaction manager per database, not 1
per instance.
22
Transaction Managers
23
_lm_global_posts=TRUE
_immediate_commit_propagation=TRUE
26
27
GSM
32
Generic Services
33
Forms Listener
Metrics Server
Metrics Client
Reports Server
Apache Listener
Starting GSM
Apps Listener:
listener.ora
gsmstart.sh
exec FNDSM
36
adcmctl.sh
adcmctl.sh calls:
starmgr.sh
batchmgr.sh
CONCSUB
FNDSVCRG
37
Verify GSM
To verify GSM is working, start the concurrent
managers.
Once GSM is enabled, the ICM uses Service
Managers to start all concurrent managers and
activated services.
If the ICM is successfully starting the managers,
then GSM has been configured properly.
If managers and/or services fail to start, errors
should appear in the ICM log file.
39
Kill FNDSM
Kill FNDCRM
[applvis@rh9 scripts]$ ps -ef |grep FNDCRM
applvis 8886 1 0 11:52 ?
00:00:00 FNDCRM
APPS/ZGA13053E1E1B7BA773417089054DA88F194EAC0D687728CC2551870E6B78C4B439
EADB287342795115A88DBC85788CCB4 FND FNDCRM N 10 c LOCK Y RH9 1302318
[applvis@rh9 scripts]$ kill -9 8886
[applvis@rh9 scripts]$ ps -ef |grep FNDCRM
applvis 9457 9392 0 12:09 ?
00:00:00 FNDCRM
APPS/ZG26430816FA3570354BC57DE47FF105D145F8DE226EFE58CE04B416633DCB90126
7BFECFA7585114F7090060EFE1147BE FND FNDCRM N 10 c LOCK Y RH9 1302343
Both of these services were started before I could enter the grep command to find the corresponding
process.
41
In Release 11i,
the Secondary
Node doesnt
need to be filled
in for failover to
occur
42
In Release 12,
failover wont
occur if there is
no Secondary
Node defined
43
44
45
PCP Failover
DB Node RH8
Database
RH7
RH8
RH9
PCP
PCP
PCP
sqlnet.ora
Database
Listener
SQL*Net
SQL*Net
SQL*Net
Client
Client
Client
46
47
48
HTML
Web Server
Interface
Forms Server
JInitiator
Internal
Monitor
FNDIMON
FNDCRM
Internal
Monitor
FNDIMON
FNDCRM
Data
JAVA
Interface
ICM
FNDLIBR
Standard
Manager
FNDLIBR
ICM
FNDLIBR
Standard
Manager
FNDLIBR
Reports Server
Service
Manager
FNDSM
Report
Review
Agent
Requests
Logs
Service
Manager
FNDSM
Report
Review
Agent
SQL*Net
.rdx
Out
SQL*Net
.rdx
Database
Requests
Logs
Out
50
TCP keepalive
PMON ICM Process Monitor
Dead Connection Detection
Connection Failure Recovery R12
10g Timeout Parameters (untested)
sqlnet.inbound_connect_timeout (server)
sqlnet.send_timeout (client and/or server)
sqlnet.recv_timeout (client and/or server)
51
reviver.sh
DCD
53
Reviver
REVIVER
ICM
Start
No
Receive
Shutdown?
Starts to Shutdown
Attempt to
Get DB
Connection
Lost DB
Connection?
No
Sleep
Yes
Yes
Kill Previous DB
Session
No
Spawn Reviver
Yes
Start ICM
Exit
No
ICM
Started?
Yes
Exit
54
reviver.log
The ICM has lost its database connection
and is shutting down.
Spawning reviver process to restart the ICM
when the database becomes available
again.
Spawned reviver process 10910.
55
TCP
TCP/IP is a connection-oriented protocol; TCP
implements packet timeout and retransmission
in an effort to guarantee the safe and sequenced
order of data packets.
If a timely acknowledgement is not received in
response to the probe packet, the TCP/IP stack
will retransmit the packet some number of times
before timing out.
After TCP/IP gives up, SQL*Net receives
notification that the probe failed.
56
TCP Keepalive
At this time, client side SQL*Net connections do not enable
keepalive for TCP connections by default.
However, it is possible to enable this by adding the
ENABLE=BROKEN parameter to the SQL*Net connect
string, by adding this parameter to the sqlnet.ora file.
**WARNING** Keepalive intervals can typically be set to 2
hours or more (i.e,,it can take more than 2 hours to
notice a dead server even if keepalive is enabled). To
make keepalive useful for PCP and TAF the keepalive
interval needs to be reduced to a smaller value (such as
2 minutes).
If there are a lot of IDLE connections on your network, then
reducing keepalive can increase network traffic
significantly.
57
ENABLE=BROKEN
Sample TNS alias to enable keepalive (notice the
ENABLE=BROKEN clause)
VIS_BALANCE = (DESCRIPTION =
(ENABLE=BROKEN)
(ADDRESS_LIST = (LOAD_BALANCE = ON)
(FAILOVER = ON)
ADDRESS = (PROTOCOL = TCP)
(HOST = rh8)(PORT = 1521)) (ADDRESS =
(PROTOCOL = TCP)(HOST = rh6)(PORT = 1521)))
58
TCP Keepalive
**WARNING** Keepalive intervals are
typically set to 2 hours or more (ie: it can
take more than 2 hours to notice a dead
server even if keepalive is enabled).
To make keepalive useful for TAF, the
keepalive interval would need to be
reduced to a smaller value (such as 2
minutes). Note: 249213.1
59
Default Settings
TCP Keepalive
Initial Settings
tcp_keepalive_time = 200 secs
tcp_keepalive_intvl = 20
tcp_keepalive_probes = 2
TCP Retries
tcp_retries1 (default: 3) The number of times TCP will
attempt to retransmit a packet on an established
connection normally, without the extra effort of getting
the network layers involved.
tcp_retries2 (default: 15) The maximum number of times
a TCP packet is retransmitted in established state before
giving up
tcp_syn_retries (default: 5) The maximum number of
times initial SYNs for an active TCP connection attempt
will be retransmitted. The default value is 5, corresponds
to approximately 180 seconds.
62
TCP Retries
Now lets consider changing the following
TCP parameters from their default values:
tcp_retries1 = 2
tcp_retries2 = 2
tcp_syn_retries = 2
PMON
Process monitor session started : 18-JAN-2009
22:38:57
CONC-SM TNS FAIL
Call to PingProcess failed for OAMGCS
18-JAN-2009 22:38:58 - Node:(RH7), Service
Manager:(FNDSM_RH7_VIS) currently unreachable by TNS
Found dead process: spid=(11234), cpid=(1321563), ORA
pid=(167), manager=(0/4)
PMON
Shutting down Internal Concurrent Manager : 18JAN-2009 22:02:01
18-JAN-2009 22:02:01
The ICM has lost its database connection and is
shutting down.
Spawning reviver process to restart the ICM when
the database becomes available again.
Spawned reviver process 10910.
70
71
72
73
74
$FND_TOP/bin/batchmgr.sh
Make sure the PMON changes are made in the $FND_TOP/bin/batchmgr.sh file.
FILENAME
#
batchmgr
# DESCRIPTION
#
fire up Internal Concurrent Manager process
# USAGE
#
batchmgr arg1=val1 arg2=val2 ...
#
#
Parameters may be sent via the environment.
#
# ARGUMENTS
#
[appmgr|sysmgr]=username/password
#
[sleep=sleep_seconds]
#
[mgrname=manager_name]
#
[logfile=log_filename]
#
[restart=N|mim minutes between restarts]
#
[mailto="user1 user2..."]
#
[PRINTER=printer_name]
#
[pmon=iterations]
#
[quesiz=pmon_iterations]
#
[diag=Y|N]
DEFAULT
15
icm
$FND_TOP/$APPLLOG/$mgrname.mgr
N
current user
4
1
N
75
Reviver
REVIVER
ICM
Start
No
Receive
Shutdown?
Starts to Shutdown
Attempt to
Get DB
Connection
Lost DB
Connection?
No
Sleep
Yes
Yes
Kill Previous DB
Session
No
Spawn Reviver
Yes
Start ICM
Exit
No
ICM
Started?
Yes
Exit
76
reviver.log
reviver.sh starting up...
[ Mon Jan 12 20:02:15 MST 2009 ] - Read APPS username/password.
[ Mon Jan 12 20:02:45 MST 2009 ] - Attempting database connection...
[ Mon Jan 12 20:02:45 MST 2009 ] - Successful database connection.
[ Mon Jan 12 20:02:45 MST 2009 ] - Killing previous ICM session...
1 row updated.
Commit complete.
[ Mon Jan 12 20:02:45 MST 2009 ] - Looking for a running ICM
process...
[ Mon Jan 12 20:02:45 MST 2009 ] - ICM now running, reviver.sh
complete.
77
reviver.sh
reviver.sh code summary
Sleep 30
Test_connection
Kill_old _icm
Get session
Alter system kill session
Check_running_icm
Fnd_conc.ecm_alive
start_icm
startmgr.sh
78
79
Implement DCD
Implement by:
adding SQLNET.EXPIRE_TIME = 1 (Minutes)
to the sqlnet.ora file
If the connection is idle for the time interval
specified in minutes by the
SQLNET.EXPIRE_TIME parameter, the serverside process sends a small 10-byte packet to the
client. The packet is sent using TCP/IP.
80
The DCD comes into picture here after ICM has crashed
and DB needs to identify that the ICM is gone.
The DB needs to clean up the dedicated server process
resource corresponding to the ICM client process
82
Be aware that if a TCP failure is not detected, failover will not occur.
The following except from a concurrent manager log shows:
fdpsrp() (running_processes correction):
ICM cannot obtain exclusive lock on FND_CONCURRENT_QUEUES
Oracle error code returned: 1
This message is information and does not indicate a problem with CP
functionality.
remote call function (FNDIMON)
15-AUG-2008 10:06:02 - Function to call: PingProcess
83
reviver.sh
DCD
85
Expire_time
PMON
Sleep
PMON
Cycles
tcp_KA
time
tcp KA
intvl
tcp KA
probes
tcp
retries
tcp
retries2
tcp syn
retries
241 secs /
1 minute
30 secs
200
20
15
5 minute
30 secs
200
20
15
10 minutes
30 secs
200
20
15
1 minute
15 secs
200
20
15
10 minute
30 secs
1000
60
10
15
1 minute
30 secs
1000
60
10
10 secs / 42 secs
1 minute
30 secs
200
20
7 secs / 40 secs
10 minutes
30 secs
200
20
6 secs / 34 secs
1 minute
15 secs
200
20
87
88
Concurrent Managers
Actual Processes = 0
90
PCP Setup
92
TCP Failure
CRM is DOWN
94
CRM Failure
95
96
97
98
RH9 is DOWN
99
PCP is DOWN
This is momentary as
GSM figures out what to
do
100
101
104
Request Failover
105
Failback
RH9 available
109
ICM Failback
110
112
Failback Complete
114
115
Shutdown of CP
117
118
120
121
122
Failover Processes
PO Document Approval Manager and the Standard Manager will reduce the number of
processes when RH7 fails. When RH9 fails, the number of failover processes for managers
that run on RH7 are not reduced.
124
References
249213.1 - Performance problems with Failover when TCP Network goes down
364171.1- TAF Session Hangs, Select Fails To Complete W/ Loss Of NIC: Tune TCP
Keepalive
211362.1 - Process Monitor Session Cycle Repeats Too Frequently
291201.1 - How To Remove a Dead Connection to the Target Database
362135.1 - Configuring Oracle Applications Release 11i with Oracle10g Release 2 Real
Application Clusters and Automatic Storage Management
Optimizing the E-Business Suite with Real Application Clusters (RAC) - Ahmed Alomari
240818.1 - Concurrent Processing: Transaction Manager Setup and Configuration
Requirement in an 11i RAC Environment
R12 ATG - Concurrent Processing Functional Overview Aaron Weisberg
210062.1 - Generic Service Management (GSM) in Oracle Applications 11i
271090.1 - Parallel Concurrent Processing Failover/Failback Expectations
241370.1 - Concurrent Manager Setup and Configuration Requirements in an 11i RAC
Environment
602899.1 - Some More Facts On How to Activate Parallel Concurrent Processing
126