Beruflich Dokumente
Kultur Dokumente
Angelique De Vos
Customer Support Engineer
March 2017
Agenda
Problem Isolation
Live Data
HA Failover
Finesse
Look at UCCX JTAPI cause codes on the web to get a list of all possible failure
codes.
UCCX Database and
reporting
Logical overview of databases
UCCX
Configuration Datastore
Historical Datastore
CCX DB
Repository Datastore
Platform
Platform database instance
Database What does it store? How to query it?
ccm cluster and platform run sql select name,nodeid from processnode
info, configuration
cuic_data CUIC application data run sql select name from cuic_data:cuicuser
db_phx_config Finesse config No connect permissions via CLI, this needs TAC
remote account
db_cra Historical datastore – run uccx sql db_cra select count(*) from resource
historical data where active (COUNT(*))
Configuration
datastore –
configuration data from
CCX admin (CSQ’s,
applications)
db_cra_repository Repository datastore run uccx sql db_cra_repository select * from
– scripts, prompts, DocumentsFiletbl
documents,…
UCCX database instance
• To check replication use CLI command “utils uccx dbreplication status” or the UCCX
serviceability pages. Active/Connected is a good replication status, Dropped/Timeout indicates a
replication issue
DBReplication commands to check
configuration
• utils ntp status
• utils diagnose test
• utils network connectivity <secondary node>
• utils uccx dbreplication dump configfiles
Configuration changes when subscriber is
down
• Both nodes must be available to commit changes to the CDS (Configuration Datastore)
CCX Administration (MADM) Logs:
%MADM-ADM_CFG-7-UNK:ICDServlet :: Exception occurred
The SUBSCRIBER node of the HA is not available
%MADM-ADM_CFG-3-ADM_EXCEPTION:Unknown ADM Exception:
Exception=java.lang.RuntimeException: The SUBSCRIBER node of the HA is not available
• You can temporarily disable CDS on the subscriber to make changes on the publisher
UCCX DB replication reset
• Replication can be broken if Subscriber is unavailable for too long and send queues buffer is
exceeded
• Typically 3-4 days (*can vary with load) Alert Raised!
DBReplicationStopped
• Alert will be raised
Remote DB
CDS
Overwrite Target = Local! CDS
HDS HDS
RDS RDS
Subscriber’s CUIC
Also uses Subscriber’s
db_cra
Historical reporting – no DB available
• If no Database is available, historical data records are written to flat file and alert is raised
DB
online.uccx.log file for issues
related to database service
DB REPLICATION
uccx_repl_output_util.log file
for issues related to database
replication or at a minimum
gather these logs for TAC
Reporting and database performance
• Historical reports are running slow
• Overall high CPU
Message received on
socket::{"id":"agent1","operation":"UPDATE","ResourceIAQStats":{"resourceId":"a
gent1","resourceName":"agent1","resourceState":3,"durationInStateMillis":264720
,"nHandledContacts":6,"nPresentedContacts":8,"avgTalkDuration":1165394,"longest
TalkDuration":2266049,"avgHoldDuration":0,"longestHoldDuration":0,"avgHandleDur
ation":0,"avgWorkDuration":0,"totalTalkTime":6992368,"totalHoldTime":0,"maxRead
yTime":10199923,"avgReadyTime":1156099,
HA Failover
High Availability
CCX Component Redundancy
Reporting
Primary CCX Server Secondary CCX Server • CUIC reporting
Engine always queries
M Engine Engine S
• First started the Slave DB for
becomes Master reporting.
Database Database • This is to
• Primary is
preferred Master Reporting Reporting prevent load on
• Other node will the current
be Standby Finesse Finesse Master DB for
large queries.
Databases
• Comprised of Historical, Agent, Repository Finesse
and Config Datastores (HDS, RDS and CDS) • Finesse always talks to the Master
•DB mastership prefers Engine Master (unless Engine.
DB is down on Engine Master) • Finesse Failover follows Engine
• Data is read from / written to DB Master Failover
• HDS/RDS has Publisher/Subscriber and • Finesse service Restart does not
syncs using Informix Enterprise Replication initiate FAILOVER
• CDS uses distributed transactions (both
databases must be operational for updates)
HA – Failover detection
• Cluster View Daemon (CVD) sends and monitors heartbeats between CCX nodes
TCP Heartbeats
HA – Lost Connectivity Secondary CVD detects missing keep-alives from Primary CVD
%MCVD-CVD-5-HEARTBEAT_MISSING_HEARTBEAT:
CVD does not receive heartbeat from node for a long period:
Primary CCX Server Secondary CCX Server nodeId=1,dt=2049
.....
%MCVD-CVD-5-HEARTBEAT_MISSING_HEARTBEAT:
CVD M ? ? S ? M CVD CVD does not receive heartbeat from node for a long period: nodeId=1,dt=10245
%MCVD-CVD-4-HEARTBEAT_SUSPECT_NODE_CRASH:CVD suspects node
crash:
Engine ! Engine state=Heartbeat State,nodeInfo=Node id=1 ip=192.168.12.5
%MCVD-CVD-4-HEARTBEAT_SUSPECT_NODE_CRASH:CVD suspects node
crash:
Databases Databases state=Convergence State,nodeInfo=Node id=1 ip=192.168.12.5
• Failover completed
HAoWAN – Island mode
Primary CVD detects missing Heartbeats
Primary CCX Server Secondary CCX Server and assumes secondaryhas failed
%MCVD-CVD-5-HEARTBEAT_MISSING_HEARTBEAT:
CVD does not receive heartbeat from node for a long period: nodeId=2,dt=10197
CVD M M S ? M CVD %MCVD-CVD-4-HEARTBEAT_SUSPECT_NODE_CRASH:CVD suspects node
crash:
state=Convergence State,nodeInfo=Node id=2 ip=192.168.13.5
Engine Engine
!
Secondary CVD detects missing Heartbeats
Databases Databases and assumes Primary has failed
%MCVD-CVD-5-HEARTBEAT_MISSING_HEARTBEAT:
CVD does not receive heartbeat from node for a long period: nodeId=1,dt=10242
Reporting Reporting %MCVD-CVD-4-HEARTBEAT_SUSPECT_NODE_CRASH:CVD suspects node
crash:
state=Heartbeat State,nodeInfo=Node id=1 ip=192.168.12.5
Finesse Finesse
Secondary CVD performs Master Election
%MCVD-CLUSTER_MGR-7-UNK:Post Convergence Event:
CONVERGENCE_STARTED, name=Cisco Unified CCX Engine
%MCVD-CLUSTER_MGR-7-UNK:JavaService66:
• WAN network failure Cisco Unified CCX Engine on node 1 change master from true to false
%MCVD-CLUSTER_MGR-7-UNK:Post Master
Event: MASTER_DROPPED, name=Cisco Unified CCX Engine, node=1
• Failover completed
HA Failover issue – further analysis
From Real-Time Monitoring Tool (RTMT) collect logs for Cisco Unified CCX Cluster View
Daemon
System monitoring tools
CUIC – Utilization monitoring
CUIC provides stock reports which lets admins view license utilization and monitor system
usage:
• License Utilization Hourly report: hourly breakdown about number of inbound and
outbound ports used, as well as agent seats
• Aborted and rejected report: The Aborted and Rejected Call Detail Report
provides information about each call that is aborted or rejected by the system
CUIC – Aborted and rejected report
• Incoming Call Contacts are Classified as Aborted if an exception occurs in the workflow that is
processing a call.
• Incoming Call Contacts are Classified as Rejected when a UCCX Application system resources
reach maximum capacity
• Database Contains a Cause Code for Abort/Reject that Can Alert Admins to Problems
• Reject - TRIGGER_MAX_SESSION
• Reject - REMOTE_TIMEOUT
• Reject - CHANNELS BUSY
• Aborted – Too many transfer failures
• Aborted – Max Steps Executed
Reports give helpful contact
details and exact timestamps!
RTMT – System summary
RTMT – Alert Central
Catch Problems Before
Users or Customers Do!
RTMT – Schedule log collection
Intermittent Issue?
Finesse
Finesse health
• Finesse depends on the following services for its normal
functioning :
• Cisco Unified CCX Engine Service
• Cisco Unified CCX Notification Service
• Cisco Finesse Tomcat (Tomcat Service exclusive to Finesse)
Finesse Service OOS No Failover Finesse goes Out of Sessions closed Finesse unavailable
Service until issue fixed
Island Mode Both HA nodes Both Finesse servers Clients connect to Clients reconnect to
become Master In Service either Master node
Finesse issues – further analysis
• From Real-Time Monitoring Tool (RTMT) collect logs for Cisco Finesse, Cisco
Unified CCX Engine, Cisco Unified CCX Notification Service
UCCX Server
Finesse Server
(Tomcat) SIP/SCCP
SUPERVISE_CALL_REQ with
Supervisory Action field of
SUPERVISOR_MONITOR
CTI
RmCm JTAPI startMonitor method call with agent device info CUCM
Server
UCCX Engine
Monitoring overview
Cisco Finesse Server Receives Silent Monitor request from Finesse client browser
Finesse Web Services log
CCX Engine Runs a Check to see if the agent to be monitored is already being monitored
CCX Engine log:
CCX Engine Invokes Monitoring Request via JTAPI but displays error
CCX Engine log:
%MIVR-SS_RM-3-Initiating silemt monitor request to JTAPI server...:Undefined mnemonic 'Initiating silent monitor request to JTAPI
server...‘
…..
%MIVR-SS_RM-7-UNK: JTAPI error code:88046
Monitoring troubleshooting
Problem: Monitoring Fails to Start
#3
Supervisor is
monitoring agent call
and clicks Barge In
A delay is observed,
A delayby
followed is observed,
Error
followed by Error
A delay is observed,
Finesse Server Requests Barge-In to UCCX Engine followed by Error
Finesse Desktop WebServices log and UCCX Engine log:
%CCBU_http-bio-8082-exec-11-6-API_REQUEST: %[REQUEST_URL=User/agent3/Dialogs][agent_id=agent3][requestId=d4bd0140-6be5-
4e75-9e00-7a20e47e2e1d][request_method=user.POST][request_parameters= fromAddress:1004 toAddress:2002
requestedAction:BARGE_CALL associatedDialogUri: /finesse/api/Dialog/16779268]: Request from client to webservice api
10 seconds later, Finesse Server Notifies Client of Failure via XMPP (OpenFire) A delay is observed,
Finesse WebServices log: followed by Error
%CCBU_pool-12-thread-5-3-API_ERROR: %[error_code=1][error_message=Generic Barge Call Error]: Error processing REST API request
5 seconds later, CCX Engine Notifies Finesse Server of New Incoming Call to Supervisor A delay is observed,
Finesse WebServices log: followed by Error
%CCBU_CTIMessageEventExecutor-0-6-DECODED_MESSAGE_FROM_CTI_SERVER:
%[cti_message=com.cisco.ccbu.finesse.adapter.acmi.message.CTICallOriginatedEvent@dd1de[peripheralId=1,connectionDeviceIdType=0,callId=167
79284,skillGroupNumber=1,callingDeviceType=76,calledDeviceType=76,localConnectionState=1,eventCause=-
1,connectionDeviceId=2002,callingDeviceId=2002,calledDeviceId=2001,fltReqMaskNumMasks=1,fltReqMaskInstrumentList=[2002],fltReqMaskExtensi
onList=[2002],fltReqMaskCallIDList=[16779284],fltReqMaskFlagsList=[0000000000000000],fltReqCallMaskList=[0804000000000000],fltReqAgentMas
kList=[0000000000000000],invokeID=<null>,msgID=15,timeTracker={"id":"CallOriginatedEvent","CTI_MSG_RECEIVED":1456263592134,"CTI_MSG_
DISPATCH":1456263592135},msgName=CallOriginatedEvent,deploymentType=CCX]][cti_response_time=1]: Decoded Message to Finesse from
backend cti server
Barge In - Troubleshooting
The supervisor later
goes to a Reserved
State
Finesse Server Notifies Finesse Client that Supervisor is receiving a new call A delay is observed,
Finesse WebServices log: followed by Error
2. Verify that this variable is mapped to an ECC variable named exactly --PostCallTreatment--
Post Call Treatment - Logs
UCCX Engine Script Debugging Shows Script Variable Set Correctly
UCCX Engine Log (MIVR):
Most importantly, UCCX Engine Shows the PostCallTreatment variable attached to the call is set correctly
UCCX Engine Log:
Instead of immediately going to JTAPI to disconnect the call on the agent phone, run PostCallSurvey checks:
UCCX Engine Log:
Call Control
3.CUCM Knows that Phone Has a API Service
Recording Profile Attached Service
9. Call Control Service sends metadata to
API Service 10. API Service writes to DB
7.CUCM communicates RTP ports for
forking stream to the phone
SIP
MGCP/H323
SIP/SCCP
BIB Subscription
PSTN/WAN
Analyze CUCM +
MS logs
Is CUCM Automatic Call Recording working?
• Isolate the issue by taking UCCX and Finesse out of the flow, on the CUCM agent line
configuration, enable automatic call recording:
1. Closed_Error status
Analyze CUCM +
MS logs
CLOSED_ERROR
Cause can be identified from logs or metadata
MEDIA_SERVER_TIMEOUT Record Response Timeout Call control service timed out waiting for response from
the Media (recording) server for the open or close
request.
SIP_SIGNALING_ERROR SIP Signal Error Call control service detected a SIP signaling error, for
example a missing ACK.
SIP_CANCEL_RECEIVED Record Cancelled Recording was canceled by call control service, such as
CANCEL or premature BYE received from CUCM.
CLOSED_ERROR
Cause can be identified from logs or metadata
Error Detail Code in metadata Closed Error Reason in Description
and logs Search and Play
NO_MEDIA_RECEIVED Zero Size Tracks Session was successfully closed, but ALL tracks have zero
size.
ORPHANED Orphaned Session Session was orphaned, e.g., forcibly closed after service
restart.
Analyze CUCM +
MS logs
Test call to MediaSense
• Dial the Route Pattern that points to to the MediaSense SIP trunk from an IP
Phone
• A valid recording, with just a single participant (the calling phone) should be
successfully created:
Verify MS service status
• Check if all services are running from Control Center – Feature services
Verify MS service status
Show Tech in CLI
Use ‘show tech
call_control_service’ from
each node in the CLI
Recording profile
Mediasense Troubleshooting
Is CUCM
Automatic
recording Yes
CLOSED_ERROR working?
status
No recording found
Analyze CUCM +
MS logs
Verify device and line configuration
• Ensure that a supported device is being used and that Builtin Bridge is enabled on the device
configuration
Analyze CUCM +
MS logs
Review Finesse workflow
• Verify if CUCM has selective recording selected
Cisco IdS log Cisco IdS logger will log any RTMT, collect “Cisco Identifity
error that happened in Cisco IdS Service” logs
Fedlet logs Fedlet logs will give more details RTMT, same location as IdS
about any SAML errors that logs, they have the suffix “fedlet-
happens in Cisco IdS ”
Cisco IdS API metrics API metrics can be used to look RTMT – Under Cisco Identity
into and validate any errors that Service you’ll find a folder
Cisco IdS APIs may have ”metrics”,
returned and number of requests saml_metrics.csv and authorize_
that are processed by Cisco idS metrics.csv are the relevant
metrics for this document.
Applications/logs for troubleshooting
Application/log Details Where to find
• Cause:
From ADFS Event viewer log:
Encountered error during federation passive request. Additional Data Exception details:
Microsoft.IdentityServer.Configuration.ReadServiceConfigFailedException: MSIS2001: Configuration service
URL is not configured. ---> Microsoft.IdentityServer.PolicyModel.Client.PolicyStoreConnectionException:
ADMIN0017: An exception occurred while connecting to the configuration service. The configuration service
URL 'net.tcp://localhost:1500/policy' may be incorrect or the AD FS 2.0 Windows Service is not
running. ---> System.ServiceModel.EndpointNotFoundException: Could not connect to
net.tcp://localhost:1500/policy. The connection attempt lasted for a time span of 00:00:02.0001144. TCP error
code 10061: No connection could be made because the target machine actively refused it 127.0.0.1:1500. ---
> System.Net.Sockets.SocketException: No connection could be made because the target machine actively
refused it 127.0.0.1:1500…
SAML request processing by ADFS #2
• Error Message:
From AD FS Event Viewer
The Federation Service encountered an error while processing the SAML authentication request.
Additional Data
• Possible cause:
Relying party trust is not established or Cisco IdS certificate has changed, but the same is not
uploaded to the AD FS.
SAML request processing by ADFS #2
• Recommended action:
Establish trust between AD FS and Cisco IdS with the latest Cisco IdS certificate.
Ensure that the Cisco IdS Certificate is not expired. You can see the status dashboard in Cisco Identity
Service Management. If so, regenerate the certificate in the Settings page.
SAML Response Sending by ADFS
• Problem:
Browser shows NTLM login, but after successful log in, it fails with many redirects.
• Possible cause:
Cisco IdS supports only form based authentication, Form authentication is not enabled in AD FS.
• Recommended action:
For more details on how to enable Form authentication see:
ADFS 2.0 Form Authentication Setting
ADFS 3.0 Form Authentication Setting
SAML response processing by Cisco IdS
• Error Message:
SAML response processing by Cisco IdS
• ADFS event viewer
• IdS logs:
2017-03-13 21:37:53.189 CET(+0100) [IdSEndPoints-SAML-54] ERROR com.cisco.ccbu.ids IdSSAMLAsyncServlet.java:299
- SAML response processing failed with exception com.sun.identity.saml2.common.SAML2Exception: Invalid Status code in
Response….
SAML response processing by Cisco IdS
• Possible cause:
AD FS is configured to use SHA-256.
• Recommended action:
1. RDP to the AD FS system.
2. Open AD FS console.
3. Select the Relying Party Trust and click Properties
4. Select the Advanced tab.
5. Select SHA-1 from the drop-down list.
SAML response processing by Cisco IdS #2
• Error Message:
SAML response processing by Cisco IdS #2
• ADFS Event Viewer:
SAML response processing by Cisco IdS #2
• IdS logs:
2017-03-13 21:52:50.161 CET(+0100) [IdSEndPoints-SAML-55] INFO com.cisco.ccbu.ids
SAML2SPAdapter.java:76 - SSO failed with code: 1. Response status: <samlp:Status>
<samlp:StatusCode Value="urn:oasis:names:tc:SAML:2.0:status:Responder">
</samlp:StatusCode>
</samlp:Status> for AuthnRequest: n/a
2017-03-13 21:52:50.161 CET(+0100) [IdSEndPoints-SAML-55] ERROR com.cisco.ccbu.ids
IdSSAMLAsyncServlet.java:299 - SAML response processing failed with exception
com.sun.identity.saml2.common.SAML2Exception: Invalid Status code in Response.
SAML response processing by Cisco IdS #2
• Possible cause:
Custom claim rule is not configured correctly.
• Recommended action:
1. Under AD FS claim rules, ensure that attributes mapping for "user_principal" and "uid" are defined
correctly
2. RDP to AD FS system.
3. Edit the Claim Rules for custom claim rules.
4. Verify that the AD FS and Cisco IdS fully qualified domain names are given.
SAML response processing by Cisco IdS #2
Useful links
• Configure the Identity Provider for UCCX based on SSO
http://www.cisco.com/c/en/us/support/docs/customer-collaboration/unified-contact-center-
express/200612-Configure-the-Identity-Provider-for-UCCX.html
• ADFS/IdS Troubleshooting and Common Problems
http://www.cisco.com/c/en/us/support/docs/customer-collaboration/unified-contact-center-
express/200662-ADFS-IdS-Troubleshooting-and-Common-Prob.html
• Troubleshooting docwiki
http://docwiki.cisco.com/wiki/Troubleshooting_Tips_for_Unified_CCX_11.5
• Troubleshooting Tech Notes UCCX
http://www.cisco.com/c/en/us/support/customer-collaboration/unified-contact-center-
express/products-tech-notes-list.html
If all of the above fails…
Open a TAC case
Information that is useful to add from the beginning already:
• Versions
• Topology
• Known changes
• Exact error (with screenshots/logs and your analysis)
• Steps you’ve tried already
• Impact
Questions?