Sie sind auf Seite 1von 12

APM Solution Tips CA Services

2008 CA

Document Title Customer Project Last Saved Date Version

WELLS FARGO APM Consulting WELLS FARGO APM best practice 12/6/13 1.00

Document Properties
Attribute Customer Name: Customer Engineer: Document Status: Authors Value WELLS FARGO Oscar C. Huerta Open Amit Sheth ( Aspire CA Consultant )

Copyright 2008 CA.

All rights reserved. All trademarks, trade names, service marks and logos referenced herein belong to their respective companies. This document is for your informational purposes only. To the extent permitted by applicable law, CA provides this document As Is without warranty of any kind, including, without limitation, any implied warranties of merchantability or fitness for a particular purpose, or non-infringement. In no event will CA be liable for any loss or damage, direct or indirect, from the use of this document including, without limitation, lost profits, business interruption, goodwill or lost data, even if CA is expressly advised of such damages.

2008 CA Inc. All rights reserved.

Saved Date: 12/6/13 Page 2 of 12

Enterprise Manager and Cluster guidelines:

Title: Troubleshooting Common EM Performance Problems ( Legacy KB ID WLY 1300 ) ISSUE:


Troubleshooting common EM performance problems

RESOLUTION:

Memory Tips Memory problems typically manifest as a poorly performing EM or more rarely, as OutOfMemory (OOM) errors. Check the following for memory related problems: Load Within Sizing Guidelines From the perflog, check that the following values are within the sizing guidelines for your environment. For example, the EM cannot handle more than 250k live metrics or 300 agents. TotalMemory NumberOfAgents NumberOfMetrics NumInsertsPerInterval

Also check the following: The total number of applications reporting to the EM does not exceed 1500 (on an average about five applications per agent). There are less than 5k metrics per agent. There are less than than 500 event inserts (includes traces, errors, and What's Interesting events from the Application Overview) into the traces database per timeslice. There are no more than 20 connected Workstations. There is a maximum of 5000 metrics from aggregate agents. If boundary blame is disabled on 7.x agents, all maximum loads should be reduced by 50%.

Heap Space Too much memory (JVM heap space) can also be a problem. On a 32-bit JVM, do not use more than 1.5 GB of heap. When allocating the total amount of memory on the machine, remember to factor enough room for the operating system.
2008 CA Inc. All rights reserved. Saved Date: 12/6/13 Page 3 of 12

In the EM lax file, check the configuration of the GC flags to see they are set too high or too low. Too many flags, in an effort to micromanage the memory, actually can lead to memory problems. On the other hand, check if you are not be using flags, which are good to have on a multiprocessor system like UseConcMarkSweepGC, UseParNewGC etc. At a steady load within sizing guideline limits, if the FreeMemory value keeps decreasing, it may indicate a memory leak. You can also check SmartStor data for indications of memory leaks. If GC does not reclaim space back, it indicates a memory leak, and CA Wily Support can assist you in analyzing this. CPU Tips A good indicator of an EM which is running on a box which has high CPU usage is high values for the HarvestDuration as seen in the perflog. 2. EM - Connections - Metrics Queued will start to increase if harvest is slow. 3. High GC percent time spent can also lead to high CPU. This GC time percent value can be seen in the 3rd column in the perflog. High GC time spent can be a result of a relatively high metric load when compared to memory allocated, a spike in the load leading to a demand for additional memory or also due to poorly configured GC flags. 4. One way to determine CPU usage of the EM is to create a summing calculator using the CPU usage of all the EM threads as seen under the supportability node. 5. If you have the customers SmartStor data, clicking on the top level of the node Custom Metric Agent - Enterprise Manager - Internal - Threads will get a tabular view of all the EM threads and you can see the CPU usage number for each thread to find out which thread is most busy. 6. On Windows perfmon can be used to determine the box's CPU load and on *nix systems, commands like "top" can be used. 7. The application overview grid is pretty CPU intensive and can be turned off if not needed by using the property introscope.enterprisemanager.appliactionoverview= false. 8. CPU related problems may also be due to poor network performance resulting in the EM having to deal with accumulated metrics from previous timeslices. Look at values of MetricDataPending (which should ideally be at 0 but a consistently big value which is a good percentage of the metric load can be a possible problem) and MetricDataRate (which should be around the same value as the metric value) for possible clues regarding this. SmartStor Tips 1. Top of the hour problems are typically due to Smartstor spooling problems which is most likely due to lack of disk file cache or lack of enough physical memory left for the OS after the JVM has been allocated its share. We recommend at least 3G of physical memory for the box, preferably 4G. We also recommend at least a 2 CPU box for the EM as uniprocessor machines perform very poorly. 2. Total number of metrics including historical metrics (not just live metrics) which have
2008 CA Inc. All rights reserved. Saved Date: 12/6/13 Page 4 of 12

metadata entries for them are another factor in the EM's performance in general and SmartStor performance too. This can be found out from the "Enterprise Manager - Data Store - SmartStor - Metadata - Metrics with Data" metric. Ideally this should not be more than 500k. Metadata can be cleaned to remove unneeded metrics as follows: a) Print out the current metric list using: java -Xms512m -Xmx512m -cp "IntroscopeServices.jar;EnterpriseManager.jar" com.wily.introscope.server.enterprise.entity.fsdb.MetadataFile metrics.metadata -dump sort > metrics.metadata.dump.sort

b)Next use SmartstorTools to do things like: i) Pruning (or removing dead metrics): D:\Introscope72\lib>java -Xms1024m -Xmx1024m cp SmartStorTools.jar;EnterpriseManager.jar;IntroscopeServices.ja r;IntroscopeClient.ja r Prune -src d:\Introscope72\data -backup d:\Introscope72\data\metrics.metadata.bkup ii) Removing select unwanted metrics like say socket metrics: D:\Introscope72\lib>java -Xms1024m -Xmx1024m cp SmartStorTools.jar;EnterpriseManager.jar;IntroscopeServices.jar;IntroscopeClient.jar RemoveMetrics -src d:\Introscope72\data -dest c:\metrics_removed -metrics .*Sockets.* 3. Smartstor duration should be typically under 7.5 secs as seen in the Perflog and anything above that indicates a poorly performing disk subsystem. 4. On Windows perfmon can be used to get a measure of the disk subsystem's performance while on *nix systems, iostat can be used. 5. Smartstor and persistent collections should not be combined. Network Tips 1. One sign of a poorly performing network is if either the MetricDataRate in the perflog or if "Enterprise Manager - Connections - Number of Metrics Handled" value as seen in the investigator is consistently less than the total metrics. 2. Poor network performance can lead to an overloaded EM and poor EM performance. In a cluster the ping time on the MOM is a clue to whether there are poor network times between the MOM and collectors or could be also due to overloaded collectors unable to respond to the ping request. 3. Poor network problems could also be due to improper AutoNegotiate settings on the EM's NIC card. Ideally it should be set to 1000/full duplex.

********************************************************** The log messages below are symptoms of an overloaded Enterprise Manager.
2008 CA Inc. All rights reserved. Saved Date: 12/6/13 Page 5 of 12

[WARN] [Manager.Clock] Timeslice processing delayed due to system activity. Combining data from timeslices x to y [WARN] [Manager.TransactionTracer] The Enterprise Manager cannot keep up with incoming event data. Some of the incoming events will be dropped. [WARN] [Manager.SmartStor] Cannot keep up with data persistence - dropping data from timeslice x:y:z [VERBOSE] [Manager] com.wily.introscope.spec.server.beans.baseliningengine.BaseliningException: Time series data received out of order. (Please note that the logging level of this message will be changed to DEBUG in the future)

2008 CA Inc. All rights reserved.

Saved Date: 12/6/13 Page 6 of 12

2
2.1

How to maintain Smartstor


Using the test_regex command to find a SmartStor data problem

In this example, you use the text_regex command to discover that JMX/SQL metrics data have exploded.
1. Run (On each Collector ) a test_regex command to produce a readable version of metrics.metadata.

For example, <EM_Home>\tools>SmartStorTools.bat test_regex -metrics -src <EM_Home>\data The result is a list of the metadata metrics.
2. Review the metadata metrics list to determine the probable source of the metrics leak or explosion.

For example, you may find a lot of metrics metadata entries for JMX/SQL metrics.
3. (Optional) Run a script to sort and group the metrics.

For example, to verify that JMX metrics are the source of the metrics explosion you can run this script: grep '|'.* | awk -F'|' '{print $1}' | sort | uniq -c If you are using a UNIX system, this would be the full test_regex command with sort script: SmartStorTools.sh test_regex -metrics <agents regexp> -src ../data | grep '|'.* | awk-F'|' '{print $1}' | sort | uniq c Ie: SmartStorTools.sh test_regex -metrics ".*" -src ../data | grep '|'.* | awk -F'|' '{print $1}' | sort | uniq c
4. Review the sort results.

For example, you might see a list of metric counts and types that looks like this: 1054327 JMX 1564 JSP
2008 CA Inc. All rights reserved. Saved Date: 12/6/13 Page 7 of 12

1051 OlamWebApp 889 EJB 569 CEF 496 Struts 173 Edocs 167 Servlets 36 CPU 29 Sockets In this case, SmartStor is storing over a million metrics, and most are JMX metrics. This indicates theres been a JMX metrics explosion that can be cleaned up using SmartStor tools. For more information on metrics explosions, see the CA APM Sizing and Performance Guide. 2.2 Trimming SmartStor data and metadata

Use the SmartStor tools to trim excess SmartStor data. Important! CA Technologies recommends shutting down the Enterprise Manager when running SmartStor tools. Because Enterprise Managers create SmartStor data when running, if you use SmartStor tools, the results can be unpredictable. Also see the topics about metrics leaks and explosions in the CA APM Sizing and Performance Guide. To trim SmartStor data and metadata:
1. Shut down the Enterprise Manager( each collector, one at a time). 2. From a command prompt, run the test_regex command to print the current metric list.

For example, if Introscope is installed in the <EM_Home> directory and SmartStor is configured to save data in the <EM_Home>\data directory, run this command: SmartStorTools.sh test_regex -metrics <agents regexp> -src ../data Ie: SmartStorTools.sh test_regex -metrics ".*" -src ../data

2008 CA Inc. All rights reserved.

Saved Date: 12/6/13 Page 8 of 12

The result is a list of fully qualified metric names, including the host name, agent name, metric name, and metric IDs. The total line count gives an indication of total number of metrics. However the line count is not as accurate as the metric count you get from the supportability metrics in the Investigator. By doing line counts against specific regular expressions matching an agent or metric name, you can get an idea of what metrics are occupying the majority of the space. This also helps identify what metrics you are not interested in and would like to remove.
3. From the list of metrics, pick metrics from specific agents to be removed. 4. Run the SmartStor tools remove_metrics command to remove a set of excess metrics.

For example, if you have high SQL metric counts, use this command to remove all the SQL metrics:
<EM_Home>\tools>SmartStorTools.bat remove_metrics -dest c:\sql_metrics_removed -metrics ".*SQL.*" -src <EM_Home>\data

When the command executes successfully, all the metrics except the SQL metrics are present in the destination directory and the source directory has all the metrics intact. Note: This command needs approximately as much target space (although it will shrink by a few GB) to run as the space occupied by the original data (specified by the src option). In the example above, the new data without the SQL metrics (specified by the dest option) will now be in the c:\sql_metrics_removed directory. This command is a long-running operation and can run for a few hours.
5. If necessary, repeat the remove_metrics command on a different set of excess metrics.

Sockets are another common source of leaked metrics. After running the remove_metrics command, use this command to remove socket metrics: <EM_Home>\tools>SmartStorTools.bat remove_metrics dest c:\sql_and_sockets__metrics_removed -metrics ".*Sockets.*" -src c:\sql_metrics_removed This command makes use of the data derived from the first command that removed SQL metrics as the source data (the data in the c:\sql_metrics_removed directory). The data that has both SQL and socket metrics removed will then be in this location specified by the dest option: c:\sql_and_sockets__metrics_removed. Again, this command needs almost as much extra space to run as the original data.
6. Check the metrics data value for the cleaned metrics data and metadata.
2008 CA Inc. All rights reserved. Saved Date: 12/6/13 Page 9 of 12

1. Open the apm-events-thresholds-config.xml file in the <EM_Home>\config directory. 2. Make a note of the current property setting, then set the property introscope.enterprisemanager.metrics.historical.limit=500000 . 3. Save and close the file. 4. Open the IntroscopeEnterpriseManager.properties file in the <EM_Home>\config directory. 5. Make a note of the current setting, then set the property introscope.enterprisemanager.smartstor.directory= to point to the destination location for the metrics removal.

For example, the example destination location is c:\sql_and_sockets__metrics_removed. Therefore, this property setting would be introscope.enterprisemanager.smartstor.directory=c:\sql_and_sockets__metrics_ removed
6. Save and close the file 7. Restart the Enterprise Manager. 8. Check the value of the following metric: Enterprise Manager | Data Store | SmartStor | Metadata | Metrics with Data

The value should be far below the original value. If it is not, repeat the trimming steps using suitable SmartStor tools commands against agent or metric names that are not needed. Once the historical metric count is below 300K, the Enterprise Manager runs more efficiently.
7. If needed, reset the changed property values back to their original settings. 1. Open the apm-events-thresholds-config.xml file in the <EM_Home>\config directory. 2. Reset the introscope.enterprisemanager.metrics.historical.limit property back to the original setting. 3. Save and close the file. 4. Shut down the Enterprise Manager. 5. Open the IntroscopeEnterpriseManager.properties file in the <EM_Home>\config directory. 6. Reset the introscope.enterprisemanager.smartstor.directory property back to the original setting. 7. Save and close the file. 8. Restart the Enterprise Manager.

2008 CA Inc. All rights reserved.

Saved Date: 12/6/13 Page 10 of 12

CA Application Performance Management Top Solution Documents EM

CA Application Performance Management Top Solution Documents - CEM

Last Updated: July 27, 2013


Doc ID #: Category: Technical Support Document: Cluster running slow, poor response time, can't navigate, can't do a transaction trace, load balancing not working properly. Where should I look next? Agent metrics are not getting reported in the Investigator tree due to breaching the agent metric limit Workstation or Enterprise Manager OutOfMemory due to SOA Dependency Map keeps growing Introscope EM - Resource recommendations Changing Agent Domain older agent data are lost in Investigator/dashboards Fixing Corrupt Transaction Trace Database Troubleshooting Common EM Performance Problems CA APM Workstation Logon very slow OutOfMemoryException in MOM Last Modified: 07/23/13

TEC595953 Performance

TEC595877 Error

07/22/13

TEC595381 Error TEC533725 Performance TEC533582 Error TEC533771 Error TEC533903 Performance TEC595655 Performance TEC565842 Performance TEC595024 Error

07/19/13 11/29/12 11/28/12 11/27/12 11/27/12 07/18/13 03/10/12

Not able to load Agent.dll - .NET Agent fails to monitor 07/08/13 a .NET application .NET agent installation Microsoft Avicode profiler preventing instrumentation but perfmon metrics are reporting 07/12/13

TEC595292 Error

TEC594851

Installation/Trouble Troubleshooting a 9.1.x .Net Agent Installation shooting Getting "Failed to CoCreate profiler" message in Windows Event Viewer

07/03/13 11/29/12 11/23/12

TEC533809 Error TEC534035

Installation/Trouble .NET Agent Troubleshooting and Manual Installation shooting

TEC575344 Configuration
2008 CA Inc. All rights reserved.

Scenario: Configure the Enterprise Manager and Agent 08/09/12


Saved Date: 12/6/13 Page 11 of 12

Connection TEC534460 Performance TEC534043 Errors TEC534283 Troubleshooting TEC593253 Configuration Agent Sending Data Too Slowly to Enterprise Manager Troubleshooting APM database connections error messages Troubleshooting a metric explosion problem How to Set a Workstation Alert for ConnectionStatus 12/03/12 11/26/12 11/09/12 06/04/13

2008 CA Inc. All rights reserved.

Saved Date: 12/6/13 Page 12 of 12

Das könnte Ihnen auch gefallen