Sie sind auf Seite 1von 30

IT Operation Manual

COSMIC

Exported on 06 Jun 2019


COSMIC – IT Operation Manual

Table of Contents

1 Einleitung .............................................................................................................................. 4
1.1 Management IT betriebsrelevanter Informationen ............................................................. 4
1.1.1 How relevant information is managed ........................................................................................... 4
1.2 Inhalt und Zuständigkeit ..................................................................................................... 4
1.3 Scope: Kurzbeschreibung des zu betreibenden Umfangs ................................................. 5
1.4 Übergreifende Steuerungsprinzipien .................................................................................. 5
2 Organisation des Betriebes ................................................................................................ 6
2.1 Besprechungen / Betriebsmeetings ................................................................................... 6
2.1.1 Daily meetings .............................................................................................................................. 6
2.1.2 L1 .................................................................................................................................................. 6
2.1.3 CCB .............................................................................................................................................. 6
2.1.4 CAB............................................................................................................................................... 6
3 IT Prozesse ........................................................................................................................... 7
3.1 Manage Changes ............................................................................................................... 7
3.1.1 The change process within the BizDevOps organization of COSMIC ........................................... 7
3.2 Manage Incidents ............................................................................................................... 8
3.3 Manage IT Compliance .................................................................................................... 10
3.4 Manage Problems ............................................................................................................ 11
3.5 Manage Releases ............................................................................................................. 11
3.5.1 Maintenance hours on COSMIC platform ................................................................................... 11
4 Support Konzept ................................................................................................................ 12
5 Systemüberblick ................................................................................................................ 13
5.1 Systemarchitektur ............................................................................................................. 13
6 Systemkomponenten ......................................................................................................... 14
7 Installationsanweisungen ................................................................................................. 15
7.1 Vorarbeiten ....................................................................................................................... 15
7.1.1 SAP LogOn Installation ............................................................................................................... 15
7.2 Migration / Upgrade .......................................................................................................... 15
7.3 Update / Patch / Hotfix ...................................................................................................... 15
8 Administration .................................................................................................................... 16
8.1 Interfaces start and stop ................................................................................................... 16
8.2 Serverstart bei Reboot:..................................................................................................... 16
8.3 Logging Dateien ............................................................................................................... 16
8.4 Start/Stop Applikation ....................................................................................................... 16
8.5 Administrationsanweisungen ............................................................................................ 16
8.5.1 Serverstart bei Reboot ................................................................................................................ 17
8.5.2 Manueller Serverstart und –restart .............................................................................................. 17
8.5.3 Serverstopp ................................................................................................................................. 17
8.5.4 Überprüfen der Server Prozesse ................................................................................................ 17
8.5.5 Logging Dateien .......................................................................................................................... 17
8.5.6 Start/Stop Applikation.................................................................................................................. 17
9 Regelmäßige Tätigkeiten ................................................................................................... 18
9.1 Tools ................................................................................................................................. 18
9.2 Batch / Jobs ...................................................................................................................... 18
10 Monitoring ....................................................................................................................... 19
11 Backup / Archivierung ................................................................................................... 20
12 Troubleshooting ............................................................................................................. 23
12.1 Debugging ..................................................................................................................... 23
12.2 Hilfsmittel zur Fehlersuche............................................................................................ 23
12.3 Typische Fehlermeldungen........................................................................................... 23
12.4 Known Errors ................................................................................................................ 23
12.5 FAQ / ResolveIT Datenbank ......................................................................................... 24

Table of Contents – 2
COSMIC – IT Operation Manual

13 Zulassungen ................................................................................................................... 25
14 Notfallkonzept ................................................................................................................. 26
14.1 General ......................................................................................................................... 26
14.1.1 Result of Risk Analysis and Risk Reducing Measures ................................................................ 26
14.1.2 Malfunction / Emergency Scenarios ............................................................................................ 26
14.1.3 High Availability and disaster recovery in general ....................................................................... 27
14.2 Organisatorische und technische Vorsorgemaßnahmen ............................................. 27
14.2.1 Muss noch ergänzt werden: ........................................................................................................ 27
14.2.2 Organizational and technical prevention measures .................................................................... 28
14.3 Recovery ....................................................................................................................... 28
14.3.1 Return to normal operation ......................................................................................................... 28
15 Offene Punkte ................................................................................................................. 29
16 Lenkung des Dokumententemplates ............................................................................ 30

Table of Contents – 3
COSMIC – IT Operation Manual

1 Einleitung
The application COSMIC (Central SOlution for Stock Management, Invoicing & Cost of Retail)
replaces and retires the existing AG invoicing solution based on PFIFF, Fakturen-sammler,
ZBF, BFFZ (until 2018). COSMIC enables invoicing for AG including DE for new business
models (e.g. direct sales).
Furthermore COSMIC got the assignment to create a second new integrated SAP platform,
which covered different processes like invoice collectors and inventory management. With this
solution the following legacy systems shall be replaced:
- PFIFF (Invoicing system BMW AG)
- FW (Collector of inovices)
- BFFZ (Inventory and Stock Management System)
- ZFB (Schnittstellendatenaufbereitung von Fahrzeugstammdaten für SAP Systeme)

To COSMIC the PFS application (price finding service) is strongly connected which delivers all
prices for invoices and stands as single source of vehicle prices within BMW AG.
In addition CoR application sends all relevant cost of retails measures to COSMIC.

1.1 Management IT betriebsrelevanter Informationen


The master document of this operation manual is stored and updated here.

1.1.1 How relevant information is managed


The content of this document is checked and updated regularly by the Accenture operations
support team and the FG-540 DevOps team. Usually, updates will be done with every
sprint/release of COSMIC and/or every six months (see standard operations terms according to
SAO II contract). The operations manual will describe more and more functionalities throughout
the different performance levels.
Further documents
Dokument Referenz

Process charts Prozessdokumentation

Product team COSMIC Organisation

Functions and add-on specification Add-on specifications

Interfaces Schnittstellenverträge

1.2 Inhalt und Zuständigkeit


COSMIC Contact Persons

Contact list for all operation topics DL.Operations.COSMIC@list.bmw.com DevOps Team

Seite– 4
COSMIC – IT Operation Manual

1.3 Scope: Kurzbeschreibung des zu betreibenden Umfangs

Please find a system description, complete overview of all components, contact persons,
system resposibilities and interfaces in the CONNECTIT

1.4 Übergreifende Steuerungsprinzipien

No special regulations applicable. Steering follows the BMW standard processes.

Seite– 5
COSMIC – IT Operation Manual

2 Organisation des Betriebes

2.1 Besprechungen / Betriebsmeetings

The following regular meetings take place to secure stable application operation and/or to
address quickly any open issues or problems.

2.1.1 Daily meetings


The Cosmic teams are holding daily meetings to discuss issues, defects, solutions to run and
develop new functionalities for the application COSMIC

2.1.2 L1
The L1 meeting takes place weekly in a 30 minutes time slot between the service provider
(Accenture) and BMW application responsible. Goal is the clarification of any
incidents/problems which have occurred in the passed month.

2.1.3 CCB
The CCB is held on demand under coordiation of the Product Owners of vehicle invoicing AG or
COSMIC.

2.1.4 CAB
See Change Management

Seite– 6
COSMIC – IT Operation Manual

3 IT Prozesse
The product COSMIC follows the standard IT process of the agile working model.
CMDB ID:
Connect -IT/Application ID:

3.1 Manage Changes


Changes to the IT live environment belong to the everyday tasks in the IT community. On the
one hand, problems need to be fixed, on the other hand, business requirements need to be
implemented, such as function upgrades, cost reductions and efficiency improvements
(functional releases for application operations, infrastructure releases for infrastructure
operations). The goal of IT Change Management is to define consistent procedures and
processes that ensure efficient and prompt handling of all changes. The number of incidents
caused by changes shall be minimized, and service quality shall be improved, in order to
consequently improve day-to-day IT operations.
In general, a change and requirement user story can be created in the COSMIC JIRA board.
If a user opens an incident ticket to the service fin-cosmic:global for an enhancement in the
application or for fixing a bug in the code, the incident will be closed and a user story in JIRA
has to be created in its regard.
The user story of the chane which describes the change in detail (along with functional and
technical modifications) must be specified by the user and the product owner. The change will
be closed only after the successful deployment into production within a release, post
deployment checks and completed tasks.
Changes are all technically feasible prepared modifications which will be implemented in the
productive environment by COSMIC DevOps team and might have effects on operations.
The purpose of the Change Management process is to implement changes in the most efficient
manner, while minimizing the negative impact on customers when changes are implemented.
The COSMIC Change Management process is delivered in JIRA and part of the AWM overall
process for BMW
AG: https://atc.bmwgroup.net/confluence/display/AWM/RM2.10.06.07+Manage+Changes

3.1.1 The change process within the BizDevOps organization of


COSMIC

Seite– 7
COSMIC – IT Operation Manual

3.2 Manage Incidents


In general, an incident can be a user request or an issue to request the restoration of an
available service or function.
Whenever there is an incident in the application, the support team is responsible to analyses,
inform the user/ IT FG-540 COSMIC DevOps team and fix the issue within the stipulated
timelines of SLA.
If any approvals are required, for deletion/insertion into database, the support team must reach
out to the FG-540 operations (DL.Operations.COSMIC@list.bmw.com) team for the same.
The objective of Incident Management is to restore normal operations as quickly as possible
with the least possible impact on either the business or the user, at a cost-effective price.

The standard Incident Management process is delivered as part of the ITSM Suite.

1. Main COSMIC Servicefin-cosmic:global


Group: ao-fin-sap04:global:2nd

2. All topics after the issue has been analysed by 2nd level, regarding code changes or
missing data which need to be maintained by the business, will be routed to this
service: fin-project-cosmic:global
Group: ao-fin-cosmic-projektbetrieb:global:2nd

3. Platform issues must be handled by FG-553 and raised to the service: fin-platform-
cosmic:global
Group: prod-platform:global:2nd (for poduction issues on CMP system)
Group: non-prod-platform:global:2nd (for issues concering the test and developement
systems CME/CMQ/CMI)
4. For reporting and BI related topics: mgmt-cosmic-bi:global
Group: ao-mgmt-projekt-enabling-int99:global:2nd

Seite– 8
COSMIC – IT Operation Manual

SLA ticket priority solution timeframe

Complete solution time met percentage for priority High <= 4 hours

Complete solution time met percentage for priority medium <= 9 hours

Complete solution time met percentage for priority low <= 27 hours

SLA ticket priority classification


Impact A = extensive
Impact B = significant
Impact C = Moderate
Impact D = Minor

Seite– 9
COSMIC – IT Operation Manual

Ticket Priority Reduction


The ticket priority can be reduced only with the coordination of User or BV. The Approval Email
or the conversation with the user has to be attached in the work log of the Incident before
changing the priority. Refer the below snippet from the IM Process handbook for more
information.

3.3 Manage IT Compliance


Bitte hier Information referenzieren oder dokumentieren wenn es spezifische Ergänzungen zum
Standardprozess gibt. Siehe auch Kapitel „Management IT betriebsrelevanter Informationen".
Rechtliche Aspekte die für den vorliegenden Kontext im Besonderen zu beachten sind,
insbesondere für zu regulierende Bereiche (z.B. die BMW Bank).
z.B. IT IKS, Datenschutz, MaRisk, Rechnungslegung GoBd,..
Wie und in welchem Zyklus wird überprüft, ob relevante Compliance Vorgaben eingehalten
werden?

From operations point of view, support engineers comply with the standards procedures which
they have to follow in case of a critical priority or high priority issues. They also adhere to the
approval procedures set for making changes in database or any change in the production file
system.
Security of information is controlled by giving proper access to right users, only the relevant
support staff are given access to application.

Once identified, a security incident must be passed on to the Security Incident Handling
Process (SIHP) immediately by the member of staff concerned. Capturing IT security incident
reports through ASZ or the ITSM tool: it-security-soc:global. This ITSM service must be known
by the operations team and documented in the operation manual.

Seite– 10
COSMIC – IT Operation Manual

Vulnerability management is a defined process that must be observed by everyone involved.


The purpose of the process is to avert technical threats to vulnerable areas in IT systems (e.g.
weak encryption mechanism or defect door to a security zone). Capturing IT vulnerability
reports through ASZ or the ITSM tool: it-security-sem:global. This ITSM service must be known
by the operations team and documented in the operation manual.

3.4 Manage Problems


When an incident has multiple occurrences with similar symptoms then there is a possibility that
the application has a bug/defect, either in code or in database or with interfacing systems. In
this case, a complete analysis is required to find the root cause of the problem.
The support team will create a problem ticket and a complete analysis of the problem will be
shared to the FG-540 COSMIC DevOps team. Once the root cause is identified, the PM ticket
will be closed either by creating a user story in the COSMIC JIRA Board or by creating a
solution document in case it is resolved by a change.
The objective of Problem Management is to minimize the impact of problems on the
organization. Problem Management plays an important role in the detection and providing
solutions to problems (work arounds & known errors) and prevents their reoccurrence.
The scope of the Problem Management process is limited to problems that can be identified
using the registered incident request information and problems that have been identified by the
DevOps team.
Problem Management within the IT is an ITIL Process and is being handled by using the ITSM
suite

3.5 Manage Releases


The release management process is handled in the COSMIC JIRA board under the section
"Releases". For each release or bug fix, a ticket is created, descibed, implemented, tested and
documented in either a user story or a defect.
All deployed developments are being document in the transport list. and can be controlled as
well in the production system (CMP).

3.5.1 Maintenance hours on COSMIC platform


Responsible for this process is <cosmic_platform_management@bmwgroup.com>
Weekly unavailability / downtime window for the following COSMIC systems each
Thursday 21.00 - 22.00:
CME
CMQ
CMI
CMD

For the productive sytem CMP the defined maintenance window is from 01:00 - 05:00 am.
Special Job processing during this time:
Full stop is scheduled of all CMP020 jobs from 01H00am – 05H00am every Sunday
Change run times of the following jobs for Sundays only:
.BMW.CSI_I4042_GAMMA - (Start at 05:10)
.BMW.CSI_A4073_POSTPROCESS_0300 - (Start at 06H00)

Seite– 11
COSMIC – IT Operation Manual

4 Support Konzept
Whenever the users faces any issue, they raise a request in ITSM tool wich is called a
ticket/incident. That is how the issues are recorded and tracked in daily operations; see also
manage incidents.

LEVEL DESCRIPTION NEXT STEPS

Customer A customer has raised an The Customer is asked to defined key users and
incident regarding the to check information in wiki.muc or the resolve
system COSMIC. IT solutions provided by the 2nd level support. If
the incident remains, the customer will create a
ticket by calling ASZ (call 55555) or
mail asz.hotline@bmw.de

1st-Level-Support Incident ticket will proofed If no solution can be found in the ResolveIT
(ASZ) against known solutions in database, the ticket will be opened to 2nd-Level-
Resolve IT-database. Support.

2nd-Level-Sup- Incident ticket will be vali- The 2nd level support team will fix the incident,
dated and in case of inform the user and set the incident to
port
technical incidents, "resolved".
(Operations actions to find a solution
If no technical solution can be taken, the ticket
and solve the issue will be
Management) will be forwarded to 3rd-Level-Support of the
taken.
Business Application.
In case of business/ maintenance incidents, a
ticket will be opend to the 3rd level support,
which means a user story or defect will be
created in the COSMIC JIRA board

3rd-Level-Support Incident ticket will be vali- Complexe Coding Analysis will be supported by
(Business and dated and solutions will 3rd-Level-Support of the Businsess Application
Maintenance be resolved. and necessary code changes will be taken.
Support)

Once a solution from a problem or a recurring incident is found, a solution guide is being
created and documented under the operations confluence page:
https://atc.bmwgroup.net/confluence/display/COSMIC/COSMIC+Operations

Seite– 12
COSMIC – IT Operation Manual

5 Systemüberblick
Central Solution for Stock Management Invoicing (COSMIC).
Fahrzeugfaktura der BMW AG für Neu- und Gebrauchtfahrzeuge samt Bestandsführung,
Accounting und CoR Abrechnungen zu Verkaufsförderungsmaßnahmen.

Link ConnectIT: COSMIC in Connect IT

5.1 Systemarchitektur
Die Systemarchitektur als Originaldokument liegt hier:
\\Europe.bmw.corp\winfs\Panama\PLW_F\COSMIC\A_Phasenergebnisse\05_IT-
Konzept\01_Architektur\01_MetA_Models\01_Core

Example picture

Seite– 13
COSMIC – IT Operation Manual

6 Systemkomponenten
Application (SAP BW) SAP Logon Entry Application Server System-
ID

BWE (development system) BMW AG Contr. + RechnW. i. tbwecs20.bmwgroup.net BWE


BWE

BWI (Integration system) BMW AG Contr. + RechnW. j. ibwics10.bmwgroup.net BWI


BWI

BWP (Production system) BMW AG Contr. + RechnW. pbwpcs00.bmwgroup.net BWP


k. BWP

CME (Project Development COSMIC Invoicing tcmecs20.bmwgroup.net CME


System) Development Project

CMQ (Maintenance Test COSMIC Invoicing icmqcs10.bmwgroup.net CMQ


System) Integration Project

CMD (Maintenance COSMIC Invoicing tcmdcs20.bmwgroup.net CMD


Development System) Development Maintenance

CMI (Project Integration COSMIC Invoicing icmics10.bmwgroup.net CMI


System) Integration

CMP (Production System) COSMIC Invoicing pcmpcs00.bmwgroup.net CMP


CLNT 010 Production

Seite– 14
COSMIC – IT Operation Manual

7 Installationsanweisungen

7.1 Vorarbeiten

Please find the Worker User Self Service Portal in the BMW Network under the following link:
Link to WUSS

7.1.1 SAP LogOn Installation


Please use the searchbar you can find under the WUSS Link.
Type SAP Logon and always use the most recent version of the client you can find.
Press install and follow the installation instructions. After that you can find the SAP Logon client
amongst your windows software when you press the windows start button.

7.2 Migration / Upgrade

COSMIC follows the standard BMW opertions for SAP Basis Platforms and will be included if an
update is necessary for infrastructural components.

7.3 Update / Patch / Hotfix


COSMIC follows the standard BMW opertions for SAP Basis Platforms and will be included in
necessary updates, patches or hotfixes regarding infrastructural topics

Seite– 15
COSMIC – IT Operation Manual

8 Administration

2nd level support is not allowed to restart any server infrastructure.

8.1 Interfaces start and stop


Starting and stopping interfaces depends on the type of interfaces:
- ALE interfaces (e.g. IDoc-based SAP-SAP communication) needs to be handled via
ALE Administration WE20
- EAI-based interfaces (e.g. COP CSI-I-4004) should be stopped on EAI level, i.e. EAI is
not forwarding files to the exchange directories and should buffer incoming/outgoing files
- RFC-calls can be stopped by locking technical user in partner system

8.2 Serverstart bei Reboot:


A change ticket with the respective task need to be created to SAP BASIS.

8.3 Logging Dateien


Beantworten Sie hier bitte folgende Fragen:
Gibt es Logging für das System? Wie sind die Log-Daten auswertbar? Welche Bedeutung
haben einzelne Einträge?

8.4 Start/Stop Applikation

8.5 Administrationsanweisungen

Die folgenden Unterkapitel geben einen Überblick über die Administrations-Tätigkeiten für den
Betrieb.
Nachfolgend sind einige üblicherweise beschriebenen Themen aufgelistet. Beschreiben Sie hier
alle für den laufenden Betrieb notwendigen Tätigkeiten.

Seite– 16
COSMIC – IT Operation Manual

8.5.1 Serverstart bei Reboot

8.5.2 Manueller Serverstart und –restart

8.5.3 Serverstopp

8.5.4 Überprüfen der Server Prozesse

8.5.5 Logging Dateien


Beantworten Sie hier bitte folgende Fragen:
Gibt es Logging für das System? Wie sind die Log-Daten auswertbar? Welche Bedeutung
haben einzelne Einträge?

8.5.6 Start/Stop Applikation

Seite– 17
COSMIC – IT Operation Manual

9 Regelmäßige Tätigkeiten

9.1 Tools
Relevante Inhalte:
 Verwendungszweck
 Lokation der Anwendung (Pfad/ Link)
 Beschreibung wie die Anwendung gestartet/ ausgelöst wird
 Datenquelle der Anwendung

9.2 Batch / Jobs


The COSMIC job overview can be found here: Job Schedule

Seite– 18
COSMIC – IT Operation Manual

10 Monitoring

The checks of the daily monitoring and its activites are descibed here: Operations 2nd level -
how to

Seite– 19
COSMIC – IT Operation Manual

11 Backup / Archivierung

The archiving concept can be founde here: Data Archiving Concept [CSI-A-4092]
Regarding backup rules, COSMIC is following the BMW SAP backup guidelines implemented
by SAP Basis team
To enable consistent backups for the SAP HANA database the following parameters have to be
set:
 enable_auto_log_backup=yes
 log_mode=normal
The SAP HANA database offers two different supported backup mechanism:
 File-based backups: To perform a file-based backup a file system (/hana/backup/<SID>) -
accessi-ble (read/write) from all database nodes - is required. SAP supports cluster file
systems as well as NFS file systems. As the BMW Linux operation team does not yet
support cluster file systems, a NFS file system provided by a NAS filer is used for the
backup file system (see [HANABACKUP03]). In any case, the log- and data- file systems
have to be separated from the backup file system. At BMW, file-based backups were used
until master solution 1.5. However, starting with the master solution release 2.0 backups are
performed by using the backint interface ([HANABACKUP01]). Although, file-based
backups are not used regularly, the possibility to create file-based backups will still be given
for extraordinary situations such as an unavailability of the NetBackup infrastructure
([HANABACKUP07]).
 Backups using the backint interface: In this case, the HANA system delivers the backint
interface. A backup agent from the backup vendor has to be provided to communicate
between the backint interface to the backup software. Backups are transferred via Unix
pipes from the SAP HANA da-tabase to the 3rd party backup tool. This 3rd party backup
tool must be a certified product for the usage with the SAP HANA database, which is
documented in SAP note 2031547 - Overview of SAP-certified 3rd party backup tools and
associated support process. As of the master solution release 2.0, the backint interface is
used with Veritas NetBackup (see [HANABACKUP01]).
NetBackup, which uses the backint interface, is designed as a 3-tier client-server infrastructure.
It con-sists of three major components:
 Master server: the master server manages backups, archives and restores. It contains the
NetBackup catalog, which is an internal database with information about the backups,
system configuration and available backup resources. Furthermore, the master server uses
the backup policy in order to run and operate the backups. The backup policy defines
certain backup parameters of backup jobs such as which clients to backup and where the
backup files should be stored.
 Media server: The major function of the media server is the data movement between the
client and the storage, which is attached to the media server. It reads data from the
NetBackup Cli-ent and then writes the backup data to the designated backup media. A
NetBackup media server owns one or more backup devices such as tape drives, tape
library etc. In addition, a NetBackup media server can be located either with a NetBackup
master server on a single server or on an own separate server.
 NetBackup Client: The NetBackup Client contains the Veritas backint implementation and
pro-vides an interface to HANA. It is installed on the SAP HANA database server and in
case of a multi-node environment, it is installed on each node.
BMW runs two independent dedicated NetBackup environments for SAP HANA backups. Every
SAP HANA server is configured to use one of the two environments actively. However, all SAP
HANA serv-ers are also configured to use the other NetBackup environment. Hence, in case of
an unavailability of one NetBackup environment, the other environment will be used by the SAP
HANA server that is as-signed to the unavailable NetBackup environment. However, this switch

Seite– 20
COSMIC – IT Operation Manual

over requires a manual inter-vention by adopting the bp.conf file with the corresponding
environment parameters.
On the SAP HANA database server, using NetBackup with backint requires some configuration:
The SAP HANA database expects the backup agent executable (hdbbackint) to be on path
/usr/sap/<SID>/SYS/global/hdb/opt/hdbbackint. As the NetBackup Client executable is located
on path /usr/openv/netbackup/bin/hdbbackint_script, a symbolic link must be created. Moreover,
the SAP HANA database requires the data and log parameter file initSAP.utl, which can be
located on any lo-cation. On multi-node SAP HANA systems, this file must be stored on each
node of the multi-node system. The initSAP.utl file is a text file that contains comments,
parameters and parameter values. These parameters determine the backup and restore
procedure between NetBackup and the SAP tools. At BMW, two initSAP.utl files are used: The
initSAP_db.utl file includes the required parameters for data backups, whereas the
initSAP_log.utl file contains the required parameters for log backups. The absolute path of both
files must be specified for the HANA parameters “data_backup_parame-ter_file” and
“log_backup_parameter_file”. Once these configuration steps have been performed, the SAP
HANA database has been set up for performing backups using NetBackup.
When a backup has been started, the NetBackup Client contacts the master server. The master
server validates whether the client is authorized to perform the requested backup. In case of a
successful val-idation, the master server chooses a media server and informs the client about
the selected media server. The backint interface then creates several Unix pipes – one pipe for
each SAP HANA service with a persistence layer being backed up. In a scale-out system
environment, there are four pipes (one pipe for each service) on the master node and one pipe
for the indexserver on each worker node. These Unix pipes are always created in an
unchangeable default location /usr/sap/<SID>/SYS/global/hdb/backint. Using these pipes the
NetBackup Client then passes the data to the media server in order to store the backup based
on the backup policy specified location. Once, the backup process is completed the NetBackup
Client backint communicates the backup status back to the SAP HANA database, containing
relevant information such as backup id.
The following backup types are used in the backup strategy at BMW:
 Full data backup: A full data backup contains all the data that is required to recover the
database to a consistent state. As it is a full copy of the entire data set, it is normally the
longest running type of backup.
 Differential backup: A differential backup contains only data that has been changed since
the last full data backup. It can only be created if there is a full data backup available.
Changed data ap-plies to the physical representation of the data in the SAP HANA
persistent storage, which is not always the data that has been actually changed by an
application. For instance, an internal reor-ganization can change the physical
representation without changing the actual data. As a differen-tial backup only saves the
changed data since the last full data backup, it is normally completed faster than a full data
backup.
 Log backup: By default log backups are enabled (enable_auto_log_backup=yes) and SAP
HANA creates automatically backups of the log segments. During a log backup, only the
actual data of the log segments for each service with persistence is written from the log
area to service-specific log backups in the file system or to a third-party backup tool
(NetBackup at BMW). A log backup is performed independently of data and differential
backups at regular intervals, specified in the global.ini parameter log_backup_timeout_s. In
case the log segment becomes full before the log backup timeout interval, a log backup is
automatically triggered.
There will be at least one full data backup performed per week. This will then be used as basis
for dif-ferential backups that are performed once a day on every other day of the week
([HANABACKUP06]).
Backups in SAP HANA are controlled through the nameserver process. This guarantees the
global synchronization of the backup processes, which are executed on the working nodes for
each persist-ing process. The minimum data backup throughput requirement for the backup file
system is 200 GB/h (measured for last week) and 300 GB/h in average for all backups over the

Seite– 21
COSMIC – IT Operation Manual

last week. These values are defined in the minichecks (SAP note 1999993 - SAP HANA Mini
Checks, SAP note 1969700 - SQL statement collection for SAP HANA) and lead to alerts 915
and 916 (see SAP note 1999930 - FAQ: SAP HANA I/O Analysis). Expected data throughput
should be higher than 200 MB/s for reach-ing reasonable backup times.
The backup could be started from different tools:
 SAP HANA Studio: The backup console allows only manual starting of backups and is
therefore not selected as solution for automated backups.
 Transaction DBACOCKPIT from SAP NetWeaver application server: This solution is not
selected for automatism as an external client (here NetWeaver application server)
coordinates the backups and future possible projects with a SAP HANA standalone
database would require a different solu-tion.
 hdbsql: Starting the backups should be performed via the HANA interactive terminal hdbsql,
which allows execution of scripts.
 The hdbsql solution is selected for automating the backups (see [HANABACKUP02]) from
Linux side. All database nodes use a common framework to detect the status of the nodes
before exe-cuting the backups. The backup is executed only on the node having the actual
nameserver role MASTER. The hdbsql backup solution covers future projects deploying
SAP HANA databases without SAP NetWeaver stacks.
SAP HANA databases used for disaster recovery can only be backed up on the secondary site,
when the database is not registered to the primary site (see [HANABACKUP04]).
For the implementation of backup routines, follow the documentation in [Oper01] and or
[Admin01].
For further information please see the SAP Netweaver on HANA document in chapter 10 under
the following link: SAP Netweaver on HANA

Seite– 22
COSMIC – IT Operation Manual

12 Troubleshooting

12.1 Debugging
The following transactions are relevant for error analysis and correction in case of troubles with
data loading processes:
Transaction Transaction Name Task

RSA3 Extractor Checker Check if extractor runs correctly

RSA7 BW Delta Queue Check if the delta queue is filled


Monitor

SE11 ABAP Dictionary Check if the excractstructures are valid


Maintenance

SE16 Data Browser Compare data between A-BW and source system

SM37 Overview of job Check if report RMBWV302 (Delta Queue Update) is


selection running.

ZRSPC Process chain Allows the planning and control of process chains as well
schedule board as their dependencies to each other.

ZRSPCM Process chain Overview of all executed process chains in selected


monitor period.

RSPC Process chain view Enables modeling of process chains and access to
and modeling protocols of process chain runs.

SM37 Job selection Display all SAP jobs that were planned or executed during
the selected period. Planned jobs can be changed.

12.2 Hilfsmittel zur Fehlersuche

Welche Hilfsmittel und Werkzeuge gibt es zur Ermittlung von Fehlersymptomen und –
Ursachen?

12.3 Typische Fehlermeldungen

12.4 Known Errors


vgl. auch Unterkapitel „Manage Incidents"
Relevante Inhalte:
 Besonderheiten bei Systemstopp/-start
 Vorgehen bei/nach Systemausfall
 Ankündigung von Downtimes
 Datenverlust

Seite– 23
COSMIC – IT Operation Manual

12.5 FAQ / ResolveIT Datenbank

Available solutions in resolveIT belong to the owning group of ‘AO_ACN_Finanzplanung’ and


below Subgroup ‘fin-cosmic:global’:
Link to resolveIT

Responsibility for managing resolveIT-Solutions is defined in SAO II as an operational task for


specific 2nd Level Support team of the application.
Documentation for how to use resolve IT can be found here:
https://resolveit.bmwgroup.net/Lists/ResolveITDocumentation/Forms/AllItems.aspx

Seite– 24
COSMIC – IT Operation Manual

13 Zulassungen
Please follow the Link to access the IBV – Integrated User Authorisation to demand the needed
work place and roles. IBV - Access
The firefighter concept and authrisation requests can be found
here: https://atc.bmwgroup.net/confluence/display/COSMIC/COSMIC+Authorizations
Please find all detailed Information about roles and authorization in the COSMIC Authorization
Concept [RAC]

Seite– 25
COSMIC – IT Operation Manual

14 Notfallkonzept
Referenz oder Beschreibung von Abläufen in Notfallsituationen. Vgl. auch Vorgaben und
Anforderungen des Prozesses „Manage IT Continuity"

14.1 General
High availability and disaster recovery for databases are employed to ensure continuity of
business processes. The target of the concept is to ensure that the application, in this case SAP
NetWeaver, can connect to the database and access all required data without or with only
minimal interruption. To avoid unplanned downtimes of the application server ABAP, the
concepts described in 13.1 have to be met. This chapter concept focusses on the concept for
the SAP HANA database only. At BMW, the following solutions have been implemented for high
availability and disaster recovery in terms of SAP HANA:
BWoH scale-out systems make use of the host auto-failover feature for high availability and
HSR for disaster recovery, whereas BWoH single-node systems only make use of HSR
[HANAHA03]. Table 32 indicates failure scenarios and whether they can be handled by both
solutions:

14.1.1 Result of Risk Analysis and Risk Reducing Measures


This chapter includes a reference to the risk management processes and problem
management.

14.1.2 Malfunction / Emergency Scenarios


Scenario Handled by host Handled by remarks
auto-failover HSR
(automatically) (manu-ally)

Network failure of one no no A host auto-failover can be trig-


HANA host (not gered by stopping the affected
Intercon-nect HANA node.
network)

Failure of network yes no A failover is initiated if the issue


interface for the persists (checked 3 times every 30
interconnect network seconds)
on one host

Network failure no no HSR takeover cannot be initi-ated


between the data without data loss until changes have
centers been transferred to the secondary
site success-fully.

Issues with HANA no yes A HSR takeover might only be


shared storage triggered, if issues are DC-related

Issues with HANA no yes A HSR takeover might only be


persis-tence layer triggered, if issues are DC-related
(data or log area)

Failure of single no no This is handled by the HANA


HANA pro-cesses, daemon, which restarts missing
e.g. index server services. If the daemon failed, the

Seite– 26
COSMIC – IT Operation Manual

other processes are termi-nated and


a failover therefore initiated.

Linux related issues yes no In case of issues that do not lead to


of one host an automated failover, a failover can
be triggered manu-ally

14.1.3 High Availability and disaster recovery in general


High availability of the HANA database can be ensured by using SAP HANA host auto-failover
func-tionality for scale-out systems. This functionality is a built-in solution provided by SAP
HANA that en-sures the access to all data in case of failure of one host. If one of the nodes of a
SAP HANA scale-out system fails, the database will not be fully functional due to its shared-
nothing architecture. To avoid that, additional standby hosts can be added to the database (see
[HANAHA01]). A standby host does not have its own persistence layer (data and log area) but
can take over the role of a failed host, meaning it takes over the missing services and
persistence area of a failing host. This failover func-tionality is handled HANA-internally and
does not require manual action triggered by an administrator. For BWoH single-node systems
and SoH systems, the host auto-failover functionality is not applied [HANAHA03].
Disaster Recovery (DR) for SAP HANA is realized by using the built-in HANA System
Replication (HSR) functionality for BWoH and SoH. The DR solution is implemented to ensure
business continuity after major issues or double failures in the primary datacenter. HANA
System Replication is based on a complete installation of two independent databases. After
registration of the secondary to the pri-mary database, the primary database sends any
changes to the secondary, which acknowledges this, once the data is received in memory (see
Figure 19). Only after receiving the acknowledgement, the transaction is completely committed.
To ensure that this concept is working, both sites need to communicate frequently. In case of a
net-work failure between the sites or an unavailability of the secondary system, the primary site
waits for a timeout defined in the parameter logshipping_timeout (default value 30 seconds) and
then continues without the acknowledgement of the secondary system. This way, continuity of
productive operations is ensured in case of a network issue. Once the secondary site is back
again, the replication restarts automatically with a delta or full replication, depending on what’s
more efficient.
While the secondary database is registered to the primary site, it is running in recovery mode,
mean-ing that it does not accept any client connections. It normally operates in preload mode,
meaning that row and column store tables are loaded to memory. This enables faster takeover
times, but is only al-lowed, if the system is used for HSR secondary site only. In case of a non-
productive system running on the secondary site, preload mode has to be disabled.
For further information please see the SAP Netweaver on HANA document in chapter 13 under
the following link: SAP Netweaver on HANA

14.2 Organisatorische und technische Vorsorgemaßnahmen

14.2.1 Muss noch ergänzt werden:


Konkrete Handlungsbeschreibung für den Betrieb im Falle eines (Total-)Ausfalls fehlt – z.B.
 wer (Basisbetrieb oder Anwendungsbetrieb) muss welche Aktivitäten in welchem
Notfallszenarien (Ausfall vom System, Schnittstellen/verbundene Systeme, die Plattform,
die Anwendung oder die DB) durchführen?
Ich spinne als Außenstehender mal ein bischen herum à z.B. Daten sichern (Beschreibung wie
und wer macht dies), Neustart (welche Rolle ist dafür autorisiert?) der einzelnen Komponenten
in der richtigen Reihenfolge, richtige Backupdaten (von Konfigurationen, Daten, etc.) von
autorisierten Personen einspielen,

Seite– 27
COSMIC – IT Operation Manual

 Wer muss wann und über welche Kanäle (eMail, Telefon, SMS etc.) informiert/alarmiert
werden (HAL, SF6, Händler oder sonstige Nutzer, etc.), wenn das System, die Plattform,
die Anwendung oder die DB, etc. ausfallen? à ähnlich zum Kapitel 5.1 COSMIC Support
Services (im BHB)
 ab wann greift das Firefigther-Konzept? Sprich wann muss dieser involviert werden, etc.

14.2.2 Organizational and technical prevention measures


14.2.2.1 Arrangements with external providers
Only necessary if deviations or individual agreements (no standard sourcing contract) are
arranged. If applicable, this chapter should contain a reference to the defined arrangement.

14.2.2.2 Dependencies
This chapter should define all necessary dependencies to sub applications. Additional a
visualized dependencies matrix should be defined and if necessary license requirements have
to be defined within this chapter.
Information in accordance to given interface contracts should be defined. Source and target
Applications. Additional references to other emergency concepts (Apps) should be implemented
within this chapter.

14.3 Recovery
This chapter should describe all necessary technical and organizational measures in
accordance to perform the recovery to normal operations.
Example: to perform a recovery and return to normal operation of the application, it is
necessary, to ensure that the required sub systems / infrastructure (servers, DB, OS, interfaces)
are running on a stable system infrastructure and provide an unrestrained user access.

14.3.1 Return to normal operation


This chapter should contain all necessary technical and organizational measures in accordance
to perform the return to normal operation (e.g. backlog processing, replacement of manual
workarounds).
Example: In accordance to reduce the backlog, manual steps have to be performed.
Furthermore, it is necessary to implement batchjobs regarding efficient data recovery.

Seite– 28
COSMIC – IT Operation Manual

15 Offene Punkte
The open points are maintained here: 2nd level open points

Seite– 29
COSMIC – IT Operation Manual

16 Lenkung des Dokumententemplates


Template auf dem das vorliegende Dokument basiert
Bitte diesen Eintrag nicht ändern
BMW Unterlagenklasse: 4.2 Nr.: -
Group
Gültigkeitsbereich: BMW Group IT Version: 2.3

IT Operation Manual (DE) Status:


Freigegeben, gültig ab 25.10.2017

Beteiligte Personen/Fachstellen/ -
Gremien :

Änderungshistorie

Versio Inhalt Ersteller Prüfer Freigeber


n
Kurzzeichen Kurzzeichen Kurzzeichen
Datum Datum Datum

2.3 Anpassung Fußzeile Eckhard Andrea Heid


Ruetz, FG-120 Weber, PT- Hasenkopf,
24.10.2017 330 FG-120
25.10.2017 25.10.2017

2.2 Anpassung Leistungsinformation Ismail Yildiz, ITPM CCB,


FG-120 22.06.2017
22.06.2017

2.1 Hinzufügen von zusätzlichen I. Rauscher, T. Wagner, ITPM CCB,


Informationen im Kapitel FG-S-26, FG-85, K. 16.03.2017
„Notfallkonzept“ 14.03.2017 Dudek, FG-
901,
16.03.2017

2.0 Zusammenführung Template Infra. FG-900, M. Klug, FG- ITPM CCB,


und Appl., Einführung neuer Abschnitte 120, 09.12.2016
FG-801,
„Management IT betriebsrelevanter 09.12.2016
Informationen“ und „Lenkung des FG-901
Dokumentes“.
01.12.2016
Überarbeitung und Bereinigung, u.a.
basierend auf der aktuellen ITPM
Inhalte.

1.2 Integration of new document control M. Klug, C. Teich, ITPM CCB,


plugin (DRP). Change of the naming 26.09.2016
FG-120, FG-120,
Operation Handbook to Operation
26.09.2016 26.09.2016
Manual

1.1 Lenkungsinformationen hinzugefügt. ITPM ITPM CCB


Responsible
22.09.2015
22.09.2015

1.0 Veröffentlichte Version in ITPM. ITPM ITPM CCB

Seite– 30