Sie sind auf Seite 1von 9

[04/09/2016 4:04:06 PM] Arshad Amir Jamadar: Documentation on Hadoop

Operation Task
===============================

For Basic Commands in Hadoop Please follow the link given below:

https://www.youtube.com/watch?v=PI1Xm1Zoj78

1. Manual conf file deployment.


As per the Hadoop standard development team is not storing conf file in SVN due to
information security. SVN files are visible to all from SVN URL. Application team is
storing conf file at /appl/conf/<HADOOP_BATCH_USER_ID>.
Hadoop operation team has sudo access for all hadoop batch user IDs. Operation
team is creating (copying) new conf file or updating existing conf file as per the
application requirement.

New Conf File


Execute specific hadoop batch id sudo command
Creating new property file as per succeed and CCMDB request

sudo -u hdshci
cd /appl/conf/hdshc/
touchApplicationName.properties
viApplicationName.properties
copy property content in ApplicationName.properties

e.g

export RUBY_PATH="/usr/local/ruby19/bin/ruby -E ascii-8bit"


export INC_DIR='/incoming/marketing/promotional'
export WORK_DIR='/work/marketing/promotional'
export GOLD_DIR='/gold/marketing/promotional'

Update Existing Conf File


Execute specific hadoop batch id sudo command
Open conf file in vi editor and change the value as per the succeed and CCMDB
requirement
Note. Operation team resources are receiving username and password in direct mail
for security purpose.

sudo -u hdshci
cd /appl/conf/hdshc/
viApplicationName.properties
copy property content in ApplicationName.properties

e.g
export RUBY_PATH="/usr/local/ruby19/bin/ruby -E ascii-8bit"
export INC_DIR='/incoming/marketing/promotional'
export WORK_DIR='/work/marketing/promotional'
export GOLD_DIR='/gold/marketing/promotional'

2. Hadoop code migration using SVNStage user (manual activity)


hadoop support have automated code deployment process with CCMDB. This
process is works with application directory level, not working with application sub
directories wise (sub-application wise). Pricings all sub-applications are not
available on SVN. Automated process is not working with pricing SVN code
deployment. Hadoop operation team is doing manual code deployment for pricing.

There two types for code deployment


New Sub application deployment
Create Sub application directory and provide 755 recursive access on unix level
created by SHC hadoop infra team
Execute sudo command - sudo -u svnstagei
Go to /appl/<HADOOP_BATCH_USER_ID>/<sub_application_name>
Executing svn checkout command
svn checkout
https://ushofsvpsvn1.intra.searshc.com/svn/hadoop_repo/hadoop/<batch
user>/branches/prod/hdp_r1_prod_b1/appl/<batch_user>

Update existing code deployment


Execute sudo command - sudo -u svnstagei
Go to /appl/<HADOOP_BATCH_USER_ID>/<sub_application_name>
Executing svn checkout command
svn update r30 (updating svn code as per the CCMDB requirement)

3. Providing support to forwarded mail for hadoop support oncall page/hadoop


support DL

First point of contact for hadoop support on call page or hadoop support

Issues are related job stuck in last reducer, tasks are failing due to JAVA Heap size,
tasks are failing due to no route to host exception or infrastructure issues.

4. Increasing Priority for Job :

Sometimes there is a user request to increase the priority of jobs on production.


This is done so that the job running can get mappers and reducers prior to other
jobs running at same time.

Command: mapred job -set-priority <job-id><priority>

Eg :mapred job -set-priority job_201306081603_990673 VERY_HIGH

Valid values for priorities are: VERY_HIGH, HIGH, NORMAL , LOW, VERY_LOW

It requires no permission to carry out this task.

5. Job hang on integraton/production :

While monitoring jobtracker we also check the hung jobs.


We observe them when their counters are not increasing and job is not in progress
from long time. This can happen due to connectivity issues, data skewness etc.

For this we drop a mail to corresponding teams as per batch user ids.
Usually reporting teams ask to kill the job.

6. Data migration from production to integration using distcp command (succeed


request)
Sometimes application teams require production data on integration cluster for
developing or testing code. Hadoop operation team is working such type of succeed
request

Execute sudo command - sudo -u hdshc i

# hdfsdfsdistcp<source_url><destination_url

Hdsfdfsdistcp hftp://<source_namenode_name>:<port>/<path>
hdfs://<destination_namenode_name>:<port>/<path>

Hdfsdfsdistcp hdfs://trphadn01:8020/gold/netwatch/traffic_guest/2013/07
hdfs://trihadn01:8020/gold/netwatch/traffic_guest/2013/07

Or

hadoopdistcp -overwrite hdfs://trphadn01/gold/location/ims/master/000000_0


hdfs://trihadn01/gold/location/ims/master/000000_0

7. Data migration from integration to production using distcp command CCMDB


request)
Sometimes application teams need development data of production cluster for
production application requirement. (Small master tables) Hadoop operation team is
working such type of CCMDB requests

Execute sudo command - sudo -u hdshc i

# hdfsdfsdistcp<source_url><destination_url

Hdfsdfsdistcphftp://<source_namenode_name>:<port>/<path>

hdfs://<destination_namenode_name>:<port>/<path>
hadoopdistcp

hdfs://trihadn01:8020/gold/netwatch/traffic_guest/2013/07
hdfs://trphadn01:8020/gold/netwatch/traffic_guest/2013/07

8. Providing application log files from production server (using succeed request)
Sometimes application users/support users are requesting application log files using
share request. We are providing log files as per the request in batch user home
directory.

Login into production access node


Use batch user SUDO access
Browsing specific application log directory
Copying log file in to BATCH users home directory

sudo -u hdshci
cd/logs/hdshc/impact/
mkdir/home/auto/hdshc/log_bkp_20140810
cp *20140810.log /home/auto/hdshc/log_bkp_20140810/
[04/09/2016 4:04:35 PM] Arshad Amir Jamadar: 11. Using svn stage to provide
permissions to files with succeed requests :

Files in svnstage have always 755 permission.

It may happen during deployment / svn up that new files/scripts may have other
permissions and job would not execute or may have excess permissions ( 777 ).

We need to set the file permissions to 755 to meet the standards.

It does not require any succeed request.

Command :

sudo u svnstage i
chmod R 755 /<path>

12. Monitoring hadoop integration and production Job-Tracker and DFS-Health portal

In Job Tracker monitoring:http://trphadj01:50030/jobtracker.jsp

We basically looks for cluster should have all the nodes responsive. Black listed
nodes shows nodes are not taking any request. We need to restart tasktrackers
manually.
[04/09/2016 4:04:51 PM] Arshad Amir Jamadar:
14. Monitor and actions on long running jobs on integration and production hadoop
cluster, analysing delayed jobs affecting the cluster, keeping track on no. of
reducers used by jobs, keeping track of users running jobs with own ID on
production.

All actions and screen shots are same as Job-tracker monitoring (point no 12)

15. Monitoring dfs health, and space issues .Also updating users to clear
unnecessary space on DFS. hdfs-rpt.sh script used to monitor space on dfs
Monitor by http://trphadn01.hadoop.searshc.com:50070/dfshealth.jsp

If DFS Used% is above 90%, checking which users/batch-ID is consuming more


data, immediately inform to all the users to delete the unwanted data from
Production

Ex :hdfsdfs -du -s -h /user/a or hdfsdfs -du -s -h /user/

If the Dead nodes found in the Monitoring tool, inform to the Hadoop Infra Team
Members for action

16. If the integration or production server is slow, we are provided the access of
commands like "htop",to update, followup and kill the long
running
hadoop processes of the user (hadoop infra).
[04/09/2016 4:05:11 PM] Arshad Amir Jamadar: during the maximum load on the
server

If the long job running on the production, more than actual time, we will update to
user, after user request we killed the job
Ex:mapred job -kill <job ID>

Some time we need to set the job priority, as per the share request raised by the
user
Ex: Hdfsdfsjob -set-priority <job-id><priority>

17. If any application team assign the job failed incident to our queue
SHC_INF_HADOOP_SUPPORT below points should me mentioned by the Resp.
Application team

1) How your team determine that this is related to infrastructure issue?


2) Have you reviewed log files and done first level of analysis from your end for
failed jobs & where they located?
3) Failure job names and numbers in the job tracker?

In general scenario when any job failed in Control-M, respective application team
needs to be ask to Control-M team to rerun/resubmit their failed jobs. mostly job
completes successfully in rerun then the ticket should be closed either by
application team or Control-M team. if rerun fails then above 3 points should be
considered.

If any tickets assigned to Hadoop infrastructure which are not related will be redirected to the respective team.

for historical job information please go through http://hadoopperf.intra.shldcorp.com/perf-reports/job-tracking.html

Das könnte Ihnen auch gefallen