Beruflich Dokumente
Kultur Dokumente
Operation Task
===============================
For Basic Commands in Hadoop Please follow the link given below:
https://www.youtube.com/watch?v=PI1Xm1Zoj78
sudo -u hdshci
cd /appl/conf/hdshc/
touchApplicationName.properties
viApplicationName.properties
copy property content in ApplicationName.properties
e.g
sudo -u hdshci
cd /appl/conf/hdshc/
viApplicationName.properties
copy property content in ApplicationName.properties
e.g
export RUBY_PATH="/usr/local/ruby19/bin/ruby -E ascii-8bit"
export INC_DIR='/incoming/marketing/promotional'
export WORK_DIR='/work/marketing/promotional'
export GOLD_DIR='/gold/marketing/promotional'
First point of contact for hadoop support on call page or hadoop support
Issues are related job stuck in last reducer, tasks are failing due to JAVA Heap size,
tasks are failing due to no route to host exception or infrastructure issues.
Valid values for priorities are: VERY_HIGH, HIGH, NORMAL , LOW, VERY_LOW
For this we drop a mail to corresponding teams as per batch user ids.
Usually reporting teams ask to kill the job.
# hdfsdfsdistcp<source_url><destination_url
Hdsfdfsdistcp hftp://<source_namenode_name>:<port>/<path>
hdfs://<destination_namenode_name>:<port>/<path>
Hdfsdfsdistcp hdfs://trphadn01:8020/gold/netwatch/traffic_guest/2013/07
hdfs://trihadn01:8020/gold/netwatch/traffic_guest/2013/07
Or
# hdfsdfsdistcp<source_url><destination_url
Hdfsdfsdistcphftp://<source_namenode_name>:<port>/<path>
hdfs://<destination_namenode_name>:<port>/<path>
hadoopdistcp
hdfs://trihadn01:8020/gold/netwatch/traffic_guest/2013/07
hdfs://trphadn01:8020/gold/netwatch/traffic_guest/2013/07
8. Providing application log files from production server (using succeed request)
Sometimes application users/support users are requesting application log files using
share request. We are providing log files as per the request in batch user home
directory.
sudo -u hdshci
cd/logs/hdshc/impact/
mkdir/home/auto/hdshc/log_bkp_20140810
cp *20140810.log /home/auto/hdshc/log_bkp_20140810/
[04/09/2016 4:04:35 PM] Arshad Amir Jamadar: 11. Using svn stage to provide
permissions to files with succeed requests :
It may happen during deployment / svn up that new files/scripts may have other
permissions and job would not execute or may have excess permissions ( 777 ).
Command :
sudo u svnstage i
chmod R 755 /<path>
12. Monitoring hadoop integration and production Job-Tracker and DFS-Health portal
We basically looks for cluster should have all the nodes responsive. Black listed
nodes shows nodes are not taking any request. We need to restart tasktrackers
manually.
[04/09/2016 4:04:51 PM] Arshad Amir Jamadar:
14. Monitor and actions on long running jobs on integration and production hadoop
cluster, analysing delayed jobs affecting the cluster, keeping track on no. of
reducers used by jobs, keeping track of users running jobs with own ID on
production.
All actions and screen shots are same as Job-tracker monitoring (point no 12)
15. Monitoring dfs health, and space issues .Also updating users to clear
unnecessary space on DFS. hdfs-rpt.sh script used to monitor space on dfs
Monitor by http://trphadn01.hadoop.searshc.com:50070/dfshealth.jsp
If the Dead nodes found in the Monitoring tool, inform to the Hadoop Infra Team
Members for action
16. If the integration or production server is slow, we are provided the access of
commands like "htop",to update, followup and kill the long
running
hadoop processes of the user (hadoop infra).
[04/09/2016 4:05:11 PM] Arshad Amir Jamadar: during the maximum load on the
server
If the long job running on the production, more than actual time, we will update to
user, after user request we killed the job
Ex:mapred job -kill <job ID>
Some time we need to set the job priority, as per the share request raised by the
user
Ex: Hdfsdfsjob -set-priority <job-id><priority>
17. If any application team assign the job failed incident to our queue
SHC_INF_HADOOP_SUPPORT below points should me mentioned by the Resp.
Application team
In general scenario when any job failed in Control-M, respective application team
needs to be ask to Control-M team to rerun/resubmit their failed jobs. mostly job
completes successfully in rerun then the ticket should be closed either by
application team or Control-M team. if rerun fails then above 3 points should be
considered.
If any tickets assigned to Hadoop infrastructure which are not related will be redirected to the respective team.