Beruflich Dokumente
Kultur Dokumente
Hadoop developer
PROFESSIONAL SUMMARY:
Overall 8 years of professional experience in Software Development and Requirement Analysis in Agile work
environment with 4+ years of Big Data Ecosystems experience in ingestion, storage, querying, processing and
analysis of Big Data.
Experience in dealing with Apache Hadoop components like HDFS, MapReduce, Hive, HBase, Pig, Sqoop,
Oozier, Mahout, Python, Spark, Cassandra, MongoDB,
Good understanding/knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker,
Task Tracker, Name Node, Data Node, Secondary Name node, and MapReduce concepts.
Experienced managing No-SQL DB on large Hadoop distribution Systems such as: Cloudera, Hortonworks
HDP, Map M series etc.
Experienced developing Hadoop integration for data ingestion, data mapping and data process capabilities.
Worked with various data sources such as Flat files and RDBMS-Teradata, SQL server 2005, Netezza and
Oracle. Extensive work in ETL process consisting of data transformation, data sourcing, mapping, conversion.
Hands on experience in installing, configuring, and using Hadoop ecosystem components like Hadoop
MapReduce, HDFS, HBase, Hive, Sqoop, Pig, Zookeeper, Storm, Spark, Kafka and Flume.
Strong understanding of Data Modeling and experience with Data Cleansing, Data Profiling and Data
analysis.
Experience in ETL (Data stage) analysis, designing, developing, testing and implementing ETL processes
including performance tuning and query optimizing of databases.
Experience in extracting source data from Sequential files, XML files, Excel files, transforming and loading it
into the target data warehouse.
Strong experience with Java/J2EE technologies such as Core Java, JDBC, JSP, JSTL, HTML, JavaScript,
JSON
Strong experience in Scala programming for developing the Spark applications.
Extensive coding experience in Java and Mainframes - COBOL, CICS and JCL
Experience of working in all the phases of Software Development in various methodologies
Strong base in writing the Test plans, perform Unit Testing, User Acceptance testing, Integration Testing, System
Testing
Proficient in software documentation and technical report writing.
Worked coherently with multiple teams. Conducted peer reviews, organized and participated in knowledge
transfer (technical and domain) sessions.
Experience in working with Onsite-Offshore model.
Developed various UDFs in Map-Reduce and Python for Pig and Hive.
Decent experience and knowledge in other SQL and NoSQL Databases like MySQL, MS SQL, MongoDB,
HBase, Accumulo, Neo4j and Cassandra.
Good Data Warehouse experience in MS SQL.
Proficiency in programming with different IDE's like Eclipse, NetBeans.
Involved in database design, creating Tables, Views, Stored Procedures, Functions, Triggers and Indexes.
Good understanding of service oriented architecture (SOA) and web services like XML, XSD, XSDL, SOAP.
Good Knowledge about scalable, secure cloud architecture based on Amazon Web Services (leveraging AWS
cloud services: EC2, Cloud Formation, VPC, S3, etc.
Good Knowledge on Hadoop Cluster architecture and monitoring the cluster.
In-depth understanding of Data Structure and Algorithms.
Experience in managing and troubleshooting Hadoop related issues.
Expertise in setting up standards and processes for Hadoop based application design and implementation.
Experience in importing and exporting data using Sqoop from Relational Database Systems to HDFS and
vice-versa.
Experience in managing Hadoop clusters using Cloudera Manager.
Hands on experience in VPN, Putty, wisp, Unviewed, etc.
Expertise in setting up standards and processes for Hadoop based application design and implementation
EDUCATION:
Bachelors of Technology
Hadoop/Big Data Map Reduce, Hive, Pig, Impala, Sqoop, Flume, HDFS, Oozie, Hue,
HBase, Zookeeper, Spark
Operating Systems Windows, Ubuntu, RedHat Linux, Unix
PROFESSIONAL EXPERIENCE:
Environment: Hadoop, CDH4, Map Reduce, HDFS, Pig, Hive, Impala, oozy, Java, spark, Kafka, Flume, Storm, Knox,
Linux, Scala, Maven, Java Scripting, Oracle 11g/10g, SVN
Client: Core metrics Inc, San Mateo, CA Jan 2012 Nov 2014
Role: Hadoop Developer
Description:
Coremetrics, Inc., a software as a service provider, offers digital marketing optimization solutions. It offers Coremetrics
Continuous Optimization Platform that gives insight into the behavior of customers and prospects; Coremetrics Analytics to
deliver intuitive collaboration capabilities to share performance insights; and Coremetrics for Mobile that provides access to
the marketing metrics on handheld devices. The company also provides Coremetrics Explore that provides a complete
picture of visitor and customer behavior.
Responsibilities:
Involved in review of functional and non-functional requirements.
Installed and configured Hadoop Mapreduce, HDFS, Developed multiple MapReduce jobs in java for data
cleaning and pre-processing.
Importing and exporting data into HDFS and Hive using Sqoop.
Supported Map Reduce Programs those are running on the cluster.
Involved in loading data from UNIX file system to HDFS.
Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map
reduce way.
Setup and benchmarked Hadoop/HBase clusters for internal use
Involved in Loading process into the Hadoop distributed File System and Pig in order to preprocess the data.
Integrated Oozie with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box
(such as MapReduce, Pig, Hive, Sqoop, flume) as well as some system specific jobs like (shell scripts)
Involved in Data modeling sessions to develop models for Hive tables
Imported and exported large sets of data into HDFS and vice-versa using Sqoop.
Transferred log files from the log generating servers into HDFS.
Installed Hive, Pig, Flume, Sqoop and Oozie on the Hadoop cluster.
Created tasks, workflows and sessions using Workflow manager. Worked with scheduling team to come up
Production schedule
Worked on Hive partition and bucketing concepts and created hive External and Internal tables with Hive
partition
Assisted the project manager in problem solving with Big Data technologies for integration of Hive with
HBASE and Sqoop with HBase.
Solved performance issues in Hive and pig with understanding of Joins, Group and aggregation and how does
it transfer to Map-Reduce.
Moved the data from traditional databases like MySQL, MS SQL Server and Oracle into Hadoop
Worked on Integrating Talend and SSIS with Hadoop and performed ETL operations.
Used Flume to collect, aggregate and push log data from different log servers
Performed unit testing using JUnit testing framework and used Log4j to monitor the error log
Environment: Hadoop, MapReduce, HDFS, Hive, Sqoop, HBase, UNIX Shell Scripting