Vast experience in Administration
and development on Hadoop technologies like HDFS, Hive, Pig, Flume,
MongoDB, Sqoop, Zookeeper, Spark, MapReduce2, YARN, H-Base, Tez,
Kafka, and Storm.
Able to benchmark systems, analyze system
bottlenecks and propose solutions to eliminate them.
Experience in Planning, designing and strategizing the Big
Data roadmap to meet the Organization's objective and goal towards
Data Analytics revolution
Experience in end-to-end
responsibility of the Hadoop life cycle in the organization
Build distributed, reliable and scalable data pipelines to
ingest and process data in batch and real-time using Java or Python or
Scala.
Experience in Implementing, managing and administering
the overall Hadoop infrastructure.
Able to clearly articulate
pros and cons of various Big Data technologies and platforms.
Able to perform detailed analysis of business problems and
technical environments and use this in designing the Big Data
solution.
Good understanding and experience of Lambda
Architecture, along with its advantages and drawbacks (is a plus).
Experience in preserving security, data privacy and data
encryption with related components/technologies such as Kerberos,
Sentry, and KTS/KMS.
Experience with Spark Programming.
Experience with building stream-processing systems, using
solutions such as Storm or Spark-Streaming.
Write
Producer/Consumer applications for Kafka Framework
Experience
in fine tuning applications and systems for high performance and
higher volume throughput.
Experience in Hadoop Cluster
planning, screening and Maintenance.
Experience in ETL/ELT
process such as translating, loading and exhibiting disparate data
sets in various formats and sources like JSON, text files, Kafka
queues, and log data.