Hadoop Developer

Location: Washington, DC
Date Posted: 08-25-2017
Hadoop Developer
Washington, DC
Direct Hire
Phone Interview then In Person
Active Secret Clearance Required


We are looking for a hands-on, seasoned Hadoop Developer to work on a mission critical information sharing platform at Department of Homeland Security (DHS) that will enable the sharing of secure, accurate, and privacy-controlled data with approved stakeholders, while protecting sensitive data and preserving privacy within DHS components and other national security agencies. The system is deployed across the classified and unclassified domains within Department of Homeland Security, and housed as DHS Data centers. The individual will work closely with customers and infrastructure teams to design and implement Big Data analytic solutions on a Hadoop–based platform.

Responsibilities:
  • Optimize HDFS infrastructure, using MapReduce and Spark-based jobs
  • Test and refine data throughput within the cluster using Spark and MR jobs
  • Create custom analytic jobs to help extract knowledge and meaning from vast stores of data.
  • Refine a data processing pipeline focused on unstructured and semi-structured data refinement. Support quick turn and rapid implementations and larger scale and longer duration analytic capability implementations.
  • Ingest data from various structured and unstructured data sources into Hadoop and other distributed Big Data systems.
  • Support the sustainment and delivery of an automated ETL pipeline using a suite of COTS, GOTS, and other tools.
  • Validate data that is extracted from structured and unstructured data inputs, databases, and other repositories using scripts and other automated capabilities, logs, and queries. Enrich and transform extracted data, as required.
  • Monitor and report the data flow through the ETL process.
  • Perform data extractions, data purges, or data fixes in accordance with current internal procedures and policies.
  • Track development via user stories and decomposed technical tasks in a provided issue tracking software, including, JIRA.
  • Test and validate integration points with downstream columnar databases.
 
Required Qualifications:
  • 3+ years of experience with distributed scalable Big Data systems and/or NoSQL databases, including Hadoop, Accumulo, HBase
  • CDH-certified Hadoop Developer
  • Experience in MapReduce  and Spark programming within the Hadoop Distributed File System (HDFS) and with processing large data stores (minimum 20 data nodes)
  • Experience with the design and development of multiple object–oriented systems ( 2+ years of experience with software development throughout the SDLC)
  • Experience with Open–Source Software or COTS products
  • Experience with Linux, including CentOS and Red Hat
  • Experience with working on Scrum or other Agile methodology
  • Ability to show flexibility, initiative, and innovation when dealing with ambiguous and fast–paced situations
  • Ability to obtain a security clearance
  • BS degree in CS or equivalent

    Additional Qualifications:
    Experience with Hadoop
    Experience with R or Python
    Experience with using repository management solutions
    Experience with deploying applications in a Cloud environment
    Experience with designing and developing automated analytic software, techniques.
or
this job portal is powered by CATS