Data Engineer
Employment Type : Full-Time
JOB SUMMARY
The Data Engineer is responsible for processing structured and unstructured data, validating data quality, developing and supporting data products. The Data Analytics Engineer also plays a role in the expansion and maintenance of HFI’s data analytics solutions, gathering requirements and translating those requirements into technical specifications and design documents. Responsibilities include but not limited to:
- is to be responsible for supporting the expansion and maintenance of an HFI data analytics solution. You will also assist in gathering requirements and will be expected to translate those requirements into technical specifications and design documents.
ESSENTIAL FUNCTIONS & RESPONSIBILITIES
Create and maintain optimal data pipeline architecture.- Assemble large, complex data sets that meet functional / non-functional business requirements.
- Identify, design, and implement internal process improvements: automating manual processes, optimizing data delivery, re-designing infrastructure for greater scalability, etc.
- Build the infrastructure required for optimal extraction, transformation, and loading of data from a wide variety of data sources using SQL, AWS ‘big data’, Google Cloud, Big Query technologies.
- Build analytics tools that utilize the data pipeline to provide actionable insights to clients, operational efficiency and other key business performance metrics by working closely with software developers and analytics experts.
- Work with stakeholders including the Managers, Product, Data and Design teams to assist with data-related technical issues and support their data infrastructure needs.
- Create data tools for analytics and data scientist team members that assist them in building and optimizing our product into an innovative industry leader.
- Work with data and analytics experts to strive for greater functionality in our data systems.
- Support and develop web applications alongside software developers and other data science team members to build scalable analytics reporting systems and portals.
- Support other data engineers, data analysts to develop machine learning pipelines for the data scientists to develop complex machine learning systems and models.
- Adapt and migrate to new technologies as needed for various projects to maintain and help HFI remain in an industry leader position.
- Work with DevOps to roll production grade systems and deploy highly scalable applications that are client facing ensuring industry level security protocols in place.
- Build dynamic analytics dashboards and reports using Tableau, SQL, Python and other analytic tools or build web application to consume such reports for the operations, finance and other teams including the executives.
- Ensures that all deliverables are thoroughly documented.
- Others duties may be assigned.
MINIMUM requirements
Master's Degree in Computer Science, Statistics, Applied Math or an equivalent combination of education and experience, Master’s, preferred.- 2+ years of experience in a Data Engineer role.
- In-depth knowledge of Tableau reporting tool, required.
- Experience developing scalable analytics reports and dashboards and building custom reporting systems using web technologies, required.
- Experience with relational and non-relational databases (like MySQL, Postgres, MongoDB and Cassandra), required.
- Experience building and optimizing ‘big data’ pipelines, architectures and data sets and software applications (like Hive, MapReduce, Spark, etc.)
- Experience with data pipeline and workflow management tools (like Airflow, Azkaban, Luigi, etc.)
- Experience with AWS cloud services like EC2, EMR, RDS, Redshift, Lambda etc. and Google cloud services like GCP, Big Query, Firestore etc.
- Experience with stream-processing systems (like Storm, Spark-Streaming, etc.)
- Experience with object-oriented/object function scripting languages (like Python, Java, C++, Scala, Shell scripting etc.)
- Experience with big data tools (like Hadoop, Spark, Kafka, etc.)
- Experience with statistical and machine learning libraries (like Numpy, Pandas, Scikit-learn, Spark, Keras, etc.).
- Experience with developing automations tools (like Python, Shell scripting, Cron jobs, PHP, etc.)
- Experience with version control and code management tools (like Git, Github, Gitlab, JIRA, Bitbucket, etc.)
- Experience with software and system deployment (like PEN testing, unit testing, CD/CI, SDLC, etc.)
- Understanding of development and software applications using Django, Flask, API’s, JavaScript (React.js, express.js, node.js etc.), jQuery, AJAX, HTML5, CSS, Bootstrap and related web technologies is a plus.
- A successful history of manipulating, processing and extracting value from large disconnected datasets and developing scalable systems to serve such data.
- Strong analytic skills related to working with structures and unstructured datasets.
- Strong project management and organizational skills.
- Excellent written, communication and presentations skills.
WORKING CONDITIONS / WORK ENVIRONMENT
Moderate noise level associated with open office work environment.
Physical demands:
While performing the duties of this job, the employee is regularly required to talk or hear; stand, walk, sit, use hands to finger, handle or feel objects, and reach with hands and arms. The employee occasionally will lift and/or move up to 25 pounds.