Employment Type : Full-Time
Senior Cloud DevOps Engineer, Monitoring
L ocation: San Diego, CA
Teradata is growing our Cloud Operations team and we’re looking for individuals that exemplify our principle of Customer Obsession through operational excellence, leadership, and a passion to continually be the voice of the customer. This is a unique opportunity to join our team in a period of fast growth and expansion. If you are interested in working in a dynamic and fast paced environment where you can directly influence the future of cloud-based analytics solutions and services, then this is the place for you. You will actively develop and implement state of the art technical solutions, including capabilities to support elastic scalability, on-demand self-service, disaster recovery, and usage-based consumption, to enable customers to solve their most complex data analytics challenges.
Teradata Cloud seeks a Sr Staff DevOps Engineer to lead in building and operating highly scalable, fault tolerant, and secure systems in a distributed system highly distributed and dynamic Hybrid Cloud environment.
Responsibilities
Provide architectural leadership for d eveloping and building highly available systems and software in large distributed and Hybrid Cloud environments
Promote a culture of continuous improvement for technology, and processes
Lead in-depth analysis for improving the deployment of cloud-native applications, monitoring, securing, and supporting a large-scale public cloud environment
Analyze and improve existing provisioning processes for automation opportunities and improvements
Drive the improvement of proactive alerting using modern monitoring tools such as Datadog, NewRelic, Nagios
Improve system monitoring and observability through log analysis, dashboard creation, and automated alerts based on established service level objectives (SLO) and service level agreements (SLA)
Mentor team members in use of industry standard best practices
Working with the development teams to clarify runtime infrastructure requirements
Collaborating with other teams to gather requirements, and decompose large tasks into small, testable commits
Understanding performance and security considerations for the code we deploy
Collaborate with distributed, global teams to achieve common goals
Build automation frameworks and systems to improve time to delivery through the use of modern CI/CD systems
Participate in on-call for escalated support of production customer and systems
Perform and improve SRE / operational functions, such as monitoring and maintenance of productions systems
Qualifications
5+ years of relevant job experience
Expert level hands-on system administrator experience on public cloud platforms with at least one of the big three Google Cloud, Azure, and AWS. (Google Cloud and Azure highly preferred)
Expert coding skills
Proven experience with Configuration Management tools such as Ansible, Puppet, Chef
Strong experience with Test and build systems such as Jenkins, Maven, Ant
Experience with Monitoring and reporting tools such as DataDog, New Relic, Nagios, and Graphite
Strong experience with Linux operating systems
Experience working with database systems, network topologies, and hardware
Experience working with virtualization software such as VMWare and Openstack preferred
Experience working in hybrid environment preferred
Bachelor’s Degree in computer science or related field preferred