Senior Site Reliability Engineer
We are seeking a bright, energetic, and highly motivated Sr. Site Reliability Engineer to join our technology team. If you’re a Systems Engineer who loves automation, or a Software Developer who loves infrastructure, this job is for you! The Site Reliability Engineer will be an integral part of our IT Operations team, and this individual should have a passion for technology and open source software.
Responsibilities
- Help build tools and systems that assist GoHealth's Engineering, QA, and Operations team, to deliver high-quality software
- Work with development teams and leadership to help evolve our continuous delivery process
- Monitor and respond to alerts from technology infrastructure in order to ensure proper SLA's are met
- Work throughout the technology stack to design, build, and monitor solutions that allow for continued scalability
- Document work associated with troubleshooting, while utilizing problem solving skills from the early stages of design all the way through identifying production issues.
- Remain flexible, and exude a strong sense of ownership of uptime and system performance
- Experienced and comfortable in programming and developing software
Qualifications
- BS in Computer Science (or equivalent experience) and minimum of 7 years of overall experience including experience with open source technologies, automated configuration, DevOps, or cloud automation development
- Expert level knowledge of managing a Linux environment at scale (RHEL, CentOS)
- Exceptional skills with one or more scripting languages (bash, Python, Perl, Ruby or similar)
- Proven experience using configuration management tools such as Puppet or Chef
- Experience doing software development with a wide variety of open source technologies to scale, automate, and monitor
- Experience supporting and troubleshooting for the following technology components (or similar): Docker/Orgin/Kubernetes, Jenkins, Nagios/ElasticSearch/Splunk, Apache/Tomcat/Nginx, MySQL/Couchbase, LDAP and Kerberos
- Hands on experience with Nagios and creation of custom scripts to monitor all aspects of application infastructure
- Bonus points for individuals with working knowledge of building and maintaining IP-based networks (Cisco) and F5 LTM, DNS and/or iRule development experience
- Experience with VMWare/vCenter, or Openstack is preferred, but not required
- Experience managing a 24/7 SaaS infrastructure
- Detail oriented individual, with strong communication skills used to collaborate and keep others informed
- Ability to produce results while working independently, and in a team environment