You take pride in your work, helping others learn and you look for the same from everyone on your team. Your past experience shows an eye for detail and a desire to talk about that new thing you just learned or that new skill you picked up.
You feel passionate about automating repetitive tasks. You don’t just think “there has to be a better way”--you find the better way and can’t help but show it off to your teammates.
In this role, you will work with a team of Engineers who are responsible for managing the overall health of our production environment running on Google Cloud Platform. You will assist with maintaining our Kubernetes clusters and GCE instances running all of our sites and services along with the routing and processes that glue them all together.
- Performing day-to-day operational tasks on public facing infrastructure (keep existing things running and get new things going).
- Ownership of configuration management and deployment tools.
- Assisting in the architectural design of new services and making them operate at scale.
- Monitoring and analysis of systems, services and service clusters, optimization of performance and resource utilization.
- Tracing and troubleshooting misbehaving servers or services and assist with diagnosis or resolution (servers are cattle not pets).
- Assisting in or lead incident response, diagnosis and follow-up on system outages or alerts.
- Maintaining up-to-date supporting systems and platforms. Provide recommendations on needed upgrades or migrations.
- Balancing security and risk assessment with business needs and processes.
- Building awesome tools and processes that help us achieve more together.
- Participate in a flexible on-call rotation schedule.
- 3+ years experience in an SRE/Operations/DevOps role as part of a team.
- Experience managing high traffic customer facing websites and servers.
- Familiarity with Open Source configuration management and orchestration tools .
- Comfortable with shell and scripting languages in relation to the role (Python and Bash required, any others are welcome).
- Experience with cloud-based solutions (AWS, Google Cloud, Azure, etc) including networking, IAM, provisioning, and monitoring/tracing.
- Experience with software and service deployments built on Docker and Kubernetes.
- Desire to automate tasks and assume ownership of production infrastructure.