Sr. Site Reliability Engineer
Lead small-team initiatives to continuously refine our AWS deployment practices for improved reliability, repeatability and security. You’ll create plans, collaborate with other DevOps team members, and coordinate with development and business teams. These high-visibility initiatives will help to increase service levels, lower costs, and deliver features more quickly.
Write code and scripts to automate provisioning of AWS services and to configure services, using tools and languages including AWS CLI / API, Terraform, Ansible, Python, Bash, and Git.
Design effective monitoring / alerting (for conditions such as application-errors, high memory usage) and log aggregation approaches (to quickly access logs for troubleshooting, or generate reports for trend analysis) to proactively notify business stakeholders of issues and communicate metrics, working closely with these stakeholders, using tools including AWS CloudWatch, Sumologic, New Relic.