Site Reliability Engineer
Site Reliability Engineering(SRE) is an engineering discipline that combines software and systemsengineering to build and run large-scale, massively distributed, fault-tolerantsystems. The main function of the SRE team is to be responsible for theavailability, performance, monitoring, and incident response for STATS'internally critical and our customer-facing systems.
What You'll Do:
- Engage in and improve the whole life cycle of services-from inception and design, through deployment, operation and refinement
- Support services before they go live through activities such as system design consulting, developing software platforms and frameworks, capacity planning and launch reviews
- Maintain services once they are live by measuring and monitoring availability, latency and overall system health
- Scale systems through sustainable mechanisms like automation, and evolve systems by pushing for changes that improve reliability and velocity.
- Practice sustainable incident response and blameless postmortems.
- Establish a mindset and a set of engineering approaches to running better production systems with focuses on optimizing existing systems, building infrastructure and eliminating work through automation
- Establish a culture of diversity, intellectual curiosity, problem solving and openness to ensure team success
- Create an environment that provides the support and mentorship needed to learn and grow
What You'll Need:
- B.S. in Computer Science or equivalent experience
- Minimum of 3 years of experience with technical operations and software development
- Solid understanding/experience of containerization services such as Docker
- Working knowledge of open source tools such as Prometheus, Grafana, Logstash, Elasticsearch
- Solid understanding/experience of web services, databases and relating infrastructure/architectures
- Ability to manage using a preferred scripting language
- Solid understanding of IT infrastructure
- Excellent Troubleshooting Skills
- DevOps experience a plus
- System administration experience a plus
- AWS cloud experience a plus
- Supporting experience for enterprise-level SaaS environment a plus
- Security experience a plus
- Kubernetes experience a plus
STATS provides equal employment opportunities (EEO) to all employees and applicants for employment without regard to race, color, religion, sex, sexual orientation, national origin, age, disability or genetics.