Senior Site Reliability Engineer
Senior Site Reliability Engineer
Position Overview
We are looking for a Senior SRE that is interested in building something from the ground up with our new and exciting cloud platform. In this role you will contribute with a newly formed SRE team. You will participate in architectural and design discussions, along with efforts to avoid and reduce toil, and of course, provide a scalable, reliable platform for the success of our customers and organization.
Key Responsibilities
- Participate in Agile Sprints and associated ceremonies
- Drive innovation and platform evolution
- Scale cloud infrastructure to support our growing ecosystem based on Docker and Mesos
- Provide reliable, predictable deployment and maintenance of distributed systems
- Adhere to security best practices
- Write and design automation, monitoring, diagnostics and debug tooling
- Participate in production support and on-call rotations
- Conduct incident management and contribute to associated retrospective/post mortem as needed
Requirements
- 3+ years in a SRE role
- Working knowledge of the SCM tools such as Ansible, Puppet, Chef, or Salt - Salt and/or Ansible preferred
- Experience with IaC (Infrastructure as Code) concepts and tooling - Terraform preferred
- Solid understanding of working with git and gitflow
- Knowledge of Docker engine and ecosystem
- Can troubleshoot and debug container issues at any level, including container networking
- Understanding of Docker networking, including different network plugins and frameworks such as Calico
- Experience with Mesos / Marathon ecosystem - Kubernetes a plus
- Strong knowledge and understanding of microservices based architectures
- Good understanding of networking including L2 and L3 concepts
- Strong background in administrating and maintaining Linux based systems
- Strong scripting skills including ability to write scripts from scratch using Python and/or Bash
- Can identify and mitigate reliability risks
- Excellent communication and troubleshooting skills
- Experience with Continuous Integration and Continuous Delivery models including Blue/Green and Canary release models is a plus
- Experience working with HashiCorp Vault, Consul, and Terraform, provisioning experience with Mesos or Kubernetes clusters as well as knowledge of network architecture, VMWare, KVM & OpenStack are all desired skills
About iManage
iManage combines artificial intelligence with content and email management to free, secure, and understand information. Over 3000 companies and 1 million users worldwide rely on our market-leading software to share and protect their most valuable data. Our work is not always easy but it is ambitious and rewarding.
So we’re looking for people who love a challenge. People who are happiest when they’re solving problems and collaborating with the industry’s best and brightest. In exchange, we’ll make sure the work you do here is worth doing. That’s the iManage way. It’s how we do things that might appear impossible. How we develop our employees’ strengths and unlock their potential. It’s how we find meaning in everything we do.
Whoever you are, whatever you do, however you work. Make it mean something at iManage.