Site Reliability Engineer
About The Role
As Yello’s Site Reliability Engineer, you are responsible for ensuring Yello continues to meet its SLA obligations and keep all services highly available, reliable, secure, and scalable. As part of the Operations team, under Infrastructure Architecture, you’ll work closely with the engineering and security teams to accomplish infrastructure and organization goals.
The successful candidate will possess strong knowledge of Linux systems, AWS cloud services, and web application frameworks; have a strong automation mindset; and drive continuous improvement and innovation.
How You'll Make An Impact
- Maintain and run a world class fully automated cloud infrastructure on AWS and other IaaS providers
- Implement proactive monitoring, alerting, trend analysis and build self-healing systems
- Help build a cloud operations strategy to deliver high availability and performance
- Work closely with engineering managers and our production support team to move Yello cloud operations to the next level
- Troubleshoot failures and performance issues across services and put in preventive measures
- Own core services like MySQL, PostgreSQL, and ElasticSearch and build company-wide standards for using these services
- Participate in on-call rotations; primary coordinator of incident communications and efforts across teams
- Oversee change management process. plan infrastructure changes and evaluate other teams requests
What We're Looking For
- 5+ years of professional Linux systems administration or DevOps experience
- Practical experience with MySQL, PostgreSQL, and ElasticSearch is required
- Proficient in Ruby, Python, or Go
- Experience with configuration management tools like Ansible, Puppet, or Chef is required
- Experience with managing distributed SaaS systems in public and private cloud environments. AWS experience preferred.
- Experience building and running a high-transactional, 24x7 production environment
- Ability to troubleshoot and resolve challenging technical issues related to cloud infrastructure and/or Linux.
- Expertise specifying, designing, and implementing; traffic routing, health, performance monitoring
- Experience with applications that span multiple geographical locations
Additional Information
We are the trailblazers in our space and we continually strive to learn and grow, but there is always time to celebrate a colleague's birthday or a recent success. We dress casually, have one of the best views in the city and the whole team sports Apple laptops. Our CEO Jason Weingarten and President Dan Bartfield always have their office doors open. And with opportunities for professional advancement, medical, dental and vision insurance, and a 401K match – Yello has you covered.
- Yello is an Equal Opportunity Employer. All applicants will receive consideration for employment without regard to race, color, religion, sex, pregnancy, sexual orientation, gender identity, national origin, age, protected veteran status, or disability status.
- Candidates local to Chicago are preferred.
- You must be authorized to work in the United States.
- Must be able to sit or stand for continuous periods of time
- This role frequently communicates/interacts with individuals, must have strong written and oral communication skills
- Yello reserves the right to assign or reassign the responsibilities and requirements to this job at any time