Site Reliability Engineer
About The Role
The Site Reliability Engineer is responsible for the support and maintenance of our SAAS platforms. This position will help Yello scale our cloud infrastructure, all while increasing efficiency, speed of delivery, and reliability at every level of service. For this role, solutions are code-based, priorities are automation-centric, and focused on building Yello’s data-driven, secure infrastructure. This role will enhance the availability of all of our services and increasing our security initiatives. In all, this role will be supporting our team’s development, pre-production and production environment with expertise in system scalability.
How You'll Make An Impact
- Proactively improve key application metrics, such as up-time, application performance, time to issue resolution, time spent resolving incidents and other key operational SLAs
- Investigate and resolve complex and multifaceted issues, spanning the entire technology stack, which require effective collaboration across team and technology boundaries
- Operate as primary point of contact for production incidents, perform root cause analysis, identify and resolve underlying problem patterns, while driving to develop automated and self-healing solutions.
- Drive continuous improvement to supported applications, in areas such as monitoring, operational task automation, continuous integration, deployments and performance tuning
- Collaborate and coordinate with software engineering and product management teams to improve operability and supportability of critical customer facing SaaS applications (DevOps)
- Maintain up-to-date documentation on deployments, processes and standard operating procedures/run-books
What We're Looking For
- BA or BS in a technical discipline or equivalent work experience
- 3+ years experience as an engineer supporting a high transaction volume cloud infrastructure
- Experience with Linux system administration
- Experience managing and deploying Elasticsearch clusters
- Intermediate to advanced level database experience with indexes, queries, joins, etc.
- Experience scripting in any open source language, Python or Ruby preferred
- Strong understanding of AWS (VPC, EC2, RDS, S3, ElasticCache) and cloud resources. Experience monitoring, alerting, and configuring these resources.
- Securing and hardening cloud resources and Linux systems
- This role frequently communicates/interacts with individuals, must have strong written and oral communication skills
- Must be able to sit or stand for continuous periods of time
- Yello reserves the right to assign or reassign the responsibilities and requirements to this job at any time
Additional Information
We are the trailblazers in our space and we continually strive to learn and grow, but there is always time to celebrate a colleague's birthday or a recent success. We dress casually, have one of the best views in the city and the whole team sports Apple laptops. Our CEO Jason Weingarten and President Dan Bartfield always have their office doors open. And with opportunities for professional advancement, medical, dental and vision insurance, and a 401K match – Yello has you covered.
- Yello is an Equal Opportunity Employer. All applicants will receive consideration for employment without regard to race, color, religion, sex, pregnancy, sexual orientation, gender identity, national origin, age, protected veteran status, or disability status.
- Candidates local to Chicago are preferred.
- You must be authorized to work in the United States.
- Must be able to sit or stand for continuous periods of time.
- This role frequently communicates/interacts with individuals, must have strong written and oral communication skills.
- Yello reserves the right to assign or reassign the responsibilities and requirements to this job at any time.