Senior Site Reliability Engineer

Sorry, this job was removed at 8:59 a.m. (CST) on Monday, June 21, 2021
Find out who's hiring in Chicago.
See all Developer + Engineer jobs in Chicago
Apply
By clicking Apply Now you agree to share your profile information with the hiring company.

Location:Chicago HQ (preferred) / US

Phenix is seeking an experienced Site Reliability Engineer who will be responsible for services related to availability, latency, efficiency, change management, monitoring, emergency response, and capacity planning as they relate to our large scale distributed real-time network. As a member of the Phenix team, you will be building the future of video communications.

We Are Looking For Someone Who:

  • Is experienced in areas such as automating infrastructure monitoring, release engineering, and continuous delivery
  • Has developed automated processes in support of the availability, performance, security, and maintainability of 24/7 systems
  • Understands the inherent tradeoff between frequently delivering features to customers and operating a reliable system
  • Has a passion for system-wide continuous improvement
  • Operates at a high level of effectiveness in a fast-paced startup environment

Responsibilities:

  • Proactively manage the risk associated with feature delivery
  • Develop service level objectives and determine indicators for platform reliability
  • Reduce the toil of standard operating procedures through automation
  • Participate in system design discussions, platform management, and capacity planning
  • Designs and conducts load tests and analyzes the results to better understand the limits of our system and how it performs under load
  • Improve our ability to monitor indicators for platform reliability and performance
  • Manage software releases from planning stage, through certification in staging, to production release across global PoPs, coordinating with Engineering and Product teams
  • Lead operational incident response team
  • Troubleshoot incidents through analysis of system logs
  • Contribute to Root Cause Analysis (RCA) investigations
  • Contribute to operations playbooks and documentation
  • Communicate clearly and openly with internal stakeholders regarding progress, roadblocks, and timelines

Requirements:

  • MS/BS. Computer Science or a related technical degree preferred
  • 4+ years of experience as Site Reliability and/or DevOps Engineer
  • Experience with high level languages, such as Python, C/C++, and/or JavaScript
  • Experience with the bash scripting language
  • Experience with SQL database queries
  • Experience managing container-based apps using Docker
  • Experience with git
  • Experience with large scale cloud-based operations
  • Experience with build management technologies
  • Experience using CI/CD server technologies
  • Experience with test-driven development
  • Strong problem solving ability
  • Ability to troubleshoot issues in complex distributed software environments
  • Relentless focus on results and details

Bonus Points:

  • Familiarity with video streaming: WebRTC, RTP, RTMP, HLS, DASH
  • Experience with mobile audio/video development
  • Experience integrating with Slack
  • Familiarity with HTML5
  • Familiarity with Node.js

Perks:

  • Competitive benefits package
  • Collaborating with and learning from a world class team of business professionals and technologists
  • Working with a global and diverse customer base
Read Full Job Description
Apply Now
By clicking Apply Now you agree to share your profile information with the hiring company.

Location

Our newly renovated HQ is located in the Loop downtown Chicago. We are surrounded by restaurants and an easy walk to the metro, subway and bus stops.

Similar Jobs

Apply Now
By clicking Apply Now you agree to share your profile information with the hiring company.
Learn more about Phenix Real Time SolutionsFind similar jobs