Backblaze Logo

Backblaze

Site Reliability Engineer II

Posted 20 Days Ago
In-Office or Remote
Hiring Remotely in San Mateo, CA
Junior
In-Office or Remote
Hiring Remotely in San Mateo, CA
Junior
The Site Reliability Engineer II role involves ensuring the stability and reliability of services, automating operational tasks, and collaborating with teams for system design while promoting reliability practices.
The summary above was generated by AI

About Backblaze
Backblaze is the object storage leader in the open cloud movement, fueling customer success with cloud storage built purposefully to unlock budgets, unburden administrators, and unleash innovators. Together with our partners, we’re helping customers break free from the restrictive, overpriced legacy solutions that hold them back, and blaze forward with the full power of the open cloud in their hands.

Founded in 2007, we scaled the business with less than $3 million in outside funding until 2021, when we did a traditional IPO on the Nasdaq stock exchange. Today, Backblaze generates over $100m in revenue and is the leading specialized storage cloud - managing over three billion gigabytes of data storage for 500K+ customers in 175+ countries, including businesses, developers, IT professionals, and individuals.

About the Role

We are seeking a Site Reliability Engineer II (SRE II) to help ensure the stability, scalability, and reliability of our services and infrastructure. This role focuses on building automation, maintaining observability, and supporting incident response to keep customer-facing systems performing at their best. The SRE will collaborate with engineering, product, and operations teams to embed reliability practices into day-to-day development and operations while contributing to tools and processes that improve efficiency and reduce manual effort.

Key ResponsibilitiesService Reliability & Operations
  • Support the availability and durability of critical services across production environments.
  • Monitor service health using SLIs, SLOs, and error budgets, and escalate issues when thresholds are at risk.
  • Participate in on-call rotations, incident response, and post-incident reviews to drive service improvements.
  • Follow established ITIL/OSS processes (incident, change, problem, and capacity management).
Automation & Tooling
  • Develop automation for common operational tasks, reducing manual intervention and toil.
  • Contribute to monitoring, logging, and alerting frameworks (e.g., Prometheus, Grafana, Catchpoint,ELK).
  • Work with CI/CD pipelines, configuration management, and infrastructure as code tools (Terraform, Ansible, Jenkins).
  • Write scripts (Bash, Python, Go, etc.) to improve system reliability and efficiency.
Collaboration
  • Partner with engineering, product, and operations teams to support resilient system design and operations.
  • Assist in capacity planning and disaster recovery exercises.
  • Work with vendors and service providers to troubleshoot service issues and track SLA performance.
  • Document systems, share learnings, and help grow a reliability-minded engineering culture.
Continuous Improvement
  • Contribute to playbooks, runbooks, and operational documentation.
  • Identify recurring issues and propose long-term improvements.
  • Promote reliability-focused practices within development and operations teams.
QualificationsEducation & Experience
  • Bachelor’s degree in Computer Science, Engineering, or related field (or equivalent experience).
  • 2–4 years of experience in site reliability, systems engineering, or operations.
  • Exposure to large-scale, production-grade systems.
Technical Skills
  • Solid Linux systems administration and troubleshooting skills.
  • Familiarity with service reliability concepts - monitoring, alerting, incident response, and root cause analysis.
  • Proficiency in at least one scripting language (Python, Bash, or Go).
  • Understanding of containers (Kubernetes, Docker) and microservices concepts.
  • Knowledge of incident response and operational best practices.
Preferred Attributes
  • Experience in a SaaS, service provider, or distributed systems environment.
  • Familiarity with ITIL/OSS practices and SLO/SLA’s
  • Strong problem-solving skills and willingness to learn new technologies.
  • Experience with cloud platforms (AWS, GCP, or Azure).
  • Ability to work independently, take ownership, and drive projects from problem discovery through resolution. 

At this point, we hope you're feeling excited about the job description you're reading. Even if you don't meet every requirement, we still encourage you to apply. Learning, developing, and growing are key parts of our culture. We're eager to meet people who believe in our mission and can contribute to our team in various ways. We want people to feel comfortable expressing their true selves and to come, stay, and do their best work here.
At Backblaze, we value being fair and good to our customers, partners, and employees. That’s why diversity, equity, and inclusion are at the core of our values. We are committed to fostering a workforce where all employees feel a sense of belonging regardless of race, ethnicity, nationality, gender, sexual orientation, age, religion, socio-economic status, ability, veteran status, and education. We believe that our dedication to cultivating a diverse workspace not only allows us to better serve our customers in over 175 countries, but further reinforces our commitment to doing the right thing. We are proud to be an Equal Opportunity Employer.

To understand more about the data we collect and process as part of your application, please view our Backblaze Employee Privacy Notice.

Similar Jobs

14 Days Ago
In-Office or Remote
United States
95K-171K Annually
Junior
95K-171K Annually
Junior
Cloud • Security • Software • Cybersecurity
The Site Reliability Engineer II - Database ensures the integrity, security, and performance of MySQL databases while collaborating with development and operations teams to address database issues and improve reliability.
Top Skills: MySQLSQL
14 Days Ago
Remote
United States
Senior level
Senior level
Artificial Intelligence • Information Technology • Software • Database
As a Site Reliability Engineer, you will design, implement, and maintain scalable infrastructure, ensure system reliability, automate processes, and collaborate with engineering teams.
Top Skills: DockerElk StackGoGrafanaJavaKubernetesNode.jsPrometheusPulumiPythonRubyTerraform
21 Days Ago
Remote
United States
95K-136K Annually
Senior level
95K-136K Annually
Senior level
Fintech • Real Estate
The Senior Site Reliability Engineer executes reliability strategies, designs and maintains infrastructure, improves monitoring and deployment processes, collaborates with teams for system reliability and performance optimization.
Top Skills: Automated Configuration ManagementAutomated ProvisioningAWSAzureAzure StorageCloud-Based SolutionsContainerization SolutionsGCPGitJIRALinuxMariadbMySQLRdsSQL ServerUnixWindows

What you need to know about the Chicago Tech Scene

With vibrant neighborhoods, great food and more affordable housing than either coast, Chicago might be the most liveable major tech hub. It is the birthplace of modern commodities and futures trading, a national hub for logistics and commerce, and home to the American Medical Association and the American Bar Association. This diverse blend of industry influences has helped Chicago emerge as a major player in verticals like fintech, biotechnology, legal tech, e-commerce and logistics technology. It’s also a major hiring center for tech companies on both coasts.

Key Facts About Chicago Tech

  • Number of Tech Workers: 245,800; 5.2% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: McDonald’s, John Deere, Boeing, Morningstar
  • Key Industries: Artificial intelligence, biotechnology, fintech, software, logistics technology
  • Funding Landscape: $2.5 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Pritzker Group Venture Capital, Arch Venture Partners, MATH Venture Partners, Jump Capital, Hyde Park Venture Partners
  • Research Centers and Universities: Northwestern University, University of Chicago, University of Illinois Urbana-Champaign, Illinois Institute of Technology, Argonne National Laboratory, Fermi National Accelerator Laboratory

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account