Lead technical services engineer guiding and training engineers, designing IT architecture, troubleshooting network security and third-party control integrations, coordinating projects, providing customer training and field support, and managing personnel and resources.
The Site Reliability Engineer will be responsible for ensuring the availability, reliability, and performance of our customer-facing software applications. This role combines planning, engineering, monitoring, incident response, and administration to create highly scalable and fault-tolerant systems.
Responsibilities:
- Ensure the high availability and reliability of the production environment by monitoring system health and performance
- Provide primary operational support for large-scale distributed software applications
- Facilitate incident resolution via triage, communication, engagement, escalation, and documentation
- Partner with platform administration (both internal and external) to define and achieve stability and scalability objectives
- Collaborate with technical and quality teams to improve services by identifying areas of risk and helping to define and proactively implement solutions
- Drive continual improvement in system performance by setting service level objectives in collaboration with a performance center of practice and/or product development teams
- Participate in system design, capacity planning, and platform management
- Analyze and publish metrics from operating systems and applications to assist in performance tuning and fault finding
- Pursue opportunities for automation and process improvements
Qualifications:
- Bachelor’s degree (or demonstrable equivalent work experience) in information technology
- Experience providing first-level incident response and troubleshooting with technical teams to resolve end-user issues
- Proficiency with enterprise system monitoring software (examples: Datadog, NewRelic, Nagios, Solarwinds, Azure Monitor, Splunk)
- Experience with performance tuning and fault finding in large-scale distributed systems.
- Experience with cloud-based infrastructure, databases, and applications
- Experience providing first-level incident response and troubleshooting with technical teams to resolve end-user issues
- Experience with designing, implementing, and managing performance testing practices, including specific tools and frameworks
- Knowledge of disaster recovery planning and execution.
- Ability to effectively work in a highly matrixed organization
#LI-JB1
#LI-REMOTE
About UsThis amount is what we reasonably believe we will pay for the position; however, offer amounts may vary based on factors such as geographic location, relevant education, experience, qualifications, skills, shift, or any collective bargaining agreements.In addition, Wesco offers a benefits program for eligible employees, which may include paid time off, medical, dental, and vision coverage, and retirement savings plans. Additional details about benefits are available here.
This posting is for a current, active vacancy intended for immediate hire.
Similar Jobs
Information Technology • Software
Seek an SRE/Network Engineer with deep MAAS and bare-metal automation expertise to manage hundreds of nodes across distributed sites. Responsibilities include Linux administration, hardware-level diagnostics (BIOS/IPMI/RAID), network design (VLANs/L2-L3/VPN/UniFi), infrastructure automation (Ansible, Bash/Python, Git), observability (Prometheus/Grafana, ELK/Graylog/Loki), PXE/MAAS-based OS provisioning, API integrations, virtualization (OpenStack/Kolla-Ansible, Proxmox, VMware), and container workload support.
Top Skills:
AnsibleBashBiosCloud-InitCloudflare ApiDebianElkGitGitopsGrafanaGraylogIpmiIronicKolla-AnsibleL2 RoutingL3 RoutingLinuxLokiMaasOpenstackPreseedPrometheusProxmox VePxePythonRaidUbuntuUnifiVlanVmware EsxiVpn
Real Estate • Financial Services • PropTech
Support and optimize products migrated to AWS, implement cloud best practices, maintain operational coverage, enhance automation, observability, CI/CD/GitOps, and security. Collaborate with development and platform teams to scale, troubleshoot, and ensure reliable SaaS operations.
Top Skills:
AmisArgocdAWSAws Elastic BeanstalkAws Transfer FamilyAzure DevopsBashCloudwatchCurlDockerEc2EksFluxcdGitGitopsHTTPIstioKubernetesLinkerdLoad BalancerPowershellPythonRdsSQLTerraformWget
Fitness • Healthtech • Information Technology • Payments • Software
The Site Reliability Engineer will enhance system reliability, manage cloud infrastructure, automate processes, support CI/CD pipelines, and troubleshoot production issues.
Top Skills:
AnsibleAWSBashChefDockerGitGitlabJenkinsKubernetesMySQLPostgresPythonSQL ServerTerraformVMware
What you need to know about the Chicago Tech Scene
With vibrant neighborhoods, great food and more affordable housing than either coast, Chicago might be the most liveable major tech hub. It is the birthplace of modern commodities and futures trading, a national hub for logistics and commerce, and home to the American Medical Association and the American Bar Association. This diverse blend of industry influences has helped Chicago emerge as a major player in verticals like fintech, biotechnology, legal tech, e-commerce and logistics technology. It’s also a major hiring center for tech companies on both coasts.
Key Facts About Chicago Tech
- Number of Tech Workers: 245,800; 5.2% of overall workforce (2024 CompTIA survey)
- Major Tech Employers: McDonald’s, John Deere, Boeing, Morningstar
- Key Industries: Artificial intelligence, biotechnology, fintech, software, logistics technology
- Funding Landscape: $2.5 billion in venture capital funding in 2024 (Pitchbook)
- Notable Investors: Pritzker Group Venture Capital, Arch Venture Partners, MATH Venture Partners, Jump Capital, Hyde Park Venture Partners
- Research Centers and Universities: Northwestern University, University of Chicago, University of Illinois Urbana-Champaign, Illinois Institute of Technology, Argonne National Laboratory, Fermi National Accelerator Laboratory



