We're building a world class team with the aim of providing polished, professional, technical support of an emerging company and product. Ocient is looking for ultimate team players who are dedicated to improving our product and customer experience with every interaction. The Ocient team consists of highly technical engineers and understanding the end to end development, sales, and delivery cycle is key to adding unique value to the organization. Ocient is searching for an experienced Site Reliability Engineer with a solid infrastructure background and a passion for solving hard problems. This individual will help maintain and expand Ocient's "as a service" offering of its cutting-edge data warehouse.
- Assist with design and operations of our hosted offerings of the Ocient DB and related services, such as message queues and storage systems, ensuring all services are highly-available, high performance, and efficient.
- Design and maintain monitoring, log centralization, and alerting for all services to facilitate observability and incident management.
- Automate deployment and configuration of Linux-based servers, including the OS and the numerous applications that compose our hosted offerings.
- Develop and maintain rigorous security practices to protect our applications and customer data.
- Assist with automation of testing pipelines for the Ocient DB and monitoring of test infrastructure.
- 4+ years of experience in system administration or service operations in production environments.
- Scripting experience with Bash, Python, or other languages.
- Experience with system and software monitoring and alerting tools, such as the ELK stack, Graylog, InfluxDB, Prometheus, Zabbix, Grafana, Dynatrace, or others.
- Knowledge and experience with Infrastructure as a Code tools and practices.
- Experience with data archiving, backup and disaster recovery.
- Continuous Integration / Continuous Deployment experience with Jenkins, Gitlab CI or others.
- Experience with source control tools like Git.
- Ability to work flexible hours and serve in on call rotations.
- Broad background in infrastructure across compute, storage, network, and security with proven troubleshooting skills.
- Knowledge of OWASP principles for application security.
- Experience with server / system virtualization and containerization technologies e.g., ProxMox, KVM, VMware.
- Experience with SQL and Database Administration.
- Experience managing and operating cloud infrastructure. (e.g. AWS, GCP, Azure)
- Experience with SSAE18 SOC2 Compliance.
- Experience with networking administration, including VPN, proxy, DNS, and firewall configuration.