GitLab20.04.2026

Site Reliability Engineer, Environment Automation

Remote

Обязанности

01Contribute to automating operational tasks across many GitLab environments, from initial provisioning and configuration updates to upgrades and routine maintenance, helping reduce manual work and improve reliability at scale under the guidance of senior team members
02Help build and refine the observability stack for multi-tenant GitLab environments so we monitor the right signals across Kubernetes, cloud services, and GitLab applications, supporting early issue detection and basic capacity tracking
03Assist in responding to platform alerts and incidents, collaborating with Environment Automation SREs and engineering teams to troubleshoot production issues across multiple tenants and document findings
04Support planning and implementation of infrastructure changes, capacity expansions, and new service rollouts for Dedicated and other managed GitLab environments, contributing to efforts that improve resource efficiency and environment isolation
05Develop and maintain scripts, automation tools, and infrastructure-as-code workflows that manage parts of the GitLab environment lifecycle, enabling more repeatable, self-service operations over time
06Apply and help implement best practices for running GitLab on Kubernetes and cloud platforms, focusing on day-to-day reliability, performance, and security while learning how to keep environments consistent
07Participate in the on-call rotation for production GitLab environments with appropriate support, helping triage and mitigate incidents across clusters and cloud providers and contributing to post-incident reviews
08Document operational tasks, runbooks, and lessons learned so they become clear, repeatable processes and can be candidates for future automation, improving shared knowledge and reducing manual toil across the team

Требования

01Experience working as an SRE or in a similar role operating production infrastructure, with an interest in automating the lifecycle of many environments or tenants in parallel, even if you have not yet done so at large scale
02Hands-on experience with backend programming languages such as Golang, with the ability to read, understand, and modify infrastructure tools
03Hands-on experience running Kubernetes-based workloads in production, including basic understanding of deployments, rollouts, and debugging common issues like crash loops, failed health checks, and scheduling problems
04Familiarity with infrastructure automation and configuration management tools such as Terraform and Ansible, including experience working with modules, variables, and managing state safely for multiple environments
05Solid understanding of Git-based workflows and infrastructure-as-code practices

Site Reliability Engineer, Environment Automation

Обязанности

Требования

Похожие вакансии

Senior Site Reliability Engineer, Tenant Services: Geo

Staff Backend Engineer, Gitlab Delivery: Upgrades

Sr. IT Site Reliability Software Engineer

Site Reliability Engineer

Staff Site Reliability Engineer - Kubernetes

Site Reliability Engineer, Cloud Cost Utilization