Databricks03.03.2026

Sr. Staff Technical Program Manager - Reliability

Bellevue

Обязанности

01Lead the strategy, execution, and continuous improvement of critical Reliability initiatives across infrastructure and product engineering teams
02Lead cross-company programs to enhance reliability, performance, and operational excellence of multi-cloud infrastructure
03Partner with senior engineering leaders to define Reliability strategy and set long-term goals
04Execute multi-quarter programs to build the most reliable cloud platform
05Anticipate risks, shape technical direction, and deliver complex programs across product, engineering, SRE, and cloud partner teams
06Lead Reliability Strategy and Multi-Quarter Roadmaps with senior engineering leadership
07Ensure clarity and alignment on priorities across engineering teams (Platform Engineering, Compute Fleet Management, SRE, Security, Cloud Partnerships)
08Own program execution end-to-end: planning, risk management, dependency mapping, trade-off decisions, status reporting, and delivery
09Identify gaps in process or architecture and work with TLs to drive organizational or technical improvements
10Partner deeply with engineering to influence technical direction and facilitate alignment between cross-functional teams
11Bring systems thinking to diagnose reliability bottlenecks and drive improvements to scalability, fault tolerance, automation, and operational tooling
12Drive adoption of reliability best practices across engineering teams (error budgets, incident reviews, design-for-resilience patterns, operational readiness)
13Define and implement program governance, repeatable processes, metrics, and documentation to scale reliability efforts
14Evangelize reliability expectations and engineer-empowering processes to reduce operational load and improve incident preparedness

Требования

0110+ years of experience managing and delivering large-scale technical programs in cloud infrastructure, distributed systems, SRE, or platform engineering environments
02Experience developing infrastructure at two or more hyperscale cloud providers (AWS, Azure, GCP), with knowledge of cloud primitives, multi-AZ/region architecture, and control plane/data plane patterns
03Demonstrated success leading Reliability Programs at scale (availability, failover, operational excellence, incident reduction, dependency hardening)
04Strong understanding of infrastructure, distributed systems, or SRE practices; previous engineering or SRE experience is highly preferred
05Experience partnering directly with senior engineering leadership to define strategy and drive large, multi-team initiatives
06Ability to translate ambiguous goals into actionable program plans with clear milestones, KPIs, and success metrics
07Demonstrated ability to manage complex cross-organizational dependencies, technical risks, and multi-quarter timelines
08Experience delivering programs across multiple clouds and/or large-scale cloud-native services
09Experience building and scaling engineering processes, operational frameworks, and stakeholder alignment mechanisms
10Background in distributed systems engineering, SRE, platform infrastructure, or cloud services
11Experience with large-scale compute fleets, container orchestration, autoscaling, or control-plane architecture
12Familiarity with reliability methodologies (SLOs, error budgets, chaos engineering, failure mode analysis, incident management frameworks)
13Expertise using Jira or equivalent tools for program tracking and execution
14Bachelor’s degree in Computer Science, Engineering, or related technical field; advanced degree preferred

Условия

01Pay range transparency with expected salary range for non-commissionable roles or on-target earnings for commissionable roles
02Total compensation package may include eligibility for annual performance bonus

Sr. Staff Technical Program Manager - Reliability

Обязанности

Требования

Условия

Похожие вакансии

Sr. Staff Technical Program Manager - Reliability

Staff Technical Program Manager- Platform Features

Sr. IT Site Reliability Software Engineer

Sr. Manager, Engineering

Engineering Manager - Platform Reliability

Sr. Manager - Production Engineering