Databricks03.03.2026

Sr. Staff Technical Program Manager - Reliability

Mountain View

Обязанности

01Lead the strategy, execution, and continuous improvement of Reliability initiatives across infrastructure and product engineering teams
02Partner with senior engineering leadership to define the long-term Reliability roadmap and influence technical direction
03Ensure clarity and alignment on priorities across engineering teams including Platform Engineering, Compute Fleet Management, SRE, Security, and Cloud Partnerships
04Own program execution end-to-end: planning, risk management, dependency mapping, trade-off decisions, status reporting, and delivery
05Identify gaps in process or architecture and work with TLs to proactively drive organizational or technical improvements
06Partner deeply with engineering teams to influence technical direction and facilitate alignment between cross-functional teams
07Bring systems thinking to diagnose reliability bottlenecks and drive improvements to scalability, fault tolerance, automation, and operational tooling
08Drive adoption of reliability best practices across engineering teams including error budgets, incident reviews, design-for-resilience patterns, and operational readiness
09Define and implement program governance, repeatable processes, metrics, and documentation to scale reliability efforts across teams
10Evangelize reliability expectations and engineer-empowering processes that reduce operational load and improve incident preparedness

0110+ years of experience managing and delivering large-scale technical programs in cloud infrastructure, distributed systems, SRE, or platform engineering environments
02Experience developing infrastructure at two or more hyperscale cloud providers (e.g., AWS, Azure, GCP), with knowledge of cloud primitives, multi-AZ/region architecture, and control plane/data plane patterns
03Demonstrated success leading Reliability Programs at scale including availability, failover, operational excellence, incident reduction, or dependency hardening
04Strong understanding of infrastructure, distributed systems, or SRE practices; previous engineering or SRE experience is highly preferred
05Experience partnering directly with senior engineering leadership to define strategy and drive large, multi-team initiatives
06Ability to translate ambiguous goals into actionable program plans with clear milestones, KPIs, and success metrics
07Demonstrated ability to manage complex cross-organizational dependencies, technical risks, and multi-quarter timelines
08Experience delivering programs across multiple clouds and/or large-scale cloud-native services
09Experience building and scaling engineering processes, operational frameworks, and stakeholder alignment mechanisms

01Pay range transparency: The pay range(s) for this role is listed below and represents the expected salary range for non-commissionable roles or on-target earnings for commissionable roles
02Actual compensation packages are based on several factors that are unique to each candidate, including but not limited to job-related skills, depth of experience, relevant certifications and training, and specific work location
03Based on the factors above, Databricks anticipates utilizing the full width of the range
04The total compensation package for this position may also include eligibility for annual performance bonus