Databricks03.03.2026

Sr. Staff Technical Program Manager - Reliability

Mountain View

Обязанности

  • 01Lead the strategy, execution, and continuous improvement of Reliability initiatives across infrastructure and product engineering teams
  • 02Partner with senior engineering leadership to define the long-term Reliability roadmap and influence technical direction
  • 03Ensure clarity and alignment on priorities across engineering teams including Platform Engineering, Compute Fleet Management, SRE, Security, and Cloud Partnerships
  • 04Own program execution end-to-end: planning, risk management, dependency mapping, trade-off decisions, status reporting, and delivery
  • 05Identify gaps in process or architecture and work with TLs to proactively drive organizational or technical improvements
  • 06Partner deeply with engineering teams to influence technical direction and facilitate alignment between cross-functional teams
  • 07Bring systems thinking to diagnose reliability bottlenecks and drive improvements to scalability, fault tolerance, automation, and operational tooling
  • 08Drive adoption of reliability best practices across engineering teams including error budgets, incident reviews, design-for-resilience patterns, and operational readiness
  • 09Define and implement program governance, repeatable processes, metrics, and documentation to scale reliability efforts across teams
  • 10Evangelize reliability expectations and engineer-empowering processes that reduce operational load and improve incident preparedness

Требования

  • 0110+ years of experience managing and delivering large-scale technical programs in cloud infrastructure, distributed systems, SRE, or platform engineering environments
  • 02Experience developing infrastructure at two or more hyperscale cloud providers (e.g., AWS, Azure, GCP), with knowledge of cloud primitives, multi-AZ/region architecture, and control plane/data plane patterns
  • 03Demonstrated success leading Reliability Programs at scale including availability, failover, operational excellence, incident reduction, or dependency hardening
  • 04Strong understanding of infrastructure, distributed systems, or SRE practices; previous engineering or SRE experience is highly preferred
  • 05Experience partnering directly with senior engineering leadership to define strategy and drive large, multi-team initiatives
  • 06Ability to translate ambiguous goals into actionable program plans with clear milestones, KPIs, and success metrics
  • 07Demonstrated ability to manage complex cross-organizational dependencies, technical risks, and multi-quarter timelines
  • 08Experience delivering programs across multiple clouds and/or large-scale cloud-native services
  • 09Experience building and scaling engineering processes, operational frameworks, and stakeholder alignment mechanisms

Условия

  • 01Pay range transparency: The pay range(s) for this role is listed below and represents the expected salary range for non-commissionable roles or on-target earnings for commissionable roles
  • 02Actual compensation packages are based on several factors that are unique to each candidate, including but not limited to job-related skills, depth of experience, relevant certifications and training, and specific work location
  • 03Based on the factors above, Databricks anticipates utilizing the full width of the range
  • 04The total compensation package for this position may also include eligibility for annual performance bonus