Databricks03.03.2026
Sr. Staff Technical Program Manager - Reliability
Mountain View
Обязанности
- 01Lead the strategy, execution, and continuous improvement of Reliability initiatives across infrastructure and product engineering teams
- 02Partner with senior engineering leadership to define the long-term Reliability roadmap and influence technical direction
- 03Ensure clarity and alignment on priorities across engineering teams including Platform Engineering, Compute Fleet Management, SRE, Security, and Cloud Partnerships
- 04Own program execution end-to-end: planning, risk management, dependency mapping, trade-off decisions, status reporting, and delivery
- 05Identify gaps in process or architecture and work with TLs to proactively drive organizational or technical improvements
- 06Partner deeply with engineering teams to influence technical direction and facilitate alignment between cross-functional teams
- 07Bring systems thinking to diagnose reliability bottlenecks and drive improvements to scalability, fault tolerance, automation, and operational tooling
- 08Drive adoption of reliability best practices across engineering teams including error budgets, incident reviews, design-for-resilience patterns, and operational readiness
- 09Define and implement program governance, repeatable processes, metrics, and documentation to scale reliability efforts across teams
- 10Evangelize reliability expectations and engineer-empowering processes that reduce operational load and improve incident preparedness
Требования
- 0110+ years of experience managing and delivering large-scale technical programs in cloud infrastructure, distributed systems, SRE, or platform engineering environments
- 02Experience developing infrastructure at two or more hyperscale cloud providers (e.g., AWS, Azure, GCP), with knowledge of cloud primitives, multi-AZ/region architecture, and control plane/data plane patterns
- 03Demonstrated success leading Reliability Programs at scale including availability, failover, operational excellence, incident reduction, or dependency hardening
- 04Strong understanding of infrastructure, distributed systems, or SRE practices; previous engineering or SRE experience is highly preferred
- 05Experience partnering directly with senior engineering leadership to define strategy and drive large, multi-team initiatives
- 06Ability to translate ambiguous goals into actionable program plans with clear milestones, KPIs, and success metrics
- 07Demonstrated ability to manage complex cross-organizational dependencies, technical risks, and multi-quarter timelines
- 08Experience delivering programs across multiple clouds and/or large-scale cloud-native services
- 09Experience building and scaling engineering processes, operational frameworks, and stakeholder alignment mechanisms
Условия
- 01Pay range transparency: The pay range(s) for this role is listed below and represents the expected salary range for non-commissionable roles or on-target earnings for commissionable roles
- 02Actual compensation packages are based on several factors that are unique to each candidate, including but not limited to job-related skills, depth of experience, relevant certifications and training, and specific work location
- 03Based on the factors above, Databricks anticipates utilizing the full width of the range
- 04The total compensation package for this position may also include eligibility for annual performance bonus