Grafana Labs07.05.2026

Senior Software Engineer - Grafana Databases, Managed Services | Ireland | Remote

Republic of Ireland (Remote)

Обязанности

  • 01Operating and evolving 100+ multi-cloud streaming clusters and related database infrastructure
  • 02Diagnosing and eliminating cross-layer failure modes (e.g., object storage latency, noisy neighbors, control-plane bottlenecks, query performance regressions, etc.)
  • 03Designing safe upgrade and rollout strategies at scale
  • 04Improving observability, automation, and operational ergonomics
  • 05Partnering closely with database and platform teams to ensure safe scaling, partitioning, consumer fan-out, and query performance
  • 06Working directly with distributed systems behavior, Kubernetes scheduling dynamics, storage engines, compression trade-offs, etc.
  • 07Serving as a primary escalation point and on-call for relevant incidents
  • 08Owning the relationship with all system vendors, including WarpStream Labs and others
  • 09Reviewing and defining SLOs for shared database infrastructure, proactively reducing error budgets through improvements to monitoring, automation, scaling strategies, and system design
  • 10Improving the diagnosability of core streaming and database systems in production, where possible
  • 11Implementing solutions that ensure reliability, scalability, and performance of high-throughput, multi-cloud infrastructure
  • 12Developing fault-tolerant patterns that account for distributed system realities such as storage latency, partition imbalance, noisy neighbors, and control-plane dependencies
  • 13Planning and executing safe upgrades and rollouts across dozens of production clusters
  • 14Collaborating with database and platform engineering leaders to influence architecture, roadmap priorities, and long-term strategy
  • 15Participating in PR review and contributing to design documents, automation, tooling, and code improvements that reduce operational risk
  • 16Sharing best practices and distributed systems knowledge with partner teams
  • 17Participating in incident response, from investigation through resolution and post-incident reviews (PIR)

Требования

  • 016+ years of engineering experience, including meaningful time in SRE, platform engineering, production engineering, infrastructure engineering, or distributed systems roles
  • 02Experience operating distributed systems in production (e.g., streaming systems, analytical databases, large-scale storage backends)

Условия

  • 01Remote opportunity for applicants living in Ireland time zones only
  • 02Remote-first company with guidance and regular video calls
  • 03Independent attitude and good communication skills are a must
  • 04Access to modern AI coding assistants (e.g., GPT-Codex 5/3, Claude Opus 4.6, Gemini 3 Pro) with company-funded usage budget
  • 05On-call component with global hiring to ensure healthy on-call aligned to approximately 12 daylight hours per day
  • 06Regular 1:1s with manager and close collaboration with teammates across regions