Grafana Labs27.03.2026

Staff AI Engineer | Canada | Remote

Canada (Remote)

Обязанности

  • 01Own end-to-end development of multi-agent AI systems, from architecture and implementation through testing, deployment, and ongoing operation
  • 02Build modular, composable agentic systems using orchestration frameworks (LangChain, CrewAI, Anthropic MCP, or similar) that operate 24/7 across teams
  • 03Develop reusable agentic skills that agents invoke across interfaces (Slack, dashboards, internal apps, CLIs)
  • 04Implement observability and feedback loops including logging, performance metrics, prompt iteration, model evaluation, and cost management
  • 05Establish governance and compliance standards for AI workflows including access controls, audit trails, PII handling, and human-in-the-loop escalation paths
  • 06Build MCP servers, APIs, CLIs, and microservices connecting AI models to business systems (BigQuery, Slack, CRMs, email, calendars, analytics tools)
  • 07Architect data flows for retrieval-augmented generation (RAG), connecting LLMs to internal knowledge bases, customer data, and real-time business context
  • 08Build serverless or containerized services (GCP Cloud Functions, Cloud Run) that scale with usage and integrate with Grafana's cloud infrastructure
  • 09Partner with RevOps, Demand Generation, Regional Marketing, and SDR teams to scope high-impact automation problems, identify bottlenecks, and build solutions with measurable business outcomes
  • 10Design and deploy workflows using orchestration tools (n8n, Workato, or custom platforms) with CI/CD, testing, and production reliability standards
  • 11Build systems designed for self-service with documentation, playbooks, and enablement materials that let partner teams operate independently

Требования

  • 018+ years of software engineering experience with depth in backend development, systems integration, or data/analytics engineering
  • 022+ years hands-on experience applying LLMs/AI to production workflows, not just prototypes
  • 03Strong proficiency in Python and JavaScript/Node.js with Git-based workflows, code review practices, and testing discipline
  • 04Hands-on experience with LLM frameworks and patterns including prompt engineering, RAG, function calling/tool use, structured output parsing, and evaluation
  • 05Experience building and operating multi-agent systems at scale including agent decomposition, orchestration patterns (sequential chains, router/dispatcher, parallel fan-out), state management, and production monitoring
  • 06You diagnose business problems before writing code. You think in workflows and outcomes, not just functions
  • 07Deep familiarity with Google Cloud Platform, BigQuery, and serverless/containerized services (Cloud Functions, Cloud Run)
  • 08Understanding of LLM failure modes and production mitigations including confidence thresholds, fallback logic, human escalation, and cost/latency management
  • 09Proven ability to identify high-leverage problems, push back on low-impact requests, and deliver end-to-end with minimal direction
  • 10Fluent with AI-assisted development tools (GitHub Copilot, Cursor, Claude Code). You use AI to build AI systems
  • 11Clear technical communicator —you can explain complex systems in simple terms to both engineers and business stakeholders

Условия

  • 01This is a remote opportunity
  • 02We would be interested in applicants from Canada time zones only at this time
  • 03We invest heavily in developer productivity
  • 04You'll have access to AI coding assistants (Claude Code, Gemini CLI, OpenAI Codex, and others of your choice within security guidelines)