Poolside18.05.2026

Member of Engineering (Pre-training / Data Acquisition)

Полная занятостьУдалёнка

Обязанности

  • 01Design, build, and operate a large-scale web crawler responsible for acquiring all openly accessible data on the internet
  • 02Develop specialized deep crawlers targeting high-value sources to improve recall and coverage
  • 03In collaboration with data researchers, own a long-term road map for data acquisition
  • 04Build observability, monitoring, and debugging tooling to ensure reliability and transparency across crawl infrastructure
  • 05Collaborate with pre-training, post-training, and evaluations teams to align data acquisition priorities with model training needs
  • 06Build high-throughput ingestion pipelines for rapidly onboarding partner data and evaluating it for quality

Требования

  • 01Strong distributed systems background with proven experience building and operating large-scale infrastructure — data pipelines, web crawlers, or similar
  • 02Proficiency in Python, and comfortable optimizing performance and debugging complex systems under production conditions
  • 03Hands-on experience with web crawling or large-scale data extraction: understanding of HTTP protocols, distributed job queues, and data parsing at scale
  • 04Familiarity with cloud platforms (AWS) and container orchestration (Kubernetes, Docker) for deploying and managing high-throughput workloads
  • 05Awareness of the non-technical dimensions of internet-scale crawling: data privacy, robots.txt adherence, and responsible crawl practices

Условия

  • 01Fully remote work & flexible hours
  • 0237 days/year of vacation & holidays
  • 0316 weeks of flexible, full-pay parental leave
  • 04Health insurance allowance for you & dependents
  • 05Company-provided equipment
  • 06Well-being, always-be-learning & home office allowances
  • 07Frequent team get togethers
  • 08Diverse & inclusive people-first culture