Anthropic18.04.2026

Research Engineer, Safeguards Labs

San Francisco

Обязанности

01Lead and contribute to research projects investigating new methods for detecting misuse of Claude, identifying malicious organizations and accounts, strengthening model safeguards, and other safety needs
02Design and run offline analyses over model usage data to surface abuse patterns, build classifiers and detection systems, and evaluate their effectiveness
03Develop and iterate on prototypes that could eventually feed signals into the real-time safeguards path, partnering with engineers on tech transfer
04Contribute to a broader research portfolio investigating methods for detecting abusive behavior in chat-based or agentive workflows, and for training the model to robustly refrain from dangerous responses or behaviors without over-refusing
05Build evaluations and methodologies for measuring whether safeguards actually work, including in agentic settings
06Write up findings clearly so they inform decisions across Trust & Safety, research, and product teams

01Have a track record of independently driving research projects from ambiguous problem statements to concrete results, ideally in AI, ML, security, integrity, or a related technical field
02Are comfortable scoping your own work and switching between research, engineering, and analysis as a project demands
03Have working familiarity with how large language models operate — sampling, prompting, training — even if LLMs aren't your primary background
04Are proficient in Python and comfortable working with large datasets
05Care about the societal impacts of AI and want your work to directly reduce real-world harm
06Minimum education: Bachelor’s degree or an equivalent combination of education, training, and/or experience
07Required field of study: A field relevant to the role as demonstrated through coursework, training, or professional experience

01The annual compensation range for this role is listed below. For sales roles, the range provided is the role’s On Target Earnings ("OTE") range, meaning that the range includes both the sales commissions/sales bonuses target and annual base salary for the role. Annual Salary: $350,000 — $850,000 USD
02Location-based hybrid policy: Currently, we expect all staff to be in one of our offices at least 25% of the time. However, some roles may require more time in our offices
03Visa sponsorship: We do sponsor visas! However, we aren't able to successfully sponsor visas for every role and every candidate. But if we make you an offer, we will make every reasonable effort to get you a visa, and we retain an immigration lawyer to help with this