Anthropic18.04.2026
Research Engineer, Safeguards Labs
San Francisco
Обязанности
- 01Lead and contribute to research projects investigating new methods for detecting misuse of Claude, identifying malicious organizations and accounts, strengthening model safeguards, and other safety needs
- 02Design and run offline analyses over model usage data to surface abuse patterns, build classifiers and detection systems, and evaluate their effectiveness
- 03Develop and iterate on prototypes that could eventually feed signals into the real-time safeguards path, partnering with engineers on tech transfer
- 04Contribute to a broader research portfolio investigating methods for detecting abusive behavior in chat-based or agentive workflows, and for training the model to robustly refrain from dangerous responses or behaviors without over-refusing
- 05Build evaluations and methodologies for measuring whether safeguards actually work, including in agentic settings
- 06Write up findings clearly so they inform decisions across Trust & Safety, research, and product teams
Требования
- 01Have a track record of independently driving research projects from ambiguous problem statements to concrete results, ideally in AI, ML, security, integrity, or a related technical field
- 02Are comfortable scoping your own work and switching between research, engineering, and analysis as a project demands
- 03Have working familiarity with how large language models operate — sampling, prompting, training — even if LLMs aren't your primary background
- 04Are proficient in Python and comfortable working with large datasets
- 05Care about the societal impacts of AI and want your work to directly reduce real-world harm
- 06Minimum education: Bachelor’s degree or an equivalent combination of education, training, and/or experience
- 07Required field of study: A field relevant to the role as demonstrated through coursework, training, or professional experience
Условия
- 01The annual compensation range for this role is listed below. For sales roles, the range provided is the role’s On Target Earnings ("OTE") range, meaning that the range includes both the sales commissions/sales bonuses target and annual base salary for the role. Annual Salary: $350,000 — $850,000 USD
- 02Location-based hybrid policy: Currently, we expect all staff to be in one of our offices at least 25% of the time. However, some roles may require more time in our offices
- 03Visa sponsorship: We do sponsor visas! However, we aren't able to successfully sponsor visas for every role and every candidate. But if we make you an offer, we will make every reasonable effort to get you a visa, and we retain an immigration lawyer to help with this