← All Careers
Posted Jun 6, 2026

Software Engineer, Data Processing & Privacy - 26-00382

Additional Notes: Data Privacy and legal environments, working in Python & with Claude, handling/processing PII; Soft skills: attention to detail, reliable, good with reviews/audits. About the role • Client is seeking a detail-oriented Software Engineer on a contract basis to build and run data processing pipelines for datasets used in our research. You'll take raw, heterogeneous inputs — text, code, documents, structured exports — and turn them into clean, well-structured, privacy-safe outputs ready for downstream use. • The work spans ingestion, format normalization, data quality, privacy handling (including PII de-identification), and the supporting tooling that makes the pipeline reliable and self-serve. You'll iterate closely with internal teams on QA findings and harden the pipeline so each new dataset is cheaper than the last. Responsibilities • Build and extend per-source processing for new data types as they arrive • Ingest and normalize raw exports across many formats into consistent, well-structured outputs • Handle privacy requirements — for example, PII detection and de-identification — to meet our internal compliance bar • Run data quality QA: automated checks plus LLM-assisted review to flag gaps, malformed inputs, and incompleteness • Iterate on internal feedback: root-cause issues, fix, re-run, re-deliver • Build supporting tools: auditing, data exploration, monitoring, simple search over processed data • Land cleaned data with the right storage layout and access controls • Document and harden the pipeline so each new dataset is cheaper than the last You may be a good fit if you • Have 4+ years of software engineering experience, with substantial time on data pipelines • Are a proficient user of Claude / Claude Code for day-to-day engineering and know when to verify its output Are genuinely detail-oriented • Have high integrity and take handling real people's personal data seriously • Are comfortable with sustained, careful data work and find satisfaction in getting it right • Can work independently, ship reliably, and communicate clearly about progress and edge cases • Are proficient in Python and comfortable working across many heterogeneous, semi-structured formats (JSON, NDJSON, code, HTML/XML dumps, archives) • Strong candidates may also have experience with • PII detection and anonymization techniques • Working with large, messy, semi-structured text and code corpora • Data quality monitoring and validation • Cloud storage and access-control patterns (S3/GCS, IAM) • Building internal tools or self-serve data platforms for researchers • Information retrieval, search, or RAG systems.