Job Description:
• Design, implement, and evaluate algorithms suited to long-horizon, sparse-reward sequential decision-making in healthcare.
• Frame member decisioning problems as Markov Decision Processes (MDPs) or Partially Observable MDPs.
• Manage exploration-exploitation tradeoffs appropriate for a production healthcare environment.
• Build simulation and backtesting environments to evaluate policy or decision quality before production promotion.
• Own the nightly Databricks training workflow involving feature engineering from upstream clinical and operational data sources.
• Apply multi-agent decision-making concepts where member household or population-level coordination is required.
Requirements:
• 8+ years of software engineering or quantitative research experience building and operating large-scale production systems, with emphasis on data-intensive platforms, recommendation systems, optimization engines, or simulation frameworks serving millions of users.
• 3+ years of hands-on experience implementing reinforcement learning, operations research methods, or simulation-driven decision systems in production.
• Relevant backgrounds include policy gradient and value-based RL (PPO, A3C, DQN, CQL), stochastic dynamic programming, discrete-event simulation, or large-scale combinatorial or constrained optimization.
• Deep familiarity with Markov Decision Processes, Bellman-equation-based value estimation, reward or objective shaping, exploration-exploitation tradeoffs, and constraint formulation in real-world decision systems.
• Demonstrated ability to diagnose failure modes in learned or optimized policies: instability, poor credit assignment across long horizons, and distributional shift across large populations.
• Proficiency in Python 3.x; experience with PyTorch or TensorFlow for policy network or learned model implementation.
• Experience with Ray RLlib or equivalent distributed computation frameworks for large-scale training or optimization.
• Experience with Databricks, PySpark, and Delta Lake for large-scale ML or data pipelines processing tens of millions of records.
• Experience with MLflow for experiment tracking, model registry, and artifact management.
• Experience with shipping systems that operate reliably under production load, not just research or prototype work.
Benefits:
• medical, dental and vision benefits
• 401(k) retirement savings plan
• time off (including paid time off, company and personal holidays, volunteer time off, paid parental and caregiver leave)
• short-term and long-term disability
• life insurance