Company Overview
Finding quality mental health care can be a challenge for those in crisis. At Charlie Health, we are committed to providing comprehensive, effective, and accessible solutions. Our treatment programs integrate personalized care with peer support, creating a nurturing environment that promotes lasting healing and recovery.
Responsibilities
- Design, build, and maintain AWS cloud infrastructure using Terragrunt and OpenTofu (Terraform).
- Develop and optimize CI/CD pipelines in GitHub Actions to streamline deployments and enhance developer efficiency.
- Automate infrastructure provisioning, monitoring, and scaling using Bash, Golang, and Python.
- Implement security best practices across infrastructure and pipelines, ensuring compliance with IAM, networking, and HIPAA regulations.
- Collaborate with product engineering teams to support services requiring new platform capabilities.
- Enhance system reliability, scalability, observability, security, and cost-efficiency across deployed services.
- Break down complex initiatives into clear, achievable milestones to deliver incremental value.
- Mentor and support engineers while continuously improving architecture and processes.
- Boost developer productivity by identifying and abstracting common functionalities based on feedback.
- Participate in on-call rotations to ensure system stability and rapid incident resolution.
Requirements
- 4+ years of experience in DevOps, Platform, SRE, or Cloud Engineering roles.
- 3+ years of expertise with AWS services (e.g., ECS Fargate, Lambda, S3, Aurora, SQS, EventBridge, EC2, ECR).
- Strong proficiency in Terraform/OpenTofu for infrastructure-as-code.
- Hands-on experience in Linux system administration, networking, and security best practices.
- Proficiency in Bash, Python, or Golang for automation.
- Experience with CI/CD tools such as GitHub Actions, GitLab CI, or Jenkins.
- Solid understanding of containerization (Docker) and orchestration tools.
- Familiarity with observability and monitoring tools such as Datadog, Honeycomb, Prometheus, Grafana, Loki, Sumologic, and OpsGenie.
- Proven ability to break down large-scale projects and deliver incremental improvements while balancing long-term scalability.
- Passion for automation, security, scalability, and cost-effective, self-healing infrastructure.
- Experience handling security incidents, system outages, and performance degradations, including root cause analysis and post-mortems.
- Prior experience in an on-call rotation, ensuring timely resolution of critical infrastructure and application issues.
- Strong problem-solving, communication, and collaboration skills, with the ability to thrive in a fast-paced startup environment.
Note: Team members living within 45 minutes of a Charlie Health office are expected to follow a hybrid work schedule.
Preferred Qualifications
- Experience leading large cloud migrations or transitioning from monolithic to microservices architectures.
- Hands-on experience with MLOps and AWS SageMaker.
- Knowledge of backend development in Python, TypeScript, or Golang.
- Familiarity with AI-powered developer productivity tools.
- Background in Software Development Engineer in Test (SDET) or Quality Assurance (QA).