Site Reliability Engineer
Headquartered in New York City, Dataiku was founded in Paris in 2013 and achieved unicorn status in 2019. Now, more than 1,000+ employees work across the globe in our offices and remotely. Backed by a renowned set of investors and partners including CapitalG, Tiger Global, and ICONIQ Growth, we’ve set out to build the future of AI.
Dataiku is looking for a Site Reliability Engineer (SRE) to join our Cloud team developing and operating the Dataiku managed offering.
The role consists of working on a large variety of tasks from deployment to monitoring with a strong focus on Cloud operation. As an SRE, you are responsible to build and operate a reliable, secure and cost-efficient infrastructure to support the Dataiku SaaS offering.
This role is an opportunity to be part of a project that is central to our company’s vision, with a strong and direct impact on the final outcome. In this role, you will get your hands on the most promising cloud technologies and receive valuable mentorship from experts from our core team.
The position is either remote or at the company HQ in NY
Design, deploy and maintain a cloud infrastructure to support a Dataiku SaaS offering
Continuously improve the infrastructure, deployment and configuration to deliver more reliable, resilient, scalable and secure services
Automate as much as possible all technical operations
Troubleshoot cloud infrastructure, systems, network, and application stacks
Setting up monitoring, logging and tracing tools to detect and fix any potential issues
Hands-on expertise working with Docker and Kubernetes
Strong experience leveraging cloud resources from different providers
Hands-on experience with Infrastructure as code tools (Terraform/Ansible, Helm.),
Knowledge of distributed systems like Hadoop and Spark
Solution-oriented and automation first mindset