Platform Engineer

Job Description

Job Title: Platform Engineer
Location: Remote
Job Type: Full-time


About Us

Zylon is a cutting-edge tech company dedicated to providing a truly secure AI platform to highly regulated enterprises and organizations where privacy over their data is key. To do so, we deploy our platform completely On Premise without any 3rd party dependencies by leveraging a self-contained software package which provides all the necessary layers: GPU and inference management, models (LLM, embeddings, etc.), document ingestion, data pipelines, cache, databases, backend and APIs).


On top of this Private AI platform, we have created an all-in-one collaborative workspace which allows knowledge workers to increase their productivity by leveraging AI-powered features like querying your knowledge base in a conversational way, summarization, assisted document creation, report generation, etc. All of this enhanced by enterprise-grade features like roles and permissions, audit log and usage metrics, etc.


As creators and maintainers of the popular open-source project PrivateGPT (54,000 Github stars and 6,000 members in our Discord community), we are committed to leveraging the latest AI technologies to drive innovation and deliver exceptional value to both our customers and the Open Source community.


As we expand after raising $3.2 million in our Pre-Seed round in late 2023, we’re looking for a talented Platform Engineer to join our team and play a critical role in designing, building, and maintaining the infrastructure that powers our AI systems and client deployments; and while doing so, grow together in a company where we celebrate diversity and are committed to creating an inclusive environment for all employees.


What You’ll Do

  • Platform Development & Maintenance: Develop and maintain a reliable, scalable, and efficient platform for deploying AI models and services. Perform capacity planning, disaster recovery, and backup management. Troubleshoot infrastructure-related issues across a variety of systems, from networking to applications.

  • Infrastructure as Code: Design, implement, and manage infrastructure using tools like Terraform, Pulumi, or similar.

  • Monitoring & Observability: Implement robust monitoring, logging, and alerting solutions to ensure system reliability and performance.

  • Cloud & Containerization: Manage cloud resources (AWS, GCP, Azure) and container orchestration platforms (e.g., Kubernetes, Helm, Docker).

  • Collaboration: Work closely with software and AI engineers, as well as product teams to ensure smooth deployment and operation of AI models and applications.

  • Security: Implement best practices for security, ensuring systems and data are protected.

  • On-Premise Deployments: Support our clients during their Zylon On-Premise deployments and collaborate with them for a smooth onboarding.

What you bring

  • Experience: 5+ years in platform engineering, SRE, SysOps, DevOps, or related roles, ideally within startups or AI-focused organizations.

  • Technical Skills:

    • Strong programming skills (Python, Go, Bash, or similar).

    • Proficiency in containerization and orchestration tools (Docker, Kubernetes).

    • Proven experience with cloud platforms (AWS, GCP, Azure).

  • Infrastructure: Deep understanding of infrastructure-as-code tools (Terraform, Pulumi, CloudFormation).

  • Bare Metal: experience with bare metal deployments, high availability, scalability, and security best practices.

  • Networking: Solid understanding of networking, firewalls, load balancers, and DNS management.

  • Monitoring & Observability: Experience with tools like Prometheus, Grafana, Datadog, or similar.

  • Problem-Solving: Excellent analytical and troubleshooting skills, with a proactive mindset.

  • Communication: Strong interpersonal and communication skills to collaborate across teams and with clients.


Bonus Points

  • Knowledge of MLOps workflows or experience deploying AI/ML models in production.

  • Hands-on experience working with GPU clusters and HPC environments.

  • Experience with creating VMware images / AWS AMIs

  • Experience with data pipelines and ETL processes.


Why Join Us?

  • Impactful Work: Be at the forefront of AI innovation and contribute to transformative solutions for real-world challenges.

  • Collaborative Culture: Join a team that values transparency, learning, and mutual respect.

  • Growth Opportunities: Grow with a fast-paced startup environment, taking on new challenges and responsibilities.

  • Flexibility: Enjoy flexible work arrangements that prioritize work-life balance in a fully-remote environment.