Mid-level GPU Cloud Support Engineer

Posted 8 hours 7 minutes ago by Hays Specialist Recruitment

Permanent
Not Specified
Temporary Jobs
Dorset, Bournemouth, United Kingdom, BH1 1
Job Description

Your new company
I'm excited to partner with a trailblazing company that's revolutionising the future of cloud infrastructure! Their cutting-edge, high-performance, GPU-optimized platform is not only pushing the boundaries of AI and HPC but also making strides towards a greener, more sustainable world.
This is a fully remote position, so you can work from anywhere without ever needing to step into an office. Plus, you'll love the fantastic perk of unlimited holiday, giving you the freedom to recharge and thrive whenever you need it.

Your new role
As a Mid-level GPU Cloud Support Engineer, you'll provide top-notch support to customers on a GPU cloud platform and customer-dedicated GPU clusters. You'll collaborate closely with cross-functional teams, external vendors, and partners to uphold SLA commitments and maintain operational excellence.

Key Responsibilities:

  • Incident Management: Handle support enquiries, investigate complex issues related to storage (eg, Vast, Weka), networking (eg, Infiniband, RoCE), and GPU optimisation.
  • GPU Cloud Support: Resolve issues promptly, adhering to SLAs for critical incidents, including system outages and performance problems.
  • Cluster Monitoring: Perform health checks on multi-node clusters, ensuring optimal node performance, GPU utilisation, and service availability.
  • Documentation: Keep detailed records of incidents, troubleshooting steps, resolutions, and root cause analyses.
  • Collaboration: Work in Real Time with internal and external stakeholders.
  • User Assistance: Provide best-effort guidance on interactive tools.

What you'll need to succeed

  • Shift Flexibility: Willingness to work on either a -8 or +8 shift pattern.
  • Support Background: 2+ years of experience in IT support, preferably in GPU cloud environments.
  • Linux Skills: Proficiency in Linux system administration from the command line.
  • Scripting and Automation: Skilled in Scripting languages (eg, Bash, Python).
  • Tools and Platforms: Familiarity with ITSM tools (eg, ServiceNow, Jira Service Management) and monitoring solutions.

What you'll get in return

  • Share options.
  • Unlimited holiday policy.
  • 100% remote working.
  • Fantastic opportunities for career development with a strong internal promotion culture.
  • A collaborative team passionate about working together.
  • Enhanced family-friendly policies.
  • A truly flexible workplace.

What you need to do now
If you're interested in this role, click 'apply now' to forward an up-to-date copy of your CV, or call us now.

Hays Specialist Recruitment Limited acts as an employment agency for permanent recruitment and employment business for the supply of temporary workers. By applying for this job you accept the T&C's, Privacy Policy and Disclaimers which can be found on our website.