Site Reliability Engineer

4 days ago


Dubai, Dubai, United Arab Emirates Khazna Data Centers Full time
Khazna was founded in 2012 and has grown rapidly into becoming the leading and trusted wholesale Data Center provider in the Middle East and North Africa region. Through our Data Centers, we provide industry benchmark levels of power supply and cooling services to better serve the growing need for data center operations in the UAE and wider region.

We are seeking a Site Reliability Engineer to support the reliability engineering program across multiple data centers in our fleet. Reporting to the Reliability Manager, you will be responsible for monitoring system performance, driving preventative and predictive maintenance initiatives, leading root cause analysis efforts, and collaborating with cross-functional teams to minimize downtime and enhance infrastructure resilience.

Key Accountabilities:

Monitor real-time and historical performance metrics for critical power, cooling, and IT systems.
Analyse system data to identify trends, failure modes, and reliability risks.
Execute Root Cause Analyses (RCA) and Failure Mode & Effects Analyses (FMEA), then drive corrective and preventive actions.
Develop and maintain condition-based and predictive maintenance routines, leveraging IoT, data analytics, and machine learning tools.
Support preventive maintenance programs: schedule, document, and validate maintenance activities.
Assist in asset lifecycle planning, including upgrades, decommissioning, and end-of-life strategies.
Contribute to capacity runway assessments to forecast infrastructure needs.
Implement and enforce availability management plans, risk assessments, and mitigation strategies.
Ensure data collection and reporting processes for reliability KPIs (e.g., MTBF, MTTR, availability) are standardized and accurate.
Prepare reliability reports and dashboards; present findings and recommendations to site leadership.
Respond to and lead failure-response efforts during site incidents, ensuring rapid recovery and root-cause follow-through.
Maintain compliance with industry standards and regulations (Uptime Institute, ISO, ASHRAE).
Collaborate with Operations, Engineering, Facilities, and Vendors to integrate reliability best practices into day-to-day workflows.
Propose continuous-improvement initiatives and pilot emerging reliability technologies.
The job holder may be required to undertake additional duties, which may be reasonably expected and forms part of the function of the job.

Minimum Qualifications:

Bachelor's degree in mechanical, Electrical, Reliability, or related Engineering discipline.

Minimum Experience:

3+ years of experience in reliability engineering, maintenance engineering, or a data center operations environment.
Hands-on experience with RCA, FMEA, and predictive maintenance methodologies.
Proficiency with monitoring platforms, data-analytics tools, and scripting (e.g., Python, R).
Familiarity with IoT sensors, machine-learning frameworks, and condition-based monitoring systems.
Knowledge of industry reliability standards and regulations (ISO, ASHRAE, Uptime Institute).

Job-Specific Skills (Generic / Technical):

Strong analytical and problem-solving skills, with acute attention to detail.
Effective communicator, able to present technical findings to diverse audiences.
Project coordination skills and the ability to manage multiple reliability initiatives.
Collaborative mindset, comfortable working in cross-functional teams.
Self-starter with a continuous-improvement attitude and commitment to resilience.
Show more Show less

  • Dubai, Dubai, United Arab Emirates IGT Solutions Full time

    Job Title: Site Reliability EngineerLocation: Dubai, UAE Experience: 8+ years Domain: Airlines / Airports / Travel TechnologyRole OverviewAs a Site Reliability Engineer, you will play a pivotal role in driving DevOps and automation practices for products and programmes. You will design, implement, and optimise CI/CD pipelines, cloud infrastructure, and...


  • Dubai, Dubai, United Arab Emirates IGT Solutions Full time

    Job Title:Site Reliability EngineerLocation:Dubai, UAEExperience:8+ yearsDomain:Airlines / Airports / Travel TechnologyRole OverviewAs a Site Reliability Engineer, you will play a pivotal role in driving DevOps and automation practices for products and programmes. You will design, implement, and optimise CI/CD pipelines, cloud infrastructure, and automation...


  • Dubai, Dubai, United Arab Emirates SECUWALL Full time

    We are seeking a Site Reliability Engineer (SRE) to ensure the reliability, performance, and security of our distributed systems across hybrid cloud environments (AWS + on-prem). This role focuses on operational excellence, automation, and implementing DevSecOps practices. You will work closely with development teams to improve system resilience, deploy...


  • Dubai, Dubai, United Arab Emirates DICETEK LLC Full time

    Job SummaryWe are looking for a Site Reliability Engineer (SRE) to ensure the reliability, scalability, and performance of our production systems. The SRE will work closely with engineering, DevOps, and product teams to build highly available systems, automate operations, and improve system observability while maintaining service level objectives (SLOs).Key...


  • Dubai, Dubai, United Arab Emirates ManpowerGroup Middle East Full time

    Site Reliability Engineer / AMS Support Engineer - Digital Healthcare Our client, a leading global healthcare technology company, is looking for an experienced Site Reliability Engineer / Application Management Services (AMS) Support Engineer to join their innovative team.The company is at the forefront of digital health, partnering with healthcare and care...


  • Dubai, Dubai, United Arab Emirates MultiBank Group Full time

    Welcome to MultiBank Group, a global financial pioneer established in 2005 in California and now proudly headquartered in Dubai, UAE. We specialize in delivering cutting-edge trading technology, unparalleled liquidity, and exceptional customer service. Our extensive range of financial products includes Forex, Metals, Shares, Indices, Commodities, and...


  • Dubai, Dubai, United Arab Emirates Styli Full time

    Role:Head of Site Reliability EngineeringLocation:DubaiAbout Styli Marketplace:Launched in 2019 by Landmark Group, Styli Marketplace is the first eCommerce-only fashion venture of the Group, quickly becoming a leading online destination for fashion and lifestyle across the GCC, including Saudi Arabia, UAE, Kuwait, Bahrain, and beyond. Styli connects global...


  • Dubai, Dubai, United Arab Emirates Dicetek LLC Full time

    Job SummaryWe are looking for aSite Reliability Engineer (SRE)to ensure the reliability, scalability, and performance of our production systems. The SRE will work closely with engineering, DevOps, and product teams to build highly available systems, automate operations, and improve system observability while maintaining service level objectives (SLOs).Key...


  • Dubai, Dubai, United Arab Emirates ManpowerGroup Middle East Full time

    Site Reliability Engineer / AMS Support Engineer - Digital HealthcareOur client, a leading global healthcare technology company, is looking for an experienced Site Reliability Engineer / Application Management Services (AMS) Support Engineer to join their innovative team.The company is at the forefront of digital health, partnering with healthcare and care...


  • Dubai, Dubai, United Arab Emirates MultiBank Group Full time

    Welcome toMultiBank Group, a global financial pioneer established in 2005 in California and now proudly headquartered in Dubai, UAE. We specialize in delivering cutting-edge trading technology, unparalleled liquidity, and exceptional customer service. Our extensive range of financial products includes Forex, Metals, Shares, Indices, Commodities, and...