Site Reliability Engineer
4 days ago
We are seeking a Site Reliability Engineer to support the reliability engineering program across multiple data centers in our fleet. Reporting to the Reliability Manager, you will be responsible for monitoring system performance, driving preventative and predictive maintenance initiatives, leading root cause analysis efforts, and collaborating with cross-functional teams to minimize downtime and enhance infrastructure resilience.
Key Accountabilities:
Monitor real-time and historical performance metrics for critical power, cooling, and IT systems.
Analyse system data to identify trends, failure modes, and reliability risks.
Execute Root Cause Analyses (RCA) and Failure Mode & Effects Analyses (FMEA), then drive corrective and preventive actions.
Develop and maintain condition-based and predictive maintenance routines, leveraging IoT, data analytics, and machine learning tools.
Support preventive maintenance programs: schedule, document, and validate maintenance activities.
Assist in asset lifecycle planning, including upgrades, decommissioning, and end-of-life strategies.
Contribute to capacity runway assessments to forecast infrastructure needs.
Implement and enforce availability management plans, risk assessments, and mitigation strategies.
Ensure data collection and reporting processes for reliability KPIs (e.g., MTBF, MTTR, availability) are standardized and accurate.
Prepare reliability reports and dashboards; present findings and recommendations to site leadership.
Respond to and lead failure-response efforts during site incidents, ensuring rapid recovery and root-cause follow-through.
Maintain compliance with industry standards and regulations (Uptime Institute, ISO, ASHRAE).
Collaborate with Operations, Engineering, Facilities, and Vendors to integrate reliability best practices into day-to-day workflows.
Propose continuous-improvement initiatives and pilot emerging reliability technologies.
The job holder may be required to undertake additional duties, which may be reasonably expected and forms part of the function of the job.
Minimum Qualifications:
Bachelor's degree in mechanical, Electrical, Reliability, or related Engineering discipline.
Minimum Experience:
3+ years of experience in reliability engineering, maintenance engineering, or a data center operations environment.
Hands-on experience with RCA, FMEA, and predictive maintenance methodologies.
Proficiency with monitoring platforms, data-analytics tools, and scripting (e.g., Python, R).
Familiarity with IoT sensors, machine-learning frameworks, and condition-based monitoring systems.
Knowledge of industry reliability standards and regulations (ISO, ASHRAE, Uptime Institute).
Job-Specific Skills (Generic / Technical):
Strong analytical and problem-solving skills, with acute attention to detail.
Effective communicator, able to present technical findings to diverse audiences.
Project coordination skills and the ability to manage multiple reliability initiatives.
Collaborative mindset, comfortable working in cross-functional teams.
Self-starter with a continuous-improvement attitude and commitment to resilience.
Show more Show less
-
Site Reliability Engineer
1 week ago
Dubai, Dubai, United Arab Emirates IGT Solutions Full timeJob Title: Site Reliability EngineerLocation: Dubai, UAE Experience: 8+ years Domain: Airlines / Airports / Travel TechnologyRole OverviewAs a Site Reliability Engineer, you will play a pivotal role in driving DevOps and automation practices for products and programmes. You will design, implement, and optimise CI/CD pipelines, cloud infrastructure, and...
-
Site Reliability Engineer
2 weeks ago
Dubai, Dubai, United Arab Emirates IGT Solutions Full timeJob Title:Site Reliability EngineerLocation:Dubai, UAEExperience:8+ yearsDomain:Airlines / Airports / Travel TechnologyRole OverviewAs a Site Reliability Engineer, you will play a pivotal role in driving DevOps and automation practices for products and programmes. You will design, implement, and optimise CI/CD pipelines, cloud infrastructure, and automation...
-
Site Reliability Engineer
2 weeks ago
Dubai, Dubai, United Arab Emirates SECUWALL Full timeWe are seeking a Site Reliability Engineer (SRE) to ensure the reliability, performance, and security of our distributed systems across hybrid cloud environments (AWS + on-prem). This role focuses on operational excellence, automation, and implementing DevSecOps practices. You will work closely with development teams to improve system resilience, deploy...
-
Site Reliability Engineer
2 weeks ago
Dubai, Dubai, United Arab Emirates DICETEK LLC Full timeJob SummaryWe are looking for a Site Reliability Engineer (SRE) to ensure the reliability, scalability, and performance of our production systems. The SRE will work closely with engineering, DevOps, and product teams to build highly available systems, automate operations, and improve system observability while maintaining service level objectives (SLOs).Key...
-
Site Reliability Engineer
1 week ago
Dubai, Dubai, United Arab Emirates ManpowerGroup Middle East Full timeSite Reliability Engineer / AMS Support Engineer - Digital Healthcare Our client, a leading global healthcare technology company, is looking for an experienced Site Reliability Engineer / Application Management Services (AMS) Support Engineer to join their innovative team.The company is at the forefront of digital health, partnering with healthcare and care...
-
Site Reliability Engineer
1 week ago
Dubai, Dubai, United Arab Emirates MultiBank Group Full timeWelcome to MultiBank Group, a global financial pioneer established in 2005 in California and now proudly headquartered in Dubai, UAE. We specialize in delivering cutting-edge trading technology, unparalleled liquidity, and exceptional customer service. Our extensive range of financial products includes Forex, Metals, Shares, Indices, Commodities, and...
-
Site Reliability Engineering Manager
2 weeks ago
Dubai, Dubai, United Arab Emirates Styli Full timeRole:Head of Site Reliability EngineeringLocation:DubaiAbout Styli Marketplace:Launched in 2019 by Landmark Group, Styli Marketplace is the first eCommerce-only fashion venture of the Group, quickly becoming a leading online destination for fashion and lifestyle across the GCC, including Saudi Arabia, UAE, Kuwait, Bahrain, and beyond. Styli connects global...
-
Site Reliability Engineer
2 weeks ago
Dubai, Dubai, United Arab Emirates Dicetek LLC Full timeJob SummaryWe are looking for aSite Reliability Engineer (SRE)to ensure the reliability, scalability, and performance of our production systems. The SRE will work closely with engineering, DevOps, and product teams to build highly available systems, automate operations, and improve system observability while maintaining service level objectives (SLOs).Key...
-
Site Reliability Engineer
2 weeks ago
Dubai, Dubai, United Arab Emirates ManpowerGroup Middle East Full timeSite Reliability Engineer / AMS Support Engineer - Digital HealthcareOur client, a leading global healthcare technology company, is looking for an experienced Site Reliability Engineer / Application Management Services (AMS) Support Engineer to join their innovative team.The company is at the forefront of digital health, partnering with healthcare and care...
-
Site Reliability Engineer
2 weeks ago
Dubai, Dubai, United Arab Emirates MultiBank Group Full timeWelcome toMultiBank Group, a global financial pioneer established in 2005 in California and now proudly headquartered in Dubai, UAE. We specialize in delivering cutting-edge trading technology, unparalleled liquidity, and exceptional customer service. Our extensive range of financial products includes Forex, Metals, Shares, Indices, Commodities, and...