Senior Site Reliability Engineer

7 days ago


United Arab Emirates Xenon7 Full time
About us:

Where elite tech talent meets world-class opportunities

At Xenon7, we work with leading enterprises and innovative startups on exciting, cutting-edge projects that leverage the latest technologies across various domains of IT including Data, Web, Infrastructure, AI, and many others. Our expertise in IT solutions development and on-demand resources allows us to partner with clients on transformative initiatives, driving innovation and business growth. Whether it's empowering global organizations or collaborating with trailblazing startups, we are committed to delivering advanced, impactful solutions that meet today's most complex challenges.

About the Client:

Join one of Egypt's premier financial institutions, renowned for its extensive suite of banking services, including Institutional Banking, Personal Banking, and Islamic Banking. With a global presence through over 50 branches and correspondents, we serve a diverse and dynamic clientele. As we embark on a groundbreaking digital transformation journey, we are committed to leveraging the latest technologies to establish a state-of-the-art data architecture that will redefine our performance and service delivery.

Position Overview

The Senior Site Reliability Engineer is a technical leadership role responsible for designing, implementing, and maintaining highly available, scalable, and secure infrastructure for critical banking applications, including Mobile Banking and Internet Banking platforms on on-premise infrastructure. This role leads SRE initiatives, mentors junior engineers, drives continuous improvement in production support, and leads observability strategy using OpenShift, Kubernetes, Prometheus, Grafana, and ELK Stack on on-premise data center infrastructure.

Key Responsibilities

·       Design and architect highly available and scalable OpenShift/Kubernetes infrastructure for banking applications on on-premise servers

·       Lead and implement comprehensive monitoring and observability strategy using Prometheus and Grafana

·       Design and oversee centralized logging infrastructure using ELK Stack (Elasticsearch, Logstash, Kibana)

·       Lead SRE best practices implementation and adoption of production support standards across teams

·       Mentor and coach junior SRE and DevOps engineers on OpenShift, Kubernetes, monitoring, and production support

·       Define and implement Service Level Indicators (SLIs), Objectives (SLOs), and Agreements (SLAs) with measurable metrics

·       Lead incident response strategy, post-incident reviews, and drive continuous improvement in production stability

·       Architect and implement advanced alerting, monitoring dashboards, and visualization strategies using Prometheus and Grafana

·       Design automation frameworks and tools to reduce operational toil and improve production efficiency

·       Lead OpenShift/Kubernetes cluster upgrades, security patches, and infrastructure modernization on-premise

·       Establish production support procedures, on-call rotation policies, and escalation frameworks

·       Optimize system performance, cost, and resource utilization across containerized on-premise infrastructure

·       Conduct capacity planning, performance optimization, and infrastructure scaling initiatives

·       Lead technical architecture reviews and infrastructure design decisions for banking applications

·       Manage on-premise data center resources and infrastructure planning

·       Participate in 24/7 on-call rotation and escalation for critical production incidents

·       Ensure compliance, security hardening, and disaster recovery procedures for financial systems

Qualifications

·       BSc in Computer Science, Information Technology, Software Engineering, or related field

· years of hands-on SRE, DevOps, or Production Engineering experience

· years of experience leading SRE teams or managing production support operations

· years of hands-on experience managing OpenShift and Kubernetes infrastructure on on-premise infrastructure

·       Expert-level experience with Prometheus for monitoring and alerting in production

·       Expert-level experience with Grafana for creating comprehensive monitoring dashboards

·       Advanced experience with ELK Stack (Elasticsearch, Logstash, Kibana) for logging and log analysis

·       Proven experience designing and scaling production systems for high-traffic banking applications

·       Deep expertise in Linux/Unix system administration and container networking

·       Advanced knowledge of CI/CD automation and deployment strategies

·       Hands-on experience with database management, tuning, and optimization on-premises

·       Strong experience with infrastructure automation and Infrastructure as Code

·       Proven 24/7 production support experience in mission-critical environments

·       Experience managing on-premise data center infrastructure

·       Proven leadership skills and ability to mentor junior engineers

·       Excellent communication skills and ability to present to executive stakeholders

·       Experience in financial services or banking sector is highly preferred



  • Abu Dhabi, United Arab Emirates Astra Tech Full time

    Job Description Role Summary We are looking for a Senior Site Reliability Engineer (SRE) to ensure the reliability, scalability, and performance of botim's real-time communication and open platform infrastructure, supporting millions of active users globally. In this role, you will lead automation initiatives, operate and optimize large-scale Kubernetes...


  • United Arab Emirates, Dubai ManpowerGroup Middle East Full time

    Job Description Site Reliability Engineer / AMS Support Engineer - Digital Healthcare Our client, a leading global healthcare technology company, is looking for an experienced Site Reliability Engineer / Application Management Services (AMS) Support Engineer to join their innovative team. The company is at the forefront of digital health, partnering with...

  • Senior Engineer

    2 weeks ago


    , , United Arab Emirates KBR Inc. Full time

    Senior Engineer (Reliability) Mechanical (Rot & Static) / Instrumentation (I&C, F&G, Telecom) / Electrical Division KBR is looking for Senior Engineers to support the SMS Reliability Analysis Services Project, Adnoc, Abu Dhabi Summary Highly experienced professional who typically support the Reliability & Maintenance function by developing maintenance...


  • United Arab Emirates Xenon7 Full time

    About us:Where elite tech talent meets world-class opportunitiesAt Xenon7, we work with leading enterprises and innovative startups on exciting, cutting-edge projects that leverage the latest technologies across various domains of IT including Data, Web, Infrastructure, AI, and many others. Our expertise in IT solutions development and on-demand resources...


  • United Arab Emirates, Dubai Dicetek LLC Full time

    Job Description Job Summary We are looking for a Site Reliability Engineer (SRE) to ensure the reliability, scalability, and performance of our production systems. The SRE will work closely with engineering, DevOps, and product teams to build highly available systems, automate operations, and improve system observability while maintaining service level...


  • , , United Arab Emirates BHFT Full time

    We are looking for a Site Reliability Engineer who will be responsible for ensuring the reliable operation of our platform working with metrics to improve production process efficiency and participating in testing new product versions. Responsibilities Production Stability Management: Ensure continuous compliance with external regulatory requirements and...


  • united arab emirates BHFT Full time

    We are looking for a Site Reliability Engineer who will be responsible for ensuring the reliable operation of our platform working with metrics to improve production process efficiency and participating in testing new product versions. Responsibilities Production Stability Management: Ensure continuous compliance with external regulatory requirements and...


  • , , United Arab Emirates Binghatti Holding Full time

    الوصف الوظيفي Key Responsibilities Supervise daily construction activities on site Coordinate between project management, subcontractors and suppliers Monitor progress and prepare regular site reports Ensure compliance with safety standards and quality control procedures Review shop drawings, method statements and material submissions Resolve...


  • , , United Arab Emirates KBR Inc. Full time

    Title Engineer (Reliability) Mechanical (Rot & Static) / Instrumentation (I&C, F&G, Telecom) / Electrical / CMMS (SAP) Division KBR is looking for Engineers to support the maintenance of crude flexibility project critical equipment at Adnoc Refining, Ruwais, Abu Dhabi. Summary Professional who applies scientific and mathematical principles to develop...


  • , , United Arab Emirates KBR Inc. Full time

    A leading engineering firm in the United Arab Emirates is seeking a Senior Engineer (Reliability) with extensive experience in reliability and maintenance strategies. The ideal candidate will have a background in RAM analysis and a strong proficiency in tools like SAP and Maximo. This role involves optimizing preventive maintenance and conducting root cause...