Site Reliability Engineer
3 days ago
We are seeking a Site Reliability Engineer (SRE) to ensure the reliability, performance, and security of our distributed systems across hybrid cloud environments (AWS + on-prem). This role focuses on operational excellence, automation, and implementing DevSecOps practices. You will work closely with development teams to improve system resilience, deploy secure infrastructure, and monitor production workloads.
Key Responsibilities
- Operate and maintain Kubernetes clusters (EKS + on-prem), ensuring scalability, service discovery, ingress, and network policy compliance.
- Build and maintain hybrid networking solutions with secure, low-latency connectivity.
- Implement reliability-focused practices including autoscaling, load balancing, disaster recovery, and fault-tolerant system designs.
- Define and enforce DevSecOps policies, including secrets management, RBAC, Pod Security Standards, and secure container runtimes.
- Build, monitor, and maintain CI/CD pipelines with automated testing gates for security, performance, and reliability.
- Deploy and manage observability stacks (Prometheus, Grafana, Loki, OpenTelemetry) to define SLOs/SLIs, alerts, and dashboards.
- Lead incident response, root cause analysis, postmortems, and continuous reliability improvements.
- Participate in on-call rotations to ensure 24/7 system availability and quick resolution of production issues.
- Optimize cost, performance, and security across AWS services and on-prem resources.
- Harden Linux servers and infrastructure with security best practices (firewalls, SELinux/AppArmor, TLS/mTLS).
- Integrate vulnerability scanning for containers, IaC, and dependencies into operational workflows.
- Collaborate with developers to improve system reliability, operational efficiency, and secure application design.
Your Skills and Experience
Must-Have
- Strong experience in operating Kubernetes clusters (EKS + on-prem), container orchestration, and service meshes.
- Expertise in AWS services (VPC, EC2, IAM, CloudWatch, EKS, ElastiCache, MSK).
- Proficiency in Terraform, Ansible, and infrastructure as code principles.
- Solid knowledge of CI/CD pipelines (GitHub Actions, GitLab CI, Jenkins) with a focus on operational reliability and security.
- Deep understanding of Linux internals, networking (iptables, nftables, routing), and security hardening.
- Strong observability skills: Prometheus, Grafana, Loki, OpenTelemetry.
- Experience supporting high-availability, high-concurrency production systems.
- Experience in incident management, postmortems, and continuous improvement of reliability metrics.
- Willingness to participate in on-call rotations to ensure 24/7 system uptime.
Nice-to-Have
- Experience with Redis and Kafka in production at scale.
- Familiarity with secure networking automation and compliance frameworks.
- Knowledge of DevSecOps practices, Vault, IAM policy enforcement, and vulnerability management.
Qualifications
- Bachelor's degree in Computer Science, Information Systems, or equivalent experience.
- 5+ years in SRE, DevOps, Site Reliability, or Infrastructure Engineering roles.
-
Site Reliability Engineer
6 days ago
Dubai, Dubai, United Arab Emirates DICETEK LLC Full timeJob SummaryWe are looking for a Site Reliability Engineer (SRE) to ensure the reliability, scalability, and performance of our production systems. The SRE will work closely with engineering, DevOps, and product teams to build highly available systems, automate operations, and improve system observability while maintaining service level objectives (SLOs).Key...
-
Site Reliability Engineering Manager
1 week ago
Dubai, Dubai, United Arab Emirates Styli Full timeRole:Head of Site Reliability EngineeringLocation:DubaiAbout Styli Marketplace:Launched in 2019 by Landmark Group, Styli Marketplace is the first eCommerce-only fashion venture of the Group, quickly becoming a leading online destination for fashion and lifestyle across the GCC, including Saudi Arabia, UAE, Kuwait, Bahrain, and beyond. Styli connects global...
-
Site Reliability Engineer
1 week ago
Dubai, Dubai, United Arab Emirates ManpowerGroup Middle East Full timeSite Reliability Engineer / AMS Support Engineer - Digital HealthcareOur client, a leading global healthcare technology company, is looking for an experienced Site Reliability Engineer / Application Management Services (AMS) Support Engineer to join their innovative team.The company is at the forefront of digital health, partnering with healthcare and care...
-
Site Reliability Engineer
1 week ago
Dubai, Dubai, United Arab Emirates Dicetek LLC Full timeJob SummaryWe are looking for aSite Reliability Engineer (SRE)to ensure the reliability, scalability, and performance of our production systems. The SRE will work closely with engineering, DevOps, and product teams to build highly available systems, automate operations, and improve system observability while maintaining service level objectives (SLOs).Key...
-
Head of Site Reliability
1 week ago
Dubai, Dubai, United Arab Emirates Styli Full timeCompany DescriptionLaunched in 2019 by Landmark Group, Styli is the first eCommerce-only fashion venture of the Group, quickly becoming a leading online destination for fashion and lifestyle across the GCC, including Saudi Arabia, UAE, Kuwait, Bahrain, and beyond. Styli connects global sellers and creators with millions of fashion-forward customers, offering...
-
Site Reliability Engineer
6 days ago
Dubai, Dubai, United Arab Emirates Vend Tech Group Full timeJob Title:Payments SRE (Site Reliability Engineer)Department:Service LineLocation:DubaiReporting to:Test ArchitectWe're supporting on the hiring of aPayments SREto own thestability, performance, and cost optimisationof our Payments Testing-as-a-Service (TaaS) platform. You'll manage the cloud-hostedIliad T3instances for assigned regions, ensuring service...
-
Site Reliability Engineering Lead
4 days ago
Dubai, Dubai, United Arab Emirates Edison Smart® Full timeSRE Leader JDWhat you'll doStrategy and GovernanceFormulate and implement the company-level reliability strategy and SLO/ error budgeting mechanism, and establish a reliability measurement system centered on business impact.Establish release and change governance (access control, canary, rollback, freeze window), and promote the quantification and...
-
reliability engineer
1 week ago
Dubai, Dubai, United Arab Emirates DUNCAN AND ROSS MANAGEMENT CONSULTANCIES Full timeReliability Engineer (STP) ResponsibilitiesJob PurposeAsset Optimization: Develop and enhance the reliability and performance of critical Sewage Treatment Plant (STP) assets.Operational Efficiency: Minimize unplanned downtime, improve equipment availability, and optimize overall lifecycle performance.Data-Driven Strategy: Support safe, compliant, and...
-
reliability engineer
2 weeks ago
Dubai, Dubai, United Arab Emirates Duncan & Ross Full timePurpose of the Job:Develop and enhance the reliability and asset performance of critical Sewage Treatment Plant (STP) assetsMinimize unplanned downtime, improve equipment availability, and optimize lifecycle performanceSupport safe, compliant, and cost-effective plant operations through data-driven reliability engineeringJob Responsibilities:Develop,...
-
Site Engineer
2 weeks ago
Dubai, Dubai, United Arab Emirates Al Manar Metal Doors & Windows Fixing Full timeJob Description:We are hiring a qualified Site Engineer with strong experience in Aluminium Systems, ACP Cladding, and Façade Works. The ideal candidate must have 3–5 years of relevant experience in the Gulf and hold a valid UAE Driving License. This role requires excellent technical skills and the ability to manage and supervise site activities...