Sr. Infrastructure Reliability Engineer, Infrastructure Reliability

1 week ago


Abu Dhabi, Abu Dhabi, United Arab Emirates Amazon Full time
Sr. Infrastructure Reliability Engineer, Infrastructure Reliability & Quality

Job ID: | Amazon Asia-Pacific Resources Private Limited (Singapore)

AWS Infrastructure Services owns the design, planning, delivery, and operation of all AWS global infrastructure. In other words, we're the people who keep the cloud running. We support all AWS data centers and all of the servers, storage, networking, power, and cooling equipment that ensure our customers have continual access to the innovation they rely on. We work on the most challenging problems, with thousands of variables impacting the supply chain — and we're looking for talented people who want to help.

You'll join a diverse team of software, hardware, and network engineers, supply chain specialists, security experts, operations managers, and other vital roles. You'll collaborate with people across AWS to help us deliver the highest standards for safety and security while providing seemingly infinite capacity at the lowest possible cost for our customers. And you'll experience an inclusive culture that welcomes bold ideas and empowers you to own them to completion.

Our Amazon Web Services Infrastructure Reliability & Quality (IRQ) engineering team provides engineering support for our data center infrastructure equipment (Air Handling Unit, Switchgear, Breaker, Panel Board, UPS, Transformers, Generators, ATS etc.) as well as infrastructure security equipment (Cameras, Access Control, etc.) throughout their lifecycle. A critical piece of this support is ensuring the highest level of initial quality and ongoing support from our suppliers.

Our Snr Reliability Engineers have experience in using Physics-of-Failure based approach to develop and implement both analytical and empirical approaches for product quality/reliability risk identification and assessment during product design, manufacture as well as deployment stages. They drive AWS application-specific requirements in carrying out both lifecycle environmental and operational stress driven risk analysis, including thermal, electrical, chemical and mechanical stresses so to identify overstress and fatigue-related product weaknesses. Evaluate product design quality/reliability risks and assess electronics manufacture process related quality/reliability issues.

They drive critical component identification and the associated vendor selection and qualification requirements. Using their knowledge of process capability for electronic component production as well as system-level performance requirements to establish critical to quality and reliability metrics, they develop datacenter system level reliability model and related reliability quantification and risk analysis for datacenter configuration optimization.

During sustaining stage, you will be responsible for monitoring product performance in the field and will be responsible to drive root cause analysis of any critical failures and the associated corrective and preventive actions. You will drive effective vendor auditing and quarterly review process to drive the continuous improvements of datacenter availability.

As an SME in the reliability engineering field and product reliability leadership, as well as business negotiations and program management, you will conduct problem analysis and solve as well as communicate with vendors.

In this role, you will be required to travel within APAC and internationally.

Key job responsibilities

1. Proactively drive reliability risk identification, assessment, and mitigation for critical data center infrastructure equipment
2. Conduct comprehensive root cause analysis of any critical equipment failures in the field
3. Collaborate cross-functionally with internal and external partners to influence product specification, design, and reliability qualification
4. Develop and maintain data center infrastructure reliability models and quantify reliability risks
5. Monitor field performance and drive ongoing reliability improvements
6. Serve as a subject matter expert and provide technical leadership on reliability engineering best practices

BASIC QUALIFICATIONS

1. Bachelor's or Master's degree in Reliability Engineering, Physics, Electrical, Mechanical or Materials Engineering or related field
2. 8+ years of Reliability Engineering work experience in high reliability industry
3. 5+ years of experience with failure analysis activities and root cause analysis
4. 5+ years of experience with accelerated life testing, stress analysis and finite element analysis

PREFERRED QUALIFICATIONS

1. Ph.D. in Reliability Engineering, Physics, Electrical, Mechanical or Materials Engineering or a related field.
2. 10+ years of work experience in reliability risk identification and assessment from component to system level applying analytical, experimental and statistical approaches to evaluate product design and manufacture quality/reliability levels.
3. Experience with proactive and effective reliability approaches in a cost-effective manner throughout product design, manufacture and deployment stages
4. Proven experience in working with external design and manufacturing supply chain partners.
5. Familiarity with major data center infrastructure equipment reliability performance
6. Ability in managing multiple qualification activities and development schedules.

Our inclusive culture empowers Amazonians to deliver the best results for our customers. If you have a disability and need a workplace accommodation or adjustment during the application and hiring process, including support for the interview or onboarding process, please visit for more information. If the country/region you're applying in isn't listed, please contact your Recruiting Partner.

Amazon is an Equal Opportunity Employer – Minority / Women / Disability / Veteran / Gender Identity / Sexual Orientation / Age.

#J-18808-Ljbffr

  • Abu Dhabi, Abu Dhabi, United Arab Emirates Amazon Full time

    Sr. Infrastructure Reliability Engineer, Infrastructure Reliability & QualityJob ID: 2804990 | Amazon Asia-Pacific Resources Private Limited (Singapore)AWS Infrastructure Services owns the design, planning, delivery, and operation of all AWS global infrastructure. In other words, we're the people who keep the cloud running. We support all AWS data centers...


  • Abu Dhabi, Abu Dhabi, United Arab Emirates Amazon Full time

    Sr. Infrastructure Reliability Engineer, Infrastructure Reliability & QualityJob ID: | Amazon Asia-Pacific Resources Private Limited (Singapore)AWS Infrastructure Services owns the design, planning, delivery, and operation of all AWS global infrastructure. In other words, we're the people who keep the cloud running. We support all AWS data centers and all of...


  • Abu Dhabi, Abu Dhabi, United Arab Emirates ARMS Reliability Full time

    Job Responsibilities:We are seeking a highly organized and efficient Reliability Resource Planner to join our team at ARMS Reliability. The ideal candidate will have excellent communication skills and be able to schedule, plan, and communicate with technicians and relevant staff members.Scheduling and Planning: Coordinate the deployment of technicians to...


  • Abu Dhabi, Abu Dhabi, United Arab Emirates Discovered MENA Full time

    Site Reliability Engineer (SRE)Location:DubaiDuration:PermanentWe're currently partnered with a leading technology consultancy who are scaling their tech team. They offer a diverse work environment that provide services in the UAE impacting millions of lives. We're currently helping them search for a Site Reliability Engineer to join their ever growing...


  • Abu Dhabi, Abu Dhabi, United Arab Emirates Energy Infrastructure Partners (EIP) Full time

    AECOM is seeking a seasoned Infrastructure Project Manager to oversee the quality assurance of our infrastructure projects in the UAE. As a key member of our team, you will be responsible for ensuring that all project activities conform to high-quality standards and safety requirements.Key responsibilities include:Maintaining accurate records of site...


  • Abu Dhabi, Abu Dhabi, United Arab Emirates Edge Group Full time

    **Job Overview**The Edge Group is seeking a seasoned Sr. Infrastructure Engineer to join our Cyber Defense CMS Team as a top-level Systems Subject Matter Expert (SME) responsible for ensuring the continuity, security, and reliability of infrastructure and services for internal and external stakeholders.This role will focus on technical details across the...


  • Abu Dhabi, Abu Dhabi, United Arab Emirates Astra Tech Full time

    About the RoleAs a Site Reliability Engineer, you will be responsible for driving improvements in operational processes through automation and proactive incident resolution. Key responsibilities include automating routine operational tasks using Shell scripting, maintaining and optimizing middleware components, administering and optimizing Kubernetes...


  • Abu Dhabi, Abu Dhabi, United Arab Emirates Amazon Full time

    Company OverviewAWS Infrastructure Services is responsible for designing, planning, delivering, and operating all AWS global infrastructure. Our team supports all AWS data centers and the servers, storage, networking, power, and cooling equipment that ensure our customers have continual access to innovative services.


  • Abu Dhabi, Abu Dhabi, United Arab Emirates Astra Tech Full time

    Site Reliability Engineer (SRE)Established in 2022, Astra Tech has rapidly expanded its influence by strategically acquiring and developing key platforms such as PayBy, Rizek, Quantix, and Botim. These acquisitions have culminated in the creation of the world's first Ultra App, Botim, which seamlessly integrates fintech, e-commerce, AI-powered tech...


  • Abu Dhabi, Abu Dhabi, United Arab Emirates AI71 Full time

    We are seeking a highly skilled DevOps/Site Reliability Engineer (SRE) to join our team at AI71, an applied research team dedicated to creating helpful and responsible AI agents for knowledge workers. Working closely with industry partners, cross-functional teams of AI experts build products grounded in cutting-edge research.The ideal candidate will have a...


  • Abu Dhabi, Abu Dhabi, United Arab Emirates AI71 Full time

    Senior Site Reliability EngineerAI71 is an applied research team dedicated to creating helpful and responsible AI agents for knowledge workers. Working closely with our industry partners, our cross-functional teams of AI experts build products grounded in the cutting-edge research of our colleagues from the Technology Innovation Institute (TII).We are...


  • Abu Dhabi, Abu Dhabi, United Arab Emirates STAR SERVICES LLC Full time

    STAR SERVICES LLC is seeking an experienced IT Infrastructure Engineer to join our team. In this role, you will be responsible for designing, implementing, and maintaining our IT infrastructure, including Microsoft systems such as Windows Server, Active Directory, Office 365, and Azure.We are looking for a highly skilled individual with at least 3 years of...

  • Data Engineer

    1 week ago


    Abu Dhabi, Abu Dhabi, United Arab Emirates Presight AI Ltd. Full time

    About the RoleWe are seeking a meticulous and expert Senior Engineer - Site Reliability to build and support the Presight delivery model that empowers product & technology teams to develop & deliver high-quality products, improve platform infrastructure and strengthen the reliability of products and solutions.You play a key role in defining & establishing...


  • Abu Dhabi, Abu Dhabi, United Arab Emirates Presight Full time

    About PresightWe are a leading big data analytics company powered by Artificial Intelligence (AI). Our computer vision, AI and omni-analytics platform empowers us to leverage all-source data and support insight-driven decision making that shapes policy and creates safer, healthier, happier, and more sustainable societies.As a Senior Engineer – Site...

  • Reliability Engineer

    3 weeks ago


    Abu Dhabi, Abu Dhabi, United Arab Emirates Bilfinger Middle East Full time

    Get AI-powered advice on this job and more exclusive features. Inhouse Talent Acquisition - Oil & Gas/Infra/PMC/Projects (UAE/Qatar/Bahrain/KSA/Egypt/India/Kuwait) Reliability Engineer Employment Type: Full-time Department: Engineering and Technologies Reports To: Project / Study Manager Role Overview We are seeking a highly skilled and proactive...


  • Abu Dhabi, Abu Dhabi, United Arab Emirates Amazon Full time

    Job DescriptionWe are seeking a highly skilled Data Center Infrastructure Quality Engineer to join our team. As a member of our team, you will work on complex problems, collaborating with cross-functional teams to drive reliability risk identification, assessment, and mitigation for critical data center infrastructure equipment. You will conduct...


  • Abu Dhabi, Abu Dhabi, United Arab Emirates Client of Talentmate Full time

    The Senior System Reliability Engineer is responsible for ensuring the stability and performance of complex systems and applications. This role involves designing, implementing, and maintaining robust systems that meet service-level requirements. The engineer will work closely with cross-functional teams to identify potential issues, conduct root cause...


  • Abu Dhabi, Abu Dhabi, United Arab Emirates Client of Talentmate Full time

    The Senior System Reliability Engineer is responsible for ensuring the stability and performance of complex systems and applications. This role involves designing, implementing, and maintaining robust systems that meet service-level requirements. The engineer will work closely with cross-functional teams to identify potential issues, conduct root cause...

  • Reliability Engineer

    3 weeks ago


    Abu Dhabi, Abu Dhabi, United Arab Emirates Etihad Airways Full time

    The Maintenance Program & Reliability Engineer plays a key role in ensuring compliance with GCAA CAR M regulations by developing, analyzing, and optimizing aircraft maintenance programs. This role focuses on enhancing fleet reliability by identifying trends, implementing technical solutions, and driving continuous improvements in maintenance strategies....

  • Reliability Engineer

    3 weeks ago


    Abu Dhabi, Abu Dhabi, United Arab Emirates Etihad Full time

    SynopsisThe Maintenance Program & Reliability Engineer plays a key role in ensuring compliance with GCAA CAR M regulations by developing, analyzing, and optimizing aircraft maintenance programs. This role focuses on enhancing fleet reliability by identifying trends, implementing technical solutions, and driving continuous improvements in maintenance...