1 244 Devops Engineers jobs in the United Arab Emirates

Site Reliability Engineer

Abu Dhabi, Abu Dhabi D4 Insight

Posted today

Job Viewed

Tap Again To Close

Job Description

Overview

We’re Hiring: Site Reliability Engineer

Join us as a Site Reliability Engineer and help build scalable, secure, and high-performance infrastructure for cutting-edge fintech platforms in wealth management, digital wallets, trading, and blockchain.

Responsibilities
  • Contribute to designing, deploying, and maintaining reliable cloud infrastructure (AWS/Azure).
  • Manage databases, integrations, and DevOps automation to streamline operations.
  • Support cybersecurity and compliance frameworks to ensure secure, compliant services.
  • Collaborate with cross-functional teams to deliver resilient services for fintech platforms.
Qualifications
  • Proven experience in cloud infrastructure (AWS/Azure).
  • Strong in DB management, integrations & DevOps automation.
  • Familiar with cybersecurity & compliance frameworks.
  • Bonus: Knowledge of fintech trends & emerging tech.
About the Team

We craft, deploy, and manage bespoke services in CRM, data and AI, cybersecurity and consulting.

#J-18808-Ljbffr

This advertiser has chosen not to accept applicants from your region.

Site Reliability Engineer

AED120000 - AED360000 Y TechHive AI

Posted today

Job Viewed

Tap Again To Close

Job Description

About TechHive AI

TechHive AI is a global tech solutions provider delivering expertise in
cloud computing, DevOps, AI/ML, software engineering
, and scalable enterprise infrastructure. With clients across the US, UK, EU, and GCC, we empower digital transformation for startups and enterprises alike.

We are now hiring a
UAE-based Site Reliability Engineer (SRE)
to join our DevOps & Cloud Engineering team. This role focuses on
reliability, performance, automation, and incident response
across high-availability production systems.

About the Role

As a
Site Reliability Engineer (SRE)
, you'll blend
software engineering, system administration, and DevOps principles
to ensure system uptime, scalability, and performance. You will help build fault-tolerant infrastructure, automate deployment pipelines, and respond to incidents swiftly and effectively.

This is a
remote role
, but
you must be located in the UAE
due to regional time zone coordination and client compliance.

Key Responsibilities

  • Maintain and improve system reliability, availability, and performance across cloud-based infrastructure (AWS, GCP, or Azure)
  • Automate and manage deployment pipelines using
    CI/CD tools
    like GitLab CI, GitHub Actions, or Jenkins
  • Develop and maintain
    observability and monitoring systems
    (e.g., Prometheus, Grafana, ELK, CloudWatch)
  • Build tools/scripts to automate operational tasks and incident recovery
  • Participate in on-call rotations and ensure SLAs are met
  • Lead
    root cause analysis (RCA)
    after critical incidents and implement postmortem action items
  • Ensure security best practices across infrastructure and deployments
  • Implement
    infrastructure as code (IaC)
    using Terraform or CloudFormation
  • Collaborate closely with DevOps, Software Engineers, QA, and Product teams

Required Qualifications

  • 3+ years
    of experience in DevOps, SRE, or Cloud Infrastructure Engineering
  • Strong knowledge of
    AWS
    ,
    Azure
    , or
    Google Cloud Platform
  • Experience with
    Docker
    ,
    Kubernetes
    , and container orchestration
  • Proficiency in
    Linux system administration
    , shell scripting, and automation
  • Strong experience with
    monitoring, alerting, and logging tools
    (Grafana, Prometheus, ELK, Datadog, etc.)
  • Familiarity with
    incident response
    ,
    on-call practices
    , and
    SLAs/SLOs
  • Proficient in at least one scripting language (Python, Bash, or Go)
  • Familiar with
    Git
    ,
    CI/CD pipelines
    , and
    version control workflows
  • Excellent problem-solving and communication skills
  • Must be residing in the UAE

Preferred Qualifications

  • Experience with
    multi-region, highly available architectures
  • Knowledge of
    SRE best practices
    and Google's SRE principles
  • Certifications: AWS DevOps Engineer, Google SRE, Kubernetes Admin (CKA), etc.
  • Experience working with AI/ML pipelines or data-intensive systems

Why Join TechHive AI?

Work remotely while collaborating with international teams

Flexible hours with a performance-driven culture

Competitive compensation package

Opportunity to work on high-impact, cloud-native projects

Continuous learning and certification support

How to Apply

Send your CV to

Subject Line:
"SRE – Remote (UAE Based)"

Or apply via LinkedIn.

This advertiser has chosen not to accept applicants from your region.

Site Reliability Engineer

AED120000 - AED240000 Y ACIT Labs

Posted today

Job Viewed

Tap Again To Close

Job Description

Location: Abu Dhabi

Duration: Yearly Renewable Contract

Role Summary:

We are looking for a Site Reliability Engineer (SRE) to maintain the availability, scalability, and performance of critical services deployed across cloud and on-premise environments. This role combines software engineering and systems engineering to automate operations and improve reliability in CI/CD and production environments.

Key Responsibilities:

  • Maintain uptime and performance of applications deployed across hybrid infrastructure
  • Implement observability (logging, metrics, tracing) using Prometheus, Grafana, ELK, Azure Monitor
  • Troubleshoot production issues, participate in incident response, and root cause analysis
  • Automate infrastructure, monitoring, and runbooks using IaC tools and scripting
  • Implement and track SLOs, SLIs, and error budgets
  • Build self-healing systems and resilient deployments
  • Collaborate with developers, security teams, and cloud engineers to enforce reliability practices

Required Skills:

·   Experience with Azure/AWS/GCP monitoring tools and on-prem observability stacks

·   Strong in Linux/Unix administration, scripting (Python, Bash)

·   Hands-on with CI/CD pipelines, Kubernetes, and Helm

·   Good understanding of load balancing, failover, HA architecture

·   Familiar with incident management, postmortem writing, and runbook creation

Preferred Qualifications:

  • Experience with Terraform, Ansible, or Pulumi
  • Knowledge of service mesh (Istio, Linkerd) and API gateway configurations
  • Certifications: SRE Foundation, Azure/AWS Cloud Practitioner, or Kubernetes Administrator (CKA)
  • Awareness of compliance standards (CIS, NIST, ISO 27001)
This advertiser has chosen not to accept applicants from your region.

Site Reliability Engineer

AED80000 - AED120000 Y Avanza Solutions

Posted today

Job Viewed

Tap Again To Close

Job Description

Company Description

Since 2000, Avanza Solutions has been empowering organizations around the world to embrace digital transformation through innovative digital platforms and services. Renowned for its excellence and continual evolution, Avanza specializes in developing, deploying, and integrating advanced technologies like Digital Banking, AI, Blockchain, and Smart City applications. Avanza has delivered a wide array of digital solutions across various sectors, including Banking, Finance, Telecommunications, Insurance, Pharmaceuticals, and Government. With a vision of people-centric innovation and automation, Avanza has transformed the operational and functional landscapes of numerous enterprises globally.

Role Description

This is a full-time on-site role for a Site Reliability Engineer focused on Bank Applications, located in Abu Dhabi. The Site Reliability Engineer will be responsible for ensuring the reliability and performance of banking applications by managing software development, system administration, troubleshooting, and infrastructure tasks. Day-to-day tasks include monitoring system performance, automating operations, enhancing system stability, and responding to incidents and issues promptly to maintain seamless operations.

Qualifications

  • Expertise in Site Reliability Engineering and Troubleshooting
  • Proficiency in Software Development and System Administration
  • Experience with Infrastructure management
  • Strong problem-solving and analytical skills
  • Excellent written and verbal communication skills
  • Bachelor's degree in Computer Science, IT, or related field
  • Relevant certifications are a plus
This advertiser has chosen not to accept applicants from your region.

Site Reliability Engineer

AED90000 - AED120000 Y Avanza Solutions

Posted today

Job Viewed

Tap Again To Close

Job Description

Company Description

Since 2000, Avanza Solutions has been empowering organizations globally to embrace digital transformation through innovative digital platforms and services. Renowned for its excellence, Avanza specializes in digital banking, customer relationship & experience management, artificial intelligence, blockchain, smart city applications, business automation, and cognitive platforms. The company has successfully delivered digital solutions to sectors including Banking, Finance, Telecommunications, Insurance, Pharmaceuticals, and Government. Avanza Solutions is committed to driving innovation and automation in numerous enterprises worldwide.

Role Description

This is a full-time on-site role, located in Abu Dhabi, for a Site Reliability Engineer - Only Bank Application. The Site Reliability Engineer will be responsible for ensuring the reliability, efficiency, and performance of banking applications. Day-to-day tasks include troubleshooting issues, developing software solutions, and managing system administration tasks. The engineer will also maintain infrastructure and ensure the stability of banking applications.

Qualifications

  • Proficiency in Site Reliability Engineering and Troubleshooting skills
  • Experience in Software Development
  • Skills in System Administration
  • Knowledge of Infrastructure management
  • Excellent problem-solving and analytical skills
  • Strong communication and teamwork abilities
  • Experience in the banking or financial sector is a plus
  • Bachelor's degree in Computer Science, Information Technology, or related field
This advertiser has chosen not to accept applicants from your region.

Site Reliability Engineer

AED250000 - AED500000 Y ACIT Labs

Posted today

Job Viewed

Tap Again To Close

Job Description

Location: Abu Dhabi or Remote/Hybrid

Experience: 6–10 Years


Role Summary:

We are looking for a Site Reliability Engineer (SRE)
 to maintain the
availability, scalability, and performance
 of critical services deployed across
cloud and on-premise environments
. This role combines software engineering and systems engineering to automate operations and improve reliability in CI/CD and production environments.

Key Responsibilities:

Maintain uptime and performance of applications deployed across hybrid infrastructure

Implement observability (logging, metrics, tracing) using Prometheus, Grafana, ELK, Azure Monitor

Troubleshoot production issues, participate in incident response, and root cause analysis

Automate infrastructure, monitoring, and runbooks using IaC tools and scripting

Implement and track SLOs, SLIs, and error budgets

Build self-healing systems and resilient deployments

Collaborate with developers, security teams, and cloud engineers to enforce reliability practices

Required Skills:

Experience with Azure/AWS/GCP
 monitoring tools and
on-prem observability stacks

Strong in Linux/Unix administration
, scripting (Python, Bash)

Hands-on with
CI/CD pipelines
,
Kubernetes
, and
Helm

Good understanding of load balancing
,
failover
,
HA architecture

Familiar with incident management
, postmortem writing, and runbook creation

Preferred Qualifications:

Experience with Terraform
,
Ansible
, or
Pulumi

Knowledge of service mesh
 (Istio, Linkerd) and
API gateway
 configurations

Certifications: SRE Foundation, Azure/AWS Cloud Practitioner, or Kubernetes Administrator (CKA)

  • Awareness of
    compliance standards
     (CIS, NIST, ISO 27001)
This advertiser has chosen not to accept applicants from your region.

Site Reliability Engineer

AED250000 - AED500000 Y DeepRec

Posted today

Job Viewed

Tap Again To Close

Job Description

Senior Trading Operations Engineer | Dubai | Crypto / HFT

(SRE & DevOps)

Our client is a high-performance trading firm operating at the forefront of low-latency crypto markets. They're lean, profitable, and punch far above their weight — executing over
1 million trades per day
and touching
6% of daily volume on major exchanges
, all with a tight-knit global team.

They're now hiring a
Senior Trading Operations Engineer
in
Dubai
to ensure seamless real-time performance across their trading infrastructure.

The Role

You'll sit at the core of live operations, working hand-in-hand with traders and developers to optimise performance, automate systems, and rapidly troubleshoot issues. This is a hands-on engineering role with future
leadership potential
.

What You'll Do

  • Monitor and manage real-time high-frequency trading systems
  • Deep-dive into latency and performance metrics (Grafana, Prometheus, etc.)
  • Improve system reliability and uptime through automation & tooling
  • Own CI/CD pipelines, config management (Ansible), and on-call rotation
  • Collaborate on deployments, A/B testing, and incident response

What They're Looking For

  • 8+ years in software or trading operations
  • Strong Python scripting and Linux systems knowledge
  • Experience in low-latency, distributed, or HFT-style environments
  • Familiar with AWS, Grafana, Prometheus, ClickHouse, ELK
  • Bonus: Kubernetes, containerization, incident frameworks

Why Join?

  • High-impact role in a firm with
    zero external funding
    and real PnL
  • Access to cutting-edge infrastructure + strategies
  • Dubai-based with full relocation support (visa, flight, ID, etc.)
  • Regular global "workations" and a multicultural, remote-friendly team

Ready to own production in one of the world's fastest trading stacks?

Drop us a message or apply directly — all conversations held in confidence.

This advertiser has chosen not to accept applicants from your region.
Be The First To Know

About the latest Devops engineers Jobs in United Arab Emirates !

Site Reliability Engineer

AED150000 - AED250000 Y Kingston Stanley

Posted today

Job Viewed

Tap Again To Close

Job Description

Lead Site Reliability Engineer – Dubai, Full Time

Our client, who are a technology organization that are leaders within their field, are seeking an experienced
Lead Site Reliability Engineer
to join their team.

As Lead Site Reliability Engineer, you will take ownership of their production environment, which processes hundreds of billions of data points every year. This is a senior opportunity for a Kubernetes expert with strong experience in distributed data systems, automation, and infrastructure security.

Key Responsibilities:

  • Design and manage scalable Kubernetes-based infrastructure.
  • Automate provisioning, configuration management, and deployments.
  • Optimise and administer distributed databases including Cassandra, ScyllaDB, PostgreSQL, MongoDB, and ElasticSearch.
  • Build and maintain advanced monitoring, logging, and alerting systems.
  • Manage Linux-based systems with a focus on security, patching, and hardening.
  • Develop and test disaster recovery and business continuity strategies.
  • Lead performance tuning, capacity planning, and cost optimisation.
  • Act as the highest escalation point for complex infrastructure incidents.
  • Collaborate with development teams to refine CI/CD pipelines.
  • Produce clear documentation of systems, processes, and architecture.
  • Provide mentorship and technical guidance to the wider technology team.

Qualifications & Experience:

  • Degree in Computer Science, Engineering, or related field.
  • 8+ years in SRE, DevOps, or Systems Engineering across large-scale production environments.
  • Expert in Kubernetes cluster design, scaling, and security.
  • Proven experience with both NoSQL and relational databases.
  • Strong Linux administration skills (Ubuntu/SUSE) with system hardening expertise.
  • Proficiency in scripting (Bash, Python, Go) and Infrastructure as Code tools (Terraform, Ansible, Pulumi).
  • Knowledge of load balancing, networking, and storage solutions.

Nice to Have:

  • Strong understanding of infrastructure and data security.
  • Hands-on disaster recovery and resilience planning.
  • Advanced database tuning and recovery skills.
  • Familiarity with Go-based application support.
  • Excellent troubleshooting and analytical skills.
  • Proactive, accountable, and effective in high-pressure environments.
  • Strong collaboration with cross-functional engineering teams.

Salary
: Competitive

Benefits
: Medical Insurance and Visa (family), flight ticket allowance (employee only)

This advertiser has chosen not to accept applicants from your region.

Site Reliability Engineer

AED90000 - AED120000 Y Avrioc Technologies

Posted today

Job Viewed

Tap Again To Close

Job Description

Requirements:

Experience in Site Reliability Engineering with a good hands-on experience on the list of tech stack as listed in the responsibilities section. Additionally, if you are certified on any of the technologies, we would love to see you prove it with your detail-oriented problem-solving skillset and knowledge of the products.

Job description:


• Possess a strong coding background with expertise in any language like Python, Go, or Java.


• Ability to troubleshoot code errors efficiently, identifying root causes for effective communication with development teams.


• Advocate for and implement best practices in code development, prioritizing maintainability, scalability, and reliability.


• Keep current with industry trends and technologies, introducing innovative solutions to enhance system reliability and performance.


• Leverage proven design patterns to connect databases, middleware's, and other components, ensuring robust and fault-tolerant system interactions.


• Actively contribute to the evolution of the organization's design patterns by researching and proposing enhancements based on industry trends.


• Gather and analyze metrics from operating systems as well as critical application services to assist in quick identification of issues and faults.


• Partner with development to improve the reliability of application services and release procedures.


• Participate in system design consulting, platform management and capacity planning.


• Should have deep understanding on Observability enablement for the different application stacks.

This advertiser has chosen not to accept applicants from your region.

Site Reliability Engineer

AED70000 - AED120000 Y Khazna Data Centers

Posted today

Job Viewed

Tap Again To Close

Job Description

Khazna was founded in 2012 and has grown rapidly into becoming the leading and trusted wholesale Data Center provider in the Middle East and North Africa region. Through our Data Centers, we provide industry benchmark levels of power supply and cooling services to better serve the growing need for data center operations in the UAE and wider region.

We are seeking a
Site Reliability Engineer
to support the reliability engineering program across multiple data centers in our fleet. Reporting to the Reliability Manager, you will be responsible for monitoring system performance, driving preventative and predictive maintenance initiatives, leading root cause analysis efforts, and collaborating with cross-functional teams to minimize downtime and enhance infrastructure resilience.

Key Accountabilities:

  • Monitor real-time and historical performance metrics for critical power, cooling, and IT systems.
  • Analyse system data to identify trends, failure modes, and reliability risks.
  • Execute Root Cause Analyses (RCA) and Failure Mode & Effects Analyses (FMEA), then drive corrective and preventive actions.
  • Develop and maintain condition-based and predictive maintenance routines, leveraging IoT, data analytics, and machine learning tools.
  • Support preventive maintenance programs: schedule, document, and validate maintenance activities.
  • Assist in asset lifecycle planning, including upgrades, decommissioning, and end-of-life strategies.
  • Contribute to capacity runway assessments to forecast infrastructure needs.
  • Implement and enforce availability management plans, risk assessments, and mitigation strategies.
  • Ensure data collection and reporting processes for reliability KPIs (e.g., MTBF, MTTR, availability) are standardized and accurate.
  • Prepare reliability reports and dashboards; present findings and recommendations to site leadership.
  • Respond to and lead failure-response efforts during site incidents, ensuring rapid recovery and root-cause follow-through.
  • Maintain compliance with industry standards and regulations (Uptime Institute, ISO, ASHRAE).
  • Collaborate with Operations, Engineering, Facilities, and Vendors to integrate reliability best practices into day-to-day workflows.
  • Propose continuous-improvement initiatives and pilot emerging reliability technologies.
  • The job holder may be required to undertake additional duties, which may be reasonably expected and forms part of the function of the job.

Minimum Qualifications:

  • Bachelor's degree in mechanical, Electrical, Reliability, or related Engineering discipline.

Minimum Experience:

  • 3+ years of experience in reliability engineering, maintenance engineering, or a data center operations environment.
  • Hands-on experience with RCA, FMEA, and predictive maintenance methodologies.
  • Proficiency with monitoring platforms, data-analytics tools, and scripting (e.g., Python, R).
  • Familiarity with IoT sensors, machine-learning frameworks, and condition-based monitoring systems.
  • Knowledge of industry reliability standards and regulations (ISO, ASHRAE, Uptime Institute).

Job-Specific Skills (Generic / Technical):

  • Strong analytical and problem-solving skills, with acute attention to detail.
  • Effective communicator, able to present technical findings to diverse audiences.
  • Project coordination skills and the ability to manage multiple reliability initiatives.
  • Collaborative mindset, comfortable working in cross-functional teams.
  • Self-starter with a continuous-improvement attitude and commitment to resilience.
This advertiser has chosen not to accept applicants from your region.
 

Nearby Locations

Other Jobs Near Me

Industry

  1. request_quote Accounting
  2. work Administrative
  3. eco Agriculture Forestry
  4. smart_toy AI & Emerging Technologies
  5. school Apprenticeships & Trainee
  6. apartment Architecture
  7. palette Arts & Entertainment
  8. directions_car Automotive
  9. flight_takeoff Aviation
  10. account_balance Banking & Finance
  11. local_florist Beauty & Wellness
  12. restaurant Catering
  13. volunteer_activism Charity & Voluntary
  14. science Chemical Engineering
  15. child_friendly Childcare
  16. foundation Civil Engineering
  17. clean_hands Cleaning & Sanitation
  18. diversity_3 Community & Social Care
  19. construction Construction
  20. brush Creative & Digital
  21. currency_bitcoin Crypto & Blockchain
  22. support_agent Customer Service & Helpdesk
  23. medical_services Dental
  24. medical_services Driving & Transport
  25. medical_services E Commerce & Social Media
  26. school Education & Teaching
  27. electrical_services Electrical Engineering
  28. bolt Energy
  29. local_mall Fmcg
  30. gavel Government & Non Profit
  31. emoji_events Graduate
  32. health_and_safety Healthcare
  33. beach_access Hospitality & Tourism
  34. groups Human Resources
  35. precision_manufacturing Industrial Engineering
  36. security Information Security
  37. handyman Installation & Maintenance
  38. policy Insurance
  39. code IT & Software
  40. gavel Legal
  41. sports_soccer Leisure & Sports
  42. inventory_2 Logistics & Warehousing
  43. supervisor_account Management
  44. supervisor_account Management Consultancy
  45. supervisor_account Manufacturing & Production
  46. campaign Marketing
  47. build Mechanical Engineering
  48. perm_media Media & PR
  49. local_hospital Medical
  50. local_hospital Military & Public Safety
  51. local_hospital Mining
  52. medical_services Nursing
  53. local_gas_station Oil & Gas
  54. biotech Pharmaceutical
  55. checklist_rtl Project Management
  56. shopping_bag Purchasing
  57. home_work Real Estate
  58. person_search Recruitment Consultancy
  59. store Retail
  60. point_of_sale Sales
  61. science Scientific Research & Development
  62. wifi Telecoms
  63. psychology Therapy
  64. pets Veterinary
View All Devops Engineers Jobs