
Site Reliability Engineering (SRE) Manager
- القاهرة الجديدة
- دائم
- دوام كامل
- Define, implement, and maintain the SRE framework in collaboration with Platform SREs
- Ensure the framework aligns with organizational goals around performance, availability, and operational excellence
- Promote standardized best practices to drive consistent execution across diverse technology platforms
- Drive value realization by ensuring the framework leads to tangible improvements in reliability, efficiency, and customer satisfaction
- Build and nurture an engaged SRE community of practice:
- Establish regular routines for Platform SREs:
- Oversee troubleshooting efforts for complex problems and root cause analysis in collaboration with L3/L4 support vendors. Lead the technical resolution of major IT disruptions with required teams (not limited to regular working hours).
- Drive proactive improvements by defining appropriate Service Level Indicators (SLIs), analyzing incident trends to identify root causes, and implementing permanent fixes to prevent recurrence
- Optimizing system performance and ensuring scalability to meet new demands. Align platform-specific SRE objectives with overall reliability goals.
- Develop and implement automation strategies across all platforms to enhance efficiency, reduce manual interventions, and improve system reliability.
- Steer automation initiatives across platforms to boost operational efficiency, minimize manual tasks, and strengthen system reliability.
- Promote the utilization of observability and automation tools (Dynatrace, Azure Monitor, Terraform, Ansible, etc.) and ensure a unified approach to monitoring, performance tuning, and improvements across platforms.
- Digital Products Performance & Availability
- Mean Time to Recovery (MTTR)
- Reduction in User-Facing Incidents
- Number of Automations
- Adoption of SRE Practices Across Platforms
- Bachelor's or Master's degree in Computer Science, Engineering or a related field.
- 5+ years of experience in Site Reliability Engineering, DevOps, or a similar role.
- 3+ years leading team in IT Operations or Development
- Proven track record in administering full-stack technology environments in enterprise landscape, including but not limited to:
- Exposure to business-critical applications and platforms such as SAP (S4Hana, MDG), MS Dynamics or similar enterprise systems
- Hands-on expertise in system monitoring and observability (e.g. Dynatrace, Datadog, Splunk), automation (Terraform, Ansible, etc.) and performance tuning utilizing industry standard tools.
- Leadership & Influence: Strong ability to lead distributed teams, manage priorities, and influence stakeholders to achieve adoption and value delivery. Comfortable leading teams through change (new processes, tooling, and cultural differences)
- Collaboration & Communication: Excellent communication and collaboration skills, capable of working effectively across different hierarchical levels. Able to articulate complex concepts clearly to both technical and non-technical audiences.
- Problem Solving & Critical Thinking: Superior troubleshooting skills and proactive approaches to managing complex issues. Being able to resolve conflicts diplomatically.
- Technical Acumen: Strong technical foundation to engage deeply with engineering teams.
- Operational Mindset: Focused on delivering reliability, stability, and operational excellence.