Information TechnologyInternshipEntry-level(0-1 yr)
Job Description
Role Overview
The Site Reliability Engineer intern will support in applying software engineering principles to IT operations to ensure the company's platforms are reliable, scalable, observable, and efficient. Their role focuses on automation, monitoring, incident management, infrastructure as code, and measurable reliability targets (SLIs/SLOs) to guarantee high availability and performance across all products.
Duties and Responsibilities
System Reliability: Assist in the design, implementation, and continuous improvement of system reliability, availability, and performance by defining and monitoring SLIs, SLOs, and error budgets.
Monitoring and Observability: Support building and managing a robust monitoring framework using Prometheus, Grafana, and Loki to track latency, traffic, errors, and system health.
Infrastructure Automation: Assist in automating infrastructure provisioning and scaling using Infrastructure as Code (IaC) principles with Terraform and Kubernetes.
Incident Management: Participate in incident response processes, including detection, escalation, resolution, and conducting blameless postmortems.
Efficiency and Optimization: Reduce manual operational workload through automation, scripting, and process optimization to improve release velocity.
Collaboration: Work with Engineering, Product, and DevOps teams to improve deployment safety, capacity planning, and cost optimization.
Standards: Assist in establishing alerting strategies and reliability standards that minimize alert fatigue while ensuring rapid resolution of production issues.
Required Knowledge, Qualifications, and Experience
Bachelor's Degree in Computer Science, Information Technology, or a related field.
Some exposure to Kubernetes and Cloud networking.
Experience with monitoring and observability tools.
Good exposure to managing production systems in cloud environments (AWS, Azure, or Google Cloud).
Understanding of CI/CD pipelines (Jenkins, GitLab CI/CD, or equivalent).
Familiarity with containerization tools like Docker.
Basic hands-on exposure to monitoring and metrics systems such as Prometheus and dashboarding tools like Grafana.
Foundational understanding of log aggregation systems such as Loki.
Familiarity with Linux environments and basic system commands.
Exposure to scripting concepts using Python, Bash, or similar languages.
Foundational knowledge of Artificial Intelligence (AI) and AI agents; relevant certifications in AI are an added advantage.
How to Apply
Interested candidates should send their resume and portfolio with the subject SITE RELIABILITY ENGINEER INTERN to recruiting@interintel.co.ke. The submission deadline is 9th March 2026.
How to Apply
Send your resume and portfolio with the subject SITE RELIABILITY ENGINEER INTERN to recruiting@interintel.co.ke. The submission deadline is 9th March 2026.