We are looking for SRE (Site Reliability Engineer)

Posted 6 months ago

We are looking for SRE(Site Reliability Engineer)  at Remote



  • Strong monitoring tools ( Promethus /Grafana) – Expertise
  • Dashboarding – Splunk  – Expertise
  • Configuration &  Deployment in DevOps Environment  – Expertise
  • Alerts
  • Microservice based projects
  • Container ( Docker / Kubernetes )
  • IAC Terraform /ansible

Candidate must have hands on experience in Monitoring Tools (Prometheus, Grafana, Splunk, Flow Logs, Splunk) and container platforms (K8s)

SRE Job Description

You will build and maintain solutions for getting insights on infrastructure and services supporting applications with focus on logs, metrics and application traces that improve Observability. SRE/Observability engineer will think about the problem end-to-end: automation of data collection from common data sources, store data efficiently in Application Performance Managing and Monitoring tool, render this information for the user based on the defined Service Level Objectives (SLOs) and Service Level Indicators (SLIs) and focus on the actions, based on these insights.


  • Define consistent monitoring, metrics and alerting across different micro-services (Docker-Kubernetes, Serverless)
  • Develop solution to implement the SLO/SLI requirements, including visualization of the monitoring dashboard
  • Collaborate with other development, security, and compliance teams to execute on product deliverables
  • Evaluating the necessary tradeoffs for competing needs from privacy/cybersecurity, data retention and performance degradation
  • Focus on observability with an eye to quicker resolution of production issues

Skills and Experience Requirements:

  • 5+ Years of hands-on experience with cloud-based technologies and tools in configuration management, deployment, monitoring and operations
  • 5+ Years of experience with DevOps, continuous Delivery
  • Expertise in working in partnership with colleagues and in leading collaborative teams to achieve common goals
  • Candidate must have hands on experience with logging and monitoring solutions such as Prometheus, Grafana, Splunk/Elastic, Fluent Bit, Logstash, Kibana, Grafana, Application and Infrastructure monitoring tools, and Public Cloud monitoring tools such as CloudWatch, VPC flow logs
  • Candidate must have hands on experience with Kubernetes Platform
  • Strong architectural understanding for large scale distributed microservice and serverless based systems is a plus
  • Strong verbal and written communication skills, with the ability to work effectively across internal and external organizations
  • AWS experience preferred
  • Strong security background and some understanding of FedRAMP technical and operation requirements
  • Strong experience working with Linux operating system and docker containers
  • Experience with configuration management and Infrastructure as a code tools like Terraform, Ansible, CloudFormation, Salt
  • Strong experience working with scripting languages like python and bash


Techical Recruiter



Phone: 732-348-1384

Skype id: Partha Sarathy

LinkedIN: https://www.linkedin.com/in/partha-sarathy-55617919a/