Remote Job ::: Principal Site Reliability Engineer/DevOps Architect ::: 10+ Exp

Posted 2 months ago

Job Title
 Principal Site Reliability Engineer
 Job Description
 Location: REMOTE

Must be 10 years + experience?if someone is architect level that would be great

Client is looking for a talented individual with proven track record to take up an exciting role of a Principal Site Reliability Engineer within its HealthCare Research and Development Unit. Our users count on our application and services to be fast, reliable and secure. The candidate will be working with a talented group of cross-functional individuals to plan, design and build various tools, systems and infrastructure that enables continuous integration, testing, monitoring, elasticity and delivery of products and solutions on our hosted cloud platform.


The candidate must be a self-starter and be able to manage and prioritize many tasks at any given time. He/she must have good communication skills and possess the ability to follow up with groups across the company. The candidate should be comfortable working in a fast-paced technical environment. Demonstrated attention to detail and follow-through are very important for this role. Monitoring the platform, reacting to problems and proactively addressing issues before they affect performance or availability is going to be a prime focus for this individual. In this role, the candidate will work on products that are used by various health care providers to provide healthcare and life critical services to patients around the world.


As a Principal Site Reliability Engineer you will:

* Use process and best practices to ensure our platform and its applications are stable and performant.

* Keep the customer-facing applications and services always available

* Plan, coordinate, and manage releases and their deployment.

* Proactively identify hurdles to stability and implement self-healing and resiliency initiatives.

* Build and maintain tools that will help with day to day activities and orchestration of our cloud environments.

* Work to automate detection and resolution of recurring issues in the production environment.

* Participate in the Incident and Problem Management processes and assist the teams in ensuring proper RCAs are documented and follow-ups are delivered.

* Communicate with software engineers, QA engineers, product management and operations staff on a daily basis for sharing ideas, status on ongoing work and prioritizing future work.

* Implement Infrastructure as a service and Infrastructure as code practices wherever applicable.

* Stay up-to-date on relevant technologies internally as well as externally, plug into user groups, understand trends and opportunities to ensure we are using the best possible techniques and tools.



Required Education and Experience

Applicants must meet one of the following education and experience requirements:

* 5 years of relevant experience and a Bachelor?s degree in computer science or similar engineering field

* 2 years of relevant experience and a Master?s degree in computer science or similar engineering field


Basic Qualifications

* Experience working as a Senior Site Reliability Engineer or at a similar capacity operating a highly scalable and distributed cloud based platform.

* Experience with operating products in the Cloud (Azure, AWS, GCP).

* Experience designing cloud hosted solutions that provide maximum reliability, security and performance.

* Experience with Infrastructure as Code Programming language like Terraform.

* Experience with Container Technologies like Docker, Kubernetes etc.

* Experience with monitoring technologies like Elk/Kibana, NewRelic, Prometheus etc.

* Experience with configuration management systems like Salt, Chef, Puppet or Ansible etc.

* Experience with continuous integration and delivery systems like Jenkins, Azure DevOps etc.

* Experience with scripting language like Python, Perl, PowerShell etc.

* Experience using source control systems such as Git and Perforce.

* Experience with TCP/IP networking and debugging.

* Experience working in a Linux and Windows environment.

* Experience with SQL or equivalent language.

* Experience working in an Agile environment.


Desired Qualifications

* Excellent verbal and written communication and interpersonal skills.

* Experience with the Atlassian Tools such as JIRA/Confluence.

* Ability to work effectively with cross-functional teams (Engineering, QA, release management, network operations, product management, professional services, etc.).

* Strong organizational and leadership skills, and the ability to drive the day-to-day activities of internal resources.

* Demonstrated ability to quickly grasp new technologies.

* Must be action oriented, capable of multitasking well based on priorities.

* Strong team player who enjoys working in a fast-paced, dynamic environment.

* Knowledge of Internet technologies (DNS, HTTP, streaming, web servers, etc.) a strong plus.

* Ability to build, use and configure metrics collection, reporting and alerting systems.

Should you be interested, please send me a copy of your resume in word format along with the following details ASAP.

Full Name:
Current Location:
Hourly rate on C2C/W2:
Work Authorization:
Earliest Available date to start:
Date and times available to interview:
Two Professional References:(Preferably Supervisory references):

I look forward to connecting with you at earliest convenience.

Adarsh Jaiswal, Technical Recruiter
SoftSages Technology
WMBE, E-Verified

Direct No.: (470 ? 749 ? 2022)

Desk No.: 484-321-8314 ext 212

MALVERN, PA, 19355-1942