Job Boards Template

Site Reliability Engineer

Pittsburgh, PA 15203

Employment Type: Contract Industry: Information Technology / Systems Job Number: 940 Pay Rate: 50.00

Requirements:
  • BS/MS/PhD in Computer Science, Engineering, or related discipline
  • Proven experience operating highly-automated, mission-critical 24/7 production systems
  • Experience understanding of full stack service/microservice based eCommerce application
  • Good knowledge of client-side rendering, Content delivery network, UI debugging tools, web performance measurement
  • Profound understanding of an ecommerce framework/setup, e.g. Oracle ATG, involving Browse & search, cart, profile, pricing, checkout and order processing modules
  • Experience troubleshooting VM issues (Node, Java) with the help of memory/heap dumps
  • Expertise in application, data and infrastructure architecture disciplines
  • Familiarity with infrastructure components such as routers, load balancers, cloud products, container systems, compute, storage and networks
  • Strong organizational, analytical and critical thinking skills

Responsibilities:
  • Serve as a primary point responsible for the overall health, performance, and capacity of services and infrastructure serving the website and Mobile apps
  • Own and run post SEV1 incident reviews and follow up fixes/changes
  • Solve availability/performance problems and build software-based solutions to prevent re-occurrences
  • Design and develop automation frameworks and test suites to enforce SRE technics and tools
  • Analyze and identify performance bottlenecks and make recommendations
  • Implement metrics, monitoring, incident response and capacity planning processes
  • Collaborate closely with Developers/Solution Architects to ensure a designed solution responds to non-functional requirements such as availability, performance and maintainability
  • Collaborate closely with Performance Engineering to review and understand Performance tests and results
  • Collaborate closely with Monitoring and alerting teams to identify possible metrics within the application and ensure coverage across
  • Build Self-Service tools for the SRE and for other operations’ groups that automate the way out of manual toil of operations
  • Build and maintain runbooks to troubleshoot applications, infrastructure and services that are in use
  • Develop tools to effectively monitor custom applications in a large-scale UNIX environment

 

Vincent Vennero

Send an email reminder to:

Share This Job:

Related Jobs:

Login to save this search and get notified of similar positions.