Skip to main content
Search Jobs

Site Reliability Engineering, Sr Staff

pin icon Bengaluru, Karnataka, India Apply Now
Category Engineering Hire Type Employee Job ID 17594 Date posted 06/02/2026

We Are

Synopsys is the leader in engineering solutions from silicon to systems, enabling customers to rapidly innovate AI-powered products. We deliver industry-leading silicon design, IP, simulation and analysis solutions, and design services. We partner closely with our customers across a wide range of industries to maximize their R&D capability and productivity, powering innovation today that ignites the ingenuity of tomorrow.

You Are

You have spent years keeping complex systems running when everyone else is asleep, and you have learned that reliability is not about firefighting, it is about building systems that do not catch fire in the first place. You know the difference between a metric that tells you something broke and one that tells you something is about to break, and you have strong opinions about which matters more.

You are comfortable working across the stack, from Kubernetes clusters to cloud infrastructure to the Python scripts that tie it all together. At Synopsys, you will work on platforms that support semiconductor design tools used globally, and the reliability work you do will matter to thousands of engineers every day.

What You'll Be Doing

  • Own the reliability and availability of on-premises and SAAS systems that support Synopsys engineering platforms, ensuring they perform as expected under real-world load
  • Design, build, and operate observable and self-healing services using OTEL, Elastic stack, Grafana, and Beats to reduce MTTD and MTTR across deployed environments
  • Define and maintain SLIs, SLOs, and error budgets for platform teams, translating reliability goals into measurable outcomes that drive prioritization
  • Deploy and manage infrastructure on Azure using Terraform, Kubernetes, Helm, Docker, and Azure native services including AKS, Azure Monitor, Key Vault, and networking components
  • Evaluate and integrate new observability, automation, and cloud technologies to improve system resilience and operational efficiency
  • Partner with engineering teams to recommend architecture improvements and process changes that reduce toil and increase platform stability
  • Serve as the subject matter expert in observability tooling and incident resolution, mentoring teams on best practices for monitoring, alerting, and root cause analysis

The Impact You Will Have

  • Reduce mean time to detection and resolution across critical platforms, giving engineering teams more time to build and less time firefighting
  • Build self-healing capabilities into services that eliminate entire classes of recurring incidents and reduce on-call burden
  • Establish SLO-driven reliability culture that helps teams make informed tradeoffs between feature velocity and system stability
  • Enable faster, safer deployments by improving observability and automation across cloud and on-premises infrastructure
  • Drive architectural decisions that prevent outages before they happen, not just respond to them after the fact
  • Create runbooks, dashboards, and tooling that make the next engineer more effective and the next incident less painful
  • Influence platform roadmaps by surfacing reliability gaps and recommending technology investments that improve long-term operational health

What You'll Need

  • 7+ years of hands-on experience as a Site Reliability Engineer or in a similar platform reliability role
  • Strong proficiency in Python and TypeScript, including data structures, algorithms, object-oriented programming, and design patterns
  • Deep expertise deploying and managing infrastructure on Azure, including AKS, Azure Monitor, Virtual Networks, Azure SQL, Cosmos DB, Key Vault, and Azure AD
  • Hands-on experience with GitHub, Helm, Docker, Kubernetes, and Terraform in production environments
  • Proven ability to build and maintain observability pipelines using tools like OTEL, Elastic stack, Grafana, and Beats
  • Working knowledge of ITIL and Agile processes, and experience with ITSM tools like ServiceNow or Rootly
  • PowerShell programming experience is a plus, as is familiarity with GCP, AWS, Azure Machine Learning, or Azure OpenAI services
  • A proactive approach to incident response and a willingness to work on-call to support business-critical services.

Who You Are

  • You can walk into a production incident, cut through the noise, and identify the actual problem while others are still gathering logs
  • You push back when a team asks for a new feature without considering the operational cost or the reliability impact
  • You can explain the tradeoff between reliability and velocity to a product manager in two sentences without losing the nuance
  • You stay current on cloud and observability tooling not because it is trendy, but because you care about solving problems better

The Team You'll Be Part Of

You will join a rapidly growing Cloud development and SRE team focused on delivering state-of-the-art cloud solutions. The team is scaling to meet increasing demand and building the next generation of reliability and automation capabilities across Synopsys SaaS products.

Rewards and Benefits

We offer a comprehensive range of health, wellness, and financial benefits to cater to your needs. Our total rewards include both monetary and non-monetary offerings. Your recruiter will provide more details about the salary range and benefits during the hiring process.

At Synopsys, we want talented people of every background to feel valued and supported to do their best work. Synopsys considers all applicants for employment without regard to race, color, religion, national origin, gender, sexual orientation, age, military veteran status, or disability.

Apply Now

Relevant Jobs

BROWSE JOBS

Find the open role that’s
right for you

View all job opportunities here

View all job opportunities here