Team Lead – Site Reliability / Live Engineering
Bethesda Softworks @ Rockville, MD, US
Large, distributed systems, high load, cloud technologies. Does that sound exciting or interesting to you? Do you like video games? Would you like to help making them run better? Do you have a hacker mindset, and can you troubleshoot or predict problematic areas in a complex system? Do you know how to design software? Are you an experienced software engineer, who led a development team, mentored junior staff, and participated in on-boarding? Do you feel that you can do more interesting things with the skills you possess, something that you or your friends might use? Is suit-and-tie not your thing? Do you like solving tricky technical problems? Join us as an SRE Team Lead for Live Engineering team at Bethesda.net in Bethesda Softworks.
- At least 2 years of professional experience, running a development or an ops team as a tech lead or similar capacity.
- At least 5 years of professional experience as a software engineer (development or automated testing).
- Professional experience with Cloud platforms AWS, Azure, Google Cloud, or other (SaaS, IaaS).
- Understanding of databases SQL and NoSQL, caches (like elasticache, redis, memcached, etc.).
- Experience with containerization (Docker or similar), and with container orchestration platforms (Kubernetes or similar).
- Experience with build/deploy platforms.
- Understanding of agile SDLCs, and the roles and functions of different team members in a SCRUM or Kanban environment.
- Familiarity with git, and preferably other VCSs, and an understanding of various branching strategies along with their benefits and risks.
- Communicating technical problems without heavy use of technical slang.
- Professional experience with programming languages (GoLang, Python, C#, Java, C++, etc.)
- Lead SRE/LE team, drive Kanban-based story board, evaluate work size and execution.
- Work with stakeholders from internal teams on suggested improvements.
- Drive root cause analysis, evaluate system behavior using NewRelic, Splunk, etc.
- Troubleshoot production systems.
- Participate in on-call rotation with the rest of the engineering team to provide escalated support.
- Provide cost, performance, and scale analysis on an AWS-based platform.
- Provide recommendations and assist with work prioritization and execution.
- Participate in playtests for new game releases.
- Increase visibility to platform behavior and health.
- Improve stability and performance for the system components.