Leave us your email address and we'll send you all the new jobs according to your preferences.

Site Reliability Engineer II

Posted 13 hours 2 minutes ago by Bank of America

Permanent

Full Time

Other

England, United Kingdom

Job Description

Site Reliability Engineer II page is loaded

Site Reliability Engineer II

Apply locations Pennington time type Full time posted on Posted Yesterday job requisition id

Job Description:

At Bank of America, we are guided by a common purpose to help make financial lives better through the power of every connection. We do this by driving Responsible Growth and delivering for our clients, teammates, communities and shareholders every day.

Being a Great Place to Work is core to how we drive Responsible Growth. This includes our commitment to attracting and developing exceptional talent, supporting our teammates' physical, emotional, and financial wellness, recognizing and rewarding performance, and how we make an impact in the communities we serve.

At Bank of America, you can build a successful career with opportunities to learn, grow, and make an impact. Join us!

Job Description:
This job is responsible for partnering with engineering and technology teams to implement measures as prescribed by lead/senior SRE engineers. Key responsibilities include ensuring appropriate instrumentation, tooling, ticketing, alerting and on-call routines are in place for key services, identifying root causes of issues through production triage efforts, and suggesting code enhancements to technology teams to automate services and improve reliability and efficiency. Job expectations include using software development skills to improve efficiency and to address gaps in reliability.

Position Summary:

The Global Information Security Application Production Services (GIS APS) SWAT team is looking for a candidate to fill a role in Site Reliability Engineer. The candidate should have experience supporting business critical applications in an environment focused on information security.

Some responsibilities of the role include monitoring for and driving the resolution of incidents utilizing methodologies such as ITIL, data analysis through tools like Splunk or Dynatrace, and interacting with both engineering teams and clients to handle requests or issues.

To meet these responsibilities, the candidate should at least have working knowledge of operating systems (Windows and Linux/Unix), database (Oracle, MS SQL) and networking standards such as TCP/IP and SAML as well as an understanding of how Java and Middleware applications function.

Additionally, the candidate should exhibit a self-starting attitude towards driving various types of project work to completion. Some examples include the creation of and maintenance of dashboards, writing and updating technical documentation, and owning or assisting with the development of enhancements aimed at improving the environment.

Responsibilities:

Develops and maintains reliability scripts, tools and libraries and leverages them for common instrumentation, automation, and operational needs, and when mentoring Site Reliability Engineer (SRE) resources on reliability practices and established tools/capabilities.
Collaborates with Development and Infrastructure teams to understand technical solutions and implement monitoring capabilities outlined in the application and system monitoring designs put forward by the SRE Lead.
Partners to implement code changes to make use of common reliability libraries and tools and helps Application Production Services and Application Development teammates understand how to use them.
Identifies vulnerabilities and opportunities for reliability improvement, such as investigating low level error rates and 'noise' in monitoring, and defines solutions to reduce manual support effort and/or improve system reliability.
Engages as a subject matter expert in major incident triage efforts and failure scenario modelling and diagnosis with Problem Manager root causes for major incident/problem management investigations.
Participates regularly in an on-call rotation with Production Support teammates to learn more about reliability issues affecting their portfolio.

Required Qualifications:

Foundational knowledge of core ITIL processes such as the management of incidents, changes, and problems.
Should exhibit disciplined, process-driven, and results-oriented approach when providing support.
Comfortable in the Splunk environment - able to analyze logs, create/modify dashboards, and utilize reporting and alerting functionality.
Basic understanding of Federated IAM protocols such as SAML, OAuth, OpenID Connect, and FIDO2.
Able to understand and analyze HTTP traces/Wireshark captures.
Database/SQL knowledge - basic understanding of how a database functions and able to craft queries to pull data.
Working knowledge of both Unix and Windows Operating Systems.
Ability to understand and utilize various programming or scripting languages such as shell scripting, Perl, and PowerShell.
Practical knowledge of SSL/TLS cryptography and PKI.
Knowledge of LDAP and Active Directory services.

Desired Qualifications:

Strong knowledge and troubleshooting experiences in Windows, Linux, Oracle and MS SQL env platforms/environments.
Analytical skills and expertise in finding root causes and isolating complicated issues with various tools such as Splunk.
Knowledge around Multi-Factor Authentication, Single-Sign On, Password Management, and Passwordless Authentication (FIDO2) solutions.
Exposure to supporting Web Access Management solutions, such as Ping Access or CA SiteMinder.
Experience with Apache and IIS solutions.
Understanding of the OSI model.
Knowledge of the Software Development Life Cycle.
Familiarity and understanding of high-availability environments.

Skills: