As a SOC Operator, you will be responsible for monitoring and maintaining the health and performance of our fleet of installed clusters. You will work in a 24/7 operations environment, ensuring the availability, reliability, and security of services. This role involves real-time monitoring, incident detection, incident management, incident resolution, and clear written and verbal communication with other teams and stakeholders.
Responsibilities:
Monitor clusters using internal monitoring tools to detect and troubleshoot issues promptly.
Respond to alerts and incidents in a timely manner, following standard operating procedures (SOPs) and escalation processes.
Perform initial investigation and diagnosis of problems, escalating complex issues to support or R&D.
Document incidents, including their details, troubleshooting steps, and resolutions in the incident tracking system.
Collaborate with other teams, including Support, R&D, Account teams, and customers to ensure effective incident resolution and communication.
Conduct routine checks and audits to identify potential problems or vulnerabilities.
Assist with the implementation of changes and updates to the infrastructure as directed by team leads.
Participate in shift-based work schedules, including nights, weekends, and holidays, to provide 24/7 coverage in the SOC.
Maintain up-to-date knowledge of storage technologies, industry trends, and best practices.
Adhere to security protocols and ensure the confidentiality, integrity, and availability of network and system data.
Contribute to the development and improvement of SOC processes and procedures.
Provide excellent customer service to internal and external stakeholders during incident resolution and communication.
High school diploma or equivalent; a degree or certification in information technology or a related field is a plus.
Proven experience as a SOC Operator or in a similar network monitoring role is preferred.
Strong understanding of networking concepts, protocols, and technologies (TCP/IP, SNMP, DHCP, DNS, etc.).
Ability to work independently and collaboratively in a team-based environment.
Excellent problem-solving and analytical skills, with the ability to multitask effectively.
Good communication skills, both written and verbal, to interact with technical and non-technical stakeholders.
Willingness to work in a 24/7 shift-based environment, including nights, weekends, and holidays.
Detail-oriented and committed to maintaining accurate documentation.
Demonstrated commitment to continuous learning and self-improvement.