Site Reliability Engineer

Omniscius Consulting
United States
Our client is seeking a Site Reliability Engineer (SRE) that will be responsible for ensuring the reliability, performance, and scalability of the software, websites, and applications. This role requires a combination of software engineering and systems administration skills to monitor, control, and automate systems. The ideal candidate will have a deep understanding of cloud infrastructure, automation tools, and best practices for maintaining high availability and performance. This position plays a critical role in maintaining the overall health and efficiency of our platform.

Key Responsibilities: ‍ System Monitoring and Maintenance: ‍- Monitor the performance and reliability of Kubernetes clusters, software, websites, and applications. - Automate routine maintenance tasks to ensure system stability and performance.

Incident Response and Troubleshooting: - Respond to and resolve incidents in a timely manner, minimizing downtime and impact on users. - Conduct root cause analysis to identify and address underlying issues. - Develop and implement strategies to prevent future incidents and improve system resilience. ‍ Automation and Infrastructure Management: ‍- Design, build, and maintain automated systems and processes to improve efficiency and reduce manual intervention. - Manage cloud infrastructure, including provisioning, scaling, and optimizing resources. - Collaborate with development teams to ensure seamless deployment and integration of new features and updates. ‍ Performance Optimization: ‍- Analyze system performance and identify areas for improvement. - Implement performance tuning and optimization techniques to enhance system efficiency. - Collaborate with cross-functional teams to ensure optimal performance of all components. ‍ Security and Compliance: ‍- Ensure compliance with security best practices and industry standards. - Implement and maintain security measures to protect systems and data. - Conduct regular security audits and vulnerability assessments. ‍ Documentation and Reporting: ‍- Maintain accurate and up-to-date documentation of systems, processes, and procedures. - Generate and analyze reports on system performance, incidents, and other key metrics. - Provide regular updates to management and stakeholders on system health and performance. ‍ Continuous Improvement: ‍- Identify opportunities for improving system reliability, performance, and scalability. - Stay up-to-date with industry trends and best practices in site reliability engineering. - Participate in training and development opportunities to enhance skills and knowledge.

Qualifications: ‍- Deep expertise of Kubernetes and containers. - Strong understanding of cloud infrastructure, automation tools, and best practices for maintaining high availability and performance. - Experience with monitoring and logging tools such as Loki, Grafana. - Minimum of 3 years of experience in site reliability engineering, Kubernetes administration, or a related role. - Excellent problem-solving skills and attention to detail. - Strong communication and interpersonal skills, with the ability to work effectively with cross-functional teams.

Apply on omnisciusconsulting.applytojob.com

Similar Jobs

Site Reliability Engineer

Position Overview: ‍ Our client is seeking a Vice President of Sales ...

Omniscius Consulting

United States

Site Reliability Engineer

Our client is seeking a Vice President of Marketing that will be responsible ...

Omniscius Consulting

United States

Site Reliability Engineer

Our client is in search of a skilled and driven mainframe systems analyst ...

Omniscius Consulting

United States

Site Reliability Engineer

Our client is currently seeking a Cloud Infrastructure Engineer to support a Department ...

Omniscius Consulting

United States

Site Operations Lead

W hat We Do VENU+ is the global leader in turn-key ...

Venu+

Orlando, FL

Site Director

We have an extraordinary opportunity for a visionary leader committed to academic excellence ...

Castle Mound Montessori

Flower Mound, TX

Site Monitor

Savage Development Inc is currently looking to add to our team for 24/7 ...

Savage Development Inc

Phoenix, AZ

Site Scheduler

Join VAL-CO Industries as a Full-Time Site Scheduler in New Holland and ...

Valco Companies

New Holland, PA

Website Developer

Job Description: We are seeking a skilled Website Developer to join ...

SimpleCiti Companies

Garden City, NY

Marketplace Manager ( On-site)

Job description Mido US is looking for a Marketplace Manager, who will ...

The Swatch Group (U.S.) Inc.

United States