Manager, DevOps
US
Overview
Penguin Solutions Managed Services provides dedicated, remote, Linux systems DevOps for complex, integrated environments involving high-performance computing, cloud, and enterprise systems. This position requires both technical skills, including the ability to understand, document, configure, administer, troubleshoot, and resolve issues in Linux environments as well as the ability to manage people and processes. This is a customer-facing position.
Responsibilities
- Manage a group of skilled DevOps Engineers.
- Perform reviews, staff analysis, and present business plans to meet current and future needs.
- Work in data center environments with software, hardware, and network.
- Install, monitor, and maintain data center equipment.
- Build & maintain CI/CD pipelines.
- Integrate systems and platforms through infrastructure as code.
- Build automation workflows to aid in lights out operations.
- Work as part of a team, provide IT support, and resolve errors.
- Stay up to date on advancements in data center infrastructure and technologies.
- Document network processes through supporting Sr. Onsite Hardware Technicians.
- Respond to network and server errors after hours.
- Participate in weekly on-call rotation.
- Collaborate with customers to enable initiatives.
- Serve as Subject Matter Expert on HPC and associated technologies.
Qualifications
The qualified candidate will have the following or equivalent experience:
- Bachelor’s Degree in Computer Science, Computer/Electrical Engineering, or a related field (or equivalent experience).
- 6+ years as a manager of DevOps.
- 12+ years of hands-on experience with UNIX/Linux server environments, CI/CD pipelines and infrastructure as code.
- Must be a US Citizen.
Skills
- Strong leadership skills to mentor and grow talent.
- Strong customer facing skills.
- Ability to prioritize tasks and demands while delivering on time.
- Strong Linux systems administration skills and experience with open-source technologies.
- Understanding of Linux networking implementation and protocols.
- Strong Ansible scripting (5+ years).
- Python proficiency (5+ years).
- Familiarity with Infrastructure as Code, CI/CD, and other DevOps concepts.
- Ability to investigate performance issues up and down the infrastructure stack (software, network, server and storage).
- HPC/AI Performance Specialist and practical knowledge of the administration of High-Performance Computing (HPC) technologies, including cluster resource management, job scheduling, Ethernet networking, InfiniBand, etc.
- Proven expertise in solving Linux OS and user environment performance issues.
- Ability to run scaling benchmark codes on large HPC clusters.
- Ability to compile, optimize and run benchmark codes (C, Fortran).
- Familiarity with several cpu and gpu compilers including gcc, Intel, AMD (AOCC, ROCm) and NVIDIA (PGI OpenACC,CUDA).
- HPC Scheduler knowledge (SLURM, PBS, LSF).
- Ability to communicate clearly and effectively with team members and clients.
Preferred Skills
- HPC Systems Management knowledge (Scyld Clusterware preferred).
- Broad technology knowledge in:
- HPC: Application, Systems Management, MPI, OS, Optimization, Hardware and data center needs.
- AI & Cloud: Virtualization, Applications, Container Orchestration, Systems Management, and Hardware design.
- Data: High-Performance Storage and Parallel file systems used in HPC/AI and Cloud.
- HPC cluster system admin experience.
- In-depth knowledge of Linux cluster technologies and optimization techniques.
- Linux Certifications (e.g., RHCSA, RHCE).
- Cloud Certification (e.g., AWS, GCP).
- Able to install, configure, and tune software applications and provide overall support.
- Will take initiative to refer to Application OEM/Vendor for Application operations, features, functions, and questions.
- Outstanding verbal, written, and interpersonal communication skills.
Location
This is a remote position located in the United States.
Travel
10-25% Required
Compensation & Benefits
The base pay range that the Company reasonably expects to pay for this position in the United States is $148,000 - $175,000; the pay ultimately offered may vary based on business considerations, including job-related knowledge, skills, experience, and education. The position is bonus-eligible, and there are medical, dental, and vision benefits available. There is a 401k saving plan and other benefits, such as Paid Time Off, Life Insurance, and an Employee Assistance Plan.
Inclusion & Belonging Statement
We are committed to creating an inclusive environment that embraces differences and fosters belonging for all.
Equal Opportunity Statement
We are an Affirmative Action/Equal Opportunity Employer and strongly committed to all policies which will afford equal opportunity employment to all qualified persons without regard to age, national origin, race, ethnicity, creed, gender, disability, veteran status, or any other characteristic protected by law.