Solving complexity. Accelerating results.

At Penguin Solutions, we understand the boundless potential of technology and support our customers in turning cutting-edge ideas into outcomes—faster, and at any scale.

With over two decades of experience as trusted advisors, Penguin Solutions is an end-to-end technology company solving the industry’s most complex challenges in computing, memory, and LED solutions. Penguin designs, builds, deploys, and manages high-performance, high-availability enterprise solutions, allowing customers to achieve their breakthrough innovations.

Solving complexity. Accelerating results.

At Penguin Solutions, we understand the boundless potential of technology and support our customers in turning cutting-edge ideas into outcomes—faster, and at any scale.

With over two decades of experience as trusted advisors, Penguin Solutions is an end-to-end technology company solving the industry’s most complex challenges in computing, memory, and LED solutions. Penguin designs, builds, deploys, and manages high-performance, high-availability enterprise solutions, allowing customers to achieve their breakthrough innovations.

Manager, DevOps

Date Posted:  Dec 22, 2024
Requisition ID:  1300
Location: 

US

Brand:  Penguin Solutions

The Penguin Solutions portfolio, which includes Penguin Computing and Penguin Edge, accelerates customers’ digital transformation with the power of emerging technologies in HPC, AI, and IoT with solutions and services that span the continuum of edge, core, and cloud. By designing highly advanced infrastructure, machines, and networked systems we enable the world’s most innovative enterprises and government institutions to build the autonomous future, drive discovery and amplify human potential.

 

Overview

Penguin Solutions Managed Services provides dedicated, remote, Linux systems DevOps for complex, integrated environments involving high-performance computing, cloud, and enterprise systems. This position requires both technical skills, including the ability to understand, document, configure, administer, troubleshoot, and resolve issues in Linux environments as well as the ability to manage people and processes. This is a customer-facing position. 

 

Responsibilities

  • Manage a group of skilled DevOps Engineers.
  • Perform reviews, staff analysis, and present business plans to meet current and future needs.
  • Build & maintain CI/CD pipelines.
  • Integrate systems and platforms through infrastructure as code.
  • Build automation workflows to aid in lights out operations.
  • Work as part of a team, provide IT support, and resolve errors.
  • Stay up to date on advancements in data center infrastructure and technologies.
  • Document network processes through supporting Sr. Onsite Hardware Technicians.
  • Respond to network and server errors after hours.
  • Participate in weekly on-call rotation.
  • Collaborate with customers to enable initiatives.
  • Serve as Subject Matter Expert on HPC and associated technologies.

 

Qualifications

The qualified candidate will have the following or equivalent experience:

  • Bachelor’s Degree in Computer Science, Computer/Electrical Engineering, or a related field (or equivalent experience).
  • 6+ years as a manager of DevOps.
  • 12+ years of hands-on experience with UNIX/Linux server environments, CI/CD pipelines and infrastructure as code.

 

Skills

  • Strong leadership skills to mentor and grow talent.
  • Strong customer facing skills.
  • Ability to prioritize tasks and demands while delivering on time.
  • Strong Linux systems administration skills and experience with open-source technologies.
  • Understanding of Linux networking implementation and protocols.
  • Strong Ansible scripting (5+ years).
  • Python proficiency (5+ years).
  • Familiarity with Infrastructure as Code, CI/CD, and other DevOps concepts.
  • Ability to investigate performance issues up and down the infrastructure stack (software, network, server and storage).
  • HPC/AI Performance Specialist and practical knowledge of the administration of High-Performance Computing (HPC) technologies, including cluster resource management, job scheduling, Ethernet networking, InfiniBand, etc.
  • Proven expertise in solving Linux OS and user environment performance issues.
  • Ability to run scaling benchmark codes on large HPC clusters.
  • Ability to compile, optimize and run benchmark codes (C, Fortran).
  • Familiarity with several cpu and gpu compilers including gcc, Intel, AMD (AOCC, ROCm) and NVIDIA (PGI OpenACC,CUDA).
  • HPC Scheduler knowledge (SLURM, PBS, LSF).
  • Ability to communicate clearly and effectively with team members and clients.

 

Preferred Skills

  • HPC Systems Management knowledge (Scyld Clusterware preferred).
  • Broad technology knowledge in:
    • HPC: Application, Systems Management, MPI, OS, Optimization, Hardware and data center needs.
    • AI & Cloud: Virtualization, Applications, Container Orchestration, Systems Management, and Hardware design.
    • Data: High-Performance Storage and Parallel file systems used in HPC/AI and Cloud.
  • HPC cluster system admin experience.
  • In-depth knowledge of Linux cluster technologies and optimization techniques.
  • Linux Certifications (e.g., RHCSA, RHCE).
  • Able to install, configure, and tune software applications and provide overall support.
  • Will take initiative to refer to Application OEM/Vendor for Application operations, features, functions, and questions.
  • Outstanding verbal, written, and interpersonal communication skills.

 

Location 

This is a remote position in the United States.

 

Travel 

10-25% Required

 

Compensation & Benefits

The base pay range that the Company reasonably expects to pay for this position in the Virginia is $148,000 - $175,000; the pay ultimately offered may vary based on business considerations, including job-related knowledge, skills, experience, and education. The position is bonus-eligible, and there are medical, dental, and vision benefits available. There is a 401k saving plan and other benefits, such as Paid Time Off, Life Insurance, and an Employee Assistance Plan.    

 

Diversity and Inclusion Statement

SGH, together with its affiliates, is committed to creating a diverse environment that embraces differences and fosters inclusion.

 

Equal Opportunity Statement

We are an Affirmative Action/Equal Opportunity Employer and strongly committed to all policies which will afford equal opportunity employment to all qualified persons without regard to age, national origin, race, ethnicity, creed, gender, disability, veteran status, or any other characteristic protected by law.