Systems Engineer
NY, US
Overview
Penguin Solutions Managed Services provides dedicated, remote, Linux systems DevOps for complex, integrated environments involving high-performance computing, cloud, and enterprise systems. This position requires technical skills and the ability to understand, document, configure, administer, troubleshoot, and resolve issues in Linux environments. The role also requires hands-on break/fix capability and practical experience performing physical diagnostics, component replacement, and repair of HPC systems and related datacenter hardware. This is a customer-facing position on site at the customer’s data center in Orangeburg, NY.
Responsibilities
- Install, deploy, and administer HPC Clusters.
- Maintain, administer, and patch Linux Operating systems and associated software.
- Work as part of a team to provide IT support and resolve errors.
- Analyze system log files and perform basic troubleshooting.
- Create Shell/Python/Ansible scripting.
- Document processes through supporting System Engineers; follow and improve procedures to meet SLAs.
- Support users with Move/Add/Change requests.
- Troubleshoot errors and determine root cause.
- Respond to system alerts and monitoring, sometimes after hours.
- Stay up-to-date on advancements Linux Operating Systems and associated software.
- Perform break/fix activities for HPC and datacenter infrastructure, including diagnostics, component swap, firmware validation, and verification testing of repaired systems.
Qualifications
- Bachelor's degree in Computer Science, Information Technology, or a related field; or equivalent experience.
- UNIX/Linux certification or equivalent experience.
- 5+ years of hands-on experience with UNIX/Linux server environments.
- HPC Systems Management knowledge.
- Linux systems administration skills and experience with open-source technologies.
- Understanding of Linux networking implementation and protocols.
- Ability to work in ITIL operating models.
- Able to install, configure, and tune software applications and provide overall support.
- Requires the ability to stand and walk for extended periods of time, lift and move computer equipment up to 25 pounds, and perform hands-on hardware installation and maintenance tasks requiring fine motor coordination.
- Demonstrate hands-on break/fix experience and ability to troubleshoot, repair, and validate HPC cluster nodes, GPU trays, power supplies, and other datacenter components.
- Experience with Kubernetes and containers is highly desired.
Preferred Skills
- HPC: Application, Systems Management, OS, Optimization, Hardware and data center needs.
- HPC/AI Performance Specialist and practical knowledge of the administration of High-Performance Computing (HPC) technologies, including cluster resource management, job scheduling, Ethernet networking, InfiniBand, etc.
- AI & Cloud: Virtualization, Applications, Container Orchestration, Systems Management, and Hardware design.
- Data: High-Performance Storage and Parallel file systems used in HPC/AI and Cloud.
- HPC cluster system admin experience.
- In-depth knowledge of Linux cluster technologies and optimization techniques.
- HPC Scheduler knowledge (SLURM, PBS, LSF).
- Will take initiative to refer to Application OEM/Vendor for Application operations, features, functions, and questions.
- Familiarity with hands-on HPC hardware service, field replacement procedures, and break/fix workflows in high-availability datacenter environments is preferred.
Location
This is an onsite position in Orangeburg, NY.
Travel
Minimal travel may be required.
Compensation & Benefits
The base pay range that the Company reasonably expects to pay for this position in New York is $91,000 - $113,000; the pay ultimately offered may vary based on business considerations, including job-related knowledge, skills, experience, and education. The position is bonus-eligible, and there are medical, dental, and vision benefits available. There is a 401k saving plan and other benefits, such as Paid Time Off, Life Insurance, and an Employee Assistance Plan.
Inclusion & Belonging Statement
We are committed to creating an inclusive environment that embraces differences and fosters belonging for all.
Equal Opportunity Statement
We are an Affirmative Action/Equal Opportunity Employer and strongly committed to all policies which will afford equal opportunity employment to all qualified persons without regard to age, national origin, race, ethnicity, creed, gender, disability, veteran status, or any other characteristic protected by law.