Solving complexity. Accelerating results.

At Penguin Solutions, we understand the boundless potential of technology and support our customers in turning cutting-edge ideas into outcomes—faster, and at any scale.

With over two decades of experience as trusted advisors, Penguin Solutions is an end-to-end technology company solving the industry’s most complex challenges in computing, memory, and LED solutions. Penguin designs, builds, deploys, and manages high-performance, high-availability enterprise solutions, allowing customers to achieve their breakthrough innovations.

Solving complexity. Accelerating results.

At Penguin Solutions, we understand the boundless potential of technology and support our customers in turning cutting-edge ideas into outcomes—faster, and at any scale.

With over two decades of experience as trusted advisors, Penguin Solutions is an end-to-end technology company solving the industry’s most complex challenges in computing, memory, and LED solutions. Penguin designs, builds, deploys, and manages high-performance, high-availability enterprise solutions, allowing customers to achieve their breakthrough innovations.

Network Architect - Embedded Software

Date Posted:  Oct 3, 2025
Requisition ID:  1674
Location: 

US

Brand:  Penguin Solutions

Overview

Penguin's ICE Software products are used in the deployment, provisioning, management, and monitoring of some of the largest computational systems in the world. In this role as a Network Architect, you will join our remote-first Software Engineering as a specialist focused on networking as part of cluster management technologies.  This role combines deep networking expertise with software development collaboration to support our suite of tools for AI and High-Performance Computing (HPC) Linux-based clusters. The successful candidate will serve as the technical bridge between complex network infrastructure requirements and our software development initiatives, providing critical networking knowledge to enhance our cluster management solutions.

As a member of our engineering team, you will work closely with software developers and automation engineers to integrate advanced networking protocols, topologies, and architectures into our cluster management platform. You will also work closely with Solution Architects, Product Managers and Product Architects to ensure alignment on networking initiatives. This position requires both hands-on technical skills and the ability to translate complex networking concepts into actionable software requirements and productizable implementations.

Responsibilities

Network Architecture & Design

  • Design and implement high-performance network architectures for AI and HPC clusters, including InfiniBand, high-speed Ethernet (100/200/400GbE), and RDMA-based solutions
  • Architect scalable network topologies optimized for low-latency, high-bandwidth cluster computing environments
  • Develop network segmentation strategies and implement VLANs, VRFs, and ACLs for multitenant cluster environments

 

Cluster Integration & Optimization

  • Collaborate with software engineering teams to integrate networking protocols and services into cluster management tools
  • Optimize network performance for distributed computing workloads, including MPI, NCCL, and collective communication operation
  • Support developers in implementing network-aware features (e.g., load balancing, QoS, fault tolerance)

 

Automation & Tooling

  • Build scripts/tools for network configuration management, telemetry, and compliance using Python/Ansible.
  • Explore automation of network OS (Cumulus, SONiC, Arista) and SDN solutions to reduce manual intervention

 

Monitoring & Troubleshooting

  • Develop network monitoring solutions and performance metrics for cluster health assessment
  • Troubleshoot complex network issues affecting cluster performance and availability
  • Create documentation for network configurations, procedures, and troubleshooting workflows

 

Cross-functional Collaboration

  • Work embedded within the software engineering team to provide networking domain expertise
  • Translate network requirements into software specifications and architectural decisions
  • Support software developers in implementing network-aware cluster management features

Qualifications

  • Bachelor's degree in Computer Science, Electrical Engineering, Network Engineering, or related technical field or relevant experience
  • Minimum 5-7 years of experience in network engineering with focus on HPC or data center environments
  • Proven experience with high-performance networking protocols including InfiniBand, highspeed Ethernet, and RDMA technologies 
  • Strong background in Linux networking stack, including kernel networking, routing, and network interface management
  • Experience with cluster networking and distributed computing environments
  • Deep understanding of TCP/IP, BGP, OSPF, EVPN/VXLAN, and other advanced networking protocols
  • Knowledge of HPC interconnect technologies (InfiniBand, Omni-Path, high-speed Ethernet) and their performance characteristics
  • Experience with network automation tools and Infrastructure as Code (Ansible, Terraform, Netconf)
  • Understanding of software-defined networking (SDN) concepts and implementation
  • Strong technical communication skills with ability to explain complex networking concepts to software engineers
  • Collaborative mindset with experience working in cross-functional engineering teams
  • Problem-solving abilities and systematic approach to troubleshooting complex technical issues
  • Self-motivated with ability to work independently while maintaining team alignment

 

Preferred Qualifications

  • NVIDIA networking technologies including Spectrum-X, UFM, and SHARP protocols
  • GPU networking and GPUDirect RDMA implementation experience
  • Experience with network telemetry, packet analysis, and performance optimization tools
  • Knowledge of precision timing protocols (PTP/IEEE 1588) for cluster synchronization
  • Experience with AI/ML workload networking requirements and optimization
  • Knowledge of job schedulers (SLURM, PBS) and their network integration requirements
  • Understanding of storage networking and parallel file systems 
  • Familiarity with container networking and Kubernetes cluster networking
  • Proficiency in Python, Go, Bash, or similar languages for network automation and tooling

Preferred Industry Certifications

  • Professional networking certifications (CCIE, JNCIE, CCNP, or equivalent) highly valued
  • NVIDIA networking certifications (NCP-AI Networking) highly valued
  • Linux certifications (RHCE, LPIC) highly valued

Location

This is a remote role in the United States.

 

Compensation & Benefits

The base pay range that the Company reasonably expects to pay for this position in the United States is $166,000 - $190,000; the pay ultimately offered may vary based on business considerations, including job-related knowledge, skills, experience, and education. The position is bonus-eligible, and there are medical, dental, and vision benefits available. There is a 401k saving plan and other benefits, such as Paid Time Off, Life Insurance, and an Employee Assistance Plan.   

 

Inclusion & Belonging Statement

We are committed to creating an inclusive environment that embraces differences and fosters belonging for all.

 

Equal Opportunity Statement                                                              

We are an Affirmative Action/Equal Opportunity Employer and strongly committed to all policies which will afford equal opportunity employment to all qualified persons without regard to age, national origin, race, ethnicity, creed, gender, disability, veteran status, or any other characteristic protected by law.