Network Architect - Embedded Software
US
Overview
Penguin's ICE Software products are used in the deployment, provisioning, management, and monitoring of some of the largest computational systems in the world. In this role as a Network Architect, you will join our remote-first Software Engineering as a specialist focused on networking as part of cluster management technologies. This role combines deep networking expertise with software development collaboration to support our suite of tools for AI and High-Performance Computing (HPC) Linux-based clusters. The successful candidate will serve as the technical bridge between complex network infrastructure requirements and our software development initiatives, providing critical networking knowledge to enhance our cluster management solutions.
As a member of our engineering team, you will work closely with software developers and automation engineers to integrate advanced networking protocols, topologies, and architectures into our cluster management platform. You will also work closely with Solution Architects, Product Managers and Product Architects to ensure alignment on networking initiatives. This position requires both hands-on technical skills and the ability to translate complex networking concepts into actionable software requirements and productizable implementations.
Responsibilities
Network Architecture & Design
- Design and implement high-performance network architectures for AI and HPC clusters, including InfiniBand, high-speed Ethernet (100/200/400GbE), and RDMA-based solutions
- Architect scalable network topologies optimized for low-latency, high-bandwidth cluster computing environments
- Develop network segmentation strategies and implement VLANs, VRFs, and ACLs for multitenant cluster environments
Cluster Integration & Optimization
- Collaborate with software engineering teams to integrate networking protocols and services into cluster management tools
- Optimize network performance for distributed computing workloads, including MPI, NCCL, and collective communication operation
- Support developers in implementing network-aware features (e.g., load balancing, QoS, fault tolerance)
Automation & Tooling
- Build scripts/tools for network configuration management, telemetry, and compliance using Python/Ansible.
- Explore automation of network OS (Cumulus, SONiC, Arista) and SDN solutions to reduce manual intervention
Monitoring & Troubleshooting
- Develop network monitoring solutions and performance metrics for cluster health assessment
- Troubleshoot complex network issues affecting cluster performance and availability
- Create documentation for network configurations, procedures, and troubleshooting workflows
Cross-functional Collaboration
- Work embedded within the software engineering team to provide networking domain expertise
- Translate network requirements into software specifications and architectural decisions
- Support software developers in implementing network-aware cluster management features
Qualifications
- Bachelor's degree in Computer Science, Electrical Engineering, Network Engineering, or related technical field or relevant experience
- Minimum 5-7 years of experience in network engineering with focus on HPC or data center environments
- Proven experience with high-performance networking protocols including InfiniBand, highspeed Ethernet, and RDMA technologies
- Strong background in Linux networking stack, including kernel networking, routing, and network interface management
- Experience with cluster networking and distributed computing environments
- Deep understanding of TCP/IP, BGP, OSPF, EVPN/VXLAN, and other advanced networking protocols
- Knowledge of HPC interconnect technologies (InfiniBand, Omni-Path, high-speed Ethernet) and their performance characteristics
- Experience with network automation tools and Infrastructure as Code (Ansible, Terraform, Netconf)
- Understanding of software-defined networking (SDN) concepts and implementation
- Strong technical communication skills with ability to explain complex networking concepts to software engineers
- Collaborative mindset with experience working in cross-functional engineering teams
- Problem-solving abilities and systematic approach to troubleshooting complex technical issues
- Self-motivated with ability to work independently while maintaining team alignment
Preferred Qualifications
- NVIDIA networking technologies including Spectrum-X, UFM, and SHARP protocols
- GPU networking and GPUDirect RDMA implementation experience
- Experience with network telemetry, packet analysis, and performance optimization tools
- Knowledge of precision timing protocols (PTP/IEEE 1588) for cluster synchronization
- Experience with AI/ML workload networking requirements and optimization
- Knowledge of job schedulers (SLURM, PBS) and their network integration requirements
- Understanding of storage networking and parallel file systems
- Familiarity with container networking and Kubernetes cluster networking
- Proficiency in Python, Go, Bash, or similar languages for network automation and tooling
Preferred Industry Certifications
- Professional networking certifications (CCIE, JNCIE, CCNP, or equivalent) highly valued
- NVIDIA networking certifications (NCP-AI Networking) highly valued
- Linux certifications (RHCE, LPIC) highly valued
Location
This is a remote role in the United States.
Compensation & Benefits
The base pay range that the Company reasonably expects to pay for this position in the United States is $166,000 - $190,000; the pay ultimately offered may vary based on business considerations, including job-related knowledge, skills, experience, and education. The position is bonus-eligible, and there are medical, dental, and vision benefits available. There is a 401k saving plan and other benefits, such as Paid Time Off, Life Insurance, and an Employee Assistance Plan.
Inclusion & Belonging Statement
We are committed to creating an inclusive environment that embraces differences and fosters belonging for all.
Equal Opportunity Statement
We are an Affirmative Action/Equal Opportunity Employer and strongly committed to all policies which will afford equal opportunity employment to all qualified persons without regard to age, national origin, race, ethnicity, creed, gender, disability, veteran status, or any other characteristic protected by law.