The Linux Systems Specialist is responsible for providing robust, high-performance computing (HPC) and Linux system administration support to a scientific research facility (HROKL). You will be critical in ensuring the smooth and effective operation of the bioinformatics and data science infrastructure, which supports a range of genomics, epidemiology, and machine learning projects. Working closely with IT, the Biosciences Sequencing laboratory lead, and scientific teams, you will maintain a resilient, secure, and high-performing computational environment.
Key Responsibilities
HPC and Systems Administration
Ensure the smooth and effective operation of the research facility's HPC and data science cluster.
Manage a high-performance compute cluster (HPC) with over 250 processing cores and a GPU cluster comprising NVIDIA GPUs (e.g., A100, V100, Tesla V100).
Maintain 150 terabytes of clustered file storage (GlusterFS) and manage physical and virtual servers.
Install, configure, tune, and troubleshoot Linux storage and compute servers, including dedicated GPU servers.
Manage HPC file systems such as Ceph, Lustre, and JuiceFS.
Administer HPC network services (SSH, Samba, Apache, rsync, NFS) and database applications (PostgreSQL, MySQL).
Perform routine Linux system audits, upgrades, and preventive maintenance.
Software, Security, and Automation
Manage user accounts, system security, software updates, and scientific applications.
Use scripting (Bash, Shell, Perl, Python, R) to support data analysis, automate tasks, and strengthen team capacity.
Setup and deploy container-based solutions (Docker, Kubernetes, Singularity) for end-users.
Improve server architecture by implementing consistent configuration management tools like GNU-Guix.
Enhance automation of Next Generation Sequencing (NGS) pipelines.
Data Management and Security
Perform backup and restore operations for research data and system configurations.
Collaborate with the Infrastructure team to maintain resilient data storage.
Oversee business continuity and disaster recovery for the Linux ecosystem.
User Support and Collaboration
Support Linux workstations and links from Windows desktops to Linux servers.
Resolve issues for the HPC user community and provide relevant training to scientific teams.
Build architectural diagrams for the bioinformatics environment and provide comprehensive documentation.