Re: Running a Docker container on OpenHPC Cluster with multiple nodes


Adrian Reber
 

I would say it is possible, but probably not as easy as you hope.

From what you describe, it sounds like everything is running in your
container. mpirun and the process mpirun starts. The difference with
running it on an OpenHPC based system is, that usually mpirun is started
on one of the compute nodes and mpirun figures out how to start the
processes on all other compute nodes and how all these processes
communicate with each other via IB.

If mpirun is running in the container, I am not sure how it can reach
the other compute nodes in your cluster. So if containers are involved
it is often that mpirun starts a container for each process on each
node. But I do not think that is possible with Docker.

I have written an article which describes how this can be done with
Podman: https://podman.io/blogs/2019/09/26/podman-in-hpc.html

There are also multiple different container runtimes which are designed
for the HPC use case.

So, yes, it is possible, but probably using different tools and
different container layout than what you currently have.

Adrian

On Tue, May 26, 2020 at 02:57:30AM -0700, linuxgeek@... wrote:
Hello,

first of all, I must say I am a total newbie to OpenHPC and not a software engineer. So please excuse any dumb question. However, I was not able to find any useful information to my questions below.

Imagine the following:
I have been using a docker container very successfully on a 2 CPU machine until now. It was created by a very talented software engineer. This container is built around a FEA solver including OpenMPI and OpenMP, until now I was not able to compile this software bundle myself, thus I am using the container on this 2 CPU machine. The container runs on both CPUs an distributes the model and simulation on both CPUs and their cores. So far, so good.

Here are my questions, again, I am sorry if they are self-explanatory/dumb to some of you. To me, this is not trivial:
1.) If I set up a cluster with OpenHPC according to one of the recipes, will I be able to run this docker-container on multiple nodes?
2.) From the docker documentation I learned that using docker swarm might not be possible due to the fact that a 'pod' can only run on one node? [might not be a suitable question for this group...]
3.) Perhaps modifications to the container are necessary, or is the container able to run 'as is'? I have no idea....and I, myself, might not be able to solve this...[maybe also, not the right place for this Q]
4.) Is the OpenHPC suite able to 'distribute' this container on multiple nodes?

Any help, or just pointing me in the right direction, is appreciated. If it is possible with Docker&OpenHPC, I will buy a master and a second node (connected via IB). If not, I have to dig in compiling the software myself and use OpenHPC, as I am sure this solution is possible (this is the 'usual' way of running the solver on a cluster, but for me, this is a whole new undertaking).

Thank you in advance,

Mario.


Join {users@lists.openhpc.community to automatically receive all group messages.