Date
1 - 7 of 7
Running a Docker container on OpenHPC Cluster with multiple nodes
linuxgeek@...
Hello,
first of all, I must say I am a total newbie to OpenHPC and not a software engineer. So please excuse any dumb question. However, I was not able to find any useful information to my questions below. Imagine the following: I have been using a docker container very successfully on a 2 CPU machine until now. It was created by a very talented software engineer. This container is built around a FEA solver including OpenMPI and OpenMP, until now I was not able to compile this software bundle myself, thus I am using the container on this 2 CPU machine. The container runs on both CPUs an distributes the model and simulation on both CPUs and their cores. So far, so good. Here are my questions, again, I am sorry if they are self-explanatory/dumb to some of you. To me, this is not trivial: 1.) If I set up a cluster with OpenHPC according to one of the recipes, will I be able to run this docker-container on multiple nodes? 2.) From the docker documentation I learned that using docker swarm might not be possible due to the fact that a 'pod' can only run on one node? [might not be a suitable question for this group...] 3.) Perhaps modifications to the container are necessary, or is the container able to run 'as is'? I have no idea....and I, myself, might not be able to solve this...[maybe also, not the right place for this Q] 4.) Is the OpenHPC suite able to 'distribute' this container on multiple nodes? Any help, or just pointing me in the right direction, is appreciated. If it is possible with Docker&OpenHPC, I will buy a master and a second node (connected via IB). If not, I have to dig in compiling the software myself and use OpenHPC, as I am sure this solution is possible (this is the 'usual' way of running the solver on a cluster, but for me, this is a whole new undertaking). Thank you in advance, Mario. |
|
Adrian Reber
I would say it is possible, but probably not as easy as you hope.
toggle quoted message
Show quoted text
From what you describe, it sounds like everything is running in your container. mpirun and the process mpirun starts. The difference with running it on an OpenHPC based system is, that usually mpirun is started on one of the compute nodes and mpirun figures out how to start the processes on all other compute nodes and how all these processes communicate with each other via IB. If mpirun is running in the container, I am not sure how it can reach the other compute nodes in your cluster. So if containers are involved it is often that mpirun starts a container for each process on each node. But I do not think that is possible with Docker. I have written an article which describes how this can be done with Podman: https://podman.io/blogs/2019/09/26/podman-in-hpc.html There are also multiple different container runtimes which are designed for the HPC use case. So, yes, it is possible, but probably using different tools and different container layout than what you currently have. Adrian On Tue, May 26, 2020 at 02:57:30AM -0700, linuxgeek@... wrote:
Hello, |
|
linuxgeek@...
Dear Mr. Reber!
Thanks for your ultra-quick answer. Yes, everything is done inside the container. That's why it is so convenient for me... :-) I will digest your answer and do some more digging. However, this is the container: https://github.com/tianyikillua/code_aster_on_docker It is really high performance stuff, well executed by tianyikillua. Kind regards, Mario. |
|
Brayford, David <David.Brayford@...>
It would be possible to convert the Docker image to a Charlecloud or Singularity image and be able to run across multiple nodes with MPI.
I've done this successfully with Charliecloud.
A quick description can be found from the LRZ webpage with a sample Slurm script https://doku.lrz.de/display/PUBLIC/Charliecloud+at+LRZ
I've been able to distribute a containerized MPI job across multiple nodes. However, you will need to configure the cluster correctly and setup Slurm to manage the job resources.
I would first try a simple containerized MPI "hello world" example to verify that the MPI job is running on different nodes. Then try your application.
David From: OpenHPC-users@groups.io <OpenHPC-users@groups.io> on behalf of linuxgeek@... <linuxgeek@...>
Sent: 26 May 2020 11:57:30 To: OpenHPC-users@groups.io Subject: [openhpc-users] Running a Docker container on OpenHPC Cluster with multiple nodes Hello,
first of all, I must say I am a total newbie to OpenHPC and not a software engineer. So please excuse any dumb question. However, I was not able to find any useful information to my questions below. Imagine the following: I have been using a docker container very successfully on a 2 CPU machine until now. It was created by a very talented software engineer. This container is built around a FEA solver including OpenMPI and OpenMP, until now I was not able to compile this software bundle myself, thus I am using the container on this 2 CPU machine. The container runs on both CPUs an distributes the model and simulation on both CPUs and their cores. So far, so good. Here are my questions, again, I am sorry if they are self-explanatory/dumb to some of you. To me, this is not trivial: 1.) If I set up a cluster with OpenHPC according to one of the recipes, will I be able to run this docker-container on multiple nodes? 2.) From the docker documentation I learned that using docker swarm might not be possible due to the fact that a 'pod' can only run on one node? [might not be a suitable question for this group...] 3.) Perhaps modifications to the container are necessary, or is the container able to run 'as is'? I have no idea....and I, myself, might not be able to solve this...[maybe also, not the right place for this Q] 4.) Is the OpenHPC suite able to 'distribute' this container on multiple nodes? Any help, or just pointing me in the right direction, is appreciated. If it is possible with Docker&OpenHPC, I will buy a master and a second node (connected via IB). If not, I have to dig in compiling the software myself and use OpenHPC, as I am sure this solution is possible (this is the 'usual' way of running the solver on a cluster, but for me, this is a whole new undertaking). Thank you in advance, Mario. |
|
linuxgeek@...
Dear Mr. Brayford,
big thank you for answer also. I read about converting a docker to Singularity just yesterday. I will have that in mind also. Thank you. Do you guys mind if I link this thread to the Code-Aster-Forum? I guess it would be valueable for them also. Kind Regards, Mario. |
|
John Hearns
H Mario. I think you are getting very good advice here. On your OpenHPC cluster there will be a batch or scheduling system. It this named Slurm? I would convert the Docker container to Singularity format. Then Singularity will interface with Slurm easily. Try running these command on your cluster login node: sinfo modules avail On Tue, 26 May 2020 at 13:03, <linuxgeek@...> wrote: Dear Mr. Brayford, |
|
linuxgeek@...
Dear Mr. Hearns,
thank you, I think so too... :-) I do not have the necessary hardware yet, I will decide in the near future if and what I want to buy....it is still a hobby... :-) Thank you all, Mario. |
|