Date   

OpenHPC TSC (regular) - Wed, 9/19/18 8:00am-9:00am #cal-reminder

tsc@lists.openhpc.community Calendar <tsc@...>
 

Reminder:
OpenHPC TSC (regular)

When:
Wednesday, 19 September 2018
8:00am to 9:00am
(GMT-07:00) America/Los Angeles

Where:
https://zoom.us/j/556149142

Description:

Hi there, 
 
OpenHPC Project is inviting you to a scheduled Zoom meeting. 
 
Topic: OpenHPC TSC (Regular)
Time: Mar 6, 2018 8:00 AM Pacific Time (US and Canada)
    Every week on Tuesday
   
   
Join from PC, Mac, Linux, iOS or Android: https://zoom.us/j/556149142
 
Or iPhone one-tap :
    US: +16465588656,,556149142#  or +16699006833,,556149142# 
Or Telephone:
    Dial(for higher quality, dial a number based on your current location): 
        US: +1 646 558 8656  or +1 669 900 6833  or +1 855 880 1246 (Toll Free) or +1 877 369 0926 (Toll Free)
    Meeting ID: 556 149 142
    International numbers available: https://zoom.us/zoomconference?m=5gVPnokLT25AiNsLiulEreZDJaSrbAEg
 

View Event.


OpenHPC TSC (regular) - Wed, 9/19/18 8:00am-9:00am #cal-reminder

tsc@lists.openhpc.community Calendar <tsc@...>
 

Reminder:
OpenHPC TSC (regular)

When:
Wednesday, 19 September 2018
8:00am to 9:00am
(GMT-07:00) America/Los Angeles

Where:
https://zoom.us/j/556149142

Description:

Hi there, 
 
OpenHPC Project is inviting you to a scheduled Zoom meeting. 
 
Topic: OpenHPC TSC (Regular)
Time: Mar 6, 2018 8:00 AM Pacific Time (US and Canada)
    Every week on Tuesday
   
   
Join from PC, Mac, Linux, iOS or Android: https://zoom.us/j/556149142
 
Or iPhone one-tap :
    US: +16465588656,,556149142#  or +16699006833,,556149142# 
Or Telephone:
    Dial(for higher quality, dial a number based on your current location): 
        US: +1 646 558 8656  or +1 669 900 6833  or +1 855 880 1246 (Toll Free) or +1 877 369 0926 (Toll Free)
    Meeting ID: 556 149 142
    International numbers available: https://zoom.us/zoomconference?m=5gVPnokLT25AiNsLiulEreZDJaSrbAEg
 

View Event.


OpenHPC TSC (regular) - Wed, 9/12/18 8:00am-9:00am #cal-reminder

tsc@lists.openhpc.community Calendar <tsc@...>
 

Reminder:
OpenHPC TSC (regular)

When:
Wednesday, 12 September 2018
8:00am to 9:00am
(GMT-07:00) America/Los Angeles

Where:
https://zoom.us/j/556149142

Description:

Hi there, 
 
OpenHPC Project is inviting you to a scheduled Zoom meeting. 
 
Topic: OpenHPC TSC (Regular)
Time: Mar 6, 2018 8:00 AM Pacific Time (US and Canada)
    Every week on Tuesday
   
   
Join from PC, Mac, Linux, iOS or Android: https://zoom.us/j/556149142
 
Or iPhone one-tap :
    US: +16465588656,,556149142#  or +16699006833,,556149142# 
Or Telephone:
    Dial(for higher quality, dial a number based on your current location): 
        US: +1 646 558 8656  or +1 669 900 6833  or +1 855 880 1246 (Toll Free) or +1 877 369 0926 (Toll Free)
    Meeting ID: 556 149 142
    International numbers available: https://zoom.us/zoomconference?m=5gVPnokLT25AiNsLiulEreZDJaSrbAEg
 

View Event.


TSC meeting cancelled for Sep 12

Karl W. Schulz
 

Hello fine TSC folks,

The TSC meeting for this week (Sep 12) is cancelled. We’ll plan on renewing next week.

Thanks.

-k


OpenHPC TSC (regular) - Wed, 9/12/18 8:00am-9:00am #cal-reminder

tsc@lists.openhpc.community Calendar <tsc@...>
 

Reminder:
OpenHPC TSC (regular)

When:
Wednesday, 12 September 2018
8:00am to 9:00am
(GMT-07:00) America/Los Angeles

Where:
https://zoom.us/j/556149142

Description:

Hi there, 
 
OpenHPC Project is inviting you to a scheduled Zoom meeting. 
 
Topic: OpenHPC TSC (Regular)
Time: Mar 6, 2018 8:00 AM Pacific Time (US and Canada)
    Every week on Tuesday
   
   
Join from PC, Mac, Linux, iOS or Android: https://zoom.us/j/556149142
 
Or iPhone one-tap :
    US: +16465588656,,556149142#  or +16699006833,,556149142# 
Or Telephone:
    Dial(for higher quality, dial a number based on your current location): 
        US: +1 646 558 8656  or +1 669 900 6833  or +1 855 880 1246 (Toll Free) or +1 877 369 0926 (Toll Free)
    Meeting ID: 556 149 142
    International numbers available: https://zoom.us/zoomconference?m=5gVPnokLT25AiNsLiulEreZDJaSrbAEg
 

View Event.


Re: Upstreaming Ansible Automation

Renato Golin
 

On Wed, 5 Sep 2018 at 19:07, Karl W. Schulz <karl@...> wrote:
Absolutely, just added. (and sorry, I should have done that previously)
awesome, thanks!


Re: Upstreaming Ansible Automation

Karl W. Schulz
 



On Sep 5, 2018, at 2:02 PM, Renato Golin <renato.golin@...> wrote:

On Wed, 5 Sep 2018 at 18:46, Karl W. Schulz <karl@...> wrote:
We had the XSEDE/ansible quickstart guide listed under User Resources on the wiki, so I’ve added another link there to Paul’s ansible repo.

Can you also add ours?

https://github.com/Linaro/ansible-playbook-for-ohpc

Absolutely, just added.  (and sorry, I should have done that previously)

-k


Re: Upstreaming Ansible Automation

Renato Golin
 

On Wed, 5 Sep 2018 at 18:46, Karl W. Schulz <karl@...> wrote:
We had the XSEDE/ansible quickstart guide listed under User Resources on the wiki, so I’ve added another link there to Paul’s ansible repo.
Can you also add ours?

https://github.com/Linaro/ansible-playbook-for-ohpc

thanks!


Re: Upstreaming Ansible Automation

Karl W. Schulz
 

We had the XSEDE/ansible quickstart guide listed under User Resources on the wiki, so I’ve added another link there to Paul’s ansible repo.

-k



On Sep 5, 2018, at 1:11 PM, Renato Golin <renato.golin@...> wrote:

On Wed, 5 Sep 2018 at 17:50, Jeff ErnstFriedman
<jernstfriedman@...> wrote:
I believe this is a good home for it, not sure if we currently have a space for OpenHPC 'enhancements' like we do for presentations.
If Karl doesn't have an existing location, I recommend we create a place called 'Community Contributions' where we list 'unofficial' OpenHPC enhancements/recipes.

Sounds good to me. We'll also need to allow more people to edit the Wiki.





Re: Upstreaming Ansible Automation

Renato Golin
 

On Wed, 5 Sep 2018 at 17:50, Jeff ErnstFriedman
<jernstfriedman@...> wrote:
I believe this is a good home for it, not sure if we currently have a space for OpenHPC 'enhancements' like we do for presentations.
If Karl doesn't have an existing location, I recommend we create a place called 'Community Contributions' where we list 'unofficial' OpenHPC enhancements/recipes.
Sounds good to me. We'll also need to allow more people to edit the Wiki.


Re: Upstreaming Ansible Automation

Jeff ErnstFriedman <jernstfriedman@...>
 

I believe this is a good home for it, not sure if we currently have a space for OpenHPC 'enhancements' like we do for presentations.

If Karl doesn't have an existing location, I recommend we create a place called 'Community Contributions' where we list 'unofficial' OpenHPC enhancements/recipes.



Jeff ErnstFriedman
2201 Broadway #M05, Oakland, CA 94612
mobile: 510.593.1367
skype: jeffrey.ernstfriedman
twitter: @namdeirf

On Wed, Sep 5, 2018 at 10:40 AM, Renato Golin <renato.golin@...> wrote:
On Wed, 5 Sep 2018 at 17:31, Paul Peltz Jr <peltz@...> wrote:
> Here are my cleaned up vanilla Ansible install scripts.
> https://github.com/lanl/ansible-ohpc-vanilla

Thanks Paul!

Karl/Jeff, what would be the best place to put this info? The ohpc wiki?

--renato





Re: Upstreaming Ansible Automation

Renato Golin
 

On Wed, 5 Sep 2018 at 17:31, Paul Peltz Jr <peltz@...> wrote:
Here are my cleaned up vanilla Ansible install scripts.
https://github.com/lanl/ansible-ohpc-vanilla
Thanks Paul!

Karl/Jeff, what would be the best place to put this info? The ohpc wiki?

--renato


Re: Upstreaming Ansible Automation

Paul Peltz Jr <peltz@...>
 

Here are my cleaned up vanilla Ansible install scripts.

https://github.com/lanl/ansible-ohpc-vanilla

Paul

On 9/5/18, 10:41 AM, "tsc@... on behalf of Renato Golin" <tsc@... on behalf of renato.golin@...> wrote:

Hi Paul,

I'll try to answer a few of the questions I can, but I'm not trying to
speak for other people.

On Wed, 5 Sep 2018 at 17:28, Paul Peltz Jr <peltz@...> wrote:
> I think one thing that we should discuss is how inclusive/exclusive we want to be of add on features within the Ansible recipe. For example, in Eric's tree there are GPU and login node option types.

Right, this is bound to happen. We don't have that problem today
because the only user of the existing recipes is TACC. Everyone else
has their own thing.

If we want to change that, then we'll have to do something different.
Karl's point is very valid that if the code is not in the main repo,
people will likely forget to update "the other one".

However, being in a separate repository has two main benefits:
1. As Jeff said, it allows us to have a sub-community, and
sub-maintainers, which can walk at a different pace, as long as that
syncs with the releases.
2. We can all fork that repo internally and have branches with our
local changes (GPU, distros, tools) before going upstream (or not).

While (1) is probably inevitable as communities grow (look a Linux!),
(2) can actually have a negative effect on validation, as the repo can
invisibly diverge from the main scripts.

If Ansible was the only way, it would be much easier, but that's not
at all that I am proposing.


> Also, it doesn't appear as if both versions have support for xcat, pbspro, OmniPath, SLES as additional options. Do we want an implementation that follows the recipe guide exactly for every combination that is supported?

As an end goal, yes. But I have also tried and I know how painful that
can be, so I'm not expecting it to be any time soon.

We're also investigating Mellanox Infiniband (our Lab's interconnect),
which has its own set of closed source issues. The fact that some of
our clusters use Multi-Host doesn't help either.

But how far do we want to go in all of that?

Do we just put everything in and each uses what needs? That would
violate the principle of "ship what's tested", because TACC will never
have *everything*.

The closer we get to real deployments, the harder will these questions be.


> I only deploy my system one way typically (warewulf + slurm).

Same here. We have too much else to worry about than testing xCAT or
PBSpro on Arm.


> I'm going to work on cleaning up my vanilla install branch and get it up on github to share as well. My initial review of the two projects seem reasonable as starting points,

Thanks! We should probably create a wiki page to hold all the existing
Ansible repositories, so that we know the number of people that use
and how other people are using.


> but I immediately noticed the vanilla stray from the recipes to support one's own environment which is inevitable with these projects.

Yup. We tried to move everything into extra args for Ansible, but
inevitably, some things always leak into logic or environment.

Thanks!
--renato


Re: Upstreaming Ansible Automation

Renato Golin
 

Hi Paul,

I'll try to answer a few of the questions I can, but I'm not trying to
speak for other people.

On Wed, 5 Sep 2018 at 17:28, Paul Peltz Jr <peltz@...> wrote:
I think one thing that we should discuss is how inclusive/exclusive we want to be of add on features within the Ansible recipe. For example, in Eric's tree there are GPU and login node option types.
Right, this is bound to happen. We don't have that problem today
because the only user of the existing recipes is TACC. Everyone else
has their own thing.

If we want to change that, then we'll have to do something different.
Karl's point is very valid that if the code is not in the main repo,
people will likely forget to update "the other one".

However, being in a separate repository has two main benefits:
1. As Jeff said, it allows us to have a sub-community, and
sub-maintainers, which can walk at a different pace, as long as that
syncs with the releases.
2. We can all fork that repo internally and have branches with our
local changes (GPU, distros, tools) before going upstream (or not).

While (1) is probably inevitable as communities grow (look a Linux!),
(2) can actually have a negative effect on validation, as the repo can
invisibly diverge from the main scripts.

If Ansible was the only way, it would be much easier, but that's not
at all that I am proposing.


Also, it doesn't appear as if both versions have support for xcat, pbspro, OmniPath, SLES as additional options. Do we want an implementation that follows the recipe guide exactly for every combination that is supported?
As an end goal, yes. But I have also tried and I know how painful that
can be, so I'm not expecting it to be any time soon.

We're also investigating Mellanox Infiniband (our Lab's interconnect),
which has its own set of closed source issues. The fact that some of
our clusters use Multi-Host doesn't help either.

But how far do we want to go in all of that?

Do we just put everything in and each uses what needs? That would
violate the principle of "ship what's tested", because TACC will never
have *everything*.

The closer we get to real deployments, the harder will these questions be.


I only deploy my system one way typically (warewulf + slurm).
Same here. We have too much else to worry about than testing xCAT or
PBSpro on Arm.


I'm going to work on cleaning up my vanilla install branch and get it up on github to share as well. My initial review of the two projects seem reasonable as starting points,
Thanks! We should probably create a wiki page to hold all the existing
Ansible repositories, so that we know the number of people that use
and how other people are using.


but I immediately noticed the vanilla stray from the recipes to support one's own environment which is inevitable with these projects.
Yup. We tried to move everything into extra args for Ansible, but
inevitably, some things always leak into logic or environment.

Thanks!
--renato


Re: Upstreaming Ansible Automation

Paul Peltz Jr <peltz@...>
 

I think one thing that we should discuss is how inclusive/exclusive we want to be of add on features within the Ansible recipe. For example, in Eric's tree there are GPU and login node option types. Also, it doesn't appear as if both versions have support for xcat, pbspro, OmniPath, SLES as additional options. Do we want an implementation that follows the recipe guide exactly for every combination that is supported? That may just be a goal to move towards. I know I did a implementation of everything and tried to do SLES as well but found it to be very difficult because of the whole license registration process required for SLES so I gave up on it. I don’t test all of the combos though because I only deploy my system one way typically (warewulf + slurm). I'm going to work on cleaning up my vanilla install branch and get it up on github to share as well. My initial review of the two projects seem reasonable as starting points, but I immediately noticed the vanilla stray from the recipes to support one's own environment which is inevitable with these projects.

Paul

On 9/5/18, 9:59 AM, "tsc@... on behalf of Renato Golin" <tsc@... on behalf of renato.golin@...> wrote:

On Thu, 30 Aug 2018 at 16:20, Derek Simmel <dsimmel@...> wrote:
> Thanks for this most valuable and practical effort!
>
> Plan sounds good to me - let us know how best to participate and to contribute/communicate feedback.

Hi Derek,

Thanks for the reply. Just for reference, these are the Ansible
recipes that were mentioned in the call today:

https://github.com/takekato/ansible-playbook-for-ohpc

https://github.com/Linaro/ansible-playbook-for-ohpc/tree/production

https://github.com/XSEDE/CRI_XCBC

There are more people with other repos, too. Feel free to share here
your links, so we can all know what everyone is doing.

We should try to common up a strategy, so we can create a pull request
to the official OpenHPC repository and enable the upstream recipe in
the existing CI loop.

cheers,
--renato


Re: Upstreaming Ansible Automation

Renato Golin
 

On Thu, 30 Aug 2018 at 16:20, Derek Simmel <dsimmel@...> wrote:
Thanks for this most valuable and practical effort!

Plan sounds good to me - let us know how best to participate and to contribute/communicate feedback.
Hi Derek,

Thanks for the reply. Just for reference, these are the Ansible
recipes that were mentioned in the call today:

https://github.com/takekato/ansible-playbook-for-ohpc

https://github.com/Linaro/ansible-playbook-for-ohpc/tree/production

https://github.com/XSEDE/CRI_XCBC

There are more people with other repos, too. Feel free to share here
your links, so we can all know what everyone is doing.

We should try to common up a strategy, so we can create a pull request
to the official OpenHPC repository and enable the upstream recipe in
the existing CI loop.

cheers,
--renato


OpenHPC TSC (regular) - Wed, 9/5/18 8:00am-9:00am #cal-reminder

tsc@lists.openhpc.community Calendar <tsc@...>
 

Reminder:
OpenHPC TSC (regular)

When:
Wednesday, 5 September 2018
8:00am to 9:00am
(GMT-07:00) America/Los Angeles

Where:
https://zoom.us/j/556149142

Description:

Hi there, 
 
OpenHPC Project is inviting you to a scheduled Zoom meeting. 
 
Topic: OpenHPC TSC (Regular)
Time: Mar 6, 2018 8:00 AM Pacific Time (US and Canada)
    Every week on Tuesday
   
   
Join from PC, Mac, Linux, iOS or Android: https://zoom.us/j/556149142
 
Or iPhone one-tap :
    US: +16465588656,,556149142#  or +16699006833,,556149142# 
Or Telephone:
    Dial(for higher quality, dial a number based on your current location): 
        US: +1 646 558 8656  or +1 669 900 6833  or +1 855 880 1246 (Toll Free) or +1 877 369 0926 (Toll Free)
    Meeting ID: 556 149 142
    International numbers available: https://zoom.us/zoomconference?m=5gVPnokLT25AiNsLiulEreZDJaSrbAEg
 

View Event.


OpenHPC TSC (regular) - Wed, 9/5/18 8:00am-9:00am #cal-reminder

tsc@lists.openhpc.community Calendar <tsc@...>
 

Reminder:
OpenHPC TSC (regular)

When:
Wednesday, 5 September 2018
8:00am to 9:00am
(GMT-07:00) America/Los Angeles

Where:
https://zoom.us/j/556149142

Description:

Hi there, 
 
OpenHPC Project is inviting you to a scheduled Zoom meeting. 
 
Topic: OpenHPC TSC (Regular)
Time: Mar 6, 2018 8:00 AM Pacific Time (US and Canada)
    Every week on Tuesday
   
   
Join from PC, Mac, Linux, iOS or Android: https://zoom.us/j/556149142
 
Or iPhone one-tap :
    US: +16465588656,,556149142#  or +16699006833,,556149142# 
Or Telephone:
    Dial(for higher quality, dial a number based on your current location): 
        US: +1 646 558 8656  or +1 669 900 6833  or +1 855 880 1246 (Toll Free) or +1 877 369 0926 (Toll Free)
    Meeting ID: 556 149 142
    International numbers available: https://zoom.us/zoomconference?m=5gVPnokLT25AiNsLiulEreZDJaSrbAEg
 

View Event.


Re: Upstreaming Ansible Automation

Derek Simmel
 

Renato,
Takeharu,

Thanks for this most valuable and practical effort!

Plan sounds good to me - let us know how best to participate and to contribute/communicate feedback.

- Derek

On Aug 30, 2018, at 11:07 AM, Renato Golin <renato.golin@...> wrote:

Hi folks,

After ISC has passed and we have tested our recipes on different
clusters (Arm and Intel, physical and virtual), we are confident that
our recipes work well enough to be shared more widely.

It's still work in progress, and has still a long way to go, but if it
stays within our reach only, it won't have the volume of testing and
collaboration that would make it a true upstream project.

In light of that, I would like to propose the OpenHPC project in
GitHub to hold the official version.

Right now, Takeharu, as the main author, has the "upstream" project:

https://github.com/takekato/ansible-playbook-for-ohpc

Linaro has a fork, which we use for our lab (branch "production"):

https://github.com/Linaro/ansible-playbook-for-ohpc

But the upstream process is not well defined, which makes it harder
for us to collaborate. I also have heard of at least two other groups
working with Ansible, and it would be a shame to end up with multiple
separate repositories.

Here's our proposal.

1. Move "upstream" to OpenHPC space, fork at Linaro and Fujitsu (as
well as other labs).

2. Keep upstream/master clean, and collaborate on specific branches
based on labs (ex. Linaro, Fujitsu) or features (ex. Mellanox IB,
stateless).

3. Have a set of rules and a cadence for merging pull requests. For
example, branches can be worked on freely, but merging them to master
requires test from current users (labs).

4. Before releases, we freeze the current master into a release branch
(ex. 1.3.6).

5. During release validation, we test the Ansible recipes with the
release branch, merging fixes to master and back-porting to the
release branch until validated.

6. Releases will just point to their own branch in the document, so
that people are aware and can use the right branch with the right
release.

Given that there are different users and uses, we cannot enforce a
requirement that it needs to be tested at TACC before releasing. But
given that this is not a requirement to install OpenHPC, and it's a
separate project with a lot less impact for the overall community
(only those using Ansible), I don't see it as a huge problem.

Permission-wise, we should restrict the number of people that can pull
requests and create branches, as to keep the history and structure
intact for previous releases and for stability reasons.

Since Git is a distributed version control system, we really don't
need to use the upstream repository to successfully collaborate across
labs. But it should remain as the "source of truth" when it comes to
OpenHPC Ansible Automation.

Any objections? Better ideas?

--
cheers,
--renato

---
Derek Simmel
Pittsburgh Supercomputing Center
dsimmel@...
+1 (412) 268-1035


Upstreaming Ansible Automation

Renato Golin
 

Hi folks,

After ISC has passed and we have tested our recipes on different
clusters (Arm and Intel, physical and virtual), we are confident that
our recipes work well enough to be shared more widely.

It's still work in progress, and has still a long way to go, but if it
stays within our reach only, it won't have the volume of testing and
collaboration that would make it a true upstream project.

In light of that, I would like to propose the OpenHPC project in
GitHub to hold the official version.

Right now, Takeharu, as the main author, has the "upstream" project:

https://github.com/takekato/ansible-playbook-for-ohpc

Linaro has a fork, which we use for our lab (branch "production"):

https://github.com/Linaro/ansible-playbook-for-ohpc

But the upstream process is not well defined, which makes it harder
for us to collaborate. I also have heard of at least two other groups
working with Ansible, and it would be a shame to end up with multiple
separate repositories.

Here's our proposal.

1. Move "upstream" to OpenHPC space, fork at Linaro and Fujitsu (as
well as other labs).

2. Keep upstream/master clean, and collaborate on specific branches
based on labs (ex. Linaro, Fujitsu) or features (ex. Mellanox IB,
stateless).

3. Have a set of rules and a cadence for merging pull requests. For
example, branches can be worked on freely, but merging them to master
requires test from current users (labs).

4. Before releases, we freeze the current master into a release branch
(ex. 1.3.6).

5. During release validation, we test the Ansible recipes with the
release branch, merging fixes to master and back-porting to the
release branch until validated.

6. Releases will just point to their own branch in the document, so
that people are aware and can use the right branch with the right
release.

Given that there are different users and uses, we cannot enforce a
requirement that it needs to be tested at TACC before releasing. But
given that this is not a requirement to install OpenHPC, and it's a
separate project with a lot less impact for the overall community
(only those using Ansible), I don't see it as a huge problem.

Permission-wise, we should restrict the number of people that can pull
requests and create branches, as to keep the history and structure
intact for previous releases and for stability reasons.

Since Git is a distributed version control system, we really don't
need to use the upstream repository to successfully collaborate across
labs. But it should remain as the "source of truth" when it comes to
OpenHPC Ansible Automation.

Any objections? Better ideas?

--
cheers,
--renato