Mice - Non-CTSS HPC Software Publishing
From TeraGrid Wiki
Some TeraGrid resource providers use the following methods to maintain and publish software information.
Others manually maintain the information on web pages.
HPC Software Repository
About half of the TeraGrid's RPs maintain and publish a comprehensive list of the HPC software they support using the HPC Software Repository. Several RPs have listed some of their HPC software into this application but aren't actively maintaining it.
The HPC Software Repository is developed and maintained by NCSA, and uses a web interface to maintain the data in a central database, and web interfaces for users to search/browse the information.
CTSS 4 Information Services
The TeraGrid CTSS 4 capability/kit registry information service, in production as of August 2007, includes information about CTSS software components.
RPs maintain the CTSS software information in human editable flat files, and publish their information in a local MDS4 information service. The GIG operates TeraGrid wide MDS4 information services that aggregate the information from all RP information services. Users, gateway developers, the TGUP, and TG User Documentation can access this information thru Globus client tools and standard HTTP interfaces.
TACC Software Table
This section describes the issues or concerns with the current situation. Please add your name in parenthesis to each concern you author. If you share a concern written by someone else, please add your name to the author's.
Data Maintenance Concerns
- CTSS includes some HPC software. RPs using the HPC Software Repository need to maintain information about that software in two places.
User Presentation Concerns
- TeraGrid users don't have a consistent way to view what non-CTSS HPC software is available at each RP (JP Navarro).
- The HPC Software Repository doesn't include SoftEnv information needed by users to access software (JP Navarro).
- We need a public machine readable interface of HPC software information usable by user documentation, the TG user portal, gateways, and peer grids (JP Navarro).
The following individual are participating in the discussions and solution preparation. Add yourself if you'd like to participate.
- JP Navarro (facilitator/Information Services)
- Matt Heinzel (Operations/UFP AD)
- Maytal Dahan (TGUP team, user/consumer of this information)
- Praveen Nuthulapati (TGUP team, user/consumer of this information)
- Patrick Hurley (TGUP team, user/consumer of this information)
- Diana Diehl (TG documentation, user/consumer of this information)
- Michael Dwyer (TG documentation, user/consumer of this information)
- David Carver (TACC RP, content maintainer representative)
- Aaron Shelmire (PSC RP)
- Deb Nigra (PSC documentation)
- Mike Lowe (IU RP)
- Preston Smith (Purdue RP)
- Sandie Kappes (HPC Software Repository application owner)
To provide TeraGrid users with a single coordinated way to discover what non-CTSS software is available on each TeraGrid resource (JP).
To give TeraGrid RPs the flexibility to source and maintain HPC software information however they wish.
Minimize the effort required by consumers to publish the information through their preferred channels (Michael).
Data Maintenance Requirements
It is not necessary that all RPs maintain their data in using the same tools as long as the public interface is consistent (JP).
RPs need to have the flexibility to maintain their HPC software catalog using existing local tools and methods (JP).
User Presentation Requirements
Need a way to maintain the information that doesn't duplicate CTSS vs non-CTSS information (JP).
To minimize the effort required by RPs to achieve coordination (JP).
Here is some ideas that a third party software MDS could address: (David Carver)
- The current method of tracking third party software packages and version is difficult and the information about packages at RP system is incomplete or obsolete.
- Reduce the duplication of recording software packages and versions in different locations and with different procedures.
- Dynamic updating of user third party software packages and version on RP systems.
- Scripts are available to parse Softenv and Modules configuration files to help sysadmins create simple MDS text files about each package.
- MDS schema could be extended to include URL points to additional information.
- Has already been used to publish VTSS software packages at TACC and adopted by ANL/UC.
- Could be used for account allocations where PIs may require party software packages and versions that are not available on all RP systems.
- Could be used by the MetaSchedule to schedule jobs that require a third party software package and version.
Input from Jay:
I think Maytal passed on the requirements to someone, but my message below listed the most crucial things users look for when exploring software options. I'll summarize a bit better here though: - easy to search for an application by name (obvious) and list by name: most users will already know the app they want to use - tags/metadata for each application for field of science/domain: so for example biologists can choose to list only bio apps, or maybe to look for alternatives to what they are using - platform each application is on: since some users' resource preference is even stronger than their application preference - version number for each application installed on each platform: since researchers often depend on very latest enhancements) - short statement of purpose (summary) of each application and/or keywords for each application: so for example a biologist could not just list bio applications but look specifically for applications that do MD, or protein folding, or....) - whether application is an 'application' or a library (as many programming tools are) If each application has sufficient metadata, we can make the TGUP interface flexible enough to provide domain views, resource views, etc. BTW, the need for this became apparent when I served on an NSF workshop for CI for Biological and Chemical Engineering. We have very few users in those areas, and they asked for domain views that would list domain software, projects/allocations by other users in their domains, etc. The TGUP default view is a resources & services view, but we'll be adding domain views (Patrick, Praveen) as well as enhancing the personalizability/personal views (Patrick) over the coming months.
JP, I should add: please let me know when you've got it 'ready-ish' so we can start developing the TGUP interface. In fact, anything you can tell us in advance about how to poll this service for the data would be helpful. We'll probably want to start with a few simple views: - list all software: table with name, version, platform, and fields of science, default sort by package name but also sortable by field and platform - filter list by field of science, or by platform - search for package by keywords (e.g. protein)