CTSS 3 Change Plan - Secure MDS Deployment for User Portal Information
From TeraGrid Wiki
|Revision as of 01:53, 6 March 2007
Navarro (Talk | contribs)
← Previous diff
|Revision as of 14:52, 6 March 2007
Jmlowe (Talk | contribs)
Next diff →
|Line 130:||Line 130:|
|-||<td align="center"></td>||+||<td align="center">February 16, 2007/Mike Lowe</td>|
|<td align="center"></td>||<td align="center"></td>|
Revision as of 14:52, 6 March 2007
The purpose of this form is to collect basic information from people planning changes to the TeraGrid system (CTSS, network, central services, etc.) that will allow others in the project to understand what is being planned and how it affects them.
People and Timing
|Project Lead(s):||JP Navarro, Kelly Gaither|
|Area Director(s):||Kelly Gaither, JP Navarro|
|Team members:||Eric Roberts, Maytal Dahan, Neill Miller|
Describe important project timing elements.
|Change requested:||June 2006|
|Change availability:||December 2006 (testing)
January 30, 2007 (production)
|Change deployed:||February 9, 2007|
|Change production:||February 14, 2007|
Describe the problem that this change addresses in terms that users would understand. This is the first place people will look to find out why the change is being made.
The TeraGrid User Portal currently collects compute resource job queue and scheduling load information using an application (GPIR) installed and supported on all compute resources by User Portal development/support staff. GPIR is not installed or supported by the resource providers themselves and is essentially running as a user level application. This project will replace this GPIR data collecting application with an information provider that's integrated into CTSS 3 MDS4 Information Services. The User Portal will continue to present the same information to users, but instead collect it from CTSS 3 MDS4 Information Services.
What criteria will you use to determine that the change has been successfully completed?
- User Portal users will see no change in their ability to view compute resource job queue and load information
- Compute resource job queue and load information is not public and access must be limited to the User Portal and other authorized users
- User Portal support staff will no longer need to deploy and support GPIR on all TeraGrid compute resources
- Compute resource job queue and load information publishing will leverage CTSS 3 MDS4 Information Services
- Publishing this information for the User Portal will be developed and supported thru the CTSS process, and deployed and supported in production by the resource provider administrators and the GIG Software Integration team
- The central information provider used to pull data from MDS and push to GPIR will run in production on a TeraGrid User Portal server on a cron job
Affected user capabilities
The User Portal view of compute resource queue contents and scheduling load. The data itself will not change, just the method of collecting it.
Affected Project Areas
Hardware and Software Providers
No new hardware or software providers are part of this deployment.
- User Portal team notes that a new information provider has been developed to pull data from MDS and push to GPIR but this provider will not be under CTSS 3 process control. It will be controlled by the User Portal team (ESR).
The new Secure MDS service will be implemented using existing CTSS 3 Globus software components.
The Globus MDS team will take the base GPIR code provided by the User Portal development/support team and convert it into an MDS information provider.
Ongoing fixes and improvements to the MDS information provider will be kept on the TeraGrid's CVS repository here.
Resource Providers (RPs)
Resource providers will need to:
- Deploy a new CTSS 3 GT 4 WS container pre-configured to run a secure MDS service and the User Portal information provider
- Follow a standard CTSS 3 Pacman install and perform install tests that verify the services are running properly
- Verify that their resource information is showing up on the [testing User Portal]
- Provide ongoing production support for the new service.
- Check-off on the table below that their service is in production from their perspective.
- When a service failure is determined to be a software defect, it will be reported to email@example.com where it will be assigned to "TG GIG CTSS".
Some resource providers voluntered to deploy this service to assist with development and testing. The testing services are:
The table below describes the production status of Secure MDS deployments on RP resources.
Each service will be considered in production after two things happen:
- An RP enters a date and their name indicating the service is deployed, and deployment tests passed.
- The UP team verifies the data looks valid and the UP has switched to collecting data from MDS.
|Resource Provider||Resource Name||RP check-off date/person||UP check-off date|
|IU||BigRed||February 16, 2007/Mike Lowe|
|NCSA||Cobalt||March 3, 2007/Doru Marcusiu|
|NCSA||Copper||March 3, 2007/Doru Marcusiu|
|NCSA||Mercury||March 4, 2007/Doru Marcusiu|
|NCSA||Tungsten||March 3, 2007/Doru Marcusiu|
|PSC||BigBen||March 2, 2007/Derek Simmel|
|PSC||Rachel||March 2, 2007/Derek Simmel|
|Purdue||Lear (tg-login/PBS)||March 3, 2007/Preston Smith|
|Purdue||Condor (tg-gatekeeper/Condor)||March 3, 2007/Preston Smith|
|SDSC||BG/L||Feb 19, 2007/John White|
|SDSC||DTF||Feb 6, 2007/Tony Vu|
|TACC||LoneStar||Feb 5, 2007/David Carver|
|UC/ANL||DTF/Viz||Jan 30, 2007/Jason Hedden|
User Portal development/support staff will:
- provide the base GPIR code that will form the basis for the new MDS information providers
- assist with MDS information provider design
- develop User Portal ability to discover Secure MDS services by querying the TeraGrid wide Secure MDS services at mds.teragrid.org port 8448
- develop User Portal ability to retrieve compute resource job queue and scheduling load information from the Secure MDS services at each resource
- implement and test capability on the testing user portal
- perform basic verification that information being collected and displayed is accurate and check-off as each RP Secure MDS service is considered functional
- transition the production User Portal to the new Secure MDS method of data collection at production
- shut off all existing user-run information provider services running at each RP site
User Services (ASTA/Help Desk)
During the testing and production rollout User Services team members can visit the [testing User Portal] and compare data being collected with the Secure MDS service to the data being displayed on the production User Portal to verify that is correct.
User documentation doesn't need to be updated.
Network, Operations, and Security (NOS)
A new series of Inca tests to monitor the Secure MDS services at RPs will be deployed. A new Inca test to test the aggregating Secure MDS service at mds.teragrid.org:8448 will be deployed.
Issues with the Secure MDS services at RPs should be reported to firstname.lastname@example.org and routed to "TG Ops <site>" for RP administrator service. Issues with the mds.teragrid.org:8448 Secure MDS service should be reported to email@example.com and routed to <bold>TG GIG Services</bold> and assigned to JP Navarro.
Software Integration (SI)
Software Integration will:
- assist with MDS information provider design
- prepare packaging and installation documentation
- coordinate testing and production deployments
- deploy and operate the aggregating Secure MDS service running on mds.teragrid.org port 8448
- assist RPs with Secure MDS issues at RPs
- receive and manage capability defects reported thru firstname.lastname@example.org and assigned to "TG GIG CTSS"
- provide an Inca test that can test the status of both RP and TG wide Secure MDS services
Production status of GIG operated services:
|Service||GIG SI check-off date/person||UP check-off date/person|
|mds.teragrid.org:8448||Feb 16, 2007/JP Navarro|
Project Areas That Are Not Affected
Move areas above to this section if they are not affected by this change.
Science Gateways (GW)