CTSS 3 Change Plan - Secure MDS Deployment for User Portal Information

From TeraGrid Wiki

(Difference between revisions)
Jump to: navigation, search
Revision as of 01:53, 6 March 2007
Navarro (Talk | contribs)

← Previous diff
Revision as of 14:52, 6 March 2007
Jmlowe (Talk | contribs)

Next diff →
Line 130: Line 130:
<td>IU</td> <td>IU</td>
<td>BigRed</td> <td>BigRed</td>
-<td align="center"></td>+<td align="center">February 16, 2007/Mike Lowe</td>
<td align="center"></td> <td align="center"></td>
</tr> </tr>

Revision as of 14:52, 6 March 2007

The purpose of this form is to collect basic information from people planning changes to the TeraGrid system (CTSS, network, central services, etc.) that will allow others in the project to understand what is being planned and how it affects them.

Contents

People and Timing

Project Team

Project Lead(s): JP Navarro, Kelly Gaither
Area Director(s): Kelly Gaither, JP Navarro
Requestor(s):
Team members: Eric Roberts, Maytal Dahan, Neill Miller

Project Timeline

Describe important project timing elements.

Priority: Moderate
Change requested: June 2006
Change availability: December 2006 (testing)
January 30, 2007 (production)
Change deployed: February 9, 2007
Change production: February 14, 2007

Description

Brief Description

Publish TeraGrid compute resource job queue and scheduling load information using CTSS 3 MDS4 Information Services for User Portal users.

Problem Statement

Describe the problem that this change addresses in terms that users would understand. This is the first place people will look to find out why the change is being made.

The TeraGrid User Portal currently collects compute resource job queue and scheduling load information using an application (GPIR) installed and supported on all compute resources by User Portal development/support staff. GPIR is not installed or supported by the resource providers themselves and is essentially running as a user level application. This project will replace this GPIR data collecting application with an information provider that's integrated into CTSS 3 MDS4 Information Services. The User Portal will continue to present the same information to users, but instead collect it from CTSS 3 MDS4 Information Services.

Requirements

What criteria will you use to determine that the change has been successfully completed?

  • User Portal users will see no change in their ability to view compute resource job queue and load information
  • Compute resource job queue and load information is not public and access must be limited to the User Portal and other authorized users
  • User Portal support staff will no longer need to deploy and support GPIR on all TeraGrid compute resources
  • Compute resource job queue and load information publishing will leverage CTSS 3 MDS4 Information Services
  • Publishing this information for the User Portal will be developed and supported thru the CTSS process, and deployed and supported in production by the resource provider administrators and the GIG Software Integration team
  • The central information provider used to pull data from MDS and push to GPIR will run in production on a TeraGrid User Portal server on a cron job

Affected user capabilities

The User Portal view of compute resource queue contents and scheduling load. The data itself will not change, just the method of collecting it.

Affected Project Areas

Hardware and Software Providers

No new hardware or software providers are part of this deployment.

  • User Portal team notes that a new information provider has been developed to pull data from MDS and push to GPIR but this provider will not be under CTSS 3 process control. It will be controlled by the User Portal team (ESR).

The new Secure MDS service will be implemented using existing CTSS 3 Globus software components.

The Globus MDS team will take the base GPIR code provided by the User Portal development/support team and convert it into an MDS information provider.

Ongoing fixes and improvements to the MDS information provider will be kept on the TeraGrid's CVS repository here.

Resource Providers (RPs)

Resource providers will need to:

  • Deploy a new CTSS 3 GT 4 WS container pre-configured to run a secure MDS service and the User Portal information provider
  • Follow a standard CTSS 3 Pacman install and perform install tests that verify the services are running properly
  • Verify that their resource information is showing up on the [testing User Portal]
  • Provide ongoing production support for the new service.
  • Check-off on the table below that their service is in production from their perspective.
  • When a service failure is determined to be a software defect, it will be reported to help@teragrid.org where it will be assigned to "TG GIG CTSS".

Some resource providers voluntered to deploy this service to assist with development and testing. The testing services are:


The table below describes the production status of Secure MDS deployments on RP resources.
Each service will be considered in production after two things happen:

  • An RP enters a date and their name indicating the service is deployed, and deployment tests passed.
  • The UP team verifies the data looks valid and the UP has switched to collecting data from MDS.
Resource Provider Resource Name RP check-off date/person UP check-off date
IU BigRed February 16, 2007/Mike Lowe
NCSA Cobalt March 3, 2007/Doru Marcusiu
NCSA Copper March 3, 2007/Doru Marcusiu
NCSA Mercury March 4, 2007/Doru Marcusiu
NCSA Tungsten March 3, 2007/Doru Marcusiu
ORNL NSTG
PSC BigBen March 2, 2007/Derek Simmel
PSC Rachel March 2, 2007/Derek Simmel
Purdue Lear (tg-login/PBS) March 3, 2007/Preston Smith
Purdue Condor (tg-gatekeeper/Condor) March 3, 2007/Preston Smith
SDSC BG/L Feb 19, 2007/John White
SDSC DataStar
SDSC DTF Feb 6, 2007/Tony Vu
TACC LoneStar Feb 5, 2007/David Carver
UC/ANL DTF/Viz Jan 30, 2007/Jason Hedden

User Portal

User Portal development/support staff will:

  • provide the base GPIR code that will form the basis for the new MDS information providers
  • assist with MDS information provider design
  • develop User Portal ability to discover Secure MDS services by querying the TeraGrid wide Secure MDS services at mds.teragrid.org port 8448
  • develop User Portal ability to retrieve compute resource job queue and scheduling load information from the Secure MDS services at each resource
  • implement and test capability on the testing user portal
  • perform basic verification that information being collected and displayed is accurate and check-off as each RP Secure MDS service is considered functional
  • transition the production User Portal to the new Secure MDS method of data collection at production
  • shut off all existing user-run information provider services running at each RP site

User Services (ASTA/Help Desk)

During the testing and production rollout User Services team members can visit the [testing User Portal] and compare data being collected with the Secure MDS service to the data being displayed on the production User Portal to verify that is correct.

User documentation doesn't need to be updated.

Network, Operations, and Security (NOS)

A new series of Inca tests to monitor the Secure MDS services at RPs will be deployed. A new Inca test to test the aggregating Secure MDS service at mds.teragrid.org:8448 will be deployed.

Issues with the Secure MDS services at RPs should be reported to help@teragrid.org and routed to "TG Ops <site>" for RP administrator service. Issues with the mds.teragrid.org:8448 Secure MDS service should be reported to help@teragrid.org and routed to <bold>TG GIG Services</bold> and assigned to JP Navarro.

Software Integration (SI)

Software Integration will:

  • assist with MDS information provider design
  • prepare packaging and installation documentation
  • coordinate testing and production deployments
  • deploy and operate the aggregating Secure MDS service running on mds.teragrid.org port 8448
  • assist RPs with Secure MDS issues at RPs
  • receive and manage capability defects reported thru help@teragrid.org and assigned to "TG GIG CTSS"
  • provide an Inca test that can test the status of both RP and TG wide Secure MDS services

Production status of GIG operated services:


Service GIG SI check-off date/person UP check-off date/person
mds.teragrid.org:8448 Feb 16, 2007/JP Navarro

Project Areas That Are Not Affected

Move areas above to this section if they are not affected by this change.


Science Gateways (GW)

N/A.

Data, Visualization, Scheduling (DVS)

N/A.
Personal tools