CTSS 4 Data Movement Implementation

From TeraGrid Wiki

Jump to: navigation, search

Contents

Abstract

Based on the data-movement requirements of the Teragrid user community, the Data Working Group along with the Software Integration Group lists the necessary (and optional) CTSS 4 Data Movement Kits followed by the recommended implementation instructions. The implementation may include new capabilities that were not included in CTSS 3, so resource providers and any other TeraGrid service providers will need to deploy these new capabilities to be compliant with CTSS 4.

The mandatory components include Gridftp, HPN-ssh and the Data Movement Registration kit; RFT is an optional component.

  • Gridftp
  • HPN-ssh
  • Data Movement Registration
  • RFT (optional)

The companion document to this implementation, CTSS 4 Data Movement Capability Kit describes the purpose and capabilities of the component kits. Other related information include the HPN-Openssh - Data-wg projects, CTSS 4 Data Management Capabilities, CTSS 4 Data Management Capability Implementation, and the CTSS 4 Wide Area File System Capabilities.

CTSS 4 Repo http://software.teragrid.org/pacman/ctss4/

Implementation

The information services name for this kit is data-movement.teragrid.org.

A. Changed/Upgraded Component

Package Name: GridFTP
Package version: 4.0.8
Software components: globus-gridftp-server
Software pre-requisites: Globus 4.0.8
Implementation documentation http://software.teragrid.org/pacman/ctss4/gridftp/gridftp-4.0.8-r1/README.install.html
Monitoring GridFTP transfers Speedpage

GridFTP 4.0.8 is required in order to support new TeraGrid CA's introduced in early 2009.

B. New Components

Package Name I: HPN-ssh/scp
Package version: 12
Software components: gsi-openssh
Software pre-requisites: Openssh 4.5-hpn-12 or greater
Implementation documentation http://www.teragridforum.org/mediawiki/index.php?title=Data-wg_projects#CTSS_V4_Implementation_Plan
Testing http://www.teragridforum.org/mediawiki/index.php?title=Data-wg_projects
Supplement http://www.psc.edu/networking/projects/hpn-ssh/
Monitoring HPN-scp Transfers


The HPN-ssh kit (with HPN-scp) is a new and mandatory requirement for CTSS 4 compliance. Modifications of the underlying SSH2 code allows for the statically-defined internal flow control buffers to be defined at run time thereby eliminating a usual bottleneck; this allows HPN-scp to become an alternative for data transfer via the SSH protocol. It is fully interoperable with all SSH servers and clients (HPN and non-HPN) and shows faster downloads from non-HPN systems. Caution is adviced to watch for congestion on the login nodes due to HPN-scp processes.

Package Name II: CTSS-DataMovement-Registration
Package version: 4.0.0
Software components: ctss-data-movement-registration
Software pre-requisites: Perl, TGresid 2.0.0
Implementation documentation http://software.teragrid.org/pacman/ctss4/ctss-core-registration/


The Data-Movement Registration kit is a second new mandatory requirement for compliance with CTSS 4. The registration component collects registration details from all deployed kits and combines them into a form that can be published into the TeraGrid Public (MDS) Information Service; it also includes the Core kit's own registration information.

C. Optional Component

Package name: RFT
Package version:
Software components: RFT web service container
Software pre-requisites: Globus 4.0.8, GridFTP, Java, JDBC Database: PostgreSQL or MySQL
Implementation documentation http://software.teragrid.org/pacman/ctss4/globus-wsrf/globus-wsrf-4.0.8-r1/README.install.html
http://software.teragrid.org/pacman/ctss4/globus-wsrf/globus-wsrf-4.0.8-r1/README.rft.config.html

 

RFT is a Web Services Resource Framework (WSRF) compliant web service that provides “job scheduler"-like functionality for data movement. After providing a list of source and destination URLs (including directories or file globs), the service writes the job description into the Database and then transfers the files on queue.

Deployment

Software Deployment

This section describes the steps to be performed to install or update the CTSS 4 components in this kit.

Step Component Description Available At
Required CTSS 4 components.
1. GridFTP GridFTP CTSS v4 repository
2. CTSS-Data-Movement-Registration TeraGrid Data Movement Kit registration CTSS v4 repository
3. HPN-ssh High Performance SSH/SCP CTSS v4 HPN-ssh
Optional CTSS 4 components.
4. RFT Reliable File Transfer Service CTSS v4 repository

SoftEnv Configuration

This kit does not have any keys or macros that need to be defined.

Security Considerations


Resources Required *

GIG Resources

The following GIG resources will be required to prepare this kit for deployment and to maintain it during its operational lifespan.

  • Software Integration
  • Documentation
  • User Services
  • Science Gateways
  • Operations
  • Security

RP Resources

Resource providers who choose to implement this kit on their systems will incur the following resource requirements.

  • Software Deployment
  • Software Configuration
  • User Support
  • Maintenance
  • System Load

Scaling *

The implementation of this capability kit described here is independent of the number of TeraGrid resources, users, science gateways, and peer grids in the TeraGrid community. The capability kit must be deployed on each resource that supports the kit, but there are no cross-resource dependencies and there are no RP tasks relating to new users, science gateways, or peer grids.

Verification & Validation

The TeraGrid operations team will need to configure Inca to run the appropriate set of Inca tests for any resources that choose to implement this capability kit. The data in the CTSS software and service registry may be used to automatically configure Inca to run these tests, or the configuration could be performed manually.

Documentation

The TeraGrid documentation team will need to ensure that the TeraGrid documentation (User Info pages, User Portal as appropriate) reflects the availability if this capability kit on any resources that choose to implement this capability kit. The data in the CTSS software and service registry may be used to automatically list all appropriate resources, or the documentation changes could be made manually.

Acknowledgements *

This work was supported by the National Science Foundation Office of Cyberinfrastructure, grant number 0503697 “ETF Grid Infrastructure Group: Providing System Management and Integration for the Teragrid.


Author Information *

Kelly Gaither
Texas Advanced Computing Center
10100 Burnet Road, Austin TX 78758
513.471.8957

kelly@tacc.utexas.edu

Personal tools