CTSS 4 Data Movement Implementation
From TeraGrid Wiki
Based on the data-movement requirements of the Teragrid user community, the Data Working Group along with the Software Integration Group lists the necessary (and optional) CTSS 4 Data Movement Kits followed by the recommended implementation instructions. The implementation may include new capabilities that were not included in CTSS 3, so resource providers and any other TeraGrid service providers will need to deploy these new capabilities to be compliant with CTSS 4.
The mandatory components include Gridftp, HPN-ssh and the Data Movement Registration kit; RFT is an optional component.
- Data Movement Registration
- RFT (optional)
The companion document to this implementation, CTSS 4 Data Movement Capability Kit describes the purpose and capabilities of the component kits. Other related information include the HPN-Openssh - Data-wg projects, CTSS 4 Data Management Capabilities, CTSS 4 Data Management Capability Implementation, and the CTSS 4 Wide Area File System Capabilities.
|CTSS 4 Repo||http://software.teragrid.org/pacman/ctss4/|
The information services name for this kit is data-movement.teragrid.org.
A. Changed/Upgraded Component
|Software pre-requisites:||Globus 4.0.8|
|Monitoring GridFTP transfers||Speedpage|
GridFTP 4.0.8 is required in order to support new TeraGrid CA's introduced in early 2009.
B. New Components
|Package Name I:||HPN-ssh/scp|
|Software pre-requisites:||Openssh 4.5-hpn-12 or greater|
|Monitoring HPN-scp Transfers|
The HPN-ssh kit (with HPN-scp) is a new and mandatory requirement for CTSS 4 compliance. Modifications of the underlying SSH2 code allows for the statically-defined internal flow control buffers to be defined at run time thereby eliminating a usual bottleneck; this allows HPN-scp to become an alternative for data transfer via the SSH protocol. It is fully interoperable with all SSH servers and clients (HPN and non-HPN) and shows faster downloads from non-HPN systems. Caution is adviced to watch for congestion on the login nodes due to HPN-scp processes.
|Package Name II:||CTSS-DataMovement-Registration|
|Software pre-requisites:||Perl, TGresid 2.0.0|
The Data-Movement Registration kit is a second new mandatory requirement for compliance with CTSS 4. The registration component collects registration details from all deployed kits and combines them into a form that can be published into the TeraGrid Public (MDS) Information Service; it also includes the Core kit's own registration information.
C. Optional Component
|Software components:||RFT web service container|
|Software pre-requisites:||Globus 4.0.8, GridFTP, Java, JDBC Database: PostgreSQL or MySQL|
RFT is a Web Services Resource Framework (WSRF) compliant web service that provides “job scheduler"-like functionality for data movement. After providing a list of source and destination URLs (including directories or file globs), the service writes the job description into the Database and then transfers the files on queue.
This section describes the steps to be performed to install or update the CTSS 4 components in this kit.
|Required CTSS 4 components.|
|1.||GridFTP||GridFTP||CTSS v4 repository|
|2.||CTSS-Data-Movement-Registration||TeraGrid Data Movement Kit registration||CTSS v4 repository|
|3.||HPN-ssh||High Performance SSH/SCP||CTSS v4 HPN-ssh|
|Optional CTSS 4 components.|
|4.||RFT||Reliable File Transfer Service||CTSS v4 repository|
This kit does not have any keys or macros that need to be defined.
Resources Required *
The following GIG resources will be required to prepare this kit for deployment and to maintain it during its operational lifespan.
- Software Integration
- User Services
- Science Gateways
Resource providers who choose to implement this kit on their systems will incur the following resource requirements.
- Software Deployment
- Software Configuration
- User Support
- System Load
The implementation of this capability kit described here is independent of the number of TeraGrid resources, users, science gateways, and peer grids in the TeraGrid community. The capability kit must be deployed on each resource that supports the kit, but there are no cross-resource dependencies and there are no RP tasks relating to new users, science gateways, or peer grids.
Verification & Validation
The TeraGrid operations team will need to configure Inca to run the appropriate set of Inca tests for any resources that choose to implement this capability kit. The data in the CTSS software and service registry may be used to automatically configure Inca to run these tests, or the configuration could be performed manually.
The TeraGrid documentation team will need to ensure that the TeraGrid documentation (User Info pages, User Portal as appropriate) reflects the availability if this capability kit on any resources that choose to implement this capability kit. The data in the CTSS software and service registry may be used to automatically list all appropriate resources, or the documentation changes could be made manually.
This work was supported by the National Science Foundation Office of Cyberinfrastructure, grant number 0503697 “ETF Grid Infrastructure Group: Providing System Management and Integration for the Teragrid.
Author Information *
Texas Advanced Computing Center
10100 Burnet Road, Austin TX 78758