CTSS 4 TeraGrid Core Integration Capabilities

From TeraGrid Wiki

Jump to: navigation, search

This capability kit definition was developed in the GIG Software Integration Area and defines a mandatory capability set for Teragrid resources beginning with CTSS 4. This definition will be consistent for all systems that provide TeraGrid services.

The recommended implementation for the capabilities described in this kit are described in the companion document, CTSS 4 TeraGrid Core Integration Implementation.

Contents

Abstract

This document defines the purpose and design of the CTSS 4 TeraGrid Core Integration Kit. Systems that provide TeraGrid services (including computational resources) must support the capabilities defined here in coordination with other resource providers in order be meaningfully integrated with other TeraGrid systems.

The TeraGrid Core Integration kit defines capabilities that allow coordinated operation of the TeraGrid system by TeraGrid staff. It also provides TeraGrid users with a set of fundamental “hooks” to conduct basic operations on TeraGrid systems in consistent ways. In both areas, the capabilities are focused on providing a layer of integration to otherwise diverse and specialized systems.

The recommended implementation for the capabilities described in this kit are described in the companion document, CTSS 4 TeraGrid Core Integration Implementation.

Purpose

The TeraGrid Core Integration kit defines a minimal set of capabilities that allow coordinated operation of the TeraGrid system by TeraGrid staff. It also provides TeraGrid users with a set of fundamental “hooks” to conduct basic operations on TeraGrid systems in consistent ways. In both areas, the capabilities are focused on providing a layer of integration to otherwise diverse and specialized systems.

Unlike other CTSS 4 kits, the Core Integration kit is mandatory for all systems that provide TeraGrid services. Because these systems are diverse, the capabilities defined by this kit must be as minimal as possible, and must reflect what we believe are universal requirements.

The TeraGrid Core Integration kit’s capabilities fall into four categories.

  • Security – Identity, Authentication, Authorization, Auditing
  • Information – Capability and Service Registry, System & Service Description, Usage Monitoring and Profiling
  • Verification & Validation – System Status and Testing
  • Software Deployment – Deployment Tools

The security capabilities in the Core Integration kit are intended to allow the TeraGrid community to support a variety of access models to TeraGrid services. These access models include: individual user allocations, science gateway provisioning, cross-grid or multi-grid activities, and virtual organization provisioning. Identity and authentication must support both TeraGrid identities and authentication mechanisms (e.g., NCSA/SDSC/PSC CAs) and third-part identity/authentication mechanisms that TeraGrid has agreed to support (e.g., other CAs, including new types of CAs). Authorization and auditing mechanisms must support the access models mentioned above, some of which are non-trivial (e.g., science gateways, virtual organizations).

The information capabilities in the Core Integration kit are intended to provide a consistent information mechanism for TeraGrid services and systems to describe themselves so that users can discover and understand TeraGrid’s capabilities and so that other system elements can use the information to configure themselves or adapt to current conditions. Usage monitoring and profiling capabilities are also included to support operations, capacity planning, and project reporting activities. The kit provides a system registry that provides a network interface for accessing local system information. It provides the most basic information providers for that registry. Specifically, these providers publish information about the capability kits that have been installed and verified on the local system. Other kits may make use of the registry to publish information about their capabilities, status, and configuration.

The verification and validation capabilities in the Core Integration kit are intended to provide a means for independent verification of the capabilities on the local system. This currently includes the essential elements required for the Inca system to run tests on the local system and report results of those tests.

The software deployment capabilities in the core kit are intended to make it easier to deploy additional capability kits on the system and to support software providers and integrators in porting and testing software for use on the system. The Pacman software provides an easy-to-use and consistent means for obtaining additional CTSS software.

Current Applications

Security – Individual allocations that use GSI credentials to access multiple TG systems: LEAD, SCEC; science gateways that use GSI credentials: NanoHUB, LEAD, NVO; virtual organizations: SCEC; applications that combine TeraGrid and non-TeraGrid resources: CMS, ATLAS. The TG user portal is also integrated with the GSI infrastructure.

System information – The Teragrid User Portal and user documentation need to show users which capability kits are on which TG resources. The User Portal uses the MDS4 Index Service to obtain queue status information from compute resources for display. The operations team needs to understand system usage patterns and profiles in order to plan and prioritize activities.

Verification & Validation – Software WG and operations teams need to understand current variances from expected behavior on specific systems. RPs need an independent certification of their own compliance with expectations. Users and user services need data on system readiness in order to plan activities.

Software Deployment – GIG packaging team and software WG already use Pacman tools for software deployment and configuration.

Architectural Considerations

The TeraGrid Core Integration kit provides the capabilities that are necessary in order integrate a resource with the rest of the TeraGrid. Most of the kit is focused on providing pieces of TeraGrid's overall system architecture. The key elements of this architecture are as follows.

  • Support for TeraGrid's central accounting system, including the propagation of new accounts and GSI identities and reporting usage against a TeraGrid user allocation.
  • Support for the Grid Security Infrastructure (GSI), a PKI-based mechanism that allows for community-wide identities and authentication while supporting local authorization policies.
  • Support for a standard SOAP/XML information service that publishes descriptive information about the resource and its capabilities, including the CTSS capabilities available on the resource.
  • Support for an TeraGrid-wide, independent verification and validation mechanism.

Security

The TeraGrid community uses a TeraGrid-wide allocations process for deciding which science projects merit use of TeraGrid resources and how much each project should be allocated during a given use period. Once a project's use of TeraGrid resources has been approved, project leaders can reasonably expect a TeraGrid-wide mechanism for obtaining accounts on TeraGrid systems and administering sub-accounts. This kit defines this TeraGrid-wide mechanism, detailing how TeraGrid resource providers interact with TeraGrid's central accounting system to create and maintain accounts for TeraGrid users.

One of TeraGrid's key value-added features is the "single sign-on" capability, which allows users to authenticate to one TeraGrid system and then use others without re-authenticating. Implemented using a GSI-enabled SSH service and the gsissh client, TeraGrid's single sign-on makes use of the GSI "delegation" mechanism, whereby authenticating to the first system results in a proxy certificate being created on that system that can be used to log in to other systems for a short time without additional authentication. For this to work, all TeraGrid systems must (a) maintain grid-mapfiles that map users' GSI "distinguished names" (DNs, or GSI identity strings) to local userids, and (b) configure their services to trust a common set of "certificate authorities" (agencies that issue and certify the validity of GSI credentials).

The same mechanism that is used for creating and maintaining accounts (based on TeraGrid's central accounting system) also provides a mechanism for maintaining the grid-mapfile that maps GSI DNs to local userids. The TeraGrid Core Integration kit also defines mechanisms for maintaining the configuration data for trusted certificate authorities.

Information Services

TeraGrid has adopted the prevailing information services model for publishing descriptive information about its resources and their services. TeraGrid uses Web services (SOAP, WSDL) as the basic access model, and XML as the base language for this information. Both industry and academia make heavy use of Web services technologies, so there are many development and management tools available that can make use of TeraGrid's information services. In order to support both the polling (information pull) model and the notification (information push) model, the WS-Notification mechanism is supported as well as ordinary SOAP access. WS-Notification is also supported by a number of popular development tools. By basing TeraGrid's information services on popular Web services mechanisms, we hope that TeraGrid users and applications will be able to make use of TeraGrid's information services with minimal effort.

The TeraGrid Core Integration kit provides each TeraGrid resource with a Web services-based information service that is used to publish descriptive information about the resource. A common TeraGrid schema is used to publish data about CTSS kits and CTSS software and services on the resource. This schema includes elements from the GLUE schema family, which is used by a number of other systems and applications in the Web services and Grid communities.

For scenarios where access to information must be authenticated (e.g., on some TeraGrid resources the queue status is only provided to authenticated users) the GSI mechanism is used. (See description in the Security section above.)

Verification & Validation

The TeraGrid community employs a common mechanism for verification and validation testing. This mechanism is used to verify that each TeraGrid resource is providing the capabilities that are documented and to validate the appropriate behavior of those capabilities independently of the resource provider's internal testing.

The TeraGrid Core Integration kit provides the minimal environment needed to support this independent verification and validation testing mechanism so that the mechanism may be used uniformly on all TeraGrid resources.

Software Deployment

The Pacman mechanism is used by several other major Grid infrastructures, most notably including the U.S. Open Science Grid (OSG) and Enabling Grids for e-Science across Europe (EGEE). TeraGrid also uses Pacman in hopes of finding ways to leverage existing Pacman repositories used by OSG, EGEE, and other major Grid systems.

Use Scenarios

Unlike most other CTSS kits, the TeraGrid Core Integration kit includes a number of capabilities that are oriented more toward meeting the needs of system administrators and TeraGrid personnel than toward meeting end user needs. This section is divided into those capabilities that are likely to be useful to both end users and adminstrators, and those capabilities that are primarily of interest to administrators.

Scenarios for End Users and Administrators

Identify a TeraGrid system with a specific CTSS capability

TeraGrid users and potential users who are investigating use of TeraGrid frequently need to identify TeraGrid systems that provide specific CTSS capabilities. For example, a scientist who is planning to run a distributed parallel data analysis application on TeraGrid will need to know which TeraGrid systems have the distributed MPI (MPICH-G2) capability.

The TeraGrid Core Integration kit includes a software and service registry. Each resource has an associated registry service, and each registry service is registered in turn with a central registry (info.teragrid.org). The registry contains data in XML format about the CTSS kits that are supported by each resource, and this data can be browsed or searched. The resource provider registers each CTSS kit, service, and software component in the local registry when software is installed, and this data is propagated to the central registry. TeraGrid's user documentation (and/or the TeraGrid User Portal) [need user services to complete this]] provide a viewing interface for the software and service registry data.

To find a TeraGrid resource that provides a specific CTSS kit, the user would use the TeraGrid user documentation's viewing interface to view the data in the software and service registry. This interface will clearly identify each resource that supports each CTSS capability kit. If necessary, the user can view the description of each capability kit by following hyperlinks to the capability kit documentation.

Discover the CTSS capabilities on the TeraGrid resource

In some cases, TeraGrid users and potential users who are investigating use of TeraGrid may need to identify the CTSS capabilities that are provided by a specific TeraGrid resource. For example, a scientist who has already found that a specific resource is suitable for her application's runtime operation (available CPU and shared memory, node interconnect speed, etc.) may want to know what data movement capabilities are also provided on that system so that the data produced by the application can be moved to a different system for broader accessibility by her colleagues.

The TeraGrid Core Integration kit includes a software and service registry. Each resource has an associated registry service, and each registry service is registered in turn with a central registry (info.teragrid.org). The registry contains data in XML format about the CTSS kits that are supported by each resource, and this data can be browsed or searched. The resource provider registers each CTSS kit, service, and software component in the local registry when software is installed, and this data is propagated to the central registry. TeraGrid's user documentation (and/or the TeraGrid User Portal) [need user services to complete this]] provide a viewing interface for the software and service registry data.

To find the CTSS capabilities supported by a specific TeraGrid resource, the user would use the TeraGrid user documentation's viewing interface to view the data in the software and service registry, look up the specific TeraGrid resource in the viewer interface, and observe the list of supported CTSS kits provided there. This interface will clearly identify each CTSS capability kit supported on the resource. If necessary, the user can view the description of each capability kit by following hyperlinks to the capability kit documentation. The user can also view the details of the software components (e.g., softenv access keys) and network service interfaces (e.g., service endpoints) that implement the kit on that resource.

Discover the resource name and resource provider for the TeraGrid resource

TeraGrid users who make frequent use of multiple TeraGrid resources in their daily work may sometimes find that they "lose track" of which TeraGrid system they are currently logged into. The single sign-on capability makes it especially easy to move from one resource to another, and command shell prompts typically do not identify the system by name (though they can be customized to do so).

The TeraGrid Core Integration kit provides a command line tool, "tgwhereami," that will output the system's common name and the resource provider site that hosts the system. The user would enter "tgwhereami" at the command prompt, and the system will respond with the common name of the resource. (E.g., dtf.uc.teragrid.org is the DTF cluster at the University of Chicago.)

Add a new user identity to the TeraGrid resource's grid-mapfile

It is currently unclear whether this should be done via AMIE (from the central database, presumably initiated by the POPS system or by the TeraGrid user portal) or via gx-map. We need the account management group (or someone in user services) to lay out a coherent architecture for account management that covers DNs and DN propagation.

Scenarios Primarily of Interest to Administrators

Bring a new resource into the TeraGrid

When a new resource is added to TeraGrid, several things need to happen to integrate the new resource into the overall system.

  • The system should appear in the TeraGrid software and service registry, which lists the resource name, resource provider, and any CTSS capabilities that are provided by the resource, including local implementation details. The TeraGrid Core Integration kit's software and service registry component provides the mechanism for "advertizing" the resource in the TereGrid-wide registry.
  • The system needs to be able to receive account creation requests from and report usage to the TeraGrid Central Database (TGCDB). This kit's AMIE component provides that mechanism.
  • The system needs to be able to authenticate TeraGrid users who are using GSI identities and credentials and map those authenticated users to a local account. The GSI components in this kit (particularly the CA configuration data) support that capability in Grid services provided by this kit and other kits.
  • The TeraGrid verification and validation mechanism (Inca) needs to be able to run tests on the system that verify that the advertized capabilities work as described in the relevant kit definitions. This kit provides the prerequisites for Inca testing.

Add a new user account on the TeraGrid resource

The TeraGrid community uses a TeraGrid-wide allocations process for proposing, approving, and tracking use of TeraGrid resources. When an allocation proposal is accepted, a central accounting process is initiated to create accounts for the users corresponding to the proposal. This process must interface with the account creation mechanisms on each TeraGrid resource where the users need an account.

The AMIE component of the TeraGrid Core Integration kit provides the interface between the TeraGrid central database and the local accounting system. AMIE code on the local resource receives account creation requests from the central database. The resource provider supplies code to translate the request from the central database into a local account creation action.

AMIE account creation requests also often include GSI distinguished names (DNs) which should be mapped to the new local account when it is created. The resource provider implementation for AMIE also translates these requests into additions to the local grid-mapfile(s) so that the DN is associated with the account.

Install a TeraGrid capability kit on the TeraGrid resource

Obtaining and preparing software for use on a TeraGrid resource is often a cooperation between the resource provider and the TeraGrid GIG (Grid Infrastructure Group). In order to simplify software delivery among these partners, TeraGrid makes use of the Pacman system. A central Pacman cache is provided by the GIG. All TeraGrid partners may load software in the cache for sharing with other partners. In particular, the GIG uses this mechanism to provide software they have prepared for use by specific resource providers.

Once the TeraGrid Core Integration kit is installed on the resource, the resource provider can use the Pacman client to obtain software from the central Pacman cache. This mechanism is often used to obtain the software packages needed to implement other CTSS kits.

Verify a TeraGrid capability on the TeraGrid resource

The TeraGrid community has a broad interest in ensuring that capabilities that are documented work as expected. Each resource provider is responsible for software and service capabilities on its own resources, and the GIG is responsible for central services. Each organization performs local testing and monitoring to ensure the highest level of service for each local capability. Nevertheless, there is value in an independent testing mechanism that focuses on the capabilities that are documented across all resources.

The TeraGrid Core Integration kit includes the prerequisites that need to be present on resources in order to support independent testing by the TeraGrid Inca system. Inca runs tests against local services and locally installed software. As soon as the TeraGrid Core Integration kit is installed, Inca can be configured to run tests against the resource and those tests can be run and results reported.

Configuring specific tests to be run on specific resources is currently a manual process, operated by the TeraGrid operations team. In addition to installing the TeraGrid Core Integration kit, resource providers should also coordinate with the operations team to reconfigure Inca to run tests on their resource(s).

Publish the availability and current support level for a CTSS capability on the resource

Given the great diversity of TeraGrid resources, TeraGrid users and potential users need a way to discover and understand which CTSS capabilities are available on each TeraGrid resource. The inverse of this is that resource providers need a way to publish the availability of the CTSS capabilities that their resources provide. Further, resource providers need to be able to indicate not only the current situation (capabilities that are available now), but also the intended situation (including capabilities that are not yet completely ready for production use but that are on their way toward that state). The TeraGrid Core Integration kit provides the mechanism that resource providers use to register both current and intended CTSS capabilities, software, and services on their resources.

The TeraGrid Core Integration kit includes the software and service registry, which is a Web service (SOAP/XML) that publishes kit, software, and service descriptions. The registry on each TeraGrid resource registers itself with the central TeraGrid registry (mds.teragrid.org) so that data from all resources can be browsed or searched in one place. Once installed, the local registry can be used to

Once a resource provider decides to support a CTSS kit on his resource, he can install the registration package for that kit on the resource. The registration package contains the descriptive data about the kit that the local registry will publish. The resource provider should edit the data to indicate the current status (in this case, since the kit has not yet been deployed, the status should be "not ready") and the key inforamation about each software component (e.g., version, softenv key) and service (e.g., version, endpoint) that makes up the kit implementation. Later, when the kit has been fully deployed, the status can be changed to "testing" (to indicate that friendly user testing is underway) or "production" (to indicate that the kit is ready for production use).

Future Directions

  • We need the account management group to clarify the mechanism(s) by which GSI DN propagation is intended to occur. Specifically, what interfaces must RPs support in order to receive DN mapping requests from other TeraGrid systems (TeraGrid central database, POPS, etc.)?
  • We intend to explore using the data in the software and service registry to automatically configure Inca testing so that Inca automatically runs tests related to the CTSS kits that are registered on each resource and does not run other tests on those resources. Inca should also use the software and service descriptions to configure specific tests based on the registered software and service versions and service endpoints.

Acknowledgements *

This work was supported by the National Science Foundation Office of Cyberinfrastructure, grant number 0503697 “ETF Grid Infrastructure Group: Providing System Management and Integration for the TeraGrid.”

Globus software has been supported by the National Science Foundation, the U.S. Department of Energy’s Office of Science, DARPA, NASA, the U.K. e-Science Grid Core Programme, the Swedish Research Council, IBM, Microsoft, and Cisco Systems.

Author Information *

Lee Liming
University of Chicago / Argonne National Laboratory
9700 S. Cass Avenue
Argonne, IL 60439
liming@mcs.anl.gov

Requirements Analysis Teams

No RATs have been convened to date to review or discuss the requirements in this area.

Working Groups

The Software working group has been responsible for the design of these capabilities in the past, and will continue to review requirements and designs for this kit. The Software working group will also be responsible for coordinating the deployment of this capability kit on the designated TeraGrid resources.

The account management working group, user services working group, and operations team have overall responsibility for account management mechanisms including both account creation/management and usage accounting.

The operations team is responsible for the Inca verification and validation mechanism.

The security working group is responsible for TeraGrid-wide security issues, including policies and procedures for establishing GSI trust relationships with certificate authorities and ensuring that TeraGrid mechanisms include capabilities that allow and support responses to security incidents.

Workshops

Some of the core authentication and authorization capabilities described in this kit were identified at the TeraGrid Authorization Workshop held at Argonne National Laboratory in August, 2006.

Glossary *

CTSS – Coordinated TeraGrid Software Stack, a collection of software capabilities that TeraGrid resource providers coordinate with each other regarding availability and implementation

GIG – Grid Infrastructure Group, the part of TeraGrid’s staff that is funded independently of resource provider contracts

GSI – Grid Security Infrastructure, a PKI-based framework for encoding and authenticating identities and related attributes, delegating authority to intermediate agents, establishing trust relationships between multiple identity and attribute authorities, and making authorization decisions based on authenticated identities and attributes.

RAT – Requirements Analysis Team, a TeraGrid mechanism for collecting requirements in a specific problem area

RP – Resource Provider, an organization that operates hardware and/or specific services for the TeraGrid community

References

1.

Personal tools