CTSS 4 Related Schema Design

From TeraGrid Wiki

Jump to: navigation, search

Contents

Background

Two high-level encoding and linking approaches:

  • Inside a kit's XML document are kit specific <Extensions>
  • Inside a kit's XML document are links (pointers) to separate XML documents

We need to choose an approach for these capability kit specific schemas:

  • Data Movement -> Extended GridFTP attributes, file-system environment variables
  • Core Integration -> Core 2.0/RDR Common schema
  • Local Compute -> Core 2.0/RDR Compute schema
  • Local Software -> Local Software schema

We agreed that we should have separate schema definitions, and separate schemas will not constrain us to either of the above approaches.

Advantages/+ and disadvantages/- for

  • embedded <Extensions>:
    • + consumers can retrieve single document and get both sets of data
    • - large extensions result in a much larger parent document (local software)
    • - queries may need to filter out extension sub-element to retrieve less data (we believe there's an xpath way to exclude sub-elements)
    • + documents are automatically linked (because one is embedded in other)
  • separate XML document:
    • - consumers need to join or do multiple queries; or
    • - we need to develop special views/transforms that join documents
    • + large extensions don't affect parent document size
    • - documents need to somehow be linked

Decision

We will choose which approach to use on a case by case basis depending on the size of the extended documents and how important document linking is.

  • Data Movement, Extended GridFTP attributes will probably use <Extensions>
  • Local Software, will probably use separate document

By using the approach that appears to make the most sense on a case-by-case basis we will explore both approach and experience the real advantages and disadvantages to learn how significant they are.

Document Linking

For the separate documents case we discussed two document linking approaches: 1) link to existing or new UniqueID attributes 2) link by whatever document values naturally link the content We decided to go for option 2) for now.

We discussed that it would be good for the primary document (kits schema) to somehow indicate that there's a separate kit specific extended document and where it is.

Document linking attributes:

  • Schema Type
  • Schema Name
  • Schema Version
  • "Schema Definition Reference/URL" or "Schema Fields" for CVS case
  • Service Type (embedded version information if needed for compatibility)
  • Service Endpoint

Example

<V4KitsRP>
<KitRegistration Timestamp=".." UniqueID="..">
  <ResourceID>..
  <ResourceName>..
  <SiteID>..
  <Kit>
   ...
    <Extensions>
      <GridFTP...>
      <whatever>
      <ExtendedInfo>
        <SchemaType>
        <SchemaName>
        <SchemaVersion>
        <SchemaDefinitionURL> or "SchemaFieldsCSV>
        <ServiceType>
        <ServiceEndpoint>


The <ExtendedInfo> element is how we link to the external document.