Core Services 2.0

From TeraGrid Wiki

Jump to: navigation, search

This page documents the processes being taken to analyze, evaluate, design and implement the next generation of TeraGrid Core Services.

Contents

Vision Statement

  • The CS2_Vision (MS Word) or (PDF) document is the vision statement approved by the RPF at the December 2007 TeraGrid Quarterly Meeting.

Program Years 3 and 4 Roadmap

We have drafted a Core Services 2.0 Roadmap for TeraGrid Program Years 3 and 4 to assist GIG subawardees in crafting PY4 Statements of Work.

Please note that Core Services 2.0 will pursue development and implementation of the vision plan in PY4, building on design and prototyping activities conducted in PY3. The Roadmap describes the general objectives, but specific milestones and deliverables will depend on success in accomplishing PY3 objectives as well as the outcomes from PY3 designing and prototyping. Any language in support of the overall Core Services 2.0 objective should acknowledge the dependencies on the PY3 design and prototyping phase.

Here is the initial Core Services 2.0 SoW, which was drafted for initial planning purposes. While we've had to make some adjustments to accommodate broader TeraGrid planning deadlines, this brief outline shows we are still on target.

Program Years 3 and 4 Implementation Plans

Last Update: March 27, 2008 8:31 CDT AM - Mary McIlvain

  • We are drafting our Implementation Plans for PY3 and PY4 as follows:
    • 1) Using the Objectives as defined in the CS2.0 Roadmap (previous section) we built ...
    • 2) a Team Roster and Project Objectives Build Out Master (xls) in January 2008 which represents the project baseline at this point in time. (Note: for historical purposes here are the Team Lead Objective Work Sheets Homework Submissions)
    • 3) currently (as of 3/25/08) the CS Team is working on requirements and designs for PY3 and PY4 implementation activities
    • 4) once we have an integrated requirements and design documentation set, we will then build the WBS for Core Services PY3 Activities (CS1.2) and then for PY4 Activities (CS2.0)using MS Project.

Implementation Lead Members

  • David Hart, SDSC <dhart@sdsc.edu> CS Project Management - Technical & Functional Oversite
  • Mary McIlvain <mcilvain@mcs.anl.gov> GIG & CS Project Management - Administrative
  • Laura McGinnis, PSC <lfm@psc.edu>. Accounting lead.
  • Kent Milfeld, TACC <milfeld@tacc.utexas.edu>. Allocations lead.
  • Steve Quinn, NCSA <squinn@ncsa.uiuc.edu>. POPS lead.
  • Jim Basney <jbasney@ncsa.uiuc.edu> Implementation Lead
  • Maytal Dahan <maytal@tacc.utexas.edu>. TGUP lead.
  • Leo. Carson <>. Implementation Lead.
  • Rob Light, PSC <>. Implementation Lead
  • Ed Hanna, PSC Implementation Lead.

Monthly Team Meeting: Agenda (last update: 6/5/08 McIlvain)

  • Next CS2 Conference Call: 8/26/08, 10am Central
  • CS2 team will meet on the last Tuesday of every month starting with 2/26/08.
  • The basic Agenda will be to perform a round robin through each PY3 objective, where the objective team lead will present in ~5 minutes or less
    • Accomplishments since the last meeting
      • team lead - please have a concise list ready
    • Changes to their objective buildout spreadsheet
      • team lead - please review your last version, please be concise in what has changed since last monthly review, this would include things that did not get done per plan)
    • Issues - Cross-Functional Only
      • use this time only to name and briefly describe the issue and then to determine who needs to be involved and next steps)
    • Vacation/Unavailable Notices (that may impact another team)
  • Team Leads attendance will be essential to the purpose of the meeting, if the team lead is unable to attend please make arrangements to have someone from your team represent you and equipped with the above information.

PY3 Objectives Working Documents and Deliverables (strawman - lead to modify to suit the objective's needs)

Usage: Objective Leads to decide which deliverables are relevant to their objective (with Dave's guidance) and then provide update and status (at minimum) one time monthly just prior to our monthly team meeting. For each document posted, please be sure to update the date, percent and status fields - this will be important in giving the reader context as they read.

CS Objective ID: cs1.2_1_logauth, Lead: Steve Quinn, Maytal Dahan

CS Objective ID: cs1.2_2_popsite, Lead: Steve Quinn

CS Objective ID: cs1.2_3_cs2-d-p, Lead: Laura McGinnis

This is the "glue" layer, which describes how all of the components should fit together. Details on the individual components are the responsibility of the team handling each objective. This piece of the project addresses the interfaces between the components, so that everything is explicitly tested and verified.

Components of Core Services 2 Implementation

NEW: Data Flow Diagram Version 1 (pdf) - Updated with feedback from July 29 session.

OLD: Core 2 Design Document: Data Flow Diagram (pdf) with annotations

Core Services 2.0 Project Plan (MS Project) (pdf)

Core Services 2.0 Workflow - version 2 (ppt) (pdf)

Databases:

Interfaces:


Note: All documents require signoff from the CS2.0 Team

  • Requirements Definition (Due Date: April, 2008)--- Percent Complete: 95% --- Status Needs Signoff (last update: <26-may-08,lfm>)
This documents the major components of the Core Services 2.0 system.
Team leads should review this document to ensure that the high-level entities and base functionality have been captured.
  • Constraints (Due Date: May 1, 2008)--- Percent Complete: 95% --- Status In Progress (last update: <26-may-08,lfm>)
This document describes the constraints to be considered as the Core Services 2.0 components are developed. Constraints include authentication and authorization requirements, system issues, and any other issues that limit or impact development or deployment of CS2.0.
  • Unit Test Plan (Due Date: June 30, 2008)--- Percent Complete: 20% --- Status In Progress (last update: <26-may-08,lfm>)
This document contains the test plans for the major components of CS2.0. Once a component has passed its Unit Test, it is available for integration.
  • Integration Test Plan (Due Date: June 30, 2008)--- Percent Complete: 5% --- Status In Progress (last update: <26-may-08,lfm>)
This document contains test plans for the integrated components of CS2.0.

CS Objective ID: cs1.2_4_policy, Lead: Kent Milfield

  • This objective has been superseded by the Allocations RAT, which includes representations by all RPs as well as CS2 participants.

CS Objective ID: cs1.2_5_sgw, Lead: Jim Basney

  • This objective has been superseded by a similiar SGW objective. Here is the latest update. (per Jim Basney, April 2008)

CS Objective ID: cs1.2_6_amie, Lead: Leo. Carson

  • Project Charter (Due Date: June 10, 2008)---Percent Complete:100% --- Status (last update: <06/05/08, leo>):
thruput improvement for 1000 packets is ~7x faster than current AMIE
  • Deployment instructions for RP sites
1. create a test Unix user account for your tarball-capable AMIE instance (ex:tstloni)
2. enable ssh for the new user by sending us your public key. Helpful ssh tutorial link http://www.csua.berkeley.edu/~ranga/notes/ssh_nopass.html
3. test the connection.
  $ ssh -i $HOME/.ssh/id_dsa azrael.sdsc.edu
4. create a /amie-dev dir parallel to /amie, and recursively copy everything from /amie, and replace amie.pl with the tarball capable version you just downloaded.
5. in /amie-dev, the wrapper shell script in bin-rbash/ needs to be edited (we changed its name from amie to amie-dev for ease of recognition) so that AMIE_BASE points to /amie-dev, the testing dir tree.
6. in <your_sites_test_dir> you need to have logs/, xfer/ (all sub dirs need to be created too), conf/, and lib/ linked to production lib/ as it is in the production account.
7. change conf/amie.config by adding this line to put AMIE into the tarball mode. 1 for tarball, 0 for single-packet:
   ...
       tar.<your_site_name>.TGCDB          1
8. change the line of ssh command in conf/amie.config to point to your testing shell script:
send.<your_site_name>.TGCDB ssh tsttgcdb@azrael.sdsc.edu /amie-dev/bin-rbash/amie-dev tsttgcdb -r
9. change conf/db.config to point to your test database
Rob Light at PSC and Steve Brandt at LONI
Setting up test workbench and documenting the process
Remote site testing to begin no later than July 24, 2008
Successfully transferred data to/from PSC July 31
Successfully transferred data to/from LONI/LSU Aug 26
Argonne next on Aug 29
No word from Argonne, will flag issue at next tg-accounting call on Sept 11.
Argonne unable to test due to conflicting priorities per email from Ti Legget
Jinghong's last day September 30.
Declared testing finished per DLH, will put into production at SDSC week of October 20 after Leo returns from vacation.
Must we execute switchover simultaneously or is staged deployment possible? Staged. Give RPs flexibility to choose when.
Can we support both the tarball & single-packet versions simultaneously? Yes. But not mixed mode.
Tarball capable version of amie.pl in repo on 04-Nov-2008. Running at SDSC. Other RPs are encouraged to download and run it.
Bug fixed in tarball version in Jan 2009. Now in service at NICS and PSC.

CS Objective ID: cs1.2_7_fo4tgcdb, Lead: Rob Light

  • Test Replication System Built at PSC (last update:<June 2, 2008,light>)
System up and running well
Currently finding ideal settings
Creating setup and failover documentation for use with test system between SDSC and PSC
  • Test Replication System between SDSC and PSC Built (last update:<July 29,2008,light> Due Date: July 21, 2008)
Data transfer test between SDSC and PSC showed 700 KB/s
TGCDB data currently ~12.5 GB based on 25 GB used on TGCDB data drive currently
Full backup will take ~5 hours. Should be done at least once a week. Should not effect performance.
Incremental backup : Each shipped log file is always 16 MB. The files get shipped either when one fills to 16 MB of data or a specified time period expires. The files are immediately read into the backup database after being shipped and saved until the next full backup is done. This data size can add up if the time periods are small. For a 10 minute time period this would at least be 16 GB for a week. We need to take a disk space, time period, and full backups into account to have a plan for this. For example if we did a full backup every other day we could have a shorter time period for incremental updates given a fixed disk size.


Working with Patrick Dorn at NCSA getting the Dynamic Addresses setup (Setup of dynamic dns is complete 6/30/2008)
Added a lot to this documentation. Didn't get all the way through it though because server being used for project lost network for three weeks.

CS Objective ID: cs1.2_8_rdrdesign, Lead: Ed Hanna

PY4 Objectives Working Documents and Deliverables

  • coming soon.