Core Services 2.0
From TeraGrid Wiki
This page documents the processes being taken to analyze, evaluate, design and implement the next generation of TeraGrid Core Services.
Vision Statement
- The CS2_Vision (MS Word) or (PDF) document is the vision statement approved by the RPF at the December 2007 TeraGrid Quarterly Meeting.
- Go to Vision Statement Supporting Documentation & Process that evolved into the above CS2 Vision Statement.
Program Years 3 and 4 Roadmap
We have drafted a Core Services 2.0 Roadmap for TeraGrid Program Years 3 and 4 to assist GIG subawardees in crafting PY4 Statements of Work.
Please note that Core Services 2.0 will pursue development and implementation of the vision plan in PY4, building on design and prototyping activities conducted in PY3. The Roadmap describes the general objectives, but specific milestones and deliverables will depend on success in accomplishing PY3 objectives as well as the outcomes from PY3 designing and prototyping. Any language in support of the overall Core Services 2.0 objective should acknowledge the dependencies on the PY3 design and prototyping phase.
Here is the initial Core Services 2.0 SoW, which was drafted for initial planning purposes. While we've had to make some adjustments to accommodate broader TeraGrid planning deadlines, this brief outline shows we are still on target.
Program Years 3 and 4 Implementation Plans
Last Update: March 27, 2008 8:31 CDT AM - Mary McIlvain
- We are drafting our Implementation Plans for PY3 and PY4 as follows:
- 1) Using the Objectives as defined in the CS2.0 Roadmap (previous section) we built ...
- 2) a Team Roster and Project Objectives Build Out Master (xls) in January 2008 which represents the project baseline at this point in time. (Note: for historical purposes here are the Team Lead Objective Work Sheets Homework Submissions)
- 3) currently (as of 3/25/08) the CS Team is working on requirements and designs for PY3 and PY4 implementation activities
- 4) once we have an integrated requirements and design documentation set, we will then build the WBS for Core Services PY3 Activities (CS1.2) and then for PY4 Activities (CS2.0)using MS Project.
Implementation Lead Members
- David Hart, SDSC <dhart@sdsc.edu> CS Project Management - Technical & Functional Oversite
- Mary McIlvain <mcilvain@mcs.anl.gov> GIG & CS Project Management - Administrative
- Laura McGinnis, PSC <lfm@psc.edu>. Accounting lead.
- Kent Milfeld, TACC <milfeld@tacc.utexas.edu>. Allocations lead.
- Steve Quinn, NCSA <squinn@ncsa.uiuc.edu>. POPS lead.
- Jim Basney <jbasney@ncsa.uiuc.edu> Implementation Lead
- Maytal Dahan <maytal@tacc.utexas.edu>. TGUP lead.
- Leo. Carson <>. Implementation Lead.
- Rob Light, PSC <>. Implementation Lead
- Ed Hanna, PSC Implementation Lead.
Monthly Team Meeting: Agenda (last update: 6/5/08 McIlvain)
- Next CS2 Conference Call: 8/26/08, 10am Central
- CS2 team will meet on the last Tuesday of every month starting with 2/26/08.
- The basic Agenda will be to perform a round robin through each PY3 objective, where the objective team lead will present in ~5 minutes or less
- Accomplishments since the last meeting
- team lead - please have a concise list ready
- Changes to their objective buildout spreadsheet
- team lead - please review your last version, please be concise in what has changed since last monthly review, this would include things that did not get done per plan)
- Issues - Cross-Functional Only
- use this time only to name and briefly describe the issue and then to determine who needs to be involved and next steps)
- Vacation/Unavailable Notices (that may impact another team)
- Accomplishments since the last meeting
- Team Leads attendance will be essential to the purpose of the meeting, if the team lead is unable to attend please make arrangements to have someone from your team represent you and equipped with the above information.
- Go to Team Meeting Notes
PY3 Objectives Working Documents and Deliverables (strawman - lead to modify to suit the objective's needs)
Usage: Objective Leads to decide which deliverables are relevant to their objective (with Dave's guidance) and then provide update and status (at minimum) one time monthly just prior to our monthly team meeting. For each document posted, please be sure to update the date, percent and status fields - this will be important in giving the reader context as they read.
CS Objective ID: cs1.2_1_logauth, Lead: Steve Quinn, Maytal Dahan
- Requirements Definition and Sign-off (Due Date: April 30,2008)--- Percent Complete: 100% --- Status Complete(last update: <4/28/08,MD>):
- Design Definition and Sign-off (Due Date: May 15,2008)---Percent Complete: 80% --- Status In Progress (last update: <6/23/08,MD>):
- Development & Test Definition and Sign-off (Due Date: Aug 15, 2008)---Percent Complete: 80% --- Status (last update: <6/23/08,MD>):
- Operations Documentation & Training Definition and Sign-off (Due Date: Sept 8,2008)---Percent Complete: 0% --- Status (last update: <date,init>):
- Deployment and Schedule Definition and Sign-off (Due Date: Sept 8, 2008)---Percent Complete: 0% --- Status (last update: <date,init>):
CS Objective ID: cs1.2_2_popsite, Lead: Steve Quinn
- Requirements Definition and Sign-off (Due Date: 4/30/2008)--- Percent Complete: 100% --- Status (last update: 4/28/08):
- Design Definition and Sign-off (Due Date: 4/30/2008)---Percent Complete: 100% --- Status (last update: 4/28/08):
- Development & Test Definition and Sign-off (Due Date: 5/31/2008)---Percent Complete: 100% --- Status (last update: 5/27/08):
- Operations Documentation & Training Definition and Sign-off (Due Date: ,2008)---Percent Complete: 0% --- Status (last update: <date,init>):
- Deployment and Schedule Definition and Sign-off (Due Date: Sept 8, 2008)---Percent Complete: 0% --- Status (last update: <date,init>):
CS Objective ID: cs1.2_3_cs2-d-p, Lead: Laura McGinnis
This is the "glue" layer, which describes how all of the components should fit together. Details on the individual components are the responsibility of the team handling each objective. This piece of the project addresses the interfaces between the components, so that everything is explicitly tested and verified.
Components of Core Services 2 Implementation
NEW: Data Flow Diagram Version 1 (pdf) - Updated with feedback from July 29 session.
OLD: Core 2 Design Document: Data Flow Diagram (pdf) with annotations
Core Services 2.0 Project Plan (MS Project) (pdf)
Core Services 2.0 Workflow - version 2 (ppt) (pdf)
Databases:
- RDR Development (Ed, Rob, JP)
- POPS Changes (Steve, Ester)
- TGCDB Changes (Steve, Michael, Rob)
Interfaces:
- User Portal (Maytal)
- AMIE Changes (Michael, Leo)
- RP Changes (Steve, Michael, Ed, Rob)
- Reporting & Metrics Interfaces (Dave)
Note: All documents require signoff from the CS2.0 Team
- Requirements Definition (Due Date: April, 2008)--- Percent Complete: 95% --- Status Needs Signoff (last update: <26-may-08,lfm>)
- This documents the major components of the Core Services 2.0 system.
- Team leads should review this document to ensure that the high-level entities and base functionality have been captured.
- Constraints (Due Date: May 1, 2008)--- Percent Complete: 95% --- Status In Progress (last update: <26-may-08,lfm>)
- This document describes the constraints to be considered as the Core Services 2.0 components are developed. Constraints include authentication and authorization requirements, system issues, and any other issues that limit or impact development or deployment of CS2.0.
- Unit Test Plan (Due Date: June 30, 2008)--- Percent Complete: 20% --- Status In Progress (last update: <26-may-08,lfm>)
- This document contains the test plans for the major components of CS2.0. Once a component has passed its Unit Test, it is available for integration.
- Integration Test Plan (Due Date: June 30, 2008)--- Percent Complete: 5% --- Status In Progress (last update: <26-may-08,lfm>)
- This document contains test plans for the integrated components of CS2.0.
- Operations Documentation & Training Definition (Due Date: Jun 30, 2008)---Percent Complete: 0% --- Status (last update: <29-apr-08,lfm>):
- Deployment and Schedule Definition (Due Date: July 31, 2008)---Percent Complete: 0% --- Status (last update: <29-Apr-08,lfm>):
CS Objective ID: cs1.2_4_policy, Lead: Kent Milfield
- This objective has been superseded by the Allocations RAT, which includes representations by all RPs as well as CS2 participants.
CS Objective ID: cs1.2_5_sgw, Lead: Jim Basney
- This objective has been superseded by a similiar SGW objective. Here is the latest update. (per Jim Basney, April 2008)
CS Objective ID: cs1.2_6_amie, Lead: Leo. Carson
- AMIE benchmark testing (Created: July 2007. Author: Chris Baumbauer)
- Project Charter (Due Date: June 10, 2008)---Percent Complete:100% --- Status (last update: <06/05/08, leo>):
- Design Approach Decision (Due Date: Feb 15, 2008)--- Percent Complete: 100% --- Status (last update: <06/05/08,leo>):
- Project Notes and Timeline (Due Date: May 27, 2008)---Percent Complete: 100% --- Status (last update: <06/05/08,leo>):
- tarball version of AMIE working at SDSC Due Date: July 15, 2008)---Percent Complete: 100% --- Status (last update: <07/22/08,leo>):
- thruput improvement for 1000 packets is ~7x faster than current AMIE
- Deployment instructions for RP sites
- 1. create a test Unix user account for your tarball-capable AMIE instance (ex:tstloni)
- 2. enable ssh for the new user by sending us your public key. Helpful ssh tutorial link http://www.csua.berkeley.edu/~ranga/notes/ssh_nopass.html
- 3. test the connection.
$ ssh -i $HOME/.ssh/id_dsa azrael.sdsc.edu
- 4. create a /amie-dev dir parallel to /amie, and recursively copy everything from /amie, and replace amie.pl with the tarball capable version you just downloaded.
- 5. in /amie-dev, the wrapper shell script in bin-rbash/ needs to be edited (we changed its name from amie to amie-dev for ease of recognition) so that AMIE_BASE points to /amie-dev, the testing dir tree.
- 6. in <your_sites_test_dir> you need to have logs/, xfer/ (all sub dirs need to be created too), conf/, and lib/ linked to production lib/ as it is in the production account.
- 7. change conf/amie.config by adding this line to put AMIE into the tarball mode. 1 for tarball, 0 for single-packet:
...
tar.<your_site_name>.TGCDB 1
- 8. change the line of ssh command in conf/amie.config to point to your testing shell script:
- send.<your_site_name>.TGCDB ssh tsttgcdb@azrael.sdsc.edu /amie-dev/bin-rbash/amie-dev tsttgcdb -r
- 9. change conf/db.config to point to your test database
- RPs for remote site testing Due Date: July 15, 2008)---Percent Complete: 100% --- Status (last update: <09/15/08,leo>):
- Rob Light at PSC and Steve Brandt at LONI
- Setting up test workbench and documenting the process
- Remote site testing to begin no later than July 24, 2008
- Successfully transferred data to/from PSC July 31
- Successfully transferred data to/from LONI/LSU Aug 26
- Argonne next on Aug 29
- No word from Argonne, will flag issue at next tg-accounting call on Sept 11.
- Argonne unable to test due to conflicting priorities per email from Ti Legget
- Jinghong's last day September 30.
- Declared testing finished per DLH, will put into production at SDSC week of October 20 after Leo returns from vacation.
- Deployment of tarball AMIE Due Date: October 31, 2008)---Percent Complete: 100% --- Status (last update: <03/12/09,leo>):
- Must we execute switchover simultaneously or is staged deployment possible? Staged. Give RPs flexibility to choose when.
- Can we support both the tarball & single-packet versions simultaneously? Yes. But not mixed mode.
- Tarball capable version of amie.pl in repo on 04-Nov-2008. Running at SDSC. Other RPs are encouraged to download and run it.
- Bug fixed in tarball version in Jan 2009. Now in service at NICS and PSC.
CS Objective ID: cs1.2_7_fo4tgcdb, Lead: Rob Light
- Requirements Definition and Sign-off (Due Date: March 2008)--- Percent Complete: 100% --- Status Complete(last update: <03/24/2008,light>):
- Requirements Definition and Sign-off V2 (Due Date: March 2008)--- Percent Complete: 100% --- Status Complete(last update: <04/22/2008,light>):
- Updated Table of Applications Dependent on TGCDB ---Percent Complete: 100% --- Status (last update: <06/02/2008,light>):
- Replication System Analysis and Selection and Sign-off (Due Date: May 1,2008)---Percent Complete: 100% --- Status Complete (last update: <06/02/2008,light>):
- Test Replication System Built at PSC (last update:<June 2, 2008,light>)
- System up and running well
- Currently finding ideal settings
- Creating setup and failover documentation for use with test system between SDSC and PSC
- Test Replication System between SDSC and PSC Built (last update:<July 29,2008,light> Due Date: July 21, 2008)
- Data transfer test between SDSC and PSC showed 700 KB/s
- TGCDB data currently ~12.5 GB based on 25 GB used on TGCDB data drive currently
- Full backup will take ~5 hours. Should be done at least once a week. Should not effect performance.
- Incremental backup : Each shipped log file is always 16 MB. The files get shipped either when one fills to 16 MB of data or a specified time period expires. The files are immediately read into the backup database after being shipped and saved until the next full backup is done. This data size can add up if the time periods are small. For a 10 minute time period this would at least be 16 GB for a week. We need to take a disk space, time period, and full backups into account to have a plan for this. For example if we did a full backup every other day we could have a shorter time period for incremental updates given a fixed disk size.
- Working with Patrick Dorn at NCSA getting the Dynamic Addresses setup (Setup of dynamic dns is complete 6/30/2008)
- Documentation for Replication System Setup and Failover Procedures (Complete 80% Updated :<July 29,2008> Due Date: August 1, 2008)
- Added a lot to this documentation. Didn't get all the way through it though because server being used for project lost network for three weeks.
CS Objective ID: cs1.2_8_rdrdesign, Lead: Ed Hanna
- Vision Statement and Sign-off (Due Date: April 31,2008)--- Percent Complete: 25% --- Status (last update: <3/24/08, EH>):
- Vision Statement and Sign-off (Due Date: April 31,2008)--- Percent Complete: 100% --- Status (last update: <4/25/08, EH>):
- Requirements Definition and Sign-off (Due Date: Jul 15, 2008)--- Percent Complete: 20% --- Status (last update: <4/25/08,EH>):
- Requirements Definition and Sign-off (Due Date: Jul 15, 2008)--- Percent Complete: 75% --- Status (last update: <5/27/08,EH>):
- Requirements Definition and Sign-off (Due Date: Jul 31, 2008)--- Percent Complete: 95% --- Status: Need Sign-off (last update: <7/18/08,EH>):
- Requirements Definition and Sign-off (Due Date: Jul 31, 2008)--- Percent Complete: 95% --- Status: Need Sign-off (last update: <7/25/08,EH>):
- Table of Applications dependent on RDR and Sign-off (Due Date: Jul 15, 2008)--- Percent Complete: 95% --- Status: Need Sign-Off (last update: <7/18/08,EH>):
- Design Definition and Sign-off (Due Date: Aug 31, 2008)---Percent Complete: 70% --- Status (last update: <7/18/2008,EH>):
- Development & Test Definition and Sign-off (Due Date: ,PY4)---Percent Complete: 0% --- Status (last update: <date,init>):
- Operations Documentation & Training Definition and Sign-off (Due Date: ,PY4)---Percent Complete: 0% --- Status (last update: <date,init>):
- Deployment and Schedule Definition and Sign-off (Due Date: ,PY4)---Percent Complete: 0% --- Status (last update: <date,init>):
PY4 Objectives Working Documents and Deliverables
- coming soon.
