TeraGrid Forum Meeting Notes (Nov.-Dec. 2009)
From TeraGrid Wiki
Monday, Nov. 30th
1:00 pm - 2:45 pm - Managing resources and how new users and small-medium jobs will be handled - All
-- discussion is continued from TG Forum call regarding small and startup users given the current capacity-limited resource situation and concerns that these users are having bad initial experience
-- What are the main issues? 1) time to create accounts, 2) queue wait times, and 3) user support limitations
-- account creation time has increased recently - partly due to staff reductions at NCSA and partly due to the increase in new allocations due to the great success of the Campus Champions and Startup allocations - NCSA has recently added a student to assist
-- Dave Hart already has a solution developed but needs implementation time from Steve Quinn's group which means deciding to change priority of current projects list - automated account creation by the PI - PI is still manually vetted
-- decision to checkpoint next quarterly meeting on progress and also halfway during a TG RT meeting - Dave, SteveQ, Maytal, and Matt will take a look at priorities before the Holidays
-- queue wait times - suggestion to send the Startup users to some subset of systems - perhaps Lonestar/Steele/Abe/Queenbee - and reserving some of the resource for these users - the TG Extension proposal indicated that we would allocate these as a group and schedule across them (metaschedule)
-- Purdue Steele - Carol will come back to the group today/tomorrow with a decision on Steele's involvement in co-allocation
-- Abe/Lonestar/Queenbee - Matt polled the group and there were no objections to co-allocating these resources from this point forward
-- each of Abe/Lonestar/Queenbee/Steele should publish information on their queueing policies and provide some historical data that users can view to make decisions (note that Warren and Sched-WG have a new Queue Prediction Tool)
-- User Support limitations - new user info, training, documentation - Sergiu met with Diana this morning - use TGCDB to flag certain users so RP consultants can track users and contact them - for instance, 3 months in a user uses less than 10% of allocation we should contact them and see what improvements we can make to training and documentation
-- Discussion on publishing results of Allocations meetings - need to decide what to publish and where to publish - Elizabeth and Dave Hart
3:15 pm - 4:15 pm - DataNets - DataONE - John C., John T., Dan K.
-- Contact made by Dan/Towns as TG Forum reps with DataONE via John Cobb. The Data Conservancy (Johns Hopkins) discussion has not yet happened.
-- DataONE is concerned with data at a higher level/software level such as metadata, ontologies, and user interactions. Is a distributed effort. It seems that specific science domain proposals were passed over in the first round of DataNet awardees. Hardware/physical infrastructure is a problem - not much funding by DataNet in this round.
-- DataONE's vision includes coordinated/federated Metadata nodes - can TG provide this?
-- DataONE has WG's that could overlap with TG WG's - workflow, federated identity, others. Von Welch is already part of both TG and DataONE - what other connections can we make?
-- JohnC and ChrisJ will get together and suggest some TG people that can work with the DataONE WGs.
-- DataONE just finished their draft Project Management Plan and are awaiting NSF feedback. Not clear exactly how to engage until Plan is complete.
-- A joint white paper on user requirements would be valuable to both teams.
- TG pull together names to submit to for participation in Working Groups. John Cobb and Chris Jordan pull together initial suggestions and forward to RPF for discussion.
- Encourage and offer to assist DataONE in developing a white paper for the Low level CI requirements (Cobb, Katz)
- Encourage DataNets to participate in TG10 (Cobb, Katz, Moore)
- Other items (Specific assignees not made)
- TG seize the initiative and propose to NSF to be the low level CI provider of choice for Storage
- TG work to define and implement services that could be of use to DataONE “We would
- TG needs to get “credit” by being recognized for reaching out to projects like DataONE.
- Continue to seek input from other DataNet Projects, including Data Conservancy
- Work to host data for datanets
- Chris Jordan points out that TG also needs large persistent disk storage
4:15 pm - 5:00 pm - QA/CUE group update - Kate and Shava
-- FTE need identified at IU, NCSA, and TACC for this effort
-- assuming 50/50 split of funding/FTE between QA and CUE - Kate will confirm with CUE regarding levels of effort needed to achieve goals
-- Richard Moore announced that the Powderhorn Silo and ~10 STK 9940B tape drives that SDSC sent to PSC are available (for the price of shipping). Contact Richard no later than this Thursday 12/3 if interested.
Tuesday, Dec. 1
8:00 am - 8:30 am - Batch Queue Prediction - Warren
-- concern expressed regarding use/interpretation of this data by the users - they might think this assures them that their job will run at a certain time or they might use the data for other purposes - we should include a disclaimer that this is only a prediction not a guarantee
-- another concern expressed - previous QBETS issue - sent user off to the wrong system - sent a large core-count job to a small machine
8:30 am - 9:30 am - A report out of the “RAAR” group that Kent Milfeld is leading on plans for heavily-oversubscribed TRAC meetings - Kent/John/Richard
-- Total Available (orange line) should show as step functions of capacity - do not draw a trend line that is misleading
-- Total Available - show April 1, 2010 decline in resources, and show increase from SDSC Gordon - note that Futuregrid adds no cycles, nor does Keeneland
-- Send the updated chart to Jim Bottum
-- Want to smooth out the allocations requests - they are still cyclical with the old LRAC/MRAC process - want some PI's to move by a quarter - offer this to the community - suggestion to offer 5-quarter allocation to make this happen
-- Oversubscription Policy - reviewers should do merit-based review independent of resource availability - the TG Allocations Officers will do allocation adjustment post-review
-- A process has been developed for adjusting allocations in a resource-limited environment - see the slides. Will show this to TRAC this week and get feedback from them - good meeting to do it since pressure is low this time on the resources since it aligns with the old MRAC cycle.
9:30 am - 10:15 am - Data Allocations Policy Changes/Adjustments - Chris Jordan
-- Archive Replication Service - set up a small scale version, allocate it, and collect information on usage and use this as a basis for how to proceed with NSF on a future large-scale service - TACC, IU, and PSC are currently agreeing to participate - would like more partners and would like to get the storage up to 500TB - 1PB range.
-- Concern: this is a 1-year service, so how are we going to get users to want it? Chris has a few PIs that are interested right now. How do we drum up demand? What if demand is high? What is the exit strategy at the end of the year?
-- We need to push users to include a prescribed metadata set - require it.
10:30 am - 11:30 am - Vis Whitepaper - Kelly
-- Recommendations are in the slide set.
-- Software/tool development is still a large part of the work needed as users evolve their data analysis and visualization methods. Still have remote vis, but also seeing more in situ needs as data sets increase in size and movement offsite is not an option.
-- It would be good to provide a gap analysis that includes what is currently being done/planned with current funding.
11:30 am - 12:00 pm - SAB meeting and agenda discussion - Dan K. - including previews of any talks that are scheduled to be presented, such as allocations from Kent
-- Scott expects to have Broadening Participation slides available by the end of this week.
1:00 pm - 2:00 pm - Longhorn at TACC - Kelly
2:00 pm - 2:45 pm - OSG Jobs and TG Resources in the Extension - John T.
-- Tabled - moved to TG RT Meeting - ask LONI, Purdue, NCSA to present how they are interoperating with OSG
3:15 pm - 4:30 pm - Lustre WAN Deployment - Chris J.
-- by the end of January, the Data-WG needs to present a plan on how to deploy the hardware and Lustre-WAN promised in the TG Extension - this should really be in the IPP before it goes to NSF
4:30 pm - 5:00 pm - Annual Report - Mike N./Tim
Wednesday, Dec. 2
8:00 am - 8:15 am - NSF guidance for No Cost Extension to July 31, 2011 - John T.
-- The Year 5 IPP is the justification for the No Cost Extension to July 31, 2011.
8:15 pm - 8:45 am - Review what updates we need to send to the community about the migration to XD - Scott
-- The Resource Catalog is a source of information on transition, and system end dates should be there - many are listed TBA now, so it needs updated with Diana Diehl. Also seen here http://teragrid.org/XDTransition/ and a suggestion to change the heading here to Resource Transition and not XD Transition since it is only about the resources.
8:45 am - 9:15 am - * Science Gateway User Counting Work - Nancy/Von
-- Science Gateway CTSS Kit - provides the attribute-based authentication that allows for SGW user counting
-- GRAM5 replaced GRAM2 - waiting for GRAM5 production release - then with other work/testing/packaging we're shooting for February now to have this capability in place
9:15 am - 9:45 am - Creating User Forum(s) - All
-- moderators - who will do this? need people with expertise for given forum topics.
-- clearly articulate that there are other resources for Help - helpdesk email/telephone and the Knowledge Base
-- users can share experiences