Teragrid QBETS Service
From TeraGrid Wiki
Contents |
Notes
- This is NOT a production service at this point in time. It is hosted on a development machine and thus is subject to unexpected downtime.
- Discussions have begun regarding building a Teragrid specific service for determining the best resource for a specific job. A demo is expected for SC07. Please see the document in the links section below for more information on the ORPS service. When complete, the ORPS service will replace this service.
Description
This axis1 web service simply acts as a client to the UCSB Batch Queue Prediction Web service (QBETS Service) querying all queues of multiple hosts for the length of time a job specified in the etc/job.properties file would take to run. For more information on the NWS QBETS service see: http://nws.cs.ucsb.edu/ewiki/nws.php?id=QBETS+Web+Service
On the TeraGrid, users are likely to have accounts on several of the TeraGrid resources. This service allows them to easily find out where their job will start running first by returning a sorted list of queue wait times given a job description.
For general information on web services on the TeraGrid, see: http://www.teragridforum.org/mediawiki/index.php?title=Teragrid_Web_Services_Documentation
Note: NWS, NWSBatchQueuePrediction, NWSBQP, and QBETS are used interchangeably in this document and throughout the code.
Operations
- predictAll : runs a prediction on a set of machines for a particular size and length of job
- Input : predictAll
- Output : predictAllResponse
WSDL
http://rivendell.sdsc.edu:8080/axis/services/TGQbetsServiceSOAP11port?wsdl
Examples
Command Line Examples
Running the Client
Dowload and unpack the distribution found at TGQBETSService_0.1.tar.gz
From the TGQBETSService_0.1 directory:
To get a short list of machines covered by the QBETS service:
% ant list
To get a long list of machines listed with all of their queues:
% ant longList
To query the TGNWS service with the job description specified in etc/job.properties:
% ant client
or
% sh tgqbets.sh
To clean up the installation:
% ant clean
Specifying the Job
The job is specified by editing the etc/job.properties file. http://nws.cs.ucsb.edu/ewiki/nws.php?id=QBETS+Web+Service has some description of the parameters involved with the batch queue prediction service. From the above link:
"Our predictions are parameterized by a quantile and a confidence. The quantile describes the percent of all jobs submitted to a particular batch queue with a specific node range. If we choose the .75 quantile, .1 confidence, and the prediction is 300 seconds we can say with a confidence of 10% that 75% of jobs that are submitted to this queue and node range will take less than 300 seconds to exit the queue."
You can make a set of hosts you are interested in by listing them on the hosts line with a single space separating their short names.
hosts = datastar ncsateragrid lonestar
nodes = int
nodes is the number of nodes your job will run on.
walltime = int
walltime is the number of seconds of wall clock time your job will run for.
timestamp = int
timestamp is a timestamp. This feature has not been implemented in the QBETS service yet but will allow you to put a past timestamp on your query to see past trends. The information is passed to the QBETS service, though, so as soon as it is implemented on their service it will work here.
quantile = float
The quantile describes the percent of all jobs submitted to a particular batch queue with a specific node range.
confidence = float
confidence is a percent (0.9is 90%) indicating how much confidence you require in your estimation.
For example:
#8 node, 1 hour job, default quantile and default confidence (95%) hosts = datastar ncsateragrid lonestar nodes = 8 walltime = 3600 timestamp = 0 quantile = 0 confidence = 0
Programmatic Example
The distribution contains the Java client src/org/teragrid/gateways/TGQbetsServiceClient.java The command line examples above use this class.
Links & References
- TGQBETSService_0.1.tar.gz service and client code
- Gateway_Framework_Resource_Selection-1.doc contains some early plans to expand on this idea
--Steve Mock 23:29, 26 July 2007 (GMT)
