[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Resource Usage Format - Accounting Information Sharing Models



Usage Record Format - Accounting Information Sharing Models

In a conversation I recently had with Jian Zhang, we discussed the fact that
there are more than one models for exchanging accounting and usage
information. In order to pick the "right" one to target for the emerging
Usage Record Format Working Group, we would like to solicit your feedback
and thoughts on this matter.

It is believed that it does not make sense to proscribe in what format the
end-application should store the usage and accounting records. We believe we
should allow the application designer the flexibility to choose whether to
store this information in a relational database, a directory service, or
extract it out of flat files or logs. What is critical, is that an external
entity can request this information in a known way and be able to interpret
the results of that request. 

I believe there are at least two models for exchanging this accounging
information - (I) passing full accounting records around, or (II) defining a
query syntax language. In both cases, a set of semantics would need to be
defined to describe the bounds of resource usage. Is it more important that
the generator of usage records be able to package it up into a known
complete record format when requested or that it be able to respond to query
requests with the appropriate specific responses? 

I.	Defining a usage record format to be passed around as a full record

In this scenario, we could define a format that describes the resources used
within a particular session (job), i.e.

	<UsageRecord id="Job1234.0@SCCS.edu>
		<Who user="scott" account="chem101"/>
		<When completionDateTime="999999999"
startedDateTime="999999500" submittedDateTime="999999000"/>
		<Where resourceProvider="SCCS.edu"
resourceConsumer="PNNL.gov"/>
		<ResourcesConsumed>
<ProcessorSeconds>12346</ProcessorSeconds>
<MemorySeconds units="MBs">98765</ MemorySeconds>
...
</ResourcesConsumed>
	  </UsageRecord>

When asked for this information, the whole record would be returned (as an
object?)

Pros:
	Readily supports a structured description of resource utilization
(like to be able to say how much of the ProcessorSecondsConsumed was on
which NodeTypes).

Cons:
	These records would contain all of the accounting-relevant
information collected by the resource provider and would likely be very
large and inefficient to pass around as a whole.
	It might also be difficult to require the usage record generator (a
batch system etc.) to package the information into a structured record
format

II.	Defining a usage query syntax language

In this scenario, it is the query language that is well-defined. The result
that you get depends upon your question. You can ask for all or part of a
record, or potentially even specify the format you want the response to be
in.

Unlike the above case, we do not specify what the full-record format be,
rather how to ask for some or all of the information about a session. We do
not pass around records, we respond to queries.

<Request object="UsageRecord" action="query">
	<get>JobId</get>
<get>UserName</get>
	<get>AccountName</get>
	<get>ProcessorSecondsConsumed</get>
	<get units="MBs">MemorySecondsConsumed</get>
	<where name="JobId">Job1234.0@SCCS.edu</where>
</UsageRecordQueryRequest>

<Response>
	<UsageRecord>
		<JobId>Job1234.0@SCCS.edu</JobId>
		<UserName>scott</UserName>
		<AccountName>chem101</AccountName>
		<ProcessorSecondsConsumed>12345</ProcessorSecondsConsumed>
		<MemorySecondsConsumed>98765</MemorySecondsConsumed>
	</UsageRecord>
</Response>
		
If your request had no <get> statements indicating the fields you wanted
returned, it would give you all the available fields, for all of the records
selected in your request. What is well-defined here are the semantics of the
usage fields you can request about a session. A query language like QUILT
could even be used to return the response in any format in which you
desired, XML, HTML (ready for GUI consumption), pretty-printed etc.

Pros:

	It is very easy to get exactly the information you want, suitable
for your own consumption without having to pull all fields that may not be
relevant to your request

Cons:

	It may be harder to support structured accounting information

To help identify the right model, I feel it would be helpful to identify
some very specific use cases in which this usage/accounting information
might need to be passed between software entities. I recall someone pointing
out as an example that the TeraGrid project might need to share cycles and
hence accounting information with people in the European DataGrid project.
Could anyone identify two specific software components that would need to
exchange accounting information and what questions they would like to ask of
the other side to help identify whether we should be talking about tossing
complete records around or responding to informational requests in a well
known way?

This whole question of whether to support structured accounting information
presents some hard problems (perhaps), since structuring may be required in
arbitrary ways, and very differently in different cases (i.e. some resources
have parent-child relationships in some architectural designs but the
reverse child-parent relationship in others). If we choose to require a
structured-resource usage record format, then will we be imposing a
significant burden for supporters of the protocol?

Scott Jackson (Pacific Northwest National Laboratory)


Usage Record Format.doc