OGF24 Schedule
The 24th Open Grid Forum - OGF24
September 15-19, 2008
Singapore, Singapore

Monday, September 15
11:45 am - 12:30 pm
Increasing computation throughput with Grid data caching (45 mins)
Jags Ramnarayan
View Participants

The presentation will be on the use of distributed main memory cache that offers the scalability and elasticity required to operate in a grid. An introduction to terms like replicated caching, partitioned data management, hierarchical caching across compute nodes and main-memory data grid servers, read-through and write-behind caching for synchronization with external data systems will be provided.

Increase CPU utilization to near 100% - move your parallel tasks to data that is distributed across grid memory

As the adoption of Grid computing grows in the enterprise, quite a number of projects are faced with the following challenge:

The primary goal of Grid computing is cost savings through increased CPU utilization. But, often, parallel jobs require data that is stored in enterprise databases, provisioned in dispersed file systems, involve job flows with significant data traffic between jobs and require publishing of intermediate/result data produced by jobs to enterprise data repositories, etc. Essentially, data intensive nature of Grid applications can result in the servers become IO bound, sometimes reducing the average CPU utilization to less than 50%.

The presentation will be on the use of distributed main memory cache that offers the scalability and elasticity required to operate in a grid. An introduction to terms like replicated caching, partitioned data management, hierarchical caching across compute nodes and main-memory data grid servers, read-through and write-behind caching for synchronization with external data systems will be provided.

To achieve maximum CPU utilization, how do you get the data close to the compute node? If the data is partitioned across the Grid nodes, what are the policies? we will discuss static partitioning vs dynamic partitioning of data, dynamic rebalancing of data across grid nodes driven by either data growth or access patterns, static configuration of redundant copies vs dynamic increase in number of data copies for parallel access and techniques for distributed query processing.
To achieve optimal routing of job/task to where the data is provisioned, the talk covers the integration aspects of a compute scheduling engine with a data grid.


Location: Creation
 
Rate This Session:
Rating: Comments:

 
    Slides:     Increase computation throughput with Grid data caching - Jags Ramnarayan

> login   RSS RSS Contact Webmaster

OGFSM, Open Grid ForumSM, Grid ForumSM, and the OGF Logo are trademarks of OGF