| Monday, September 15 |
| 11:45 am - 12:30 pm | |
| |
Increasing computation throughput with Grid data caching
(45 mins)
Jags Ramnarayan
View Participants
The presentation will be on the use of distributed main memory cache that offers the scalability and elasticity required to operate in a grid. An introduction to terms like replicated caching, partitioned data management, hierarchical caching across compute nodes and main-memory data grid servers, read-through and write-behind caching for synchronization with external data systems will be provided.
Increase CPU utilization to near 100% - move your parallel tasks to data that is distributed across grid memory
As the adoption of Grid computing grows in the enterprise, quite a number of projects are faced with the following challenge:
The primary goal of Grid computing is cost savings through increased CPU utilization. But, often, parallel jobs require data that is stored in enterprise databases, provisioned in dispersed file systems, involve job flows with significant data traffic between jobs and require publishing of intermediate/result data produced by jobs to enterprise data repositories, etc. Essentially, data intensive nature of Grid applications can result in the servers become IO bound, sometimes reducing the average CPU utilization to less than 50%.
The presentation will be on the use of distributed main memory cache that offers the scalability and elasticity required to operate in a grid. An introduction to terms like replicated caching, partitioned data management, hierarchical caching across compute nodes and main-memory data grid servers, read-through and write-behind caching for synchronization with external data systems will be provided.
To achieve maximum CPU utilization, how do you get the data close to the compute node? If the data is partitioned across the Grid nodes, what are the policies? we will discuss static partitioning vs dynamic partitioning of data, dynamic rebalancing of data across grid nodes driven by either data growth or access patterns, static configuration of redundant copies vs dynamic increase in number of data copies for parallel access and techniques for distributed query processing.
To achieve optimal routing of job/task to where the data is provisioned, the talk covers the integration aspects of a compute scheduling engine with a data grid.
Location: Creation
|
| |
| |
|
| |
| | Slides: Increase computation throughput with Grid data caching - Jags Ramnarayan |