| Monday, February 13 |
| 1:30 pm - 3:00 pm | |
| |
Workshop on Reliability & Robustness in Grid Computing Systems - #1
(90 mins)
Christopher Dabrowski, National Institute of Standards and Technology
Grid computing systems based on emerging Web Service and Grid standards will need to achieve levels of reliability and robustness necessary for enterprise applications in industry and science. This workshop will seek to bring together researchers and engineers whose organizations are actively addressing these concerns to share experiences and describe their research. Of particular interest will be methods and techniques that allow systems that use web-service and grid standards to detect and overcome failures in order to provide a level of reliability and robustness needed for industrial and scientific purposes.
Agenda: The goal of the workshop will be bring together researchers and engineers from industry and academe whose organizations are actively addressing these concerns to share experiences and inform each other of their work in this area. Promoting information exchange can be generally expected to benefit ongoing grid reliability work as a whole. Moreover, despite the practical importance of this work, there are relatively few researchers working on grid reliability in comparison to other areas of grid work. They are quite possibly isolated and unaware of each other. The workshop will hopefully attract new experts to participate in the Research Group in Reliability and Robustness in Grid Computing Systems that is currently in the process of formation. Similarly, the workshop can serve to attract participation in other GGF Working Groups that are concerned with reliability issues. Workshop discussions also will serve as input to the program of work for the research group and ultimately to developing recommendations for how Web Services and Grid Standards can be progressed to ensure reliable grid applications.
Special topics of interest will be:
• Terminology of reliability, dependability, and robustness for grid systems.
• Technologies supporting reliability, dependability, and robustness in grid systems.
• Strategies used by grid manufacturers (software vendors, service providers, etc.) and service consumers for improving reliability (such as checkpointing, work-around techniques, and recovery methods).
• User experiences (in business and academic communities) with grid reliability.
• Grid monitoring (of failure and system performance).
• Interactions between grid software and other network software during failure detection and recovery (multilayer recovery)
Location: Vergina
|
| |
| |
|
| |
| | Slides: Providing Fault-tolerance for Parallel Programs on Grid - Yeom |
| | Slides (PDF): QoS-Aware Fault Tolerance in Grid Computing - Valcarenghi |
| | Slides: Reliable Messaging for Grids and Web Services - Fox |
| | Slides (PDF): Site Assessment and Probabilistic Risk Analysis - Higgins |
| | Slides: Understanding Emergent Behavior in Global Grid Systems - Dabrowski |
| | Link: Workshop Report |