It's not often that you get to speed something up by a factor of 100: more often, we are working hard to get a 10% improvement. But my colleague John Bresnahan recently achieved that happy result with Globus GridFTP, the Grid data transfer workhorse.
The Globus implementation of the GridFTP protocol has always been fast for large files, achieving in some cases close to 30 gigabit/s over wide area networks. However, when data is partitioned into small files, GridFTP has historically suffered from low transfer rates due to the rounddtrip latency involved in successive transfer requests.
John and other members of the GridFTP team designed pipelining to solve this "lots of small files" (LOSF) problem. They modified GridFTP to allow many transfer requests to be outstanding at once. Thus, latency between requests is hidden in the time it takes to transfer previous files: by the time one file has completed, the next request is queued up in the server ready to start.
John finally had time to write a client that takes advantage of this. A set of graphs show the performance improvement, which for "small" (10 kilobyte to 10 megabyte) files can be enormous.
John is now integrating these pipelining techniques into GridFTP clients, in particular RFT. Let us know if you're interested in trying this.

Comments