The Globus team showcased Globus Online, our new cloud-based managed file transfer service, at the SC conference in New Orleans last week. Hundreds of people came by our booth to sign up. I don't think it was just the free T-shirts: people seemed really interested. Thus, I outline here the rationale for Globus Online's development, and provide a few notes on its design and implementation.
Growing up in New Zealand, I heard endless repeats of the Goon Show. In one episode, hero Neddy Seagoon is offered five pounds to move a piano from one room to another. It turns out that one room is in France and the other in England, so it is a more difficult task than Neddy anticipated. (He ends up sailing the piano across the Channel.)
Moving data has that flavor: it can sound trivial, but in practice is often tedious and difficult. Datasets may have complex nested structures, containing many files of varying sizes. Source and destination may have different authentication requirements and interfaces. End-to-end performance may require careful optimization. Failures must be recovered from. Perhaps only some files differ between source and destination. And so on.
Many tools exist to manage data movement: RFT, FTS, Phedex, rsync, etc. However, all must be installed and run by the user, which can be challenging for all concerned. Globus Online uses software-as-a-service (SaaS) methods to overcome those problems. It's a cloud-hosted, managed service, meaning that you ask Globus Online to move data; Globus Online does its best to make that happen, and tells you if it fails.
The Globus Online a service can be accessed via different interfaces depending on the user and their application:
- A simple Web UI is designed to serve the needs of ad hoc and less technical users
- A command line interface exposes more advanced capabilities and enables scripting for use in automated workflows
- A REST interface facilitates integration for system builders who don't want to re-engineer file transfer solutions for their end users
All three access methods allow a client to:
- establish and update a user profile, and specify the method(s) you want to use to authenticate to the service;
- authenticate using various common methods, such as Google OpenID or MyProxy providers;
- characterize endpoints to/from which transfers may be performed;
- request transfers;
- monitor the progress of transfers; and
- cancel active transfers
Having authenticated and requested a transfer, a client can disconnect, and return later to find out what happened. Globus Online tells you which transfer(s) succeeded and which, if any, failed. It notifies you if a deadline is not met, or if a transfer requires additional credentials.
Globus Online REST requests are of course simple HTTP GETs and POSTs, with the destination URL indicating the requested operation and the body of the message containing any arguments.
A command line interface (CLI) has long been valuable for client-side scripting, but requires installation of client-side libraries. What we call (somewhat tongue in cheek) CLI-2 supports client-side scripting with no client-side software installation. We achieve this behavior via a restricted shell, into which any user with a Globus Online account can ssh to execute commands. Thus, I can write
ssh FOSTER@cli.globusonline.org scp alcf#dtn:/myfile nersc#dtn:/myfile
to copy myfile from source alcf#dtn to destination nersc#dtn. Two useful features are illustrated:
- Endpoints define logical names for physical nodes. For example, alcf#dtn denotes the data transfer nodes associated with the Argonne Leadership Computing Facility. Sites can publish their endpoints, and users can define their own endpoint names.
- The Globus Online scp command echoes the syntax of the popular scp (secure copy), thus facilitating access by scp users. It supports many regular scp options, plus some additional features--and is much faster because it is built on GridFTP.
There's more, including a powerful transfer command. I encourage you to browse the documentation.
The two keys to successful SaaS are reliability and scalability. The service must behave appropriately as usage grows to 1,000 then 1,000,000 and maybe more users. To this end, we run Globus Online on Amazon Web Services. User and transfer profile information are maintained in a database that is replicated, for reliability, across multiple geographical regions. Transfers are serviced by nodes in Amazon's Elastic Compute Cloud (EC2) which automatically scale as service demands increase.
We will support InCommon credentials and other OpenID providers in addition to Google; support other transfer protocols, including HTTP and SRM; and continue to refine automated transfer optimization, by for example optimizing endpoint configurations based on number and size of files.
Kudos to the Globus Online team, which includes Bryce Allen, Joshua Boverhof, John Bresnahan, Ann Chervenak, Lisa Childers, Paul Dave’, Fred Dech, Ian Foster, Dan Gunter, Gopi Kandaswany, Nick Karonis, Raj Kettimuthu, Jack Kordas, Lee Liming, Mike Link, Stu Martin, JP Navarro, Karl Pickett, Mei Hui Su, Steve Tuecke, and Vas Vasiliadis.