Globus Online CLI: Beyond the Basics

This is a companion guide to An Introduction to the CLI. New CLI users should read the intro first.

deleteme

Endpoint Management

In addition to serving as a discovery mechanism for community endpoints Globus Online enables users to create and (optionally) share their own endpoint definitions.

Logical endpoints can be created using the endpoint-add command. They can be continually modified (by adding physical addresses, renaming, etc.) and persist until explicitly deleted with the endpoint-remove command. Physical addresses (specified with the endpoint-add -p option) are the same as would appear in a globus-url-copy command. Transfers that refer to a given logical endpoint will be randomly assigned at runtime to the associated physical addresses.

In the following example user lcc adds an endpoint with a standalone ssh command. To demonstrate Globus Online’s interactive shell mode, she then adds two endpoints inside an interactive Globus Online session. Two logical endpoints are created, with vpac having one associated physical address and never having two:

bash-3.2$ ssh [email protected] endpoint-add vpac -p gsiftp://arcs-df.vpac.org:2811/
bash-3.2$ ssh [email protected]
Welcome to globusonline.org, lcc. Type 'help' for help.
$ endpoint-add -p never-1.ci.uchicago.edu never
$ endpoint-add -p never-2.ci.uchicago.edu never
$ exit
Connection to cli.globusonline.org closed. bash-3.2$

Globus Online endpoint definitions are either public or private. Public endpoints are visible to all Globus Online users; private endpoints are visible only to those who defined them. Here we see that after user lcc makes one of her endpoints public, user lccso22 now sees lcc#never in the public list:

ssh [email protected] endpoint-modify --public never
Set 'never' to public
ssh [email protected] endpoint-list -p
alcf#dtn ci#pads go#ep1 go#ep2 lcc#never nersc#dtn olcf#dtn tg#abe tg#bigred tg#cobalt tg#condor tg#ember tg#frost tg#hpss tg#kraken tg#lincoln tg#lonestar tg#longhorn tg#nstg tg#pople tg#queenbee tg#ranch tg#ranger tg#spur tg#steele
ssh [email protected] endpoint-list -p -v lcc#never
Name : lcc#never Host(s) : gsiftp://never-2.ci.uchicago.edu:2811, gsiftp://never-1.ci.uchicago.edu:2811 Subject(s) : , MyProxy Server: n/a

endpoint-list with no options displays the user's list of previously-activated endpoints (both public and private), along with the remaining activation time for each endpoint:

ssh [email protected] endpoint-list
alcf#dtn 09:36:54 ci#pads 08:54:51 go#ep1 10:34:43 go#ep2 10:34:43 never 09:36:54 vpac - nersc#dtn 08:25:47 olcf#dtn 08:48:19 tg#abe 06:34:10 tg#bigred 06:34:10 tg#cobalt 06:34:10 tg#condor 06:34:10 tg#ember 06:34:10 tg#frost 06:35:58 tg#hpss 06:34:10 tg#kraken 06:34:10 tg#lincoln 06:34:10 tg#lonestar 06:34:10 tg#longhorn 06:34:10 tg#nstg 06:34:10 tg#pople 06:34:10 tg#queenbee 06:34:10 tg#ranch 06:34:10 tg#ranger 06:34:10 tg#spur 06:34:10 tg#steele 06:34:10

In addition to explicit creation, endpoints can be implicitly created by way of transfer and scp. If the transfer or scp command refers to a hostname instead of a logical name, a private endpoint will be automatically created to represent it. Further information about implicit endpoint creation can be found in the transfer and scp man pages.

Data Management

Globus Online provides two commands for moving files: transfer and scp. The transfer command is the more feature-rich of the two; scp supports a well-known interface and is easy to use. Globus Online also supports features such as file synchronization and idempotent submission.

The following example shows a detached recursive scp. By default scp will be canceled if your ssh session is disconnected or you press CTRL-C. However, Globus Online provides the -D option so you can create a detached scp task that runs in the background even if your ssh session is disconnected:

ssh [email protected] scp -D -r alcf#dtn:/intrepid-fs0/users/childers/persistent/datasrc/sdata/10Kfiles100M/ nersc#dtn:/project/projectdirs/mpccc1/childers/data/dest/sdata/alcf20100122/10Kfiles100M/
Task ID: 4a3c471e-edef-11df-aa30-1231350018b1

In contrast to scp, the transfer command reads an EOF or ctrl-D terminated list of source and destination pairs from stdin and attempts to transfer all of the files in the list until successful or the user-specified deadline has been reached. The following example directs Globus Online to recursively copy the contents of a directory from ALCF to NERSC. It is the equivalent to the previous scp command, with the exception that any outstanding transfer requests not completed after the 6 hour deadline (-d 6h) will be ignored:

echo "alcf#dtn/intrepid-fs0/users/childers/persistent/datasrc/sdata/10Kfiles100M/ nersc#dtn/project/projectdirs/mpccc1/childers/data/dest/sdata/alcf20100122/10Kfiles100M/ -r" | ssh [email protected] transfer -d 6h
Task ID: 427b63ec-ee04-11df-aa30-1231350018b1 Created transfer task with 1 file(s)

Another way to specify a transfer dataset is via a file list. A file list can contain a mix of directory source/dest pairs and individual file source/dest pairs. (Note some might find an adaption of this unsupported python script helpful in building file lists.) The following example specifies that 10,000 individual files should be transferred within the default deadline of 24 hours:

cat ../10Kalcf-nersc100MB.dat | ssh [email protected] transfer
Task ID: 28d854ae-ee18-11df-aa30-1231350018b1 Created transfer task with 10000 file(s)

The following two examples highlight Globus Online's one-way file synchronization feature. The first executes a file size-based check, the second executes a full md5sum check:

echo "go#ep1/share/godata/ go#ep2/~/ -r -s 1" | ssh [email protected] transfer
Task ID: 609b53fc-ebff-11df-aa30-1231350018b1 Created transfer task with 1 file(s)
echo "alcf#dtn/intrepid-fs0/users/childers/persistent/datasrc/sdata/10Kfiles100M/ nersc#dtn/project/projectdirs/mpccc1/childers/data/dest/sdata/alcf20100122/10Kfiles100M/ -r -s 3" | ssh [email protected] transfer
Task ID: 1c05440a-ee57-11df-aa30-1231350018b1 Created transfer task with 1 file(s)

The following example demonstrates both inline endpoint creation and activation (note the automatically-generated private endpoint definition):

gsissh [email protected] scp -g alcf#dtn:~/samplefile.txt gridftp.lonestar.tacc.teragrid.org:~/samplefile.txt
Activating 'gridftp.lonestar.tacc.teragrid.org:2811' Activating 'alcf#dtn:2811' Task ID: 3f4c2cc6-ee20-11df-aa30-1231350018b1 [XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX] 1/1 0.00 mbps
ssh [email protected] endpoint-list *lone* -v
Name : _gsiftp_gridftp.lonestar.tacc.teragrid.org_2811 Host(s) : gsiftp://gridftp.lonestar.tacc.teragrid.org:2811 Subject(s) : MyProxy Server : n/a Credential Status : ACTIVE Credential Expires: 2010-11-12 16:47:16Z Credential Subject: /DC=org/DC=doegrids/OU=People/CN=Lisa C Childers 319818/CN=1686104609/CN=1586158410 Name : tg#lonestar Host(s) : gsiftp://tg-gridftp.lonestar.tacc.teragrid.org:2811 Subject(s) : MyProxy Server : myproxy.teragrid.org Credential Status : EXPIRED Credential Expires: 2010-11-12 05:56:46Z Credential Subject: /C=US/O=National Center for Supercomputing Applications/CN=Lisa Childers

Once-and-only-once submission:

bash-3.2$ ssh [email protected] transfer --generate-id
7f2fb1d6-ee76-11df-aa30-1231350018b1
bash-3.2$ cat ../10Kalcf-nersc100MB.dat | ssh [email protected] transfer --taskid=7f2fb1d6-ee76-11df-aa30-1231350018b1
Killed by signal 2.
bash-3.2$ cat ../10Kalcf-nersc100MB.dat | ssh [email protected] transfer --taskid=7f2fb1d6-ee76-11df-aa30-1231350018b1
Deadline : 2010-11-12 19:24:31Z Task ID: 7f2fb1d6-ee76-11df-aa30-1231350018b1 Created transfer task with 10000 file(s)
bash-3.2$ cat ../10Kalcf-nersc100MB.dat | ssh [email protected] transfer --taskid=7f2fb1d6-ee76-11df-aa30-1231350018b1
Notice: Task ID already created

Monitoring

Globus Online provides its users with realtime and historical information about their tasks. Push mechanisms include email notifications of interesting events such as task completion, credential expiration, and account creation. Pull mechanisms return metadata at the task level (the task id returned by the scp and transfer commands) and the subtask level (each individual file transfer is considered a subtask and has a unique id.)

The default status command lists all pending tasks:

ssh [email protected] status
Task ID : 28d854ae-ee18-11df-aa30-1231350018b1 Request Time: 2010-11-12 04:48:57Z Command : transfer (+10000 input lines) Status : ACTIVE

status also provides a way to list the last n tasks (-l n) regardless of state (-a):

ssh [email protected] status -l 4 -a
Task ID : 3f4c2cc6-ee20-11df-aa30-1231350018b1 Request Time: 2010-11-12 05:46:51Z Command : scp -g alcf#dtn:~/samplefile.txt gridftp.lonestar.tacc.teragrid.org:~/samplefile.txt Status : SUCCEEDED Task ID : 28d854ae-ee18-11df-aa30-1231350018b1 Request Time: 2010-11-12 04:48:57Z Command : transfer (+10000 input lines) Status : ACTIVE Task ID : 427b63ec-ee04-11df-aa30-1231350018b1 Request Time: 2010-11-12 02:26:30Z Command : transfer -d 6h (+1 input line) Status : SUCCEEDED Task ID : 4a3c471e-edef-11df-aa30-1231350018b1 Request Time: 2010-11-11 23:56:24Z Command : scp -D -r alcf#dtn:/intrepid-fs0/users/childers/persistent/datasrc/sdata/10Kfiles100M/ nersc#dtn:/project/projectdirs/mpccc1/childers/data/dest/sdata/alcf20100122/10Kfile100M/ Status : SUCCEEDED

The default details command provides an overview of a transfer’s state:

ssh [email protected] details 28d854ae-ee18-11df-aa30-1231350018b1
Task ID : 28d854ae-ee18-11df-aa30-1231350018b1 Task Type : TRANSFER Parent Task ID : n/a Status : ACTIVE Request Time : 2010-11-12 04:48:57Z Deadline : 2010-11-13 04:48:57Z Completion Time : n/a Total Tasks : 10000 Tasks Successful : 8831 Tasks Expired : 0 Tasks Canceled : 0 Tasks Failed : 0 Tasks Pending : 1169 Tasks Retrying : 8 Command : transfer (+10000 input lines) Files : 10000 Directories : 0 Bytes Transferred: 925997465600 MBits/sec : 2224.619

The details -t command lists subtasks (i.e. individual files) for an scp or transfer task. In the following example the command produces a 10,001 line file (a header, plus one line for each file):

ssh [email protected] details -t -f all -O csvh 28d854ae-ee18-11df-aa30-1231350018b1 > details.csv

R is useful for inspecting the output from large runs. Here are plots of per-file transfer times and numbers of faults; all 10,000 transfers succeeded:

[R.app GUI 1.34 (5589) x86_64-apple-darwin9.8.0]
statusFile <- read.table("~/details.csv", sep=",", header=TRUE)
attach(statusFile)
summary(task_type)

FILE_COPY 10000
summary(status)
SUCCEEDED 10000
summary(faults)
Min. 1st Qu. Median Mean 3rd Qu. Max. 0.000 0.000 0.000 0.001 0.000 1.000
sum(faults)
[1] 10
sortedTime <- statusFile[order(completion_time,parent_taskid) , ]
attach(sortedTime)
rtime <- as.POSIXct(request_time,"YYYY-mm-dd HH:MM:SS")
ctime <- as.POSIXct(completion_time,"YYYY-mm-dd HH:MM:SS")
ttime <- ctime-rtime
summary(as.numeric(ttime))

Min. 1st Qu. Median Mean 3rd Qu. Max. 37 1030 1937 1920 2857 3736
palette(rainbow(8))
plot(ttime/60,col=taskid,type="h",ylab="Completion Time (Minutes)",xlab="Individual Files",main="Globus Online Demonstration: Time\n1 user transferring 10,000 100MB files from ALCF to NERSC\n11 November 2010")
plot(faults,ylab="Number of Faults",xlab="Individual Files",col=taskid,main="Globus Online Demonstration: Faults\n1 user transferring 10,000 100MB files from ALCF to NERSC\n11 November 2010",yaxt="n")
axis(2,at=0:1)

The events command provides information about events that occurred while executing a task. In this first example user lcc is inspecting the progress of an earlier checksum-based sync by examining the "files_summed=" counts:

ssh [email protected] events 1c05440a-ee57-11df-aa30-1231350018b1 | tail -10
Code : PROGRESS Description : Performance monitoring event Details : bytes_summed=349700096000 files_summed=3335 Task ID : 1c05440b-ee57-11df-aa30-1231350018b1 Parent Task ID: 1c05440a-ee57-11df-aa30-1231350018b1 Time : 2010-11-12 13:20:09.578755Z Code : PROGRESS Description : Performance monitoring event Details : bytes_summed=355886694400 files_summed=3394

In this example lcc is extracting all events that occurred while transferring a 1TB dataset (and storing them in a file for later inspection):

ssh [email protected] events -f all -O csvh 28d854ae-ee18-11df-aa30-1231350018b1 > events.csv

Here is an R-based view of the extracted transfer events. A STARTED event represents an attempt to transfer an individual file; a SUCCEEDED event indicates that a file was successfully transferred. An UNKNOWN event indicates that a fault was detected. Faults eventually trigger new start events, courtesy of Globus Online's automatic retry mechanism:

[R.app GUI 1.34 (5589) x86_64-apple-darwin9.8.0]
eventsFile <- read.table("~/events.csv", sep=",", header=TRUE)
attach(eventsFile)
summary(code)

STARTED SUCCEEDED UNKNOWN 10010 10000 10
plot(as.numeric(etime) ~ code,yaxt="n",ylab="time",xlab="event type",col=rainbow(3),main="1 user transferring 10,000 100MB files from ALCF to NERSC\nEvent times")
axis(2,at=min(etime),label="t=0")
plot(code,col=rainbow(3),ylab="count",main="1 user transferring 10,000 100MB files from ALCF to NERSC\n20,020 events recorded",log="y")

Once your Globus Online task has finished an email will be sent to the address specified in your profile. Here is an example transfer completion notification:

Subject: Task 28d854ae-ee18-11df-aa30-1231350018b1: SUCCEEDED From: "Globus Online Notification" <[email protected]> To: [email protected] === Task Details === Task ID : 28d854ae-ee18-11df-aa30-1231350018b1 Task Type : TRANSFER Parent Task ID : n/a Status : SUCCEEDED Request Time : 2010-11-12 04:48:57Z Deadline : 2010-11-13 04:48:57Z Completion Time : 2010-11-12 05:51:08Z Total Tasks : 10000 Tasks Successful : 10000 Tasks Expired : 0 Tasks Canceled : 0 Tasks Failed : 0 Tasks Pending : 0 Tasks Retrying : 0 Command : transfer (+10000 input lines) Files : 10000 Directories : 0 Bytes Transferred: 1048576000000 MBits/sec : 2248.957

Cancel

The cancel command enables you to kill pending transfers for a given task. Files already copied by Globus Online are unaffected by cancel. Information about the state of each file can be extracted with details (SUCCEEDED files were transferred prior to the cancel):

date -u; ssh [email protected] cancel 639bb59a-bccc-11df-b9bf-1231391536db
Canceling task '639bb59a-bccc-11df-b9bf-1231391536db'.... OK

ssh [email protected] details -t -f status,src_file -O csv 639bb59a-bccc-11df-b9bf-1231391536db | grep SUCCEEDED
SUCCEEDED,/intrepid-fs0/users/childers/persistent/datasrc/sdata/10Kfiles100M/cf8-165 SUCCEEDED,/intrepid-fs0/users/childers/persistent/datasrc/sdata/10Kfiles100M/cf0-140 SUCCEEDED,/intrepid-fs0/users/childers/persistent/datasrc/sdata/10Kfiles100M/cf7-192 ...
ssh [email protected] details -t -f status,src_file -O csv 639bb59a-bccc-11df-b9bf-1231391536db | grep FAILED
FAILED,/intrepid-fs0/users/childers/persistent/datasrc/sdata/10Kfiles100M/cf1-419 FAILED,/intrepid-fs0/users/childers/persistent/datasrc/sdata/10Kfiles100M/cf8-418 FAILED,/intrepid-fs0/users/childers/persistent/datasrc/sdata/10Kfiles100M/cf8-212 ...

For More Information

Please send all questions to . We are happy to be of service!