Page tree
Skip to end of metadata
Go to start of metadata



Overview

For small data transfers (<100GB) we recommend sftp, scp or rsync with the warning noted below.   For large data transfers (>100GB) or transfers outside the University we recommend using Globus (GridFTP).

If you use sftp or rsync, then use filexfer.hpc.arizona.edu. It's dedicated purpose is file transfers. You will not be able to use the login nodes.

The bastion host has limited storage capacity and is not intended for file transfers.

FTP is an insecure protocol and is not supported on the HPC systems


GridFTP / Globus


GridFTP is an extension of the standard File Transfer Protocol (FTP) for high-speed, reliable, and secure data transfer. Because GridFTP provides a more reliable and high performance file transfer (compared to protocols such as SCP or rsync), it enables the transmission of very large files. GridFTP also addresses the problem of incompatibility between storage and access systems. (You can read more about the advantages of GridFTP here.)

To use GridFTP, one method that the UA has compatibility with is Globus. To use Globus, you'll first need to do a one-time setup to enable your local machine as a Globus endpoint, then you'll be able to transfer files.


Set up Globus Connect Personal endpoint:

1) Go to https://www.globus.org/ and click “Log In” in the top right corner

2) In the “Use your existing organizational login“ box, type in or find “The University of Arizona” and hit Continue

3) This will take you to Webauth, log in as normal

4) You will end up at the Globus “Transfer Files” web interface, but wait, there’s more.

5) In the bottom right corner of the “Transfer Files” web interface, click the “Get Globus Connect Personal” link

6) Type a descriptive name for your local computer into the “Display Name” box and hit “Generate Setup Key”.

7) Copy the key to your clipboard, and also just leave this page open for the moment.

8) Under the Setup Key it returned, you’ll see links to download the software. Click the appropriate software download button for your operating system.

9) Install the software as normal and launch it.

10) It will ask for your setup key, copy/paste that from the web site. It should now show up as a small “g” icon in your menu bar/system tray.


Transfer files via the Globus interface: 

1) Go to https://www.globus.org/app/transfer and log in again if you need to

2) You should see a pretty classic “commander”-style file transfer view. You’ll pick an endpoint for each side and then tell it to move files from one to the other. Click the “Endpoint” box on the left-hand side and it should pop up a search interface.

3) Click the “My Endpoints” tab and you should see an entry that matches the “Display Name” you typed in earlier. Click that. The interface will load a view of the files on your local machine.

4) On the right-hand side, click the “Endpoint” box, you should see the same search interface as before.

5) In the search box, type this: arizona#sdmz-dtn

That will take you back to the Transfer Files screen and you should see a list of your files on the HPC system in the pane on the right-hand side.

6) Browse in the left-hand pane to the file(s) you want to transfer to HPC and once you’ve selected them, you should see the arrow facing to the right at the top of the interface light up blue. Click the arrow. You’ll get a green alert box at the top of the screen that says something like “Transfer request submitted successfully. Task id: <a uuid of many letters and numbers>”. This confirms that we have asked Globus very politely to tell your computer to send some files to HPC. It will just start happening.

7) Depending on how large/how many files, it may take a bit to transfer. You can see in-progress transfers by clicking the Activity tab near the top of the screen.


sftp

The intent is that filexfer.hpc.arizona.edu is to be used for most file transfers.

sftp encrypts data before it is sent across the network.  Additional capabilities include resuming interrupted transfers, directory listings, and remote file removal.

  • Open a SSH v2 compliant terminal client and navigate do a desired working directory on your local machine.
  • sftp NetId@filexfer.hpc.arizona.edu
  • **NetId@ can be omitted if it's the same on both local and remote machines**
  • Use put or get command at the sftp> prompt for the file transfer
  • Type help at the sftp> prompt for commands and their usages

Or in a single command:


scp

scp uses Secure Shell (SSH) for data transfer and utilizes the same mechanisms for authentication, thereby ensuring the authenticity and confidentiality of the data in transit.

To:

  • Open a SSH v2 compliant terminal client and navigate to a desired working directory on your local machine.
  • To transfer files to a login node;  scp -rp filenameordirectory NetId@filexfer.hpc.arizona.edu:subdirectory
    **NetId can be omitted if it's the same on both local and remote machines**
  • The transferred file will be at the specified directory.

From:

  • From your SSH v2 compliant terminal client working directory on your local machine.
  • "scp -rp NetId@filexfer.hpc.arizona.edu:filenameordirectory ."
     ** the space folllowed by a period at the end means the destination is the current directory** 


Host$ scp file.ext filexfer.hpc.arizona.edu:
Warning: Permanently added 'filexfer.hpc.arizona.edu' (ECDSA) to the list of known hosts.
<NetId>@filexfer.hpc.arizona.edu's password: 
file.ext                       100%  289KB 289.5KB/s   00:00   

Wildcards

Wildcards can be used for multiple file transfers (e.g. all files with .dat extension):

For More Information Type:

  • man scp at the shell prompt
  • -r option is good for transferring directories and files in the directories
  • -p option is good for preserving time and mode from the original files

rsync

rsync is a fast and extraordinarily versatile file copying tool.  It synchronizes files and directories between two different locations (or servers). Rsync copies only the differences of files that have actually changed.  

An important feature of rsync not found in most similar programs/protocols is that the mirroring takes place with only one transmission in each direction. Rsync can copy or display directory contents and copy files, optionally using compression and recursion. 

You  use  rsync  in the same way you use scp. You must specify a source and a destination, one of which may be remote. 

Example1: 

rsync -avz  computer-name:src/directory-name  user@remote.host:/data/tmp --log-file=hpc-user-rsync.log  

This would recursively transfer all files from the directory src/directory-name on the  machine computer-name into the /data/tmp/directory-name directory on the local machine. The files are transferred in archive mode, which  ensures  that  symbolic  links,  devices,  attributes,  permissions, ownerships, etc. are preserved in the transfer.  Additionally, compression will be used to reduce the size of data portions of the transfer. 

Example 2: 

rsync -avz  computer-name:src/directory-name/  user@remote.host:/data/tmp --log-file=hpc-user-rsync.log 

A trailing slash on the source changes this behavior to avoid creating an additional directory level at the destination.  You can think  of  a trailing / on a source as meaning “copy the contents of this directory” as opposed to “copy the directory by  name”,  but  in  both  cases  the attributes  of the containing directory are transferred to the containing directory on the destination. 

-a   archive mode; will preserve timestamps

-v   increase verbosity

-z   compress file data during the transfer.

--log-file=FILE   log what we're doing to the specified FILE.


iRODS


Research Computing is implementing an iRODS configuration.  This resource will provide a method to move and share  datasets.  iRODS is implemented with user defined policies for the retention of data.  iRODS is integrated with other HPC resources like CyVerse.

There are two ways to use it - either by command line or using iRods on your workstation via a gui like Cyberduck.

Command line

Note that iCommands cannot be used to upload files into Data Store via URL from other sites (ftp, http, etc.).

To transfer data from an external site, you first must download the file to a local machine using wget or a similar mechanism, and then use iput to upload it to the Data Store. 


Ocelote

On Ocelote, iRods 4 is installed as a standard package to the operating system on every node and so you will not "module load irods". You will still need to "iinit" the first time (see below).

For any system using iRods 4.x

iRods 4 iinit, unlike its iRods3 counterpart, does not help you set up the environment the first time you run iinit.  You need to run create_irods_env with suitable options for the iRods host, zone, username... manually for iRods 4.  As an example, we'll set up for the UA test iRods instance, and presume you have an account there.

For this key:Enter this:
-hirods.hpc.arizona.edu
-p1247 (default)
-zAZHPC
-uyour NetId (default)
-aPAM

as in

$ create_irods_env -a PAM -h irods.hpc.arizona.edu -z AZHPC

will suffice to create an appropriate ~/.irods/irods_environment.json file to allow you to run iinit; we took the default -p 1247, -u <your NetId> in the above example by omitting -p and -u.  You only need to do this step ONE time; subsequent times you will just run iinit and it will asked for your password.   Note create_irods_env wil NOT overwrite or alter an existing ~/.irods/irods_environment.json file.

Once the  ~/.irods/irods_environment.json file is created properly, you should be able to sign in to the iRods server your selected using iinit, viz:

$ iinit	
Enter your current PAM password:	# enter your netid password here

At this point you can use other iRods commands such as icp to move files.


Legacy Clusters

Current iRods servers use version 4 of iRODS and require client version 4.  And correspondingly iRODS servers running version 3 require a module load of version 3 

/uaopt/modulefiles
irods/3.0irods/3.3.1p1 (default)

irods/3.1

irods/4.1.8
irods/3.3

After logging in, load the iRODS module.

$ module load irods/4.1.8


Commands

CommandDescription
icd

Changes working directory

ichmod

For help, enter icmod -h.

ichmod read

Grant read-only permission level for specified user to selected file or folder.

ichmod write

Grant read and write permission level for specified user to selected file or folder.

ichmod own

Grant full ownership permission level for specified user to selected file or folder

ichmod null

Remove permission level for the user to the file or folder

iexit

Log off/disconnect from the Data Store.

iget

Download file/directory from iRODS to local device

iinit

Initialize and start the connection to iRODS

ils

Lists contents of current working directory. For help, enter ils -h

ils -ALists directory permissions

imkdir

Creates new directory 

iput

Uploads file/directory from local device to iRODS

ipwd

Shows name and path of current remote folder

irm

Moves a file to the trash

irm -f

Deletes a file.

irm -r

Moves a folder to the trash.

irm -fr

 

Deletes a folder.

Examples

In the following examples:

  • my-files-to-transfer/ is the example name of the directory or folder for bulk transfers.
  • my files-to-transfer.txt is the example name for single file transfers.
  • Any filename may be used for the checkpoint-file.

Bulk files transfer

Example: BULK FILES TRANSFER

iput -P -b -r -T --retries 3 -X checkpoint-file my-files-to-transfer/

Single large file transfer

Example: SINGLE LARGE FILE TRANSFER

iput -P -T --retries 3 --lfrestart checkpoint-lf-file my-file-to-transfer.txt


Graphical / Cyberduck

Cyberduck is a free cross-platform, high-throughput and parallel data transfer open source file transfer program that supports multiple transfer protocols (FTP, SFTP, WebDAV, Cloud files, Amazon S3, etc.). It serves as an alternative to the iDrop Java applet, and has been extensively tested with large data transfers (60-70 GB). This allows users to transfer large files, depending on the user's available bandwidth and network settings.

Cyberduck versions are available for Mac OS (10.6 and higher on Intel 64-bit) and Windows (Windows XP, Windows Vista, Windows 7, or Windows 8). LINUX users should use iDrop Desktop or iCommands. Cyberduck version 4.7.1 (released July 7, 2015) and later supports the iRODS protocol.

Install or Update Cyberduck 

  • If Cyberduck is already installed, check if you need to update:
    1. Click the Cyberduck menu.
    2. Click Check for Updates.
    3. If an update is available click Install Update.
  • To install Cyberduck for your operating system for the first time:
    1. Go to the Cyberduck installation page at https://cyberduck.io/.
    2. Follow the steps for your OS (not available for LINUX users):
      • For Mac OS:
        1. Click Download Cyberduck-5.3.9.zip (or current).
        2. Move the downloaded file (either a zip file or the unzipped application file, depending on your browser) to your Applications folder. If the zip file is listed, unzip the file in your Applications folder.
          IMPORTANT: The file must be located in your Applications folder.
  • For Windows:
    1. Click Download Cyberduck-Installer-5.3.9.exe (or current).
    2. Locate the downloaded file and double click to begin installation.
    3. Go through the install process.

Configure Cyberduck for use with iRODS

  1. Click  Cyberduck app icon to open Cyberduck. 
    See the Cyberduck Preferences Help page on the Cyberduck website for more information on installation.
  2. Click this link to download the Connection Profile, which contains preconfigured settings for using Cyberduck with the UA iRODS data store.
  3. Click on Open Connection.
  4. In the first drop down field, choose "UA HPC iRODS"
  5. Create the connection:
    • The Server field contains irods1.helios.arizona.edu
    • The Port field contains 1247.
    • To create the connection with your HPC user account for login:
      1. Enter your user name in the NetId field with the prefix PAM: (ie PAM:<NetId>).
      • Verify your NetId is added to the URL field, as shown above.
      • The remaining fields are populated.

7. Click in the Transfer Files drop-down list and select Open multiple connections.

8. Close the window.







  • No labels