For small data transfers (<100GB) we recommend sftp, scp or rsync with the warning noted below. For large data transfers (>100GB) or transfers outside the University we recommend using Globus (GridFTP).
If you use sftp or rsync, then use filexfer.hpc.arizona.edu. It's dedicated purpose is file transfers. You will not be able to use the login nodes.
The bastion host has limited storage capacity and is not intended for file transfers.
FTP is an insecure protocol and is not supported on the HPC systems
GridFTP / Globus
GridFTP is an extension of the standard File Transfer Protocol (FTP) for high-speed, reliable, and secure data transfer. Because GridFTP provides a more reliable and high performance file transfer (compared to protocols such as SCP or rsync), it enables the transmission of very large files. GridFTP also addresses the problem of incompatibility between storage and access systems. (You can read more about the advantages of GridFTP here.)
To use GridFTP, one method that the UA has compatibility with is Globus. To use Globus, you'll first need to do a one-time setup to enable your local machine as a Globus endpoint, then you'll be able to transfer files.
Set up Globus Connect Personal endpoint:
1) Go to https://www.globus.org/ and click “Log In” in the top right corner
2) In the “Use your existing organizational login“ box, type in or find “The University of Arizona” and hit Continue
3) This will take you to Webauth, log in as normal
4) You will end up at the Globus “Transfer Files” web interface, but wait, there’s more.
5) In the bottom right corner of the “Transfer Files” web interface, click the “Get Globus Connect Personal” link
6) Type a descriptive name for your local computer into the “Display Name” box and hit “Generate Setup Key”.
7) Copy the key to your clipboard, and also just leave this page open for the moment.
8) Under the Setup Key it returned, you’ll see links to download the software. Click the appropriate software download button for your operating system.
9) Install the software as normal and launch it.
10) It will ask for your setup key, copy/paste that from the web site. It should now show up as a small “g” icon in your menu bar/system tray.
Transfer files via the Globus interface:
1) Go to https://www.globus.org/app/transfer and log in again if you need to
2) You should see a pretty classic “commander”-style file transfer view. You’ll pick an endpoint for each side and then tell it to move files from one to the other. Click the “Endpoint” box on the left-hand side and it should pop up a search interface.
3) Click the “My Endpoints” tab and you should see an entry that matches the “Display Name” you typed in earlier. Click that. The interface will load a view of the files on your local machine.
4) On the right-hand side, click the “Endpoint” box, you should see the same search interface as before.
5) In the search box, type this: arizona#sdmz-dtn
That will take you back to the Transfer Files screen and you should see a list of your files on the HPC system in the pane on the right-hand side.
6) Browse in the left-hand pane to the file(s) you want to transfer to HPC and once you’ve selected them, you should see the arrow facing to the right at the top of the interface light up blue. Click the arrow. You’ll get a green alert box at the top of the screen that says something like “Transfer request submitted successfully. Task id: <a uuid of many letters and numbers>”. This confirms that we have asked Globus very politely to tell your computer to send some files to HPC. It will just start happening.
7) Depending on how large/how many files, it may take a bit to transfer. You can see in-progress transfers by clicking the Activity tab near the top of the screen.
The intent is that filexfer.hpc.arizona.edu is to be used for most file transfers.
sftp encrypts data before it is sent across the network. Additional capabilities include resuming interrupted transfers, directory listings, and remote file removal.
- Open a SSH v2 compliant terminal client and navigate do a desired working directory on your local machine.
- sftp NetId@filexfer.hpc.arizona.edu
- **NetId@ can be omitted if it's the same on both local and remote machines**
- Use put or get command at the sftp> prompt for the file transfer
- Type help at the sftp> prompt for commands and their usages
Or in a single command:
scp uses Secure Shell (SSH) for data transfer and utilizes the same mechanisms for authentication, thereby ensuring the authenticity and confidentiality of the data in transit.
- Open a SSH v2 compliant terminal client and navigate to a desired working directory on your local machine.
- To transfer files to a login node; scp -rp filenameordirectory NetId@filexfer.hpc.arizona.edu:subdirectory
**NetId can be omitted if it's the same on both local and remote machines**
- The transferred file will be at the specified directory.
- From your SSH v2 compliant terminal client working directory on your local machine.
- "scp -rp NetId@filexfer.hpc.arizona.edu:filenameordirectory ."
** the space folllowed by a period at the end means the destination is the current directory**
Wildcards can be used for multiple file transfers (e.g. all files with .dat extension):
- scp NetId@filexfer.hpc.arizona.edu: subdirectory /\*. dat (Note: the backslash " \ " preceding *)
For More Information Type:
- man scp at the shell prompt
- -r option is good for transferring directories and files in the directories
- -p option is good for preserving time and mode from the original files
rsync is a fast and extraordinarily versatile file copying tool. It synchronizes files and directories between two different locations (or servers). Rsync copies only the differences of files that have actually changed.
An important feature of rsync not found in most similar programs/protocols is that the mirroring takes place with only one transmission in each direction. Rsync can copy or display directory contents and copy files, optionally using compression and recursion.
You use rsync in the same way you use scp. You must specify a source and a destination, one of which may be remote.
rsync -avz computer-name:src/directory-name firstname.lastname@example.org:/data/tmp --log-file=hpc-user-rsync.log
This would recursively transfer all files from the directory src/directory-name on the machine computer-name into the /data/tmp/directory-name directory on the local machine. The files are transferred in archive mode, which ensures that symbolic links, devices, attributes, permissions, ownerships, etc. are preserved in the transfer. Additionally, compression will be used to reduce the size of data portions of the transfer.
rsync -avz computer-name:src/directory-name/ email@example.com:/data/tmp --log-file=hpc-user-rsync.log
A trailing slash on the source changes this behavior to avoid creating an additional directory level at the destination. You can think of a trailing / on a source as meaning “copy the contents of this directory” as opposed to “copy the directory by name”, but in both cases the attributes of the containing directory are transferred to the containing directory on the destination.
-a archive mode; will preserve timestamps
-v increase verbosity
-z compress file data during the transfer.
--log-file=FILE log what we're doing to the specified FILE.
Research Computing is implementing an iRODS configuration. This resource will provide a method to move and share datasets. iRODS is implemented with user defined policies for the retention of data. iRODS is integrated with other HPC resources like CyVerse.
There are two ways to use it - either by command line or using iRods on your workstation via a gui like Cyberduck.
Note that iCommands cannot be used to upload files into Data Store via URL from other sites (ftp, http, etc.).
To transfer data from an external site, you first must download the file to a local machine using wget or a similar mechanism, and then use iput to upload it to the Data Store.
On Ocelote, iRods 4 is installed as a standard package to the operating system on every node and so you will not "module load irods". You will still need to "iinit" the first time (see below).
For any system using iRods 4.x
iRods 4 iinit, unlike its iRods3 counterpart, does not help you set up the environment the first time you run iinit. You need to run create_irods_env with suitable options for the iRods host, zone, username... manually for iRods 4. As an example, we'll set up for the UA test iRods instance, and presume you have an account there.
|For this key:||Enter this:|
|-u||your NetId (default)|
will suffice to create an appropriate ~/.irods/irods_environment.json file to allow you to run iinit; we took the default -p 1247, -u <your NetId> in the above example by omitting -p and -u. You only need to do this step ONE time; subsequent times you will just run iinit and it will asked for your password. Note create_irods_env wil NOT overwrite or alter an existing ~/.irods/irods_environment.json file.
Once the ~/.irods/irods_environment.json file is created properly, you should be able to sign in to the iRods server your selected using iinit, viz:
At this point you can use other iRods commands such as icp to move files.
Current iRods servers use version 4 of iRODS and require client version 4. And correspondingly iRODS servers running version 3 require a module load of version 3
After logging in, load the iRODS module.
Changes working directory
For help, enter icmod -h.
Grant read-only permission level for specified user to selected file or folder.
Grant read and write permission level for specified user to selected file or folder.
Grant full ownership permission level for specified user to selected file or folder
Remove permission level for the user to the file or folder
Log off/disconnect from the Data Store.
Download file/directory from iRODS to local device
Initialize and start the connection to iRODS
Lists contents of current working directory. For help, enter ils -h
|ils -A||Lists directory permissions|
Creates new directory
Uploads file/directory from local device to iRODS
Shows name and path of current remote folder
Moves a file to the trash
Deletes a file.
Moves a folder to the trash.
Deletes a folder.
In the following examples:
- my-files-to-transfer/ is the example name of the directory or folder for bulk transfers.
- my files-to-transfer.txt is the example name for single file transfers.
- Any filename may be used for the checkpoint-file.
Bulk files transfer
Example: BULK FILES TRANSFER
iput -P -b -r -T --retries 3 -X checkpoint-file my-files-to-transfer/
Single large file transfer
Example: SINGLE LARGE FILE TRANSFER
iput -P -T --retries 3 --lfrestart checkpoint-lf-file my-file-to-transfer.txt
Graphical / Cyberduck
Cyberduck is a free cross-platform, high-throughput and parallel data transfer open source file transfer program that supports multiple transfer protocols (FTP, SFTP, WebDAV, Cloud files, Amazon S3, etc.). It serves as an alternative to the iDrop Java applet, and has been extensively tested with large data transfers (60-70 GB). This allows users to transfer large files, depending on the user's available bandwidth and network settings.
Cyberduck versions are available for Mac OS (10.6 and higher on Intel 64-bit) and Windows (Windows XP, Windows Vista, Windows 7, or Windows 8). LINUX users should use iDrop Desktop or iCommands. Cyberduck version 4.7.1 (released July 7, 2015) and later supports the iRODS protocol.
Install or Update Cyberduck
- If Cyberduck is already installed, check if you need to update:
- Click the Cyberduck menu.
- Click Check for Updates.
- If an update is available click Install Update.
- To install Cyberduck for your operating system for the first time:
- Go to the Cyberduck installation page at https://cyberduck.io/.
- Follow the steps for your OS (not available for LINUX users):
- For Mac OS:
- Click Download Cyberduck-5.3.9.zip (or current).
- Move the downloaded file (either a zip file or the unzipped application file, depending on your browser) to your Applications folder. If the zip file is listed, unzip the file in your Applications folder.
IMPORTANT: The file must be located in your Applications folder.
- For Mac OS:
- For Windows:
- Click Download Cyberduck-Installer-5.3.9.exe (or current).
- Locate the downloaded file and double click to begin installation.
- Go through the install process.
Configure Cyberduck for use with iRODS
- Click to open Cyberduck.
See the Cyberduck Preferences Help page on the Cyberduck website for more information on installation.
- Click this link to download the Connection Profile, which contains preconfigured settings for using Cyberduck with the UA iRODS data store.
- Click on Open Connection.
- In the first drop down field, choose "UA HPC iRODS"
- Create the connection:
- The Server field contains irods1.helios.arizona.edu
- The Port field contains 1247.
- To create the connection with your HPC user account for login:
- Enter your user name in the NetId field with the prefix PAM: (ie PAM:<NetId>).
- Verify your NetId is added to the URL field, as shown above.
- The remaining fields are populated.
7. Click in the Transfer Files drop-down list and select Open multiple connections.
8. Close the window.
- No labels