Thoughts On Tape: LTO Drives

After some hard drive failures I decided to up my backup game, and ended up buying two LTO-5 tape drives.

Table of Contents

LTO 5 is the 5th generation of Linear Tape Open, these drives store 1.5 TB per tape (that is around 1300 GB as counted by Windows). Drives of this type are very common for big business applications, and generally extremely expensive.

At time of writing LTO-9 is starting to enter the market, which is 4 more than 5.

Prices for older generations drop somewhat, but even there you need to watch out for good deals. 

LTO is a linear tape format, using something called linear serpentine recording (not the old school helical scan used by e.g. VHS tapes and similar formats). Basically the tape is linearly written using a "comb" of read/write heads, the actual magnetized track is quite small, so additional tracks can be written in between the previous ones. This allows for very high data density.

Speeds are around 140-150 MB⁄s for uncompressed data.

Note that the drives require cleaning occasionally, cleaning is done using a special cleaning tape. I don't think the cleaning tapes have changed much, but I have a LTO-6 capable cleaning tape, apparently you can use any generation cleaning tape with any LTO drive version?

One note I picked up (via repairmytapes, linked below) is that using HP tapes in IBM drives may not be wise, as apparently IBM only uses Fuji tapes and these are designed for a/capable of higher tape tension than HP uses.

This could explain some issues I had with using HP tapes in my IBM, and possibly why the same HP tapes didn't seem to have any issues in the HP drive.

Interfaces

LTO drives are basically SCSI devices, for the most part they talk SCSI commands. This is kind of old school, but still common in enterprise hardware.

In practice there are two dominant interfaces: SAS, and FibreChannel.

SAS

SAS is basically SATA, but the drives and controllers use the SCSI command set over the SATA physical interface. Often the connectors are the same, though various SFF-nnnn connector types are also used with SAS devices and controllers.

A 6 Gbit⁄s SAS controller can be bought quite easily (LSI 4/8 port controllers are readily available and cheap). Some high end motherboards use built in SAS instead of SATA as well. 

SAS controllers can talk to normal SATA drives, but a SAS drive requires a SAS controller.

FibreChannel

FibreChannel (FC) is an interface designed to implement Storage Area Networks, and is basically a dedicated storage device network interface and protocol. It also supports switched networks, but can operate as and old school ring-network or point to point as well. As you'd expect from the name, it can be transferred using both electrical cables and fiber.

LTO-5 drives seem to always use 8 Gbit⁄s interfaces, though it should be noted that link downshifting it supported so a 1-4 Gbit⁄s controller should work. As above, the FibreChannel is used to implement a SCSI command set, and so there's no real difference between a SAS or FC drive from the applications perspective.

FibreChannel is often wired for redundancy it seems, so most devices have two ports on them, but only one is required.

I don't think FC has much of a future right now, losing to modern Ethernet implementations, but it was quite popular in the 2000s and early 2010s. According to some sources it was also quite well liked by operators, apparently being significantly simpler to use and operate.

If you're trying to build a cheap 1 Gbit⁄s ethernet network, 4 Gbit⁄s SFP modules for FC are extremely cheap used and work just fine with most ethernet media-converters. The fiber type for FC is basically always a duplex OM3 or OM4 LC patch cable, though single mode could also be used.

Controllers aren't quite as numerous as SAS, but the Emulex LP12002 is a popular 8 Gbit⁄s card with two ports. This card is fairly old but still supported on Windows 10/Server 2019 at least. Make sure to also get SFP modules for 8 Gbit⁄s and LC Duplex patch cables.

I had some trouble with my first LP12002 controller, which was a combination of factors.

  • Firmware version on the card was not compatible with my Z420 computers, but "FW_Emulex_LPe1250LPe12002_2.02a1" (available from Lenovo) worked
    • I had to install the card in an older computer and upgrade it there, Broadcom/Emulex supports these cards in the current "Emulex HBA Manager"
  • Lack of knowledge about FC
    • To test a FC controller, make a loopback cable and see if it works, don't assume the device you're plugging in is working
  • A ton of bad solder joints on that controller made it quite unreliable
    • This was (temporarily in any case) fixed by some aggressive hot air reflow

Power

LTO drives have an unusual power consumption of around 4 A on the 5 V rail and 1-2 A on the 12 V rail. A standard ATX power supply obviously works, but for stand-alone it was temping to use standard industrial power supplies.

The IBM drive is quite sensitive to power supply performance, it will flag error code 2 if any issues are found. I found that using a 12 V/5 A mains supply was sufficient, but using a typical Mean Well 5 V/5 A supply was not good enough. This test was not repeated on the HP drive.

I ended up using two picoPSU 120 W models with 60 W 12 V Cincon industrial power supplies.

The picoPSU 120/150 W 5 V output can do 6 A continuous and 8 A peak for reference, and it seems this product is fairly well engineered so it likely has a fairly high bandwidth voltage regulator on that output. You can likely use a smaller picoPSU as long as it has similar 5 V performance.

Note that the 12 V output on these modules is switched but not voltage regulated.

IBM LTO-5 Full Height FC

This drive was the first to arrive, it is a full height 5.25" drive, and even included a front panel! Full height is an old school term, this means it's twice as tall as a typical CD-ROM drive (which is half height).

My drive came from a tape library, and a DIP switch block on the underside of the drive had to be set to all zeros to disable the library serial interface. Without this the drive will power on, but won't come online (i.e. activate the FC interface), instead it will wait for a command over a separate serial port to come online.

DIP switch meanings, note switch 5 must be off for standalone use

According to some information I found certain SAS models may require connection to a RS-422 serial port in order to also re-configure the firmware to standalone mode, but my variant didn't require this.

The documentation is hard to find from IBM, and it requires a login, but the documentation for the Dell PowerVault LTO5-140 contains a lot of the required information for the drive itself.

The test and configuration software is the IBM Tape Drive Diagnostic Tool (ITDT), which has a GUI and CLI version. IBM wants a login to get at this, but Lenovo and Dell have downloads available for slightly older versions.

My drive had firmware version G360, which does appear to be the latest (only) version for a FC drive.

The built in Windows 10 driver worked fine, the IBM driver didn't work for me.

This drive uses two SFP+ sockets with modules made by Finisar.

Heat

This drive needs a lot of power, 5 V/4 A, and 12 V/2 A. It gets hot on the bottom where all the processors are. Sitting on a flat surface in operation the drive will overheat without forced airflow (error code 1).

I ended up packing the drive into a 2U chassis where I initially used a high performance (loud) 60 mm Delta fan to force air through the drive (by blocking all other exits). I later switched to a lower pressure fan and this worked acceptably.

I don't think putting this drive into a normal desktop computer will work all that well, since it likely won't have sufficient cooling without excessive fan noise. Putting it in a separate enclosure works better since it can be powered off when not in use.

HP Half Height FC Module

This module was obviously from a tape library, and came assembled into a hot plug caddy with dedicated fan. There is a card edge connector on the side which connects to the library system and provides power, but the drive itself operates just fine as a standalone device once removed from this caddy.

This drive only has a single FC optical module, which is not an SFP but rather a smaller soldered in module, it looks like a relatively common RJ module made by Finisar and others.

The drivers for this drive were not easily found from HPE, but Windows Update quickly located appropriate drivers. Firmware updates for the drive seems to require an HPE login to access.

The diagnostics package is called "HPE Library and Tape Tools" and was available for immediate download. This is a set of utilities for logging, diagnostics, and functional tests.

The front panel LEDs on this unit don't have light pipes on them, making them hard to see from most angles.

HP HH FC Drive Internals

Interface Card

The interface card that goes in the library was dismantled just in case it could be useful. It seems to mainly be a monitoring and power switching card, sadly this card does not contain a 5 V local regulator like I hoped. It only has load switches for 5 and 12 V, with the 5 V supplied externally.

The only useful thing on it for my use was a decent quality molex-cable and a decent quality Sanyo 40 mm fan.

Repair

My drive did work particularly well upon receipt. I'm not sure of the root cause right now, but I found that loading a tape partially worked but unloading did not.

I found two issues:

  • An optical "fork" sensor that detects the position of part of the loader mechanism had a broken flex (flex-rigid assembly)
  • The head cleaning brush was stuck in front of the head, and was not possible to move

Now upon first opening the drive the flex to the sensor seemed to be attached, but it quickly fell off. This flex attachment is not well designed with no strain relief, and it moves quite a lot every time the drive is loaded. This could well explain the inability to unload, since it senses the movement of a metal plate assembly that moves when the tape guide pin is released/grabbed.

Repair of the flex was possible by using a glass-fiber brush "pencil" to scrape off the solder mask to expose the copper on both the sensor PCB and the flex (scalpel also works). I then aligned these with Koptan tape and soldered tiny wires (individual wire chords from a thin coaxial cable) to bridge the contacts. I then applied more off-brand Koptan tape to support the break.

The second issue is more perplexing, the head cleaning brush in this design lives near the back of the drive, but in this case it was stuck right on top of the head assembly. I was eventually (through undoing some plastic clips and some bending of plastics) able to force it closer to the front of the drive where I could yank it out. After removal the drive loaded and read tapes successfully. Unfortunately taking pictures of this stuff is nearly impossible, even seeing black plastic on black plastic with my own eyes required very careful flashlight angling.

I eventually realised that the brush assembly is pushed around by a piece of plastic that also pulls the tape through the drive. This piece can then through some combination of motor forces be pushed over the heads and back if desired. The brush rides in a slot with two guide pins (between the tape and heads), critically these pins must go through two movable arms, one of which is black and linked to the rear tape roller (it passes under the head assembly). With careful lighting and a metal long pick I was able to move this arm into the right place and insert the brush into the two holes near the front of the drive.

Manually loading the tape (through turning the gear sets and take-up reel manually) the brush was pushed past the head and all the way to the back of the drive where it's supposed to be. There is also a little tape pushing rod next to the head which is actuated by a spring loaded metal arm (visible on the top of the drive), this had a tendency to come loose during the above operations, if this mechanism doesn't work then the head alignment fails during tape loading. The heads will just jump up and down for a bit and then spit out the tape. This mechanism should move freely when you push on the metal piece on top.

As for why this brush got stuck, not sure, it was dead stuck when I got it but after removal and reinstallation it moved just fine. My guess for now is that the flex I broke already had fatigue damage and may have failed but not fallen off entirely yet, causing the tape loading system to get confused and operate incorrectly, leading to the stuck brush. 

For pictures:

The mechanism for the LTO 5 variant is fairly similar to other HP generations, this Ultrium 460 disassembly shows a very similar mechanism: https://stephane.lesimple.fr/blog/repairing-a-faulty-hp-storageworks-ultrium-460-tape-drive/

This Ultrium 448 disassembly also shows a basically identical mechanism from an older generation, the brush assembly is visible in picture 5 and the broken flex-rigid assembly in picture 6 (flex going off to the right): http://crecimiento-sostenible.blogspot.com/2015/03/repairing-tape-backup-lto-2-unit-hp.html

Pictures I found of LTO 4 and 6 mechanism also look very similar, it seems the LTO mechanical design is fairly mature at this point and it's mainly electronics, tape, and tape heads that improve each generation.

There's also some videos of repairs here: https://www.youtube.com/c/repairmytapedrive

Software

On Windows there is no built in tape drive management, it's all down to third parties. When the drive is recognised it should be listed in the Device Manager.

You can basically either use a dedicated backup software, or LTFS.

For doing scheduled backups I will likely continue to use dedicated software noted below, but LTFS is kind of neat.

LTFS

LTFS is a linear tape file system, using e.g. HPE StoreOpen's LTFS implementation you can format and mount a tape like any other file system.

This makes it possible to access data directly off the tape instead of performing a batch backup/restore. In practice this can be made to work for e.g. movies where a single large file is accessed linearly, but obviously the random access time is measured in tens of seconds. Files are stored on tape in the order they were written (duh) so multi-file sequential access can also work.

I note that watching Blu-Ray quality movies directly off tape worked quite well, bandwidth is not an issue as long as the buffer sizes are sufficient to allow tape spin-up.

LTFS Configuration GUI allows tape formatting and drive mapping

StoreOpen is immediately available upon creating an HPE account, and clicking away about 20 different "please take our fucking survey" and "check out the new features of our enterprise grade download site" popups. Slightly older versions such as LTFS version 3.4.2 can be found elsewhere as well.

It may be practical to use essentially arbitrary backup software and having it write to an LTFS formatted tape, though I haven't tried this myself.

Note that for IBM drive you'll need their LTFS implementation, it is free upon registration. Unfortunately it is highly determined to not let you install it, and the current hurdle is the fact that the current IBM drivers refuse to install on consumer versions of Windows, and the LTFS software requires this. They call it "IBM Spectrum Archive Library Edition" and I call it useless. Their Python check is also broken.

I used InstEdit to remove the prerequisites then launched the install from InstEdit, this worked for installing it. Still didn't work though, now the services failed to start.

It'll probably work on Linux.

The open-source LTFS works great on Linux, it even works on RISC-V!

Backup Solutions

I found two promising software packages, easeUS Todo Backup (Workstation version), and Z-Backup. I didn't try Z-Backup much but it did seem to work.

easeUS Todo Backup comes in two versions, the consumer and enterprise version, their sites are a bit unclear on this but it looks like the consumer version does not support tape. The Enterprise version is not too expensive at ~60 USD for two years if you get the Workstation SKU. This version can't be installed on a server OS, but this is acceptable for me.

Todo Backup does support backing up from a network share without any major fuzz, so this was a good solution for my setup.

Performance was perhaps not optimal, but reliability seemed good.  Since the backup works with 2 GB chunks that are buffered to disk, then written to tape, it ends up taking a lot longer to do backups than you'd expect simply streaming data to tape. A temp directory is used for chunk storage, and this can't be changed as far as I can tell, so if it puts this on the drive you're backing up from then the I/O can take a while.

The same is true for recovery, it first creates the files in the recovery target (as sparse files), then starts copying chunks to the same temp directory while also moving them to the recovery target.

For both backup and recovery it is best to ensure that the temp folder is on a separate (and fast) volume to the backup source or recovery target. The temp drive should really be quite fast to avoid stalling the tape, a modern 16/18 TB drive can do around 270 MB/s which is fine, but older 4 TB drives may struggle to reliably reach the ~150 MB⁄s stream speed of the tape.

It is also useful to tell your antivirus software not to scan the recovery target and tape temp directories, this sped up things quite a bit.

All in all I think Todo Backup is a serviceable solution for home tape backup, though I wish it had more adjustment for e.g. the tape temp directory (using a medium size SSD for this would be great). I found the progress bar for tape backup was quite inaccurate, only running up to ~30% before finishing.

For that matter, a modern computer could easily hold a few 2 GB chunks in RAM, so there shouldn't be a need to write these to disk at all?

On recovery of a bad tape it offers no real solution to e.g. skip a bad chunk and recover the test, which is unfortunate. I guess this functionality might be part of their data recovery products? (which cost more $$).

Recovery of a ~1 TB data set from a good tape with this software took around 4 hours for reference. This was with a high speed drive for temp storage and a conventional 4 TB drive as the recovery target.

I have not yet tried to do a multi-tape backup with this software. In principle it should work but may be annoying in practice.  Multi-tape backup is not supported, if the software gets to the end of the tape before finishing the backup it simply fails. At least it leaves the files in tact so most of a partial backup should in principle be recoverable.

Note that due to compression, it is not always possible to predict if a volume will fit onto a given tape. Fortunately, it only takes 8-10 hours to reach the failure point :)

A Warning

EaseUS Todo Backup does offer a decent solution, but be aware that the Tape Manager utility is very self centered. Upon opening, it immediately commands all tape drive to perform certain operations, regardless of what they happened to be doing.

This is probably fine if EaseUS is the only tape application on the system, however:

If an LTFS disk is being written to when EaseUS Tape Manager is opened it will force an unclean unmount, which will likely lead to data loss. Before opening all writable LTFS volumes should be unmounted.

This is a pretty serious issue, and it's unfortunate that no OS level interlocks prevent this from happening. It's also a very poor decision on EaseUS's part.

Tape Reliability

The tapes are supposed to be quite reliable. I had some issues with HP branded tapes, but as I mentioned initially this was in an IBM drive. The same tapes seemed completely fine in the HP branded drive.

All the NOS Fuji tapes I've tried have worked in the IBM drive, and have not been tested in the HP drive yet.

In any case, it is crucially important to actually check that your backups work, so I always try to recover the entire tape contents after writing them.