Publishers of technology books, eBooks, and videos for creative people

Home > Articles > Digital Photography

  • Print
  • + Share This
This chapter is from the book

Exploring Storage Systems

A storage system consists of a collection of parts used to provide certain behaviors or attributes desired for data storage. This could include external enclosures for bare, uncased hard drives or RAID (Redundant Array of Inexpensive/Independent Disks) chassis to provide protected or high-performance data storage (see the “RAID Devices” section later in this chapter). Let’s look at some common storage systems appropriate for your creative workflow.

External Drive Enclosures

Once you get beyond the humble hard drive and memory card, the external drive enclosure is one of the most popular storage devices available. It can also be one of the greatest causes of potential data loss. An external drive enclosure is simply a box that holds one or more standard hard drives and provides some type of connection to a computer.

A manufacturer might preassemble the enclosure, or you can assemble one yourself by purchasing the drive and enclosure separately. The second approach is beneficial because you can continually swap out hard drives as capacities increase or even just to access data that is stored on an uncased drive.

External hard drives come in two general sizes, desktop and portable. Desktop units are designed to stay in place and often use an external power source, such as a wall brick (power supply) or common AC cord. They use a 3.5-inch hard drive internally.

The portable, also called mobile, drives use 2.5-inch drives commonly found in laptops. They can be powered by a USB or FireWire connection (see “Enclosure connection types” next) and don’t need an external power supply.

Enclosure connection types

Enclosures can connect to a computer in a number of ways:

Some enclosures offer multiple connection types, such as a combination of USB and FireWire. Some even have USB and FireWire and eSATA, but note that you can only use one type of connection at a time. Still, having multiple choices is useful because you might share that enclosure across multiple machines. Macintoshes usually use FireWire connections, but Windows PCs typically don’t, so having the USB interface is handy.

Each of the connections discussed has different properties and speeds. USB is the most common, but until the USB 3 spec was introduced, USB was quite a bit slower to move data than were the FireWire choices. You should use the fastest interface available to save time when moving your data.

Reported speeds for each connection type are theoretical, and the transfer speed will probably be slower, sometimes much slower, in real life. All of the speeds listed in the following table reflect megabits per second (Mbps). Typically, speed is described in megabytes per second (MBs). not megabits per second (Mbs); for example, the file is 100 MB. So, by dividing FireWire 800’s potential speed of 800 Mbs by 8 (8 bits in a byte), the result is a theoretical transfer speed of 100 MB per second, or 6000 MB per minute. In reality you won’t see a 6 GB (6 GB = 6000 MB) per minute transfer but more likely will see a 2 GB per minute transfer. If you get more, consider it a bonus!

Connection Comparison Table


Speed in Megabits

Can Provide Power

USB 1.1



FireWire 400



USB 2.0



FireWire 800






USB 3.0



Other considerations besides the theoretical transfer speeds are also important. The USB 2.0 potential speed spec is higher than FireWire 400; yet in most situations FireWire 400 has a faster real-life transfer speed because USB relies on the computer’s CPU to manage the data flow, whereas FireWire has its own processor. This is less of an issue with today’s hyperperformance computers, but is an issue nonetheless.


Some types of drive connections can provide power to a connected device. Both USB and FireWire can provide power to an external hard drive enclosure so you don’t need to plug in an external power supply to the drive. This is convenient, but the connection might be limited as to the amount of power it can supply. Also, the 4-pin version of the FireWire plug doesn’t carry power because it is typically found on older camcorders and was only used to move data to a computer.

Port Count

Computers have a limited number of ports for connecting to USB or FireWire devices. If you use USB drives and run out of ports on your computer, you can connect multiple USB devices to a USB hub, a device that allows a single USB port to be shared by many devices. If you use a hub, be aware that it will impact speed.

To increase the number of connected FireWire devices, you can daisy-chain them one to the next. But if one of the devices is FireWire 800 and it is plugged into a FireWire 400 device, the speed will be reduced to the 400 spec. If you need the best performance possible, connect one drive at a time to a single computer FireWire port.

External SATA

eSATA is different from the other connections. Think of it as simply extending the connection from the hard drive to the computer because that is exactly what it is. Unlike USB and FireWire, the data path isn’t converted into a different form. Because it is a direct connection, you need one eSATA connection per hard drive unless you use a port multiplier adapter. This adapter allows the connection of multiple hard drives to a single port.

Most of the time you will find eSATA supplied on the computer via a SATA controller card. The controller card plugs into an expansion slot in the computer. The SATA controller needs to be designed to work with a port multiplier, so if you are considering such a solution, make sure that all your components are compatible. Many of the external drive chassis that hold SATA drives use port multiplier adapters internally, so check with the manufacturer of the chassis for recommended SATA controllers.

Multiple hard drives

You will find enclosures on the market that have spaces for multiple hard drives or may already have drives installed. Usually, they are set up to take advantage of a multiple drive arrangement implementing some level of RAID (see “RAID Devices” later in the chapter). But the arrangements might also be there to offer greater storage than what a single disk could offer, or it could provide some security from a failed drive. You need to know if your enclosure or premade external drive uses multiple hard drives.

RAID Devices

The concept of RAID goes back to a time when capacity and speed of individual hard drives was at a premium. By combining individual drives into a larger grouping, you can increase the overall attributes—capacity, speed, and redundancy—of your storage system.

You might find RAID in a computer, an external FireWire or USB enclosure, or a big chassis that holds anywhere from 4 to 42 hard drives. RAID devices can connect to a computer via USB, FireWire, SCSI, iSCSI (SCSI over Ethernet), SAS, or Fibre Channel (another connection type; see the sidebar “Fibre Channel Primer”). Many RAID solutions are on the market today.

By definition you need at least two drives to make a RAID array. An array is simply a collection of drives made to work together. The arrangement of drives defines the RAID “level” and what its behavior will be. Let’s look at the common levels.


RAID 0 is the most dangerous level and unfortunately the most common I’ve found in the creative world. Not that it doesn’t have its purposes, but you need to understand its behavior. With RAID 0, two or more physical drives are combined into an entity that the computer recognizes as a single device. This can be advantageous because the total capacity of the drives is available to you, plus the performance increases, leading to shorter data transfer times. You might think this is ideal, but there is a downside.

When data is sent to the RAID 0 array, it is split across all the hard drives in the array. This is also called a “striped” array. This provides good performance because it takes less time to move all your data onto an individual drive within the array, since a drive has to deal with only its share of the data, not all of it. However, here is the gotcha: If any of the physical hard drives in the RAID 0 array fail, all your data is lost and recovery is neither simple nor cheap, if in fact it is at all possible.

One of the most common RAID devices you will come across and you may not know it is the external drive that advertises speed or capacity. It typically contains two or more drives in a RAID 0 configuration. You may have a number of these devices and not be aware of the potential for data loss.

So when would RAID 0 be OK to use? When you need the performance and don’t need to worry about data loss. For example, RAID 0 could work as a scratch disk for Photoshop or a capture volume for video. If the scratch disk fails, it’s no big deal; you just rebuild it. With a capture volume, you get the video and move it onto other storage device. If it were to fail during the capture, just fix it and recapture. Another situation in which RAID 0 is OK to use is when it is used in combination with other RAID levels, such as RAID 10 and 50, as discussed later in this section.

Looking at the cost/capacity value proposition, RAID 0 looks enticing, but you need to keep the potential data loss issue in mind.

To determine the capacity of a RAID 0 array, simply add the capacities of all the drives together, as in the following example of an array composed of two drives of 1 TB capacity:

1 TB drive + 1 TB drive = 2 TB volume


RAID 1 is the opposite of RAID 0. It is composed of two hard drives that are “mirrored,” meaning that the data is written to both drives at the same time. This results in exact copies of data on both drives. You don’t gain performance, but in the case of a bad hard drive, the data remains safe on the other drive. Don’t consider this a backup; it’s just protection against a dead drive. Also, if you write bad data to the array, it will dutifully be written to both drives.

If you look at the cost/capacity value proposition, RAID 1 doesn’t fare very well. You get only half of the total available storage, but you get redundancy in your storage. The following example shows the total volume of two drives of 1 TB capacity:

1 TB drive + 1 TB drive = 1 TB volume


RAID 5 provides some of the advantages of both RAID 0 and RAID 1. You get protection from a failed drive and additional capacity.

You need a minimum of three physical drives to achieve this level. Data is written across all the drives in the array, similar to RAID 0, but parity data, information that allows the RAID controller to rebuild the data that existed on a failed drive once the drive has been replaced, is spread across all the drives too. The array is still usable and running during the failure as well as during the rebuilding process once the failed drive is replaced.

The capacity of the RAID 5 array is determined with this formula:

capacity of single drive * (number of drives – 1) = total capacity

For example, let’s say you have five 1 TB drives in a RAID 5-capable enclosure. The total capacity the resulting volume will have is

1 TB * (5-1) = 4 TB

That isn’t a bad trade-off between space and capacity. A RAID 5 array has a maximum number of drives, but that is determined by the manufacturer of the RAID controller. The limit can range from 7 to 15 drives, so check with the manufacturer for best practice.

So this is the perfect RAID level, right? Like anything else, there are trade-offs. When a drive fails in the array, you must replace it and allow the array to rebuild. The data remains available during this time, but if a second drive were to fail, you would have the same situation as a RAID 0 and lose your data. If the drive fails Friday night at the studio and no one knows about it until Monday morning when someone is able to swap drives, days have gone by with your data at risk.

A potential solution is to configure the array with a hot spare, which is another drive available to be automatically called into duty to replace the failed drive. This reduces the amount of time your precious data is vulnerable. It does change the capacity formula a bit to this:

capacity of single drive * (number of drives – 2) = total capacity

For example, let’s say you have five 1 TB drives in a RAID 5-capable enclosure including one hot spare. The total capacity the resulting volume will have is

1 TB * (5-2) = 3 TB

Still, the value might be worth it. Another concern is the length of time it takes for the actual rebuilding to take place once the spare is put in play. On arrays built from small drives, say less than 500 GB, it might only take hours to rebuild. But if you have 2 TB drives, your rebuild times could be measured in days. This could be risky because it increases the time of exposure to another drive failure and resulting data loss.

The array also faces additional dangers during the actual rebuilding phase. All the existing drives get exercised more as the parity data is read from them to build the replacement drive. Most likely, the rest of the drives will be the same age as the bad one and might be at the end of their lives as well. The additional stress of the rebuild can destroy additional drives. This is a good reason not to rely on old hard drives and to have good backups!


To help alleviate some of the problems with RAID 5 during rebuilding, RAID 6 was introduced. This level is very similar to RAID 5, but it can endure the simultaneous loss of two drives in the array. Even if a second drive fails during a rebuild, the data will survive. Granted, if a third drives fails in a large array, you will experience data loss. But RAID 6 is a better choice than RAID 5 for arrays constructed with drives larger than 1 TB.

The capacity of the RAID 6 is determined using this formula:

capacity of single drive * (number of drives – 2) = total capacity

For example, let’s say you have five 1 TB drives in a RAID 6 capable enclosure. The total capacity the resulting volume will have is

1 TB * (5-2) = 3 TB

If you use one of the drives as a hot spare, the total capacity the resulting volume will have is

1 TB * (5-3) = 2 TB

As you can see, it is best to use RAID 6 for arrays with a large number of individual drives.

RAID 10, 50, and 60

So what happens when you need more capacity for your workflow than what you can get with the preceding RAID levels? Well, you can start mixing and matching to get the results you want.

RAID 10, 50, and 60 are multiples of RAID 1 (mirror), RAID 5 (striped with single parity), or RAID 6 (striped with double parity) set up as a RAID 0 (striped with no parity).

So in RAID 10 there are two mirror pairs striped together to get:

{(1 TB + 1 TB) = 1 TB} + {(1 TB + 1 TB) = 1 TB} = 2 TB

This level gives you more contiguous capacity than RAID 1 can offer but with the latter’s level of protection.

RAID 50 would look like this:

{1TB * (5-1) = 4TB} + {1TB * (5-1) = 4TB} = 8 TB

RAID 60 would look like this:

{1TB * (5-2) = 3TB} + {1TB * (5-2) = 3TB} = 6 TB

Other RAID-like offerings

I would be remiss not to discuss the Drobo. The Drobo product line from Data Robotics offers convenient products that allow you to slide just about any drive into the Drobo enclosures, building a storage solution based on the number and capacity of the installed drives. Data Robotics takes the thought process out of storage for the end user. Put simply, you can take a number of drives and install them, and when you run out of space, pull out the old smaller drives and put in new larger drives without having to configure much of anything or put much thought into the process. Your data stays secure, and as long as you change only one drive at a time, you don’t have to pull off your data before expanding.

How does this work? Drobo uses a proprietary system called Beyond RAID, and Data Robotics doesn’t release much in terms of specifics. Beyond RAID appears to virtualize the file system over the physical storage so it can change either independently. This is clever but not very transparent. What are the downsides? Beyond RAID is a proprietary technology, so you are tied to the Data Robotics hardware. It may also cause some issues if something does go wrong and you need to recover data from the disks contained in the Drobo device because the standard data recover tools may not work. That said, most of the commercial data recovery services claim they can work with Drobo systems. At the time of this writing, the maximum volume size on a Drobo is 16 TB, but I expect that will change over time.

High-capacity RAID storage

Beyond the small desktop storage devices is a class of device that offers high-capacity and professional-level hardware. Most of the desktop devices are designed for ease of use, have small footprints both physically and electrically, and offer low noise and heat output. But in the realm of the high-capacity devices, you trade off those aspects for large amounts of reliable storage.

In this genre of device, the chassis is designed to mount in a computer rack, holds anywhere from 14 to 42 drives, and provides its own internal RAID controllers. You will find drive chassis on the market that look similar to the devices I am describing, but if they don’t have their own RAID controllers, they are not in the same class.

A wide range of manufacturers, including Active Storage, Promise, and Nexsan, produce these devices. Most of the computer manufacturers offer their own devices as well.

So why consider these high-level devices? If you need a large bucket of storage, you can be assured that the manufacturers have done their homework to ensure that all the pieces work together properly. They have tested which hard drives work with their controllers and will back up their products with warranties and service contracts.

Most of these products come with dual power supplies, redundant cooling modules, and available redundant RAID controllers. They offer a choice of connection methods such as Fibre Channel, SCSI, or iSCSI. Management is done via a special application that runs on your computer or via a Web browser. They also offer email problem notifications and monitoring.

DIY RAID storage

It is tempting to build your own storage from the large selection of components on the market. The pricing is appealing, and the challenge to get everything working together has been reduced by improved tools and knowledge. Usually, a quick Web search will result in everything you need to build a storage device.

Homemade storage devices are usually constructed of some drive chassis and a RAID controller card. The controller card mounts in the computer via the PCI slots and commonly connects to the drive chassis via eSATA, SAS, or a style of cable that contains multiple links called Multilane. Inside the drive chassis anywhere from 4 to 16 drives reside connected to a SATA port multiplier card. Normally, you need one SATA connection per drive, but with the multiplier card you can connect up to five internal drives with one external cable.

Although DIY systems are attractive from a price perspective, it helps to understand some of the potential gotchas that come with building your own system. Because you are typically buying pieces from different manufacturers, there is no guarantee that they will all work together properly. You may find that you have to track down newer versions of firmware to get the RAID controller card to work properly with the specific computer OS you are running. There will be no redundancy in the RAID controller, so if the card fails, you lose your storage until it is replaced. Not all RAID controllers offer monitoring and reporting via email, so you may not be aware of a developing problem.

Many of the large drive chassis offer redundant power supplies, but the computer that is driving all of this probably won’t, unless you are using a server-class machine. This should cause additional concern because data being written to the drives goes through the RAID controller first and is temporarily stored in memory called a cache before the data is written to the hard drives (if the card offers caching). The cache should be protected by a cache battery, but if there isn’t one, a power failure can result in corrupted data.

iSCSI-connected RAID devices

A growing number of devices on the market are offering iSCSI as a connection type. If you read the marketing material, iSCSI looks very appealing, but there are a few details you need to know about it.

In a nutshell, iSCSI wraps the SCSI data protocol in Ethernet. This means you can send SCSI-based data transfers over your network at network speeds. Although this sounds great, you have to realize its limitations too. You’ll be limited to the speed of your network, and iSCSI can take up a lot of bandwidth, impacting other traffic on the network.

Also, iSCSI isn’t a file sharing protocol used to connect one or more computers to a central repository or “sharepoint” that stores files for multiple users to access. It is a point-to-point method of data transfer, meaning that one and only one device can connect to the resource being hosted by the iSCSI device. As an analogy, you can think of iSCSI as a direct hard drive connection that travels over the Ethernet network; all other computers are barred from using that hard drive.

In large implementations of iSCSI-connected devices, separate Ethernet connections are dedicated to iSCSI so as not to share bandwidth with regular network traffic.

Another factor to be aware of is a lack of native iSCSI support in Mac OS X. If you want to connect to an iSCSI device from a Mac OS X computer, you need to download and install iSCSI software, such as GlobalSAN from -Studio Network Solutions or Xtend SAN from ATTO Technology.


A server provides file, email, print, or more services to clients. With file services you can provide multiple users network access to centrally stored data in a controlled manner.

It is easy to get caught up in the hardware related to servers, but let’s first look at the functionality of the server.

Network access

The server provides a way for other computers to talk with it across a network. A network is a group of computers that can physically communicate with one another via wires or wirelessly. The data is transferred over the physical network using a file protocol.

You can relate one computer connecting to another computer to thinking of a telephone conversation described as layers. The lower layer is one user connecting to another via the phone system. The upper layer is the two talking with one another via a common language like English or French. In computer terms, the lower layer is the industry-standard networking protocol, TCP/IP, which allows the computers to connect to the network, whereas the upper layer is the language the two computers use to talk to each other, the file transfer protocol.

The file transfer protocol used by Apple OS X is Apple File Protocol (AFP), but only OS X machines can use it. Server Message Block/Common Internet File System (SMB/CIFS) is typically used by Windows machines, but OS X machines can use it too. It is probably the most universal protocol available today and is a great choice if you have a mix of computer types in your environment. Other protocols are also available, such as File Transfer Protocol (FTP), Web Distributed Authoring and Versioning (WebDAV), and Network File System (NFS). FTP and WebDAV are typically used to move data across the Internet. NFS is similar in use to AFP and SMB/CIFS but is typically found in Unix implementations.

If you have a platform-standardized environment, choose the protocol native to your machines.

Central storage

A server must have access to or provide its own storage for the files it will serve to clients. This could be anything from an internal hard drive to a huge RAID array connected via Fibre Channel.

Regardless of the type of storage, a server’s storage must be big enough to hold all the assets you need it to hold, plus have enough space for a certain amount of growth without having to change anything. It should also allow for expansion if you finally run out of space. If you have 10 TB of data you need to store and you are defining the specifications for a new server and storage system, you might want to consider buying more than 10 TB worth of storage, knowing that you will be generating more data as time goes on. Of course you don’t want to go crazy buying storage you might not need for a while; so, you might spec a storage device that will meet your short-term needs but that can be expanded later in the long term.

Another concern that isn’t as obvious is the performance of the storage. If you have many clients connecting to the same server and swapping lots of data, the storage devices might have a hard time keeping up with demand. Even though you might have a big external drive connected to the server to act as storage, if you use USB as the interconnect, which is a fairly slow connection type, a performance bottleneck could occur. Therefore, you may need to use a faster connection like FireWire or Fibre Channel.

Video production has a high-bandwidth requirement due to the size of the files and the need to have those files delivered without delay to avoid dropping frames. This needs to be accounted for when designing a server system. What might work for the smaller files associated with photos and graphics may not work for large video files.

File management

When using a server, you are able to control rights and permissions for files and directories. This ability can be important when you have certain users who should be able to access the information but others who shouldn’t.

If you want to control access via permissions, users must connect to the server with unique identities. I have seen many organizations allow totally open access to the data on their servers with no control over who can or can’t read or write data. Fortunately, this is slowly changing and becoming less popular as these organizations realize that controlling access protects their data from improper usage by the users on the network. In most cases, not everyone needs to have equal access to the files on a server.

Network Attached Storage (NAS)

A NAS device is similar to a file server in that it provides access to data centrally stored across a network, but the major difference is the scope of available services. Typically, a NAS, being a limited function device, only provides file services, whereas a full-blown server can provide many additional services. The distinction is blurred a bit with some of the new NAS devices that also provide Web services and other functions, but the major distinction remains. Some NAS devices can be configured to replicate themselves to another NAS device, providing redundancy and disaster recovery capability.

A NAS can come in many forms. It can be as simple as a normal external drive enclosure with a network port on it or as complex as a massive, multirack unit from a company like EMC. Other forms can include desktop units holding 4 or 5 drives and rack mount units with 4 to 16 drives.

To the end user, a NAS behaves very much like a server. The advantages of using a NAS include lower cost than a full server, and they are easier to set up and maintain. On the downside they tend not to have easy expansion choices, provide limited performance, and aren’t easy to back up. If you need very basic, affordable, centralized file sharing storage, consider a NAS device. If your needs are more comprehensive and include other services like remote access, wiki, and calendar, or the need for a wide range of storage options, consider a full server.


Apple’s Xsan is a storage solution consisting of servers, client workstations, Fibre Channel storage devices, and management software. Initially, Apple offered Xsan as a solution in the video editing field, but with time and newer versions, Xsan has been playing a wider role in other environments such as mail, file, and backup servers. Xsan offers fast access to a shared storage system across multiple clients. It’s as if all the connected machines have direct access to the storage disks without having to use network file connections. In fact, that is exactly what is going on.

All Xsan-connected clients use a Fibre Channel connection shared across all the Xsan storage and controllers, which is what it would be like if you could directly connect a hard drive to multiple machines at once. The Xsan software then controls access to the data on the disks so there aren’t any collisions or contention.

The storage can be easily expanded, which is a very desirable trait. The storage can also be arranged so cheaper, slower storage can be mixed with more expensive, faster storage. This is a beneficial setup that allows you to match your needs with budget restrictions all in the same storage system.

Xsan sounds great, but there has to be a catch, right? Yes there is: That catch is cost and complexity. Compared to a standard server solution, Xsan is quite a bit more costly. There needs to be at least two computers dedicated to the role of metadata controller (the traffic cop for data); a Fibre Channel switch or two to connect all the servers, storage, and computers; all the cabling needed; and a dedicated Ethernet switch and network for the metadata traffic in addition to the regular Ethernet network. And you can’t forget the actual Xsan software. On top of all that, an Xsan solution should be installed professionally and maintained on a regular schedule by an experienced Xsan consultant.

Power and Cooling Considerations

Unfortunately, the two important items often neglected when setting up a storage and server infrastructure are protected power and sufficient cooling. Keep in mind that if you are comfortable in a room, your electronics will be comfortable too. Beware that if you set up your equipment in a closet without any ventilation or poor airflow, the heat from the equipment can build up, possibly causing the hardware to fail. I’ve heard of cases where data closets have reached temperatures in excess of 120 degrees Fahrenheit. Although the closet door is usually left open, someone being helpful might just shut the door. Also, if you rely on air conditioning to cool your system, be aware that an air conditioner failure left unnoticed over a weekend could cause damage to your components.

Professional-level gear has temperature sensors and will eventually turn off to protect the hardware, but less-capable gear will just bake until failure occurs. Even if the components don’t fail right away, the exposure to temperatures in excess of what they were designed for might lessen their life span.

Clean, reliable power is vital to electronics. Invest in a properly sized uninterruptible power supply (UPS) for your gear. The UPS does more than protect against power failure. It cleans the supply of power of spikes, dips, and noise. More common than a complete power failure is the dip in voltage caused by high loads on a circuit. Anything from electric heaters, microwave ovens, and laser printers can cause the line voltage to drop below specification. When voltage drops, Ohm’s Law states that current must rise to keep providing the same total power. This additional current may not be tolerated by the electronics. Also, low voltage might cause unpredictable behavior of the electronics. Neither spikes in current nor low voltage is good. When possible, plug the UPS into its own circuit to provide the best isolation from effects from other power-consuming equipment.

If you have equipment that sports two power supplies, as some servers or storage gear might, do not plug both power supplies into one UPS. Provide two UPS units. If that isn’t possible, plug one power supply into the UPS and the other into a regular wall socket on a different circuit from the one the UPS is plugged in to. This is to prevent the UPS from failure and taking down your equipment.

Another factor to consider regarding the use of dual power supply-equipped gear is that when planning the capacity of the UPS, simply plugging in the equipment and watching the capacity gauge is not best practice. When both power supplies are working on a server or storage chassis, the load is split between the two. When one fails, the full load goes to the other power supply. If that second UPS was at its full load capacity, the UPS might shut down due to the overload. I try to avoid loading a UPS beyond 80 percent capacity.

  • + Share This
  • 🔖 Save To Your Account