Which Configuration Is Best?
According to some, RAID 10 (1+0) on locally attached storage is the only acceptable configuration. I cannot argue that this configuration will almost always be the highest performing and most fault tolerant configuration for OpenEdge databases.
That being said... in the real world we are sometimes forced to make other choices based on corporate decisions, disaster recovery or budget concerns. To say nothing of systems we have inherited through acquisitions or simply changing jobs.
Throughout this section I am assuming you are using OpenEdge databases in multiuser mode. If you are using the OpenEdge Dataserver to connect to Oracle or SQL Server there are numerous other options available that OpenEdge does not support.
You can still use this guide to improve disk performance but you may be better served with specific advice for your exact version of Oracle or SQL Server. I plan on updating this site with Oracle and SQL Server specific information as my schedule allows.
Disk Subsystem Concepts
RAID was originally an acronym for Redundant Array of Inexpensive Disks. Modern usage has been to rename this as Redundant Array of Independent Disks.
From a real world point of view RAID simply refers to a collection of disks that are logically grouped together in different ways to either achieve performance benefits or fault tolerance.
RAID stacking refers to the practice of combining different RAID levels to negate the shortcomings of the root RAID level.
Spindles refer to physical part of a disk that supports reading and writing from disk. In the case of internal drives one disk equals one spindle. For SAN storage what the OS views as one physical disk is usually a collection of physical disk drives that are either striped or concatenated together. The term spindle is used to clarify how many physical disks are involved in the IO for file system or logical disk.
A disk subsystem should always be sized based on the number of spindles required to meet your performance goals and never on the amount of space required for your database.
More spindles means more IO operations per second (IOPS) and more disk level cache available for your database. Don't be shocked if your 300GB database performs slowly on a single 1TB drive.
Disk Partition Alignment
For Intel based systems (Windows, most Linux deployments) you need to make sure your disk partition is properly aligned. Disks with proper alignment can perform up to 30% faster than misaligned drives.
The root cause is the Intel architecture expects the first 64 sectors to contain hidden information about the disk partition. This causes misalignment of the storage with modern drives. For more details click on the appropriate link for your environment.
LUNs (Logical Unit Numbers) are unique identifiers for logical storage objects. In most cases the term is used to describe a block of storage presented from a SAN to a host as a logical disk, but it also applies to locally attached disks.
SAN storage is usually a large collection of disks, segmented into smaller RAID units. LUNs are carved out in varying sizes from the logical disks provided by those RAID units, usually by assembling a collection of smaller units.
LUN sizes and characteristics should be the same for all drives that are assigned to a specific file system. If you mix and match any of these characteristics performance will suffer.
- Disk RPM
- Physical disk or chunk size
- RAID level
- Stripe size
- Number of spindles (SAN storage)
This is especially important when dealing with SAN based storage. If you ask a SAN administrator for 1TB of space they will most likely give you whatever is convenient and that usually means concatenating various metavolumes together.
If you want performance you need to be very specific in your requests for SAN storage. Make sure all of the characteristics match and that your disks are segregated from other high volume workloads.
Striping (RAID 0)
Striping allows you to spread data out evenly between all of the available disks. Spreading the data and IO requests helps to prevent hot spots (bottlenecks on certain disks or files). If you aren't using striping it is highly likely that you have multiple hot spots that need to be addressed.
You apply striping at either the OS (software) or disk subsystem (hardware) level. Which choice is appropriate depends on your desired final RAID level (10,50,60, etc.) and the capabilities of your OS and disk subsystem.
The first step to enabling striping is creating a striping policy which will define a single logical volume or LUN. The policy is composed the following elements:
- Which physical or logical disks will participate (Stripe Set)
- The size in KB or MB that defines how data will be spread across the disks (Stripe Size)
Take for example a stripe set of 10 drives and a stripe size of 1MB. The table below shows how a file that was at least 50MB in size would be stored on those disks.
Without striping the entire contents of the file would reside on one disk; limiting the number of concurrent IO operations to the abilities of that one disk. With striping enabled you achieve massive performance gains by allowing more disks to service requests for the same resources.
In the example above you could have as many as 10 drives reading or writing to the same database extent. It is not uncommon for larger databases to have several hundred drives in a stripe set, which allows massive scaling of IO.
From a system wide perspective, striping makes sure that all of your available drives are supporting the application equally. Without striping you are very likely to have certain disks that are very busy and other disks sitting mostly idle.
Setting the stripe size properly depends on your file system block size and the underlying disk subsystem. 1MB is generally a safe setting but it will depend entirely on your environment and other RAID levels you may be stacking striping on top of.
Disk concatenation should be avoided whenever possible. In the context of disk subsystems concatenation is defined as one of the following:
- Chaining multiple LUNs together to form a file system
- Chaining multiple chunks of SAN disk together to form a LUN
The table below represents 10 disks that have been concatenated together. Disks 6 through 10 will be completely idle while the rest of the disks are forced to perform all of the IO.
|Disk||Capacity in GB||Used GB|
Quite a few times I have been asked to investigate why high end storage is performing so poorly for a specific application. Roughly half of the time concatenation is the main culprit in the performance problem. Having over 200 spindles available to your application doesn't really help if a substantial number of those disks are sitting idle.
Mirroring (RAID 1)
Mirroring is a method to implement fault tolerance to prevent data loss or system outages in the case of disk failure. In the simplest form you will have 2 disks one will be designated as the primary disk and the other will be designated as secondary. All write activity will happen in tandem on both drives so you have two consistent copies of your data. Most modern implementations support independent reads from either the primary or secondary drive.
Parity Based RAID Levels
Single Parity (RAID 5)
Parity based RAID is a mixture of standard striping and checksum (parity) stripes. The most common configuration is 5 drives that provide the storage of 3 drives (one as a hot spare and one lost to parity storage). You can add up to 8 drives in a RAID 5 stripe set and should always have a disk reserved for a hot spare. Running RAID 5 without a hot spare is tempting fate and the loss of your data.
|Disk1||Disk2||Disk3||Disk4||Disk5 (Hot Spare)|
|Stripe A1||Stripe A2||Stripe A3||Parity for A||-|
|Stripe B1||Stripe B2||Parity for B||Stripe B3||-|
|Stripe C1||Parity for C||Stripe C2||Stripe C3||-|
|Parity for D||Stripe D1||Stripe D2||Stripe D3||-|
Double Parity (RAID 6)
RAID 6 is essentially RAID 5 with the hot swap disk now being used as part of the stripe and parity. Instead of one parity stripe, RAID 6 uses two parity stripes. There are some RAID 6 implementations that also use hot spares. Consult your vendor for more information.
RAID levels 2, 3 and 4 are almost never seen in the real world today and should never be used for OpenEdge databases.
Disk subsystem vendors invent new non standard RAID levels and make dubious claims about the benefits of their version. In most cases they are parity based RAID with or without mirroring added.
Keep in mind when vendors laud their product they have a vested interest in selling you their hardware. Claims of their version being faster and safer than RAID 5/6 should be met with a healthy dose of skepticism. Claims of their version being faster and safer than RAID 10 should lead you to question their sanity.
One File System To Rule Them All
Before RAID was commonly available it made sense to have different file systems to segregate certain database extents from each other. In the modern world this no longer makes sense from a performance tuning or management perspective.
You are almost always better off striping as many disks as possible into one large logical volume/file system. Without knowing the IO characteristics of every request that could possibly occur there is no way for you to intelligently decide what a "proper" multiple file system layout would be.
Oracle has coined the term S.A.M.E. (Stripe And Mirror Everything) to describe this philosophy. Some of the largest databases in the world are running under this configuration. This Oracle PDF describes the concept in much more detail.
This concept still works when you are forced into RAID 5 storage. One of the most common issues with RAID 5 and larger systems is the inherit limit of how many disks can exist in a single RAID 5 set. Striping on top of RAID 5 (aka RAID 50) spreads the IO across multiple RAID 5 sets which reduces bottlenecks, sometimes drastically so.
You can certainly make a case for keeping the AI extents physically segregated from the database extents. This may make sense for fault tolerance but rarely for performance.
If your AI file system uses the same disks and controllers as your database extents you aren't as fault tolerant as you think. This is a quite common occurrence for SAN based storage. For real fault tolerance you need to archive the AI extents on geographically different storage.
It makes very little sense to keep the BI segregated for the vast majority of OpenEdge databases. There is just too little physical IO related to BI extents to justify taking disks away from the main stripe set.
It is a good idea to exhaust all of the options for delaying and reducing BI related IO before deciding you need dedicated spindles.
There is no valid reason I can think of that would discourage you from placing all of your database extents on a single stripe set. This will give you the highest level of performance for the entire database.
Even if you have a large transactional table at the center of your application (like tr_hist for MFG Pro) there is no sense in keeping it separate from the rest of your data. You are actually slowing down access to that table by limiting the number of spindles that can service requests.
As you improve performance for that table by fixing code, adding indexes, buffer pool tuning or storage area configuration you have no way to give those free IO cycles back to the rest of the database in a split file system layout.
Disk Adapters / Host Based Adapters
Depending on your configuration you may be using actual disk adapters or host based adapters (HBAs). While somewhat different from a technical perspective you can treat these as the same thing. Basically just slightly different ways of communicating with your disks.
You want to make sure you have a sufficient number of HBAs to support the IO requests needed for your application. To take advantage of multiple HBAs you will need to make sure some form of round robin logic is implemented by your OS.
The Great RAID Debate
There are many different RAID levels available, especially when you consider stacked RAID and vendor specific versions. This section will discuss the pros and cons of each the most common RAID levels from an OpenEdge perspective.
RAID 10 (1+0)
RAID 10 is a stacked RAID level, combining RAID 1(mirroring) for fault tolerance and RAID 0 (striping) for performance. Set up properly this will be your best performing option and also have the fewest issues when a drive fails. It should be your first choice until you are forced to pick another option.
Do not confuse RAID 0+1 with RAID 10. RAID 0+1 is striping at the base level with mirroring applied on top. This does not provide the same level of fault tolerance as RAID 10. RAID 10 is explicitly mirroring at the base level with striping applied on top.
The most common argument against RAID 10 is the cost associated with the extra drives required for mirroring. This is often based on the misconception that the extra drives "aren't being used" where in reality the disks providing the mirroring are also independently supporting read activity. It is true you end up with half of the available disk space but you get twice the read IO.
There are different options to implement RAID 10, each with different strengths and weaknesses.
RAID 10 on the disk subsystem
At first glance handling the striping and mirroring on the disk subsystem seems like the proper way to implement RAID 10. It might be correct but that depends on how the disk subsystem presents the logical disk(s) to your OS.
If your OS is presented with multiple LUNs that need to be concatenated to make a file system you are likely to have problems. As discussed in the section on concatenation above you will still be limiting how many drives can handle IO for the same database extent.
If your OS is presented with one large LUN that will mapped directly to a file system you also likely to have problems. Unless your OS has an option to completely disable the queue depth for that drive or set it to a large enough number you will be blocked by the OS pausing IO. Many operating systems have upper limits for queue depth that fall well below the number of operations a RAID 10 stripe set can handle.
If your disk subsystem only supports RAID 10 and not RAID 1 (unlikely) or the effort involved in changing the RAID level seems too high, please read the section on RAID 100 below.
RAID 1 on the disk subsystem + RAID 0 on the OS
This is probably the most common implementation of RAID 10. Disks are mirrored on the disk subsystem and an OS based logical volume manager is used to stripe the blocks across all of the drives.
This allows the OS to see multiple logical disks with separate queue depth settings, avoiding the issue of hitting an OS limit on concurrent requests. This of course means your OS must handle minimal processing to handle the striping instead of that being offloaded to a disk subsystem. It is a small price to pay for the performance gains.
Fault Tolerance Rating: A (A+ with triple mirroring)
Risk of data loss is very low in standard RAID 10 (single mirroring). Assume you have 10 drives in your array and one of those drives fails:
- 80% of your data is still mirrored (4 out of 5 mirror pairs)
- Another drive failure has a 1 in 9 chance of causing data loss (loss of the exact mirror to the original failed drive)
- Minimal performance decrease for reads (90% of capacity)
- No write performance decrease
- Low impact rebuilding the mirrored pair because it does not impact the other drives in the array
As you add more disks into the array (either by striping across more disks or triple mirroring) the chance of data loss drastically decreases as does the performance impact of a drive failure.
Performance Rating: A
Striping balances the IO across all of the disks in the stripe set. In a properly configured RAID 10 stripe set it is highly unlikely that a single disk or subset of disks will become a bottleneck (hot spot) for IO.
It is possible for an individual stripe to be more active than the others but this usually indicates the stripe size is too large or a database configuration issue of some kind.
RAID 100 (1+0+0)
RAID 100 is sometimes referred to as a plaid RAID level and is just another layer of striping on top of RAID 10. This is implemented by using RAID 10 at the disk subsystem level and OS level striping on top.
Why would you do this? In some cases it is merely to spread the IO when the disk subsystem is already RAID 10. For larger systems it is used to span a stripe set across multiple adapters within the same SAN or even between multiple SAN devices.
For smaller number of disks it is close in performance to RAID 10. As the number of disks increase RAID 100 will start to out perform RAID 10 for random IO. Fault tolerance is the same as RAID 10 until you start spanning SAN devices.
RAID 5 / RAID 6
RAID 5 and 6 are commonly used when cost is the main concern and not performance or reliability. Both usually provides decent enough performance for reads but suffers heavily on write activity.
Disk subsystem vendors will always promise that there will be enough cache to mitigate the inherit write penalty but this is rarely the case. Some of the very high end EMC or Hitachi SANs can actually accomplish this; with the assumption that it set up properly and not over committed to other hosts.
Fault Tolerance Rating: C
RAID 5: There will be no data loss with the first drive failure in a stripe set. A second drive failure in that same stripe set will cause complete loss of data.
RAID 6: Handles up to two drive failures in a stripe set before complete loss of data.
There is an immediate performance impact when a drive is lost. Rebuilding the lost disk(s) is an extremely expensive operation for all parity based RAID levels. All of the remaining parity stripes must be read from the other disks to recover the lost data. Because of the imminent danger of total data loss this is not something that can be done slowly over time.
In addition parity based checksums are not 100% reliable and there is a chance that a single drive failure could cause complete loss of data.
Performance Rating: C+ for reads, D for writes
Read performance is generally acceptable if your system has a high ratio of reads to writes. Once you start writing to disk the performance penalty of maintaining the data stripe and the related parity stripe can severely degrade the performance of the entire stripe set.
For RAID 6 this is further complicated by having an extra parity stripe to maintain on each drive. You get slightly better fault tolerance but writes can take 20-30% longer on RAID 6 compared to RAID 5.
RAID 50 / RAID 60
If you are stuck with RAID 5/6 you should strongly consider implementing RAID 50/60 instead. A common issue with parity based RAID is the practical limit on how many drives you can assign to a single stripe set. This is further complicated by concatenating the LUNs on the database host.
RAID 50 and 60 stacks OS level RAID 0 (striping) on top of parity based RAID. This can help greatly with some of the performance issues related to RAID 5/6.
The wider the stripe the less of an impact the parity will have on performance. Mostly because you are now using more disk spindles than with concatenated RAID. Read performance will be somewhere between RAID 5 and RAID 10 speeds.