Hacker News new | past | comments | ask | show | jobs | submit login
Linux RAID is different from Windows for sound technical and historical reasons (plus.google.com)
134 points by dredmorbius on Sept 23, 2016 | hide | past | favorite | 42 comments



You can do really cool things with Linux RAID, like RAID stripe across 8 virtual (i.e., EBS @ EC2) volumes and then layer dmcrypt on top of that or other things for use cases that it was never even designed for. Linux RAID (and volume management) were really designed The UNIX Way as modular, small tools that can be applied like Lego bricks, and no where is that more user-visible than in the RAID/LVM/etc subsystems.


Or just the classic Floppy RAID http://mac-guild.org/raid.html (this is OS X but the same thing applies)


Now do it with multiple 1.2MB 5.25" drives if you can find a system with a BIOS old enough to support the hardware, yet new enough to run a modern kernel.


Interesting if rosy glasses view; Linux didn't propagate I/O barriers until 2.6.33 on most common mdraid setups which is absolutely insane. https://monolight.cc/2011/06/barriers-caches-filesystems/

I know this is Linux's Alan Cox but the post reads like a typical Windows hater Linux desktoper and is quite amusing and even quaint if you have just passing familiarity with ZFS internals to contrast.

As far as the layered approach I would also suggest study of FreeBSD's geom which dips slightly below and above Linux's mdraid (i.e. you still use geom with ZFS, zvols) but IMHO is a bit cleaner probably because it's a later arrival.


mdraid will still, after some 10 years of the bug being known[1], happily corrupt all your data if you mix hardware with different queue sizes.

How do you know you you have hardware with different queue sizes? Why, you start to experience data corruption for no apparent reason, of course. There's no other warning. And the only "workaround" is setting the global queue size to the lowest common denominator of the hardware installed – and God help you if you later install hardware with a lower queue size.

[1] https://bugzilla.kernel.org/show_bug.cgi?id=9401


That bug report isn't about data being corrupted due to mixing hardware with different queue sizes. The problem described in that bug is that if you add a device at runtime with a smaller queue size, the queue size of the mdraid device correctly decreases. However, if you have a dm-crypt volume mounted on top of that when you add the device then dm-crypt doesn't have any way of detecting the queue size change and stuff breaks. Just mixing different queue sizes shouldn't cause this, and neither should adding devices to a mdraid volume without anything stacked between it and the filesystem.


Is that bug actually fixed or not? The bug status is "CLOSED CODE_FIX", but commenters in the bug say they still have the problem with kernel versions in which the fix is supposed to be present.

(Aside: I think bug trackers should be configured in such a way that you can't mark a bug as having been fixed unless you specify the ID of the commit in which it was fixed.)


It's not fixed, I'm still seeing it on kernel 4.8.


Submitter's comment.

Alan Cox, former #2 Linux kernel developer, at G+, on the differences between Linux and Microsoft RAID support, related to recent Lenovo news,[1] and the technical and historical basis for this.

I did some massaging of the fourth paragraph, which seems closest to a lede/head, to fit within HN's 80 character headline limit. The first four 'graphs of the post:

"Unsupported models will rely on Linux operating system vendors releasing new kernel and drivers to support features such as RAID on SSD"

Good to see that the tech press fact check comments from companies as well as the political press fact check politicians. I'm reading this on a box with RAID1 SSD. It's had RAID1 SSD for some years.

Linux has supported RAID on SSD for years, in fact it supported it from the moment you could plug an SSD into a Linux PC.

Linux RAID is different from much of the Windows experience, for a mix of sound technical reasons and historical ones.

________________________________

Notes:

1. http://www.gossamer-threads.com/lists/linux/kernel/2352338 (from Cox's post).


After looking at the gossamer thread: Could this be directly related to the current BIOS problems? Lenovo's closed source driver runs into problems -> Lenovo dev wants to commit workaround to Linux, fails -> force different PCI-ID as alternate workaround by crippling the BIOS? Firmware gets copy&pasted to non-servers?


Is this in reference to the whole "OMG Microsoft/Lenovo is locking down your laptop!" scandal?



Hasn't Windows supported software RAID for quite a while now ?


Ish, and it depends on version -- like I created a RAID1 array for my gaming box when I was using windows 7 pro; then upgraded to windows 10 home thinking "I'm not using any of the pro features anyway"... turns out that RAID1 is a pro feature.

What's worse: if you try and mount the drives, it doesn't say "this looks like a raid1 volume, I don't have drivers for that, please upgrade to access this data", it says something more like "data corrupt, would you like to format this drive? [Yn]" :/


So RAID1 will someday be DLC in the Windows Store?


I think on older client versions of Windows like Windows XP it was restricted to RAID 0.


Yes Windows has had RAID for a long time. There has been various limitations. I don't think you can boot a RAID 0 volume, for instance. It's certainly not as flexible as Linux's model, but Windows usually isn't.

As far as hardware RAID controllers, at least you get a battery-backed write cache that will "just work" without the OS, right? As in, you don't need to boot the OS to finish flushing. Or maybe I've just bought into the hardware RAID marketing. It does seem simpler, as in less to deal with, to let the card do it and just present whatever volumes desired to the OS.


My time playing with hardware RAID cards is over. My disks spent their time dropping on different builds (even with TLER disks only). For a home NAS, a synology style linux software RAID with regular data scrubbing is the peace of mind solution. For the cache problem, a cheap UPS that lasts the few minutes required to gracefully shut down is a good enough solution. As for performance, my bottleneck is the 1 gigabit port on my laptop anyway. And I don't see any improvement on the horizon for that anyway.


I believe you mean a cheap UPS. A cheap PSU is likely to last a very small fraction of a second in an outage.


Corrected. Thanks!


As I recall, They have supported raid 0 and/or 1 since some editions of windows XP.

Same thing on Windows 7 (that may be limited to the pro/entreprise editions)

Since windows server 2012. The OS has "storage spaces" which is a full management solution for RAID and volumes and physical disks. It's somewhat similar to the ZFS/zpool/mdraid/dmcrypt tooling, EXCEPT it's all in a single software, unified, way easier to use and with a nice GUI.

IMO. Software raid and disk management is utter shit on Linux. They are easily lagging > 10 years behind but that's just my point of view.


Actually the underlying interfaces are for the largest part sane; and what's more are much easier configure from the command line (and use git or whatever source control, or devops tools). The only area where the Linux side is not easier is for first time or very occasional users who want an easy GUI with native bling to configure things with.


I disagree on the sane part.

Make a simple search on google on "how to resize a partition on linux" and you get ten tutorials, all of which give different steps, none of which work.

There are at least 7 different tools for managing disks on linux most of which vary by distribution and version: fdisk, mkfs, partx, parted, growpart, mdraid, dmcrypt...

Little story: The auto extending of partitions on boot (i.e. a critical thing in cloud environment) has been broken in debian stable for more than 6 months.

It's calling "growpart" that is calling other tools, one of which is "partx". Some flags are changing between versions and it breaks the whole toolchain, unless all software versions are carefully selected.

We could continue on why do some of the tools only accept cylinder and block counts as size??? or why is gparted a terrible GUI for managing disks.


There isn't much, if anything, you can't configure from PowerShell these days. Microsoft even has versions of Windows Server without a GUI at all.


> The Linux RAID history is different because unlike Microsoft the decision was made to integrate software RAID properly with the OS.

Well there wasn't really a choice was there?


Does linux RAID supports TRIM? I believe most hardware RAID cards still do not.


Some RAID models support TRIM at this point (modern kernel version). I can't find the commit or change log in a few seconds. But this bug confirms it: https://bugzilla.kernel.org/show_bug.cgi?id=117051 (Very slow discard(trim) with mdadm raid0 array )


dmraid is pretty cool. My desktop has 4 disks configured in a "fakeraid" RAID 10 setup. My desktop dual boots Debian and Windows. I can use/share that RAID 10 volume under both Debian and Windows just as easily as I could a single disk. I use half of the volume for my Windows Steam library and the other half for my Linux Steam library. I didn't have to do anything special under Linux. dmraid just made it magically work.


It's 2016. If you are still deploying RAID, you're doing it wrong. Seriously guys, ZFS. Look it up.


RAID isn't dead yet.

* outboard RAID provides write amplification across physical devices

* soft RAID1 (simple mirror) is nice for things like boot volumes and easy to fix when it breaks

But yeah, ZFS and similar strategies work much better than soft RAID-5/6 for file store resiliency across visible LUNs.


You may want to do a quick search on what write amplification means. It's an undesirable property of data structures.

I think you perhaps meant fan out, which is partially valid but addressed in the submitted article - i.e. bus speeds have improved greatly. In fact to the point where you can easily have no bus at all: point to point PCIe lanes to each NVMe device.

A zmirror is simple too, and gives you boot environments so you can roll back failed upgrades and other magic.


You've been voted down, but I'm curious - could you elaborate?


ZFS has built-in support for redundant disks (in fact, using a lower-level hardware or software RAID under ZFS is not recommended). This allows it to do fancy things like allocate files intelligently across stripes, rebuild a disk with only the data used by the file system (huge time saver), and be able to repair files on-the-fly that are damaged, or at worst, report what files cannot be repaired.


But it won't let you add disks to your array one-by-one, unlike linux software raid.


What gave you that impression? With `zpool attach` you can attach a device to an existing VDEV (single device, mirror, RAIDZ) to increase redundancy. With `zpool add` you can add a new VDEV to your pool to extend its capacity.

See https://illumos.org/man/1m/zpool


What he means is you can't add another disk in vdev. If you have a raidz with 3 disks, you can't add 4th one and rebalance. What you can do is either add another vdev to the pool, making it two+ striped raidz or you can replace your 3 disks with a bigger ones.

Reason for that is that adding a disk to vdev will require rebalancing everything and considering complexity of zfs, you better just make a new raidz with 4 disks and move data over, otherwise it might just take ages and complicate things a lot. This is obviously not ideal for a home user, but zfs was not created with a home users in mind.

Btrfs supposedly should let you do this, but their raid5/6 is still unstable - probably for a reason. In fact, this is not such a big deal. You can add disks to a zfs mirror (and it is real n-way mirror, not btrfs raid1 thing with multiple disks), you can add mirror vdevs to raid10 setup. If you are making raidz1/2/3 just make sure you understand that you can't expand it by adding more disks.


Hope this comment isn't too ignorant, but how does it compare to gfs2?


It doesn't. GFS2 is a clustered filesystem, ZFS is not. GFS2 is for sharing a single filesystem between multiple nodes, with a distributed lock manager to avoid corruption. It's complex to fix when it breaks, generally fairly slow and very few people need it - for most users, NFS (especially v4 and up) is a better way of accomplishing the same end goal.


RAID is something that can supported at the hardware level (hardware RAID) whereas ZFS requires software support. Also, some businesses use Windows Server. Are they wrong for not using ZFS?


If you read the Linux developer's article that this whole thread is about, you will see reasons you should not use hardware RAID for some years now.

And yes, most people should default to ZFS for everything on FreeBSD/IllumOS/Solaris/Linux. There are specific cases where you should use XFS or UFS on these operating systems such as fail in place and overlay filesystems (HDFS, Ceph, Swift etc) but times have changed, the tables have turned, these non-ZFS filesystems are now expert territory with here be dragons warning label.


Pity about the licensing. And, I'm told, the RAM requirements.


I would have preferred the 2 clause BSDL myself but they were free to choose CDDL.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: