big data + what is the right filesystem ext4 or xfs?

We have Linux Redhat version 7.2 , with xfs file system.

from /etc/fstab

/dev/mapper/vgCLU_HDP-root /                       xfs     defaults        0 0
UUID=7de1ab5c-b605-4b6f-bdf1-f1e8658fb9 /boot                   xfs     defaults        0 0
/dev/mapper/vgCLU_HDP-root /                       xfs     defaults        0 0
UUID=7de1dc5c-b605-4a6f-bdf1-f1e869f6ffb9 /boot                   xfs     defaults        0 0
/dev/mapper/vgCLU_HDP-var /var                    xfs     defaults        0 0 var /var                    xfs     defaults        0 0

The machines are used for hadoop clusters.

I just thinking what is the best file-system for this purpose?

So what is better EXT4, or XFS regarding that machines are used for hadoop cluster?

Here is Solutions:

We have many solutions to this problem, But we recommend you to use the first solution because it is tested & true solution that will 100% work for you.

Solution 1

This is addressed in this knowledge base article; the main consideration for you will be the support levels available: Ext4 is supported up to 50TB, XFS up to 500TB. For really big data, you’d probably end up looking at shared storage, which by default means GFS2 on RHEL 7, except that for Hadoop you’d use HDFS or GlusterFS.

For local storage on RHEL the default is XFS and you should generally use that unless you have specific reasons not to.

Solution 2

XFS is an amazing filesystem, especially for large files. If your load involves lots of small files, cleaning up any fragmentation periodically may improve performance. I don’t worry about it and use XFS for all loads. It is well supported, so no reason not to use it.

Set aside a machine and disk for your own testing of various filesystems, if you want to find out what is best for your typical work load. Working the test load in steps over the entire disk can tell you something about how the filesystem being tested works.

Testing your load on your machine is the only way to be sure.

Note: Use and implement solution 1 because this method fully tested our system.
Thank you 🙂

All methods was sourced from or, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply