Monday, December 12, 2016

Linux/Oracle Linux -- Inodes & partition/filesystem size.. --using debugfs, --No space left on device

In this blog post, I will shortly explain the inodes and the maximum inode count for a filesystem in Linux.


I decided to give this subject a place in my blog, because I think it is important. Especially when you want to store millions of files in a partition, the configuration of inodes become crucial.
Storing millions of files may be required in a case where you want to store lots of pictures in a filesystem, or in a case where your applications use filesystem to create some audit file or some debug files or log files.

It is important because a misconfiguration can make the system hang. In addition, you may be surprised to be in a siutation where you can't create files on your affected partition, although "df" command reports lots of free space available.

Let's introduce you the inodes briefly.

Inodes are used for storing metadata information about files. This metada includes owner info, size, timestamps and so on. To create a single file in Linux, we need to have at least 1 inode available for our filesystem.
For ext2 and ext3, the inode size is 128 bytes . This is fixed value. However, using -I argument with mk2fs, it is possible to utilize inodes larger than 128 bytes to store extended attributes.
For ext4, the inode records are 256 bytes, but the inode structure is 156 bytes. So ext4 has 100 bytes (256-156) free space in each Inode for storing extra/extended attributes.

Let's make a quick demo and take a look at the inode structure, see what is stored in it , and as a bonus "let's update the contents of an inode using debugfs :)" ->
  • Just check the file from the shell and gather info using stat command
[root@erpdb boot]# stat erm1
File: `erm1'
Size: 6 Blocks: 2 IO Block: 1024 regular file
Device: 6801h/26625d Inode: 6027 Links: 1
Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2016-12-06 21:34:50.000000000 +0300
Modify: 2016-12-06 21:34:29.000000000 +0300
Change: 2016-12-06 21:34:29.000000000 +0300
  • Check the inode number 6027 using debugfs to see the file owner's uid.
debugfs:   stat <6027>

debugfs:  stat <6027>
Inode: 6027   Type: regular    Mode:  0644   Flags: 0x0   Generation: 4071970905
User:     0   Group:     0   Size: 6
File ACL: 0    Directory ACL: 0
Links: 1   Blockcount: 2
Fragment:  Address: 0    Number: 0    Size: 0
ctime: 0x584704b5 -- Tue Dec  6 21:34:29 2016
atime: 0x584704ca -- Tue Dec  6 21:34:50 2016
mtime: 0x584704b5 -- Tue Dec  6 21:34:29 2016
BLOCKS:
(0):27657
TOTAL: 1
  • Check to see the oraprod's uid.. (we will make the oraprod the owner of the erm1 in next steps)
[root@erpdb ~]# id oraprod
uid=501(oraprod) gid=500(dba) groups=500(dba)
  • Use "mi" command to modify the inode of the file named erm1. Note that, the inputs that "mi" command are requesting here, are acutally the values that are stored in the inode attributes. (such as user id, group id, size, creation time etc...) Note that, we will only update the User ID (owner) in this example.
debugfs:  mi  <6027>
                          Mode    [0100644] 
                       User ID    [0] 501  (entered oraprod's uid)
                      Group ID    [0] 
                          Size    [6] 
                 Creation time    [1481049269] 
             Modification time    [1481049269] 
                   Access time    [1481049290] 
                 Deletion time    [0] 
                    Link count    [1] 
                   Block count    [2] 
                    File flags    [0x0]                     Generation    [0xf2b55859] 
                      File acl    [0] 
           High 32bits of size    [0]               Fragment address    [0]                Fragment number    [0] 
                 Fragment size    [0] 
           ..
           ....
           ......
You see the major attributes stored in an inode above..
  • Well, lastly unmount and mount the filesystem, in which the file is located and use "ls" command to check the owner. (note that, we need to remount the filesystem after making a change(write) using debugfs.. Unless the filesystem is remounted, our change is not seen because of inode caching)
[root@erpdb boot]# ls -al erm1
-rw-r--r-- 1 oraprod root 6 Dec  6 21:34 erm1

Well, after taking a look at the inodes, let's come back to our topic or should I say, let's start with our topic, since we didn't go into our actual topic yet.

The filesystems has a defined number of inodes.
Actually, we don't care about them while creating the filesystems but the inodes are there and created according to a default ratio.

Let's create an ext4 filesystem and look at the situation.
[root@erpdb /]#mkfs.ext4 /dev/sdb
[root@erpdb ~]# mount /dev/sdb /u03
[root@erpdb ~]# df -i /u03
Filesystem            Inodes   IUsed   IFree IUse% Mounted on
/dev/sdb             64004096      11 64004085    1% /u03
[root@erpdb ~]# df -h /u03
Filesystem            Size  Used Avail Use% Mounted on
/dev/sdb              962G  200M  913G   1% /u03

As seen with "df -i" output above, our newly created filesystem has 64004096 inodes and the size of the filesystem itself is 962G.  So , mkfs.ext4 by default creates 64004096 inodes for our filesystem.

let's create 100000 files on these filesystem;

for i in {0..100000}
do
    touch "File$(printf "%03d" "$i").txt"
done

Then, check the inode used&free counts;

[root@erpdb u03]# df -i /u03
Filesystem            Inodes   IUsed   IFree IUse% Mounted on
/dev/sdb             64004096  100015 63904081    1% /u03

As seen above, 100015 inodes are used (don't bother the extra 15 inodes, they were there when the filesystem is created)

So, we created 100000 "empty" files and we have spents 100000 inodes for them.

Now, let's check the size of our filesystem to see its used and free space.

Filesystem            Size  Used Avail Use% Mounted on
/dev/sdb              962G  203M  913G   1% /u03

As seen above, our filesystem is still empty. 203Megabytes were already used before we created our 100000 empty files.

So we are using inodes but we don't use bytes for storing anything.
Let's suppose we go further and try to create 63904082 files , each sized 10k.
Can you imagine the result?
We will  need to have 63904082x10K (624060 Mbyes --almost 620GB)  sized free space in our filesystem. In this case we have that space right? That is, we have 913G free space available as seen in the about df -h output).
On the other hand, we will not able to create the 63904082th file on our filesystem because, we only have 63904081 inodes available and that's why we will end up with the "No space left on device" error. In this kind of a scenario, we will need to reformat our filesystem with a higher number of inode counto to store more files on it.

Let's make  a conclusion for this part;
What have we learned?

1)Inodes are created automatically when we create the filesystem.
2)Inodes are occupied when we create files on our filesystems.
3)In order to be able to create x amount of files in our filesystem, we need to have at least x amount of inodes available in our filesystem
4)Small sized(or let's say almost zero sized) files occupy near-zero space, but they still occupy inodes.
5)If we create a high number of small sized files, we may occupy all the inodes in our filesystem and we may encounter "No space left on device" errors, eventhough we have plenty of free space in our filesystem.
6)Once the filesystem is created, the only way to increase its inode counts, is to reformat it.

So, in brief we know that there may be cases where we should adjust the inode counts according to our needs.

At this point, lets make a visit to man page of mkfs.ext4 (it is an up-to-date fileystem available in Linux)

man mkfs.ext4 ->

-i bytes-per-inode
              Specify  the bytes/inode ratio.  mke2fs creates an inode for every bytes-per-inode bytes of space on the disk.  The larger the bytes-per-inode ratio,
              the fewer inodes will be created.  This value generally shouldnât be smaller than the blocksize of the filesystem, since in  that  case  more  inodes
              would  be  made  than  can ever be used.  Be warned that it is not possible to expand the number of inodes on a filesystem after it is created, so be
              careful deciding the correct value for this parameter.

As seen above, we have a "-i" parameter, which lets us to adjust the inode counts by providing bytes-per-inode value. The larger the bytes-per-inode ratio, the fewer inodes will be created.
The default bytes-per-inode for ext4 is 16384.

Sol ,let's format our filesystem once again by providing a smaller value(1024 bytes) for the bytes-per-inode ratio.

[root@erpdb /]#umount /u03
[root@erpdb /]#mkfs.ext4 -i 1024 /dev/sdb
[root@erpdb /]# mount /dev/sdb /u03

[root@erpdb ~]# df -i /u03
Filesystem            Inodes   IUsed   IFree IUse% Mounted on
/dev/sdb             1023967232      11 1023967221    1% /u03
[root@erpdb ~]# df -h /u03
Filesystem            Size  Used Avail Use% Mounted on
/dev/sdb              733G  217M  684G   1% /u03

You see, we now have lost of inodes. That is, now we have 1023967232 inodes and it is a quite large number when compared with the earlier value(64004096).
However, as we see in "df -h" output, we now have 684G available space. It was 913G earlier..

So, we have more inodes, but less space. (more inodes occupy more space)
Now lets revisit our example above;
Suppose we go further and try to create 63904082 files , each sized 10k.
Can you imagine the result?
We will  need to have 63904082x10K (624060 Mbyes --almost 620GB)  sized free space in our filesystem. In this case we still have that space right? That is, we have 684G free space available as seen in the about df -h output).
So, this time we will be able to create the all these files on our filesystem because, we  have 1023967232  inodes available and that's why this time we will  not end up with the "No space left on device" error....

Well... At this point, I guess you understand what I want to express with this article right?
So, we need to be aware of the filesystem structures while creating a cooked filesystem.
In this article, the inodes come forward, but  there are other tunables as well.
In short, we may need to adjust some parameters while creating the filesystems.
We need to analyze our goal and make those adjustments, just like we do in creating our Oracle Databases or installing our EBS systems.

In case of ASM and ACFS , we are dependent to Oracle.
ASM has a limit of 1 million files per Diskgroup. ACFS, on the other hand; supports 2^40 (1 trillion) files in a file system.

No comments :

Post a Comment