Monday, September 13, 2010

lsof: alloc: /: file system full

Have you seen this error?

 Aug 9 00:17:41 server1 ufs: [ID 845546 kern.notice] NOTICE: alloc: /: file system full

When I looked at the top disk space consumers I found nothing useful.


 # df -h | sort -rnk 5

 /dev/md/dsk/d0 3.0G 2.9G 0K 100% /
 /dev/md/dsk/d3 2.0G 1.5G 404M 80% /var
 /dev/md/dsk/d30 469M 330M 93M 79% /opt
 /dev/md/dsk/d6 992M 717M 215M 77% /home
 /dev/md/dsk/d33 752M 494M 198M 72% /usr/local/install
 [...]

After doing a du on whole filesystem I can see it is showing 2.5G only and df showing 2.9G consumed space.

 # du -shd /
 2.5G

I realized few days back I came across same issue on ZFS filesystem hosting oracle DB and below understanding helped me there.

Normally, If filesystem is full, then look around in the directories that will be hidden by mounted filesystems in higher init states or see if any files that are eating up the disk space, in case if you get nothing useful from this exercise then one of the things to check is the open files and consider what has been cleaned up. Sometimes, if an open file is emptied or unlinked from the directory tree the disk space is not de-allocated until the owning process has been terminated or restarted. The result is an unexplainable loss of disk space. If this is the cause a reboot would clear it up. If you can't reboot consider any process that would be logging to that partition as a suspect and check all of your logs for any entries that imply rapid errors in a process.

In my case, reboot was not possible as the server caused file system full



 # lsof +aL1 /

 lsof  WARNING: access /.lsof_server1: No such file or directory
 lsof: WARNING: created device cache file: /.lsof_server1
 lsof: WARNING: can't write to /.lsof_server1: No space left on device

 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NLINK NODE NAME
 scp 16472 root 4r VREG 85,0 238616064 0 119696 / (/dev/md/dsk/d0)
 scp 22154 root 4r VREG 85,0 238213120 0 119677 / (/dev/md/dsk/d0)

``+L1'' will select open files that have been unlinked. A specification of the form ``+aL1 '' will select unlinked open files on the specified file system.

I got the processes ID's via lsof, after verifying the processes I killed them and suddenly it has released ~450MB space.


 # df -kh | sort -rnk 5

 /dev/md/dsk/d0 3.0G 2.5G 418M 86% /
 /dev/md/dsk/d3 2.0G 1.5G 406M 80% /var
 /dev/md/dsk/d30 469M 331M 91M 79% /opt
 /dev/md/dsk/d6 992M 717M 215M 77% /home
 /dev/md/dsk/d33 752M 494M 198M 72% /usr/local/install


No comments:

Post a Comment