Computer freezing on almost full RAM, possibly disk cache problem

The problem I think is somewhat similar to this thread.

It doesn’t matter if I have swap enabled or disabled, whenever the real used RAM amount starts going close to maximum and there is almost no space left for disk cache, system becomes totally unresponsive.

Disk is spinning wildly, and sometimes after long waits 10-30 minutes it will unfreeze, and sometimes not (or I run out of patience). Sometimes if I act quickly I can manage to slowly open console and kill some of ram eating applications like browser, and the system unfreezes almost instantly.

Because of this problem I almost never see anything in the swap, only sometimes there are some few MB there, and then soon after this problem appears.
My not so educated guess would be that it is connected somehow to the disk cache being too greedy, or memory management too lenient, so when the memory is needed it is not freed quickly enough and starves the system.

Problem can be achieved really fast if working with large files (500MB+) which are loaded in disk cache and apparently afterwards system is unable to unload them fast enough.

Any help or Ideas will be greatly appreciated.

For now I have to live in constant fear, when doing something computer can just freeze and I usually have to restart It, if it is really running out of ram I would much more like it to just kill some of userspace applications, like browser (preferably if I could somehow mark which to kill first)

Although the mystery is why doesn’t swap save me in this situation.

UPDATE:
It didn’t hang for some time, but now I got several occurrences again. I am now keeping ram monitor on my screen at all times and when the hang happened it still showed ~30% free (Used by disk cache probably).
Additional symptoms: If at the time I am watching video (VLC player) the sound stops first, after a few seconds the image stops. While the sound has stopped I still have some control over PC, but when the image stops I cannot even move the mouse anymore, so I restarted it after some waiting. Btw, this didn’t happen when I started to watch the video but some time in (20min) and I didn’t actively do anything else at the time, even though browser and oowrite were open on the second screen the whole time. Basically something just decides to happen at one point and hangs the system.

As per request in the comments I ran dmesg right after the hang. I didn’t notice anything weird, but didn’t know for what to look, so here it is:
https://docs.google.com/document/d/1iQih0Ee2DwsGd3VuQZu0bPbg0JGjSOCRZhu0B05CMYs/edit?hl=en_US&authkey=CPzF7bcC

Here is Solutions:

We have many solutions to this problem, But we recommend you to use the first solution because it is tested & true solution that will 100% work for you.

Solution 1

To fix this problem I have found that you need to set the following setting to something around 5%-6% of your total physical RAM, divided by the number of cores in the computer:

sysctl -w vm.min_free_kbytes=65536

Keep in mind that this is a per-core setting, so if I have 2GB RAM and two Cores, then I calculated 6% of only 1 GB and added a little extra just to be safe.

This forces the computer to try to keep this amount of RAM free, and in doing so limits the ability to cache disk files. Of course it still tries to cache them and immediately swap them out, so you should probably limit your swapping as well:

sysctl -w vm.swappiness=5

(100 = swap as often as possible, 0= swap only on total necessity)

The result is that linux no longer randomly decides to load a whole movie file of approx 1GB in ram while watching it, and killing the machine in doing so.

Now there is enough reserved space to avoid memory starvation, which aparrently was the problem (seeing as there are no more freezes like before).

After testing for a day – lockups are gone, sometimes there are minor slowdowns, because stuff gets cached more often, but I can live with that if I dont have to restart computer every few hours.

The lesson here is – default memory management is just one of use cases and is not allways the best, even though some people try to suggest otherwise – home entertainment ubuntu should be configured differently than server.


You probably want to make these settings permanent by adding them to your /etc/sysctl.conf like this:

vm.swappiness=5
vm.min_free_kbytes=65536

Solution 2

This happened for me in a new install of Ubuntu 14.04.

In my case, it had nothing to do with sysctl issues mentioned.

Instead, the problem was that the swap partition’s UUID was different during installation than it was after installation. So my swap was never enabled, and my machine would lock up after a few hours use.

The solution was to check the current UUID of the swap partition with

sudo blkid

and then sudo nano /etc/fstab to replace the incorrect swap’s UUID value with the one reported by blkid.

A simple reboot to affect the changes, and voila.

Solution 3

Nothing worked for me!!

So I wrote a script to monitor memory usage. It will first try to clear RAM cache if the memory consumption increases a threshold. You can configure this threshold on the script. If memory consumption doesn’t come below the threshold even then, it will start killing processes on by one in decreasing order of memory consumption until the memory consumption is below the threshold. I have set it to 96% by default. You can configure it by changing the value of variable RAM_USAGE_THRESHOLD in the script.

I agree that killing processes which consume high memory is not the perfect solution, but it’s better to kill ONE application instead of losing ALL the work!! the script will send you desktop notification if RAM usage increases the threshold. It will also notify you if it kills any process.

#!/usr/bin/env python
import psutil, time
import tkinter as tk
from subprocess import Popen, PIPE
import tkinter
from tkinter import messagebox
root = tkinter.Tk()
root.withdraw()

RAM_USAGE_THRESHOLD = 96
MAX_NUM_PROCESS_KILL = 100

def main():
    if psutil.virtual_memory().percent >= RAM_USAGE_THRESHOLD:
        # Clear RAM cache
        mem_warn = "Memory usage critical: {}%\nClearing RAM Cache".\
            format(psutil.virtual_memory().percent)
        print(mem_warn)
        Popen("notify-send \"{}\"".format(mem_warn), shell=True)
        print("Clearing RAM Cache")
        print(Popen('echo 1 > /proc/sys/vm/drop_caches',
                    stdout=PIPE, stderr=PIPE,
                    shell=True).communicate())
        post_cache_mssg = "Memory usage after clearing RAM cache: {}%".format(
                            psutil.virtual_memory().percent)
        Popen("notify-send \"{}\"".format(post_cache_mssg), shell=True)
        print(post_cache_mssg)

        if psutil.virtual_memory().percent < RAM_USAGE_THRESHOLD:
            print("Clearing RAM cache saved the day")
            return
        # Kill top C{MAX_NUM_PROCESS_KILL} highest memory consuming processes.
        ps_killed_notify = ""
        for i, ps in enumerate(sorted(psutil.process_iter(),
                                      key=lambda x: x.memory_percent(),
                                      reverse=True)):
            # Do not kill root
            if ps.pid == 1:
                continue
            elif (i > MAX_NUM_PROCESS_KILL) or \
                    (psutil.virtual_memory().percent < RAM_USAGE_THRESHOLD):
                messagebox.showwarning('Killed proccess - save_hang',
                                       ps_killed_notify)
                Popen("notify-send \"{}\"".format(ps_killed_notify), shell=True)
                return
            else:
                try:
                    ps_killed_mssg = "Killed {} {} ({}) which was consuming {" \
                                     "} % memory (memory usage={})". \
                        format(i, ps.name(), ps.pid, ps.memory_percent(),
                               psutil.virtual_memory().percent)
                    ps.kill()
                    time.sleep(1)
                    ps_killed_mssg += "Current memory usage={}".\
                        format(psutil.virtual_memory().percent)
                    print(ps_killed_mssg)
                    ps_killed_notify += ps_killed_mssg + "\n"
                except Exception as err:
                    print("Error while killing {}: {}".format(ps.pid, err))
    else:
        print("Memory usage = " + str(psutil.virtual_memory().percent))
    root.update()


if __name__ == "__main__":
    while True:
        try:
            main()
        except Exception as err:
            print(err)
        time.sleep(1)

Save the code in a file say save_hang.py. Run the script as:

sudo python save_hang.py

Please note that this script is compatible for Python 3 only and requires you to install tkinter package. you can install it as:

sudo apt-get install python3-tk

Hope this helps…

Solution 4

I know this question is old, but I had this problem in Ubuntu (Chrubuntu) 14.04 on an Acer C720 Chromebook. I tried Krišjānis Nesenbergs solution, and it worked somewhat, but still crashed sometimes.

I finally found a solution that worked by installing zram instead of using physical swap on the SSD. To install it I just followed the instructions here, like this:

sudo apt-get install zram-config

Afterwards I was able to configure the size of the zram swap by modifying /etc/init/zram-config.conf on line 21.

20: # Calculate the memory to user for zram (1/2 of ram)
21: mem=$(((totalmem / 2 / ${NRDEVICES}) * 1024))

I replaced the 2 with a 1 in order to make the zram size the same size as the amount of ram I have. Since doing so, I have had no more freezes or system unresponsiveness.

Solution 5

My guess is that you’ve set your vm.swappiness to a very low value, which causes the kernel to swap too late, leaving too low RAM for the system to work with.

You can show your current swappiness setting by executing:

sysctl vm.swappiness

By default, this is set to 60. The Ubuntu Wiki recommends to set it to 10, but feel free to set it to a higher value. You can change it by running:

sudo sysctl vm.swappiness=10

This will change it for the current session only, to make it persistent, you need to add vm.swappiness = 10 to the /etc/sysctl.conf file.

If your disk is slow, consider buying a new one.

Solution 6

I’ve been struggling with this issue for a long time, but now it seems to be solved on my Laptop.

If none of the other answers works for you (I tried most of them), play with min_free_kbytes, to have more space in RAM when your computer starts swapping (just before hitting this minimum value on your free RAM).

I have 16GB RAM, but more sooner than later the memory became full and then stopped responding for 10 to 30 minutes, until some things get swapped.

At least for me, setting min_free_kbytes value above what is recommended makes that swapping process considerably faster.

For 16GB RAM, try this:

vm.min_free_kbytes=500000

To set this value see other answers, or just google it 🙂

Solution 7

I run one of my laptops from a live Ubuntu SD card constantly, with a small ext4 storage partition and a swap file on the hard drive. When almost all of the RAM is used and the swappiness value is too low (sometimes I prefer to keep the hard drive completely off if possible, because it’s noisy), Linux performance tends to fall off a cliff for me, such that just getting to TTY1 to kill Firefox takes 15 minutes.

Raising /proc/sys/vm/vfs_cache_pressure from the default of 100 to a value of 6000 seems to help prevent this. However, the kernel documentation warns against doing so, saying

Increasing vfs_cache_pressure significantly beyond 100 may have negative
performance impact. Reclaim code needs to take various locks to find freeable
directory and inode objects. With vfs_cache_pressure=1000, it will look for
ten times more freeable objects than there are.

I am not entirely sure of the side effects of doing this so I’d be careful doing this.

Solution 8

Instead of tunning kernel parameters yourself. try linux-zen or linux-ck which are specially designed (patched and tunned) for Desktop and Laptop usages. Default Linux is more tunned towards more throughput and Servers without GUI

zen will give you less throughout but better responsiveness. In my experience tunning parameters of default kernel give less throughout than linux-zen ( or liquorix ) and linux-ck.

For ubuntu, I think you have to manually compile and patch kernel for linux-ck with Con Kolivas’ ck patchset

You can use liquorix kernel which usage Linux-zen source. take a look at
https://liquorix.net/

here’s how to install

sudo add-apt-repository ppa:damentz/liquorix
sudo apt-get update
sudo apt-get install linux-image-liquorix-amd64 linux-headers-liquorix-amd64

if you insist using default kernel, then disabling swap might be a better option too

Note: Use and implement solution 1 because this method fully tested our system.
Thank you 🙂

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply