diff --git a/README.md b/README.md index 2d19b93..e2fe725 100644 --- a/README.md +++ b/README.md @@ -1,19 +1,32 @@ -The No Hang Daemon -================== +# Nohang -`Nohang` is a highly configurable daemon for Linux which is able to correctly prevent out of memory conditions. +`Nohang` is a highly configurable daemon for Linux which is able to correctly prevent out of memory conditions and save disk cache. -### What is the problem? +## What is the problem? -OOM killer doesn't prevent OOM conditions. +OOM killer doesn't prevent OOM conditions. And OOM conditions may cause loss disk cache, [freezes](https://en.wikipedia.org/wiki/Hang_(computing)), [livelocks](https://en.wikipedia.org/wiki/Deadlock#Livelock) and killing multiple processes. -### Solutions +"How do I prevent Linux from freezing when out of memory? -- Use of [earlyoom](https://github.com/rfjakob/earlyoom). This is a simple OOM preventer written in C. -- Use of nohang. This is an advanced OOM preventer written in Python. +Today I (accidentally) ran some program on my Linux box that quickly used a lot of memory. My system froze, became unresponsive and thus I was unable to kill the offender. -### Some features +How can I prevent this in the future? Can't it at least keep a responsive core or something running?" +[serverfault](https://serverfault.com/questions/390623/how-do-i-prevent-linux-from-freezing-when-out-of-memory) + +"With or without swap it still freezes before the OOM killer gets run automatically. This is really a kernel bug that should be fixed (i.e. run OOM killer earlier, before dropping all disk cache). Unfortunately kernel developers and a lot of other folk fail to see the problem. Common suggestions such as disable/enable swap, buy more RAM, run less processes, set limits etc. do not address the underlying problem that the kernel's low memory handling sucks camel's balls." +[serverfault](https://serverfault.com/questions/390623/how-do-i-prevent-linux-from-freezing-when-out-of-memory) + +Also look at "Why are low memory conditions handled so badly?" [r/linux](https://www.reddit.com/r/linux/comments/56r4xj/why_are_low_memory_conditions_handled_so_badly/) - discussion with 480+ posts. + + +## Solutions + +- Use of [earlyoom](https://github.com/rfjakob/earlyoom). This is a simple and lightweight OOM preventer written in C. +- Use of [oomd](https://github.com/facebookincubator/oomd). This is a userspace OOM killer for linux systems whitten in C++ and developed by Facebook. +- Use of nohang. + +## Some features - convenient configuration with a well commented config file (there are 35 parameters in the config) - `SIGKILL` and `SIGTERM` as signals that can be sent to the victim @@ -24,7 +37,7 @@ OOM killer doesn't prevent OOM conditions. - possibility of restarting processes via command like `systemctl restart something` if the process is selected as a victim - look at the [config](https://github.com/hakavlad/nohang/blob/master/nohang.conf) to find more -### Demo +## Demo [Video](https://youtu.be/DefJBaKD7C8): nohang prevents OOM after the command `while true; do tail /dev/zero; done` has been executed. @@ -58,40 +71,40 @@ MemAvail: 1535 M, 26.1 % ``` And demo: https://youtu.be/5d6UovJzK8k -### Requirements +## Requirements - `Linux 3.14+` (because the MemAvailable parameter appeared in /proc/meminfo since kernel version 3.14) and `Python 3.4+` (compatibility with earlier versions was not tested) for basic usage - `libnotify` (Fedora, Arch) or `libnotify-bin` (Debian, Ubuntu) for desktop notifications and `sudo` for desktop notifications as root -### Memory and CPU usage +## Memory and CPU usage - VmRSS is 10 — 13.5 MiB depending on the settings - CPU usage depends on the level of available memory (the frequency of memory status checks increases as the amount of available memory decreases) and monitoring intensity (can be changed by user via config) -### Status +## Status The program is unstable and some fixes are required before the first stable version will be released (need documentation, translation, review and some optimisation). -### Download +## Download ```bash git clone https://github.com/hakavlad/nohang.git cd nohang ``` -### Installation and start for systemd users +## Installation and start for systemd users ```bash sudo ./install.sh ``` -### Purge +## Purge ```bash sudo ./purge.sh ``` -### Command line options +## Command line options ``` ./nohang -h @@ -104,7 +117,7 @@ optional arguments: ./nohang.conf, /etc/nohang/nohang.conf ``` -### How to configure nohang +## How to configure nohang The program can be configured by editing the [config file](https://github.com/hakavlad/nohang/blob/master/nohang.conf). The configuration includes the following sections: @@ -119,7 +132,7 @@ The program can be configured by editing the [config file](https://github.com/ha Just read the description of the parameters and edit the values. Please restart nohang to apply changes. Default path to the config arter installing via `./install.sh` is `/etc/nohang/nohang.conf`. -### Feedback +## Feedback Please create [issues](https://github.com/hakavlad/nohang/issues). Use cases, feature requests and any questions are welcome. diff --git a/nohang b/nohang index ecb1dc2..4c3c8bc 100755 --- a/nohang +++ b/nohang @@ -6,22 +6,20 @@ import os from operator import itemgetter from time import sleep, time from argparse import ArgumentParser -from subprocess import Popen +# from subprocess import Popen sig_dict = {9: 'SIGKILL', 15: 'SIGTERM'} -# директория, в которой запущен скрипт +# directory where the script is running cd = os.getcwd() -# где искать конфиг, если не указан через опцию -c/--config -default_configs = ( - cd + '/nohang.conf', - '/etc/nohang/nohang.conf' -) +# where to look for a config if not specified via the -c/--config option +default_configs = (cd + '/nohang.conf', '/etc/nohang/nohang.conf') # universal message if config is invalid -conf_err_mess = '\nSet up the path to the valid config file with -c/--confi' \ - 'g option!\nExit' +conf_err_mess = '\nSet up the path to the valid conf' \ + 'ig file with -c/--config option!\nExit' + # означает, что при задани zram disksize = 10000M доступная память # уменьшится на 42M @@ -31,6 +29,9 @@ conf_err_mess = '\nSet up the path to the valid config file with -c/--confi' \ # ("zram uses about 0.1% of the size of the disk" # - https://www.kernel.org/doc/Documentation/blockdev/zram.txt), # но это утверждение противоречит опытным данным + +# zram_disksize_factor = deltaMemAvailavle / disksize +# found experimentally zram_disksize_factor = 0.0042 name_strip_string = '\'"`\\!-$' @@ -54,16 +55,17 @@ def string_to_int_convert_test(string): return None -# извлечение праметра из словаря конфига, возврат str +# extracting the parameter from the config dictionary, str return def conf_parse_string(param): if param in config_dict: return config_dict[param].strip() else: - print('{} not in config\nExit'.format(param)) + print('All the necessary parameters must be in the config') + print('There is no "{}" parameter in the config'.format(param)) exit() -# извлечение праметра из словаря конфига, возврат bool +# extracting the parameter from the config dictionary, bool return def conf_parse_bool(param): if param in config_dict: param_str = config_dict[param] @@ -72,22 +74,20 @@ def conf_parse_bool(param): elif param_str == 'False': return False else: - print('Invalid {} value {} (shou' \ - 'ld be True or False)\nExit'.format(param, param_str)) + print('Invalid value of the "{}" parameter.'.format(param_str)) + print('Valid values are True and False.') + print('Exit') exit() else: - print('{} not in config\nExit'.format(param)) + print('All the necessary parameters must be in the config') + print('There is no "{}" parameter in the config'.format(param_str)) exit() def func_decrease_oom_score_adj(oom_score_adj_max): - # цикл для наполнения oom_list for i in os.listdir('/proc'): - - # пропускаем элементы, не состоящие только из цифр if i.isdigit() is not True: continue - try: oom_score_adj = int(rline1('/proc/' + i + '/oom_score_adj')) if oom_score_adj > oom_score_adj_max: @@ -99,14 +99,14 @@ def func_decrease_oom_score_adj(oom_score_adj_max): pass -# чтение первой строки файла +# read 1st line def rline1(path): with open(path) as f: for line in f: return line[:-1] -# запись в файл +# write in file def write(path, string): with open(path, 'w') as f: f.write(string) @@ -128,12 +128,12 @@ def just_percent_swap(num): return str(round(num * 100, 1)).rjust(5, ' ') -# K -> M, выравнивание по правому краю +# KiB to MiB, right alignment def human(num, lenth): return str(round(num / 1024)).rjust(lenth, ' ') -# возвращает disksize и mem_used_total по zram id +# return str with amount of bytes def zram_stat(zram_id): try: disksize = rline1('/sys/block/' + zram_id + '/disksize') @@ -153,7 +153,7 @@ def zram_stat(zram_id): return disksize, mem_used_total # BYTES, str -# имя через пид +# return process name def pid_to_name(pid): try: with open('/proc/' + pid + '/status') as f: @@ -166,7 +166,7 @@ def pid_to_name(pid): def send_notify_warn(): - # текст отправляемого уведомления + if mem_used_zram > 0: info = '"MemAvailable: {} MiB\nSwapFree: {} MiB\nMemUsedZram: {} MiB" &'.format( kib_to_mib(mem_available), @@ -229,8 +229,6 @@ def sleep_after_send_signal(signal): def find_victim_and_send_signal(signal): - time0 = time() - print(mem_info) # выставляем потолок для oom_score_adj всех процессов @@ -240,7 +238,7 @@ def find_victim_and_send_signal(signal): # получаем список процессов ((pid, badness)) oom_list = [] - if use_regex_lists: + if regex_matching: for pid in os.listdir('/proc'): if pid.isdigit() is not True: @@ -250,16 +248,16 @@ def find_victim_and_send_signal(signal): oom_score = int(rline1('/proc/' + pid + '/oom_score')) name = pid_to_name(pid) - res = fullmatch(avoidlist_regex, name) + res = fullmatch(avoid_regex, name) if res is not None: # тут уже получаем badness - oom_score = int(oom_score / avoidlist_factor) - print(' {} (Pid: {}, Badness {}) matches with avoidlist_regex'.format(name, pid, oom_score)), + oom_score = int(oom_score / avoid_factor) + print(' {} (Pid: {}, Badness {}) matches with avoid_regex'.format(name, pid, oom_score)), - res = fullmatch(preferlist_regex, name) + res = fullmatch(prefer_regex, name) if res is not None: - oom_score = int((oom_score + 1) * preferlist_factor) - print(' {} (Pid: {}, Badness {}) matches with preferlist_regex'.format(name, pid, oom_score)), + oom_score = int((oom_score + 1) * prefer_factor) + print(' {} (Pid: {}, Badness {}) matches with prefer_regex'.format(name, pid, oom_score)), except FileNotFoundError: oom_score = 0 @@ -287,7 +285,7 @@ def find_victim_and_send_signal(signal): # получаем максимальный oom_score oom_score = pid_tuple_list[1] - if oom_score >= oom_score_min: + if oom_score >= min_badness: # пытаемся отправить сигнал найденной жертве @@ -326,19 +324,24 @@ def find_victim_and_send_signal(signal): else: - try: + try: # SUCCESS -> RESPONSE TIME os.kill(int(pid), signal) + success_time = time() delta_success = success_time - time0 - send_result = ' Success; reaction time: {} ms'.format(round(delta_success * 1000)) + send_result = ' Success; response time: {} ms\n'.format(round(delta_success * 1000)) + r'}' - if desktop_notifications: + if gui_notifications: send_notify(signal, name, pid, oom_score, vm_rss, vm_swap) except FileNotFoundError: - send_result = ' No such process' + success_time = time() + delta_success = success_time - time0 + send_result = ' No such process; response time: {} ms'.format(round(delta_success * 1000)) except ProcessLookupError: - send_result = ' No such process' + success_time = time() + delta_success = success_time - time0 + send_result = ' No such process; response time: {} ms'.format(round(delta_success * 1000)) try_to_send = ' Preventing OOM: trying to send the {} signal to {},\n Pid: {}, Badness: {}, VmRSS: {} MiB, VmSwap: {} MiB'.format(sig_dict[signal], name, pid, oom_score, vm_rss, vm_swap) @@ -347,8 +350,12 @@ def find_victim_and_send_signal(signal): else: - badness_is_too_small = ' oom_score {} < oom_score_min {}'.format( - oom_score, oom_score_min) + success_time = time() + delta_success = success_time - time0 + + + badness_is_too_small = ' oom_score {} < min_badness {}; response time: {} ms'.format( + oom_score, min_badness, round(delta_success * 1000)) print(badness_is_too_small) @@ -391,7 +398,7 @@ for s in mem_list: mem_list_names.append(s.split(':')[0]) if mem_list_names[2] != 'MemAvailable': - print('Your Linux kernel is too old, 3.14+ requie\nExit') + print('Your Linux kernel is too old, Linux 3.14+ requie\nExit') exit() swap_total_index = mem_list_names.index('SwapTotal') @@ -455,7 +462,7 @@ print(config) ########################################################################## -# парсинг конфига с получением словаря параметров +# parsing the config with obtaining the parameters dictionary # conf_parameters_dict # conf_restart_dict @@ -463,10 +470,10 @@ print(config) try: with open(config) as f: - # словарь с параметрами конфига + # dictionary with config options config_dict = dict() - # словарь с именами и командами для параметра execute_the_command + # dictionary with names and commands for the parameter execute_the_command etc_dict = dict() for line in f: @@ -487,7 +494,7 @@ try: etc_name = a[0].strip() etc_command = a[1].strip() if len(etc_name) > 15: - print('инвалид конфиг, длина имени процесса не должна превышать 15 символов\nExit') + print('Invalid config, the length of the process name must not exceed 15 characters\nExit') exit() etc_dict[etc_name] = etc_command @@ -506,9 +513,9 @@ except IndexError: ########################################################################## -# извлечение параметров из словаря -# проверка наличия всех необходимых параметров -# валидация всех параметров +# extracting parameters from the dictionary +# check for all necessary parameters +# validation of all parameters print_config = conf_parse_bool('print_config') @@ -523,47 +530,59 @@ print_sleep_periods = conf_parse_bool('print_sleep_periods') realtime_ionice = conf_parse_bool('realtime_ionice') + + + if 'realtime_ionice_classdata' in config_dict: realtime_ionice_classdata = string_to_int_convert_test( config_dict['realtime_ionice_classdata']) if realtime_ionice_classdata is None: - print('Invalid realtime_ionice_classdata value, not integer\nExit') + print('Invalid value of the "realtime_ionice_classdata" parameter.') + print('Valid values are integers from the range [0; 7].') + print('Exit') exit() if realtime_ionice_classdata < 0 or realtime_ionice_classdata > 7: - print('Invalid realtime_ionice_classdata value\nExit') + print('Invalid value of the "realtime_ionice_classdata" parameter.') + print('Valid values are integers from the range [0; 7].') + print('Exit') exit() else: - print('realtime_ionice_classdata not in config\nExit') + print('All the necessary parameters must be in the config') + print('There is no "realtime_ionice_classdata" parameter in the config') exit() + + + + mlockall = conf_parse_bool('mlockall') -if 'self_nice' in config_dict: - self_nice = string_to_int_convert_test(config_dict['self_nice']) - if self_nice is None: - print('Invalid self_nice value, not integer\nExit') +if 'niceness' in config_dict: + niceness = string_to_int_convert_test(config_dict['niceness']) + if niceness is None: + print('Invalid niceness value, not integer\nExit') exit() - if self_nice < -20 or self_nice > 19: - print('Недопустимое значение self_nice\nExit') + if niceness < -20 or niceness > 19: + print('Недопустимое значение niceness\nExit') exit() else: - print('self_nice not in config\nExit') + print('niceness not in config\nExit') exit() -if 'self_oom_score_adj' in config_dict: - self_oom_score_adj = string_to_int_convert_test( - config_dict['self_oom_score_adj']) - if self_oom_score_adj is None: - print('Invalid self_oom_score_adj value, not integer\nExit') +if 'oom_score_adj' in config_dict: + oom_score_adj = string_to_int_convert_test( + config_dict['oom_score_adj']) + if oom_score_adj is None: + print('Invalid oom_score_adj value, not integer\nExit') exit() - if self_oom_score_adj < -1000 or self_oom_score_adj > 1000: - print('Недопустимое значение self_oom_score_adj\nExit') + if oom_score_adj < -1000 or oom_score_adj > 1000: + print('Недопустимое значение oom_score_adj\nExit') exit() else: - print('self_oom_score_adj not in config\nExit') + print('oom_score_adj not in config\nExit') exit() @@ -813,17 +832,17 @@ else: exit() -if 'oom_score_min' in config_dict: - oom_score_min = string_to_int_convert_test( - config_dict['oom_score_min']) - if oom_score_min is None: - print('Invalid oom_score_min value, not integer\nExit') +if 'min_badness' in config_dict: + min_badness = string_to_int_convert_test( + config_dict['min_badness']) + if min_badness is None: + print('Invalid min_badness value, not integer\nExit') exit() - if oom_score_min < 0 or oom_score_min > 1000: - print('Недопустимое значение oom_score_min\nExit') + if min_badness < 0 or min_badness > 1000: + print('Недопустимое значение min_badness\nExit') exit() else: - print('oom_score_min not in config\nExit') + print('min_badness not in config\nExit') exit() @@ -844,10 +863,10 @@ else: exit() -if 'desktop_notifications' in config_dict: - desktop_notifications = config_dict['desktop_notifications'] - if desktop_notifications == 'True': - desktop_notifications = True +if 'gui_notifications' in config_dict: + gui_notifications = config_dict['gui_notifications'] + if gui_notifications == 'True': + gui_notifications = True users_dict = dict() with open('/etc/passwd') as f: for line in f: @@ -855,15 +874,15 @@ if 'desktop_notifications' in config_dict: username = line_list[0] uid = line_list[2] users_dict[uid] = username - elif desktop_notifications == 'False': - desktop_notifications = False + elif gui_notifications == 'False': + gui_notifications = False else: - print('Invalid desktop_notifications value {} (shoul' \ + print('Invalid gui_notifications value {} (shoul' \ 'd be True or False)\nExit'.format( - desktop_notifications)) + gui_notifications)) exit() else: - print('desktop_notifications not in config\nExit') + print('gui_notifications not in config\nExit') exit() @@ -873,47 +892,47 @@ notify_options = conf_parse_string('notify_options') root_display = conf_parse_string('root_display') -use_regex_lists = conf_parse_bool('use_regex_lists') -if use_regex_lists: +regex_matching = conf_parse_bool('regex_matching') +if regex_matching: from re import fullmatch -preferlist_regex = conf_parse_string('preferlist_regex') +prefer_regex = conf_parse_string('prefer_regex') -if 'preferlist_factor' in config_dict: - preferlist_factor = string_to_float_convert_test(config_dict['preferlist_factor']) - if preferlist_factor is None: - print('Invalid preferlist_factor value, not float\nExit') +if 'prefer_factor' in config_dict: + prefer_factor = string_to_float_convert_test(config_dict['prefer_factor']) + if prefer_factor is None: + print('Invalid prefer_factor value, not float\nExit') exit() - if preferlist_factor < 1 and preferlist_factor > 1000: - print('preferlist_factor должен быть в диапазоне [1; 1000]\nExit') + if prefer_factor < 1 and prefer_factor > 1000: + print('prefer_factor должен быть в диапазоне [1; 1000]\nExit') exit() else: - print('preferlist_factor not in config\nExit') + print('prefer_factor not in config\nExit') exit() -avoidlist_regex = conf_parse_string('avoidlist_regex') +avoid_regex = conf_parse_string('avoid_regex') -if 'avoidlist_factor' in config_dict: - avoidlist_factor = string_to_float_convert_test(config_dict['avoidlist_factor']) - if avoidlist_factor is None: - print('Invalid avoidlist_factor value, not float\nExit') +if 'avoid_factor' in config_dict: + avoid_factor = string_to_float_convert_test(config_dict['avoid_factor']) + if avoid_factor is None: + print('Invalid avoid_factor value, not float\nExit') exit() - if avoidlist_factor < 1 and avoidlist_factor > 1000: - print('avoidlist_factor должен быть в диапазоне [1; 1000]\nExit') + if avoid_factor < 1 and avoid_factor > 1000: + print('avoid_factor должен быть в диапазоне [1; 1000]\nExit') exit() else: - print('avoidlist_factor not in config\nExit') + print('avoid_factor not in config\nExit') exit() -low_memory_warnings = conf_parse_bool('low_memory_warnings') +gui_low_memory_warnings = conf_parse_bool('gui_low_memory_warnings') if 'min_time_between_warnings' in config_dict: @@ -1077,22 +1096,22 @@ else: # повышаем приоритет try: - os.nice(self_nice) - self_nice_result = 'OK' + os.nice(niceness) + niceness_result = 'OK' except PermissionError: - self_nice_result = 'Fail' + niceness_result = 'Fail' pass # возможность запрета самоубийства try: with open('/proc/self/oom_score_adj', 'w') as file: - file.write('{}\n'.format(self_oom_score_adj)) - self_oom_score_adj_result = 'OK' + file.write('{}\n'.format(oom_score_adj)) + oom_score_adj_result = 'OK' except PermissionError: pass - self_oom_score_adj_result = 'Fail' + oom_score_adj_result = 'Fail' except OSError: - self_oom_score_adj_result = 'Fail' + oom_score_adj_result = 'Fail' pass # запрет своппинга процесса @@ -1111,6 +1130,10 @@ self_uid = os.geteuid() self_pid = os.getpid() + + + + if self_uid == 0: root = True decrease_res = 'OK' @@ -1140,11 +1163,11 @@ if print_config: print('\nII. SELF-DEFENSE [displaying these options need fix]') print('mlockall: {} ({})'.format(mlockall, mla_res)) - print('self_nice: {} ({})'.format( - self_nice, self_nice_result + print('niceness: {} ({})'.format( + niceness, niceness_result )) - print('self_oom_score_adj: {} ({})'.format( - self_oom_score_adj, self_oom_score_adj_result + print('oom_score_adj: {} ({})'.format( + oom_score_adj, oom_score_adj_result )) print('\nIII. INTENSITY OF MONITORING') @@ -1170,7 +1193,7 @@ if print_config: print('\nV. PREVENTION OF KILLING INNOCENT VICTIMS') print('min_delay_after_sigterm: {}'.format(min_delay_after_sigterm)) print('min_delay_after_sigkill: {}'.format(min_delay_after_sigkill)) - print('oom_score_min: {}'.format(oom_score_min)) + print('min_badness: {}'.format(min_badness)) # False (OK) - OK не нужен когда фолс print('decrease_oom_score_adj: {} ({})'.format( @@ -1180,22 +1203,22 @@ if print_config: print('oom_score_adj_max: {}'.format(oom_score_adj_max)) print('\nVI. DESKTOP NOTIFICATIONS') - print('desktop_notifications: {}'.format(desktop_notifications)) - if desktop_notifications: + print('gui_notifications: {}'.format(gui_notifications)) + if gui_notifications: print('notify_options: {}'.format(notify_options)) print('root_display: {}'.format(root_display)) print('\nVII. AVOID AND PREFER VICTIM NAMES VIA REGEX') - print('use_regex_lists: {}'.format(use_regex_lists)) - if use_regex_lists: - print('preferlist_regex: {}'.format(preferlist_regex)) - print('preferlist_factor: {}'.format(preferlist_factor)) - print('avoidlist_regex: {}'.format(avoidlist_regex)) - print('avoidlist_factor: {}'.format(avoidlist_factor)) + print('regex_matching: {}'.format(regex_matching)) + if regex_matching: + print('prefer_regex: {}'.format(prefer_regex)) + print('prefer_factor: {}'.format(prefer_factor)) + print('avoid_regex: {}'.format(avoid_regex)) + print('avoid_factor: {}'.format(avoid_factor)) print('\nIX. LOW MEMORY WARNINGS') - print('low_memory_warnings: {}'.format(low_memory_warnings)) - if low_memory_warnings: + print('gui_low_memory_warnings: {}'.format(gui_low_memory_warnings)) + if gui_low_memory_warnings: print('min_time_between_warnings: {}'.format(min_time_between_warnings)) print('mem_min_warnings: {} MiB, {} %'.format( @@ -1218,7 +1241,7 @@ if print_config: ########################################################################## -# для рассчета ширины столбцов при печати mem и zram +# for calculating the column width when printing mem and zram mem_len = len(str(round(mem_total / 1024.0))) rate_mem = rate_mem * 1048576 @@ -1233,11 +1256,11 @@ print('\nStart monitoring...') ########################################################################## -# цикл проверки уровней доступной памяти + while True: - # находим mem_available, swap_total, swap_free + # find mem_available, swap_total, swap_free with open('/proc/meminfo') as f: for n, line in enumerate(f): if n is 2: @@ -1252,7 +1275,7 @@ while True: - # если swap_min_sigkill задан в процентах + # if swap_min_sigkill is set in percent if swap_kill_is_percent: swap_min_sigkill_kb = swap_total * swap_min_sigkill_percent / 100.0 @@ -1263,7 +1286,7 @@ while True: swap_min_warnings_kb = swap_total * swap_min_warnings_percent / 100.0 - # находим MemUsedZram + # find MemUsedZram disksize_sum = 0 mem_used_total_sum = 0 for dev in os.listdir('/sys/block'): @@ -1325,6 +1348,7 @@ while True: # MEM SWAP KILL if mem_available <= mem_min_sigkill_kb and swap_free <= swap_min_sigkill_kb: + time0 = time() mem_info = '* MemAvailable ({} MiB, {} %) < mem_min_sigkill ({} MiB, {} %)\n Swa' \ 'pFree ({} MiB, {} %) < swap_min_sigkill ({} MiB, {} %)'.format( @@ -1344,6 +1368,7 @@ while True: # ZRAM KILL elif mem_used_zram >= zram_max_sigkill_kb: + time0 = time() mem_info = '* MemUsedZram ({} MiB, {} %) > zram_max_sigkill ({} MiB, {} %)'.format( kib_to_mib(mem_used_zram), @@ -1355,8 +1380,9 @@ while True: # MEM SWAP TERM elif mem_available <= mem_min_sigterm_kb and swap_free <= swap_min_sigterm_kb: + time0 = time() - mem_info = '* MemAvailable ({} MiB, {} %) < mem_min_sigterm ({} MiB, {} %)\n Sw' \ + mem_info = r'{' + '\n MemAvailable ({} MiB, {} %) < mem_min_sigterm ({} MiB, {} %)\n Sw' \ 'apFree ({} MiB, {} %) < swap_min_sigterm ({} MiB, {} %)'.format( kib_to_mib(mem_available), percent(mem_available / mem_total), @@ -1379,6 +1405,7 @@ while True: # ZRAM TERM elif mem_used_zram >= zram_max_sigterm_kb: + time0 = time() mem_info = '* MemUsedZram ({} MiB, {} %) > zram_max_sigter' \ 'm ({} M, {} %)'.format( @@ -1390,7 +1417,7 @@ while True: find_victim_and_send_signal(15) # LOW MEMORY WARNINGS - elif low_memory_warnings and desktop_notifications: + elif gui_low_memory_warnings and gui_notifications: if mem_available < mem_min_warnings_kb and swap_free < swap_min_warnings_kb + 0.1 or mem_used_zram > zram_max_warnings_kb: warn_time_delta = time() - warn_time_now diff --git a/nohang.conf b/nohang.conf index 50abe46..a713447 100644 --- a/nohang.conf +++ b/nohang.conf @@ -5,31 +5,40 @@ The configuration includes the following sections: - * THRESHOLDS FOR SENDING SIGNALS TO VICTIMS - * INTENSITY OF MONITORING (AND CPU USAGE) - * PREVENTION OF KILLING INNOCENT VICTIMS - * AVOID AND PREFER VICTIM NAMES VIA REGEX MATCHING - * EXECUTE THE COMMAND INSTEAD OF SENDING THE SIGTERM SIGNAL - * GUI NOTIFICATIONS: RESULTS OF PREVENTING OOM AND LOW MEMORY WARNINGS - * SELF-DEFENSE AND PREVENTING SLOWING DOWN THE PROGRAM - * OUTPUT VERBOSITY + 1. Memory levels to respond to as an OOM threat + 2. The frequency of checking the level of available memory + (and CPU usage) + 3. The prevention of killing innocent victims + 4. Impact on the badness of processes via matching their names + with regular expressions + 5. The execution of a specific command instead of sending the + SIGTERM signal + 6. GUI notifications: + - results of preventing OOM + - low memory warnings + 7. Preventing the slowing down of the program + 8. Output verbosity Just read the description of the parameters and edit the values. Please restart the program after editing the config. ##################################################################### - * THRESHOLDS FOR SENDING SIGNALS TO VICTIMS + 1. Thresholds below which a signal should be sent to the victim Sets the available memory levels below which SIGTERM or SIGKILL signals are sent. The signal will be sent if MemAvailable and - SwapFree at the same time will drop below the corresponding - values. Can be specified in % (percent) and M (MiB). Valid values - are floating-point numbers from the range [0; 100] %. + SwapFree (in /proc/meminfo) at the same time will drop below the + corresponding values. Can be specified in % (percent) and M (MiB). + Valid values are floating-point numbers from the range [0; 100] %. + + MemAvailable levels. mem_min_sigterm = 9 % mem_min_sigkill = 6 % + SwapFree levels. + swap_min_sigterm = 9 % swap_min_sigkill = 6 % @@ -41,30 +50,26 @@ swap_min_sigkill = 6 % Can be specified in % and M. Valid values are floating-point numbers from the range [0; 100] %. -zram_max_sigterm = 55 % -zram_max_sigkill = 60 % +zram_max_sigterm = 50 % +zram_max_sigkill = 55 % ##################################################################### - * INTENSITY OF MONITORING (AND CPU USAGE) + 2. The frequency of checking the amount of available memory + (and CPU usage) Coefficients that affect the intensity of monitoring. Reducing the coefficients can reduce CPU usage and increase the periods between memory checks. - Почему три коэффициента, а не один? - Потому что скорость - наполнения свопа обычно ниже скорости наполнения RAM. - Можно для свопа задать более низкую интенсивность - мониторинга без ущерба для предотвращения нехватки памяти - и тем самым снизить нагрузку на процессор. + Why three coefficients instead of one? Because the swap fill rate + is usually lower than the RAM fill rate. - В дефолтных настройках на данной интенсивности демон работает - достаточно хорошо, успешно справляясь с резкими скачками потребления - памяти. + It is possible to set a lower intensity of monitoring for swap + without compromising to prevent OOM and thus reduce the CPU load. - Default values are well for desktop. - On servers without rapid fluctuations in memory level, the - values can be reduced. + Default values are well for desktop. On servers without rapid + fluctuations in memory levels the values can be reduced. Valid values are positive floating-point numbers. @@ -74,20 +79,19 @@ rate_zram = 1 ##################################################################### - * PREVENTION OF KILLING INNOCENT VICTIMS + 3. The prevention of killing innocent victims Минимальное значение oom_score, которым должен обладать процесс для того, чтобы ему был отправлен сигнал. Позволяет предотвратить убийство невиновных если что-то - пойдет не так. Может min_badness с учетом списков? + пойдет не так. Valid values are integers from the range [0; 1000]. -oom_score_min = 10 +min_badness = 10 Минимальная задержка после отправки соответствующих сигналов для предотвращения риска убийства сразу множества процессов. - Должно быть неотрицательным числом. Valid values are non-negative floating-point numbers. @@ -104,6 +108,7 @@ min_delay_after_sigkill = 3 Enabling the option requires root privileges. Valid values are True and False. + Values are case sensitive. decrease_oom_score_adj = False @@ -113,75 +118,79 @@ oom_score_adj_max = 20 ##################################################################### - * AVOID AND PREFER VICTIM NAMES VIA REGEX MATCHING + 4. Impact on the badness of processes via matching their names + with regular expressions. - Можно задать регулярные выражения (Perl-compatible regular - expressions), которые будут использоваться для сопоставления с - именами процессов для влияния на их badness. + See https://en.wikipedia.org/wiki/Regular_expression and + https://en.wikipedia.org/wiki/Perl_Compatible_Regular_Expressions - Включение этой опции замедляет поиск жертвы, так как - имена всех процессов сравниваются с заданными regex-паттернами. + Enabling this option slows down the search for the victim + because the names of all processes are compared with the + specified regex patterns. Valid values are True and False. -use_regex_lists = False +regex_matching = False - Badness процессов, имена которых соответствуют preferlist_regex, - будут рассчитываться по формуле - badness = (oom_score + 1) * preferlist_factor + Badness of processes whose names correspond to prefer_regex will + be calculated by the following formula: + badness = (oom_score + 1) * prefer_factor -preferlist_regex = tail|python3 +prefer_regex = tail|python3 Valid values are floating-point numbers from the range [1; 1000]. -preferlist_factor = 3 +prefer_factor = 3 - Список нежелательных для убийства процессов. + Badness of processes whose names correspond to avoid_regex will + be calculated by the following formula: + badness = oom_score / avoid_factor - Badness процессов, имена которых соответствуют avoidlist_regex, - будут рассчитываться по формуле - badness = oom_score / avoidlist_factor - -avoidlist_regex = Xorg|sshd +avoid_regex = Xorg|sshd Valid values are floating-point numbers from the range [1; 1000]. -avoidlist_factor = 4 +avoid_factor = 3 ##################################################################### - * EXECUTE THE COMMAND INSTEAD OF SENDING THE SIGTERM SIGNAL + 5. The execution of a specific command instead of sending the + SIGTERM signal. - Для процессов с определенным именем можно задать команду, - которая будет выполняться вместо отправки сигнала SIGTERM - процессу с соответствующим именем. + For processes with a specific name you can specify a command to + run instead of sending the SIGTERM signal. - Например, если процесс запущен как демон, то вместо - отправки SIGTERM можно выполнить команду перезапуска. + For example, if the process is running as a daemon, you can run + the restart command instead of sending SIGTERM. Valid values are True and False. execute_the_command = False - Длина имени процесса не должна превышать 15 символов. - Синтаксис таков: строки, начинающиеся с **, считаются строками, - содержащими имена процессов и соотвестствующие команды для - перезапуска этих процессов. После имени процесса через двойное - двоеточие (::) следует команда. - Амперсанд (&) в конце команды позволит nohang продолжить работу - не дожидаясь окончания выполнения команды. + The length of the process name can't exceed 15 characters. + The syntax is as follows: lines starting with ** are considered + as the lines containing names of processes and corresponding + commands. After a name of process the double colon (::) follows. + And then follows the command that will be executed if the + specified process is selected as a victim. + The ampersand (&) at the end of the command will allow nohang to + continue runing without waiting for the end of the command + execution. For example: ** mysqld :: systemctl restart mariadb.service & - ** php-fpm7.0 :: systemctl restart php7.0-fpm.service & + ** php-fpm7.0 :: systemctl restart php7.0-fpm.service ** processname :: some command + Extra sleep time after executing the command (in addition to + min_sleep_after_sigterm). + ##################################################################### - * GUI NOTIFICATIONS: - * RESULTS OF PREVENTING OOM - * LOW MEMORY WARNINGS + 6. GUI notifications: + - results of preventing OOM + - low memory warnings Включение этой опции требует наличия notify-send в системе. В Debian/Ubuntu это обеспечивается установкой пакета @@ -192,7 +201,7 @@ execute_the_command = False See also wiki.archlinux.org/index.php/Desktop_notifications Valid values are True and False. -desktop_notifications = False +gui_notifications = False Additional options for notify-send. See `notify-send --help` and read `man notify-send` @@ -213,7 +222,7 @@ root_display = :0 Для работы опции должны быть включены десктопные уведомления. Valid values are True and False. -low_memory_warnings = False +gui_low_memory_warnings = True Минимальное время между отправками уведомлений в секундах. Valid values are floating-point numbers from the range [1; 300]. @@ -238,32 +247,32 @@ zram_max_warnings = 40 % ##################################################################### - * SELF-DEFENSE AND PREVENTING SLOWING DOWN THE PROGRAM + 7. Preventing the slowing down of the program - True - заблокировать процесс в памяти для запрета его своппинга. - False - не блокировать. + mlockall() lock ... all of the calling process's virtual address + space into RAM, preventing that memory from being paged to the + swap area. - `man mlockall` - В Fedora 28 значение True вызывает увеличение потребления - памяти процессом на 200 MiB, в Debian 8 и 9 такой проблемы нет. + It is disabled by default because the value mlockall = True in + Fedora 28 causes the process to increase memory consumption by + 200 MiB. On Debian 8 and 9 there is no such problem. mlockall = False - Установка отрицательных значений self_nice и self_oom_score_adj + Установка отрицательных значений niceness и oom_score_adj требует наличия root прав. - Установка отрицательного self_nice повышает приоритет процесса. + Установка отрицательного niceness повышает приоритет процесса. Valid values are integers from the range [-20; 19]. -self_nice = -15 +niceness = -15 - # -> niceness - - Set oom_score_adj for the process. + Set oom_score_adj for the nohang process. Valid values are integers from the range [-1000; 1000]. Setting the values to -1000 will prohibit suicide. -self_oom_score_adj = -100 +oom_score_adj = -100 Read `man ionice` to understand the following parameters. Setting the True value requires the root privileges. @@ -279,11 +288,10 @@ realtime_ionice_classdata = 5 ##################################################################### - * STANDARD OUTPUT VERBOSITY + 8. Output verbosity Display the configuration when the program starts. Valid values are True and False. - Values are case sensitive! print_config = False