This is nohang config file. Redesign of this config in progress. Lines starting with #, tabs and spaces are comments. Lines starting with $ contain obligatory parameters. Lines starting with @ contain optional parameters. The configuration includes the following sections: 1. Memory levels to respond to as an OOM threat 2. Response on PSI memory metrics 3. The frequency of checking the level of available memory (and CPU usage) 4. The prevention of killing innocent victims 5. Impact on the badness of processes via matching their - names, - cmdlines and - UIDs with regular expressions 6. The execution of a specific command instead of sending the SIGTERM signal 7. GUI notifications: - OOM prevention results and - low memory warnings 8. Output verbosity 9. Misc Just read the description of the parameters and edit the values. Please restart the program after editing the config. ##################################################################### 1. Thresholds below which a signal should be sent to the victim Sets the available memory levels at or below which SIGTERM or SIGKILL signals are sent. The signal will be sent if MemAvailable and SwapFree (in /proc/meminfo) at the same time will drop below the corresponding values. Can be specified in % (percent) and M (MiB). Valid values are floating-point numbers from the range [0; 100] %. MemAvailable levels. mem_min_sigterm = 10 % mem_min_sigkill = 5 % SwapFree levels. swap_min_sigterm = 10 % swap_min_sigkill = 5 % Specifying the total share of zram in memory, if exceeded the corresponding signals are sent. As the share of zram in memory increases, it may fall responsiveness of the system. 90 % is a usual hang level, not recommended to set very high. Can be specified in % and M. Valid values are floating-point numbers from the range [0; 90] %. zram_max_sigterm = 50 % zram_max_sigkill = 55 % ##################################################################### 2. Response on PSI memory metrics (it needs Linux 4.20 and up) About PSI: https://facebookmicrosites.github.io/psi/ Disabled by default (ignore_psi = True). ignore_psi = True Choose a path to PSI file. By default it monitors system-wide file: /proc/pressure/memory You also can set file to monitor one cgroup slice. For example: psi_path = /sys/fs/cgroup/unified/user.slice/memory.pressure psi_path = /sys/fs/cgroup/unified/system.slice/memory.pressure psi_path = /sys/fs/cgroup/unified/system.slice/foo.service/memory.pressure Execute the command find /sys/fs/cgroup | grep -P "memory\.pressure$" to find available memory.pressue files (except /proc/pressure/memory). psi_path = /proc/pressure/memory Valid psi_metrics are: some_avg10 some_avg60 some_avg300 full_avg10 full_avg60 full_avg300 some_avg10 is most sensitive. psi_metrics = some_avg10 sigterm_psi_threshold = 80 sigkill_psi_threshold = 90 psi_post_action_delay = 60 ##################################################################### 3. The frequency of checking the amount of available memory (and CPU usage) Coefficients that affect the intensity of monitoring. Reducing the coefficients can reduce CPU usage and increase the periods between memory checks. Why three coefficients instead of one? Because the swap fill rate is usually lower than the RAM fill rate. It is possible to set a lower intensity of monitoring for swap without compromising to prevent OOM and thus reduce the CPU load. Default values are well for desktop. On servers without rapid fluctuations in memory levels the values can be reduced. Valid values are positive floating-point numbers. rate_mem = 4000 rate_swap = 1500 rate_zram = 500 See also https://github.com/rfjakob/earlyoom/issues/61 Максимальное время сна между проверками памяти. Положительное число. max_sleep_time = 3 Минимальное время сна между проверками памяти. Положительное число, не превышающее max_sleep_time. min_sleep_time = 0.1 ##################################################################### 4. The prevention of killing innocent victims Минимальное значение bandess (по умолчанию равно oom_score), которым должен обладать процесс для того, чтобы ему был отправлен сигнал. Позволяет предотвратить убийство невиновных если что-то пойдет не так. Valid values are integers from the range [0; 1000]. min_badness = 20 Минимальная задержка после отправки соответствующих сигналов для предотвращения риска убийства сразу множества процессов. Valid values are non-negative floating-point numbers. min_delay_after_sigterm = 0.2 min_delay_after_sigkill = 1 Процессы браузера chromium обычно имеют oom_score_adj 200 или 300. Это приводит к тому, что процессы хрома умирают первыми вместо действительно тяжелых процессов. Если параметр decrease_oom_score_adj установлен в значение True, то у процессов, имеющих oom_score_adj выше oom_score_adj_max значение oom_score_adj будет опущено до oom_score_adj_max перед поиском жертвы. Enabling the option requires root privileges. Valid values are True and False. Values are case sensitive. decrease_oom_score_adj = False Valid values are integers from the range [0; 1000]. oom_score_adj_max = 20 ##################################################################### 5. Impact on the badness of processes via matching their names, cmdlines or UIDs with regular expressions using re.search(). See https://en.wikipedia.org/wiki/Regular_expression and https://en.wikipedia.org/wiki/Perl_Compatible_Regular_Expressions Enabling this options slows down the search for the victim because the names, cmdlines or UIDs of all processes (except init and kthreads) are compared with the specified regex patterns (in fact slowing down is caused by reading all /proc/*/cmdline and /proc/*/status files). Use script `oom-sort` from nohang package to view names, cmdlines and UIDs of processes. 5.1 Matching process names with RE patterns Valid values are True and False. regex_matching = False Syntax: @PROCESSNAME_RE badness_adj /// RE_pattern New badness value will be += badness_adj It is possible to compare multiple patterns with different badness_adj values. Example: @PROCESSNAME_RE -100 /// ^Xorg$ @PROCESSNAME_RE -500 /// ^sshd$ 5.2 Matching cmdlines with RE patterns A good option that allows fine adjustment. re_match_cmdline = False @CMDLINE_RE 300 /// -childID|--type=renderer @CMDLINE_RE -200 /// ^/usr/lib/virtualbox 5.3 Matching UIDs with RE patterns The most slow option re_match_uid = False @UID_RE -100 /// ^0$ 5.4 Matching CGroup-line with RE patterns re_match_cgroup = True @CGROUP_RE -50 /// system.slice @CGROUP_RE 50 /// foo.service @CGROUP_RE 2000 /// user.slice 5.5 Matching realpath with RE patterns re_match_realpath = False @REALPATH_RE 20 /// ^/usr/bin/foo Note that you can control badness also via systemd units via OOMScoreAdjust, see https://www.freedesktop.org/software/systemd/man/systemd.exec.html#OOMScoreAdjust= ##################################################################### 6. The execution of a specific command instead of sending the SIGTERM signal. For processes with a specific name you can specify a command to run instead of sending the SIGTERM signal. For example, if the process is running as a daemon, you can run the restart command instead of sending SIGTERM. Valid values are True and False. execute_the_command = False The length of the process name can't exceed 15 characters. The syntax is as follows: lines starting with keyword $ETC are considered as the lines containing names of processes and corresponding commands. After a name of process the triple slash (///) follows. And then follows the command that will be executed if the specified process is selected as a victim. The ampersand (&) at the end of the command will allow nohang to continue runing without waiting for the end of the command execution. For example: $ETC mysqld /// systemctl restart mariadb.service & $ETC php-fpm7.0 /// systemctl restart php7.0-fpm.service If command will contain $PID pattern, this template ($PID) will be replaced by PID of process which name match with RE pattern. Exmple: $ETC bash /// kill -KILL $PID It is way to send any signal instead of SIGTERM. (run `kill -L` to see list of all signals) Also $NAME will be replaced by process name. $ETC bash /// kill -9 $PID $ETC firefox-esr /// kill -SEGV $PID $ETC tail /// kill -9 $PID $ETC apache2 /// systemctl restart apache2 ##################################################################### 7. GUI notifications: - OOM prevention results and - low memory warnings Включение этой опции требует наличия notify-send в системе. В Debian/Ubuntu это обеспечивается установкой пакета libnotify-bin. В Fedora и Arch Linux - пакет libnotify. Также требуется наличие сервера уведомлений. При запуске nohang от рута уведомления рассылаются всем залогиненным пользователям. See also wiki.archlinux.org/index.php/Desktop_notifications Valid values are True and False. gui_notifications = False Enable GUI notifications about the low level of available memory. Valid values are True and False. gui_low_memory_warnings = False Execute the command instead of sending GUI notifications if the value is not empty line. For example: warning_exe = cat /proc/meminfo & warning_exe = Если значения MemAvailable и SwapFree одновременно будут ниже соотвестствующих значений, то будут отправлены уведомления. Can be specified in % (percent) and M (MiB). Valid values are floating-point numbers from the range [0; 100] %. mem_min_warnings = 25 % swap_min_warnings = 25 % Если доля zram в памяти превысит значение zram_max_warnings, то будут отправляться уведомления с минимальным периодом равным min_time_between_warnings. zram_max_warnings = 40 % Минимальное время между отправками уведомлений в секундах. Valid values are floating-point numbers from the range [1; 300]. min_time_between_warnings = 15 Ampersands (&) will be replaced with asterisks (*) in process names and in commands. ##################################################################### 8. Verbosity Display the configuration when the program starts. Valid values are True and False. print_config = False Print memory check results. Valid values are True and False. print_mem_check_results = False Минимальная периодичность печати состояния памяти. 0 - печатать все проверки памяти. Неотрицательное число. min_mem_report_interval = 60 Print sleep periods between memory checks. Valid values are True and False. print_sleep_periods = False Печатать общую статистику по корректирующим действиям с момента запуска nohang после каждого корректирующего действия. print_total_stat = True Печатать таблицу процессов перед каждым корректирующим действием. print_proc_table = False print_victim_info = True Максимальная глубина показа родословной жертвы. По умолчанию (1) показывается только родитель - PPID. Целое положительное число. max_ancestry_depth = 1 separate_log = False psi_debug = False ##################################################################### 9. Misc Жертва может не реагировать на SIGTERM. max_post_sigterm_victim_lifetime - это время, при превышении которого жертва получит SIGKILL. Неотрицательные числа. max_post_sigterm_victim_lifetime = 10 Execute the command after sending SIGKILL to the victim if the value is not empty line. For example: post_kill_exe = cat /proc/meminfo & post_kill_exe = forbid_negative_badness = True