'stop_pipeline' filed may be reused during cache lifetime (e.g. when cache is
detached and attached again - the pipeline would be freed and then
re-allocated). Calling completion after detach before freeing the pipeline may
lead to race condition.
Signed-off-by: Michal Mielewczyk <michal.mielewczyk@huawei.com>
Cache mngt lock cannot be unlocked from io completion context (which is
potentially atomic context) as it may involve sleeping operations.
Modify cleaner utility to support rescheduling to queue context before
calling the completion. Update cleaning policies to use that option.
Signed-off-by: Robert Baldyga <robert.baldyga@huawei.com>
HB lock takes inclusive metadata lock, which is taken also by metadata
flush, thus trying to call metadata flush under HB lock attempts to take
this lock recursively. In that case, if in the meantime some other thread
would try to take exclusive metadata lock, the inner inclusive lock would
block (because the lock keeps the order), with outer inclusive lock still
held, leading to a deadlock.
Signed-off-by: Robert Baldyga <robert.baldyga@huawei.com>
There are situations when we can end up in engine_pt with cache lines
locked for write. One example is engine_rd falling back to engine_pt after
failure during cache line preparation, where write lock has been already
taken. To handle this situation properly, unlock request using more general
unlock function.
Signed-off-by: Robert Baldyga <robert.baldyga@huawei.com>
there is an issue when someone call to parallelize/pipeline
with some struct that is aligned (say to 64B)
but these APIs add their own data, right before
the user's private data.
so, the user's data is no longer aligned
which might cause segfault in some cases.
Signed-off-by: Amir Haroush <amir.haroush@huawei.com>
Signed-off-by: Shai Fultheim <shai.fultheim@huawei.com>
Signed-off-by: Robert Baldyga <robert.baldyga@intel.com>
Because context has one field which is aligned to 64B
(struct ocf_volume cache_volume) the compiler use vmovdqa (aligned)
instead of vmovdqu (unaligned) in reality the address is not 64 aligned,
it ends with 0x8, so we get this segfault.
Signed-off-by: Amir Haroush <amir.haroush@huawei.com>
Signed-off-by: Shai Fultheim <shai.fultheim@huawei.com>
Signed-off-by: Robert Baldyga <robert.baldyga@intel.com>
With a high dirty ratio and occupancy, OCF might unable to map cache lines
for new requests, thus pass-through the I/O to core devices. IOPS will
drop afterwards. We need to control the dirty ratio.
Existing `alru' policy gives user the chance to control the stale buffer
time, activity threshold etc. They can affect the dirty ratio of the cache
device, but in an empirical manner, more or less. Introducing
`max_dirty_ratio' can make it explicit.
At first glance, it might be better to implement a dedicated cleaner policy
directly targeting dirty ratio goal, so that the `alru' parameters remains
orthogonal. But one the other hand, we still need to flush dirty cache
lines periodically, instead of just keeping a watermark of dirty ratio.
It indicates that existing `alru' parameters are still required if we
develop a new policy, and it seems reasonable to make it a parameter.
To sum up, this patch does the following:
- added a 'max_dirty_ratio' parameter with default value 100;
- with default value 100, `alru' cleaner is identical to what is was;
- with value N less than 100, the cleaner (when waken up) will active
brought dirty ratio to N, regardless of staleness time.
Signed-off-by: David Lee <live4thee@gmail.com>
Don't populate cleaning policies during initialization procedure so the user
has to call the latter explicitly.
Until now cleaning policies could be populated in two ways:
- implicitly during cleaning policy initialization,
- explicitly be calling populate.
The difference was that the former was single threaded.
This patch removes the functionally redundant and less efficient code.
Signed-off-by: Michal Mielewczyk <michal.mielewczyk@intel.com>
The function not only recovers cleaning policy metadata but is also utilized
to initialize data structures so more generic name is actually more accurate
Signed-off-by: Michal Mielewczyk <michal.mielewczyk@intel.com>
Initializing metadata in an asynchronous manner will allow to use
parallelization utilities in the future commits
Signed-off-by: Michal Mielewczyk <michal.mielewczyk@intel.com>
Normally cleaning policy would be deinitialized during stopping cache which is
one of steps of error handling e.g in case of failed cache activation. But since
`cache_stop()` may be called only for an attached cache instance, cleaning
policy needs to deinitialized explicitly.
Signed-off-by: Michal Mielewczyk <michal.mielewczyk@intel.com>
Remove one callback indirection level. I/O never changes it's direction
so there is no point in storing both read and write callbacks for each
request.
Signed-off-by: Robert Baldyga <robert.baldyga@intel.com>
In most (6/9) instances across engines ocf_core_stats_cache_error_update
is called upon each cache volume I/O error, possibly multiple times
per a user request in case of multi-cacheline requests. Backfill,
fast and read engine are exceptions, incrementing error stats only
once per user request.
This commit unifies ocf_core_stats_cache_error_update usage so that
in all the engines error statistic is incremented for once for every
error.
Signed-off-by: Adam Rutkowski <adam.j.rutkowski@intel.com>
It is wastefull to allocate a full 1B to store 1 bit of
alock status per cacheline. Fixed allocation of 128 bits
seems more reasonable.
Signed-off-by: Adam Rutkowski <adam.j.rutkowski@intel.com>
1. On 1 bit per cacheline is required for the status
2. ... however the size must be 8B aligned
Signed-off-by: Adam Rutkowski <adam.j.rutkowski@intel.com>
Metadata capacity reported by dmesg was actually a memory footprint.
A proper size of metadata is now reported.
Signed-off-by: Krzysztof Majzerowicz-Jaszcz <krzysztof.majzerowicz-jaszcz@intel.com>
Optional uuid parameter to ocf_volume_init() points to UUID object
initialized by the user. We should verify it is not excesively large
as we attempt to allocate a buffer to store a copy of the UUID.
Signed-off-by: Adam Rutkowski <adam.j.rutkowski@intel.com>
The proper way to avoid calling on_deinit() callback on an already
deinitialized volume is to deinitialize type callbacks, as it is done
in the previous commit.
This reverts commit a7f70687a9.
Signed-off-by: Adam Rutkowski <adam.j.rutkowski@intel.com>
After deinitialization of volume there is no need to call back to
type ops. Currently we would erroneously call on_deinit() callback
multiple times if ocf_volume_deinit() is performed more than once,
which we expect to happen and treat as a correct use of API.
Signed-off-by: Adam Rutkowski <adam.j.rutkowski@intel.com>
ocf_metadata_flush_superblock() is being called on the cache stop, after
deinitialization of the cores (and their volumes), thus accessing core
volume in superblock flushing procedure leads to use-after-free bug.
Fix this by moving volume type setting to the core insertion code.
Signed-off-by: Robert Baldyga <robert.baldyga@intel.com>
After moving from a volume, it's priv is assigned to the new owner.
Destroying the volume after moving from it must not attempt to use the
priv, especially not to attempt to deinit member volumes in case of
composite volume.
Signed-off-by: Adam Rutkowski <adam.j.rutkowski@intel.com>
Volumes are now exposed in OCF API and we should gracefully handle
attempt to open already opened volume (instead of ENV_BUG).
Signed-off-by: Adam Rutkowski <adam.j.rutkowski@intel.com>
It makes it possible to attach/load cache using volume types that have
non-standard constructors.
Signed-off-by: Robert Baldyga <robert.baldyga@intel.com>
Flush I/O should be forwarded to core and cache device. In case of core
this is simple - just mirror the I/O from the top volume. Since
cache data is owned by OCF it makes sense to send a simple flush I/O
with 0 address and size.
Current implementation attempts to use cache data I/O interface
(ocf_submit_cache_reqs function) instead of submitting empty flush to
the underlying cache device. This function is designed to read/write
from mapped cachelines while there is no traversation/mapping
performed on flush I/O.
If request map allocation succeeds, this results in sending I/O to
addres 0 with size and flags inherited from the top adapter I/O.
This doesn't make any sense, and can even result in invalid I/O if the
size is greater than cache device size.
Even worse, if flush request map allocation fails (which happens
always in case of large flush requests) then the erroneous call to
ocf_submit_cache_reqs results in NULL pointer dereference.
Signed-off-by: Adam Rutkowski <adam.j.rutkowski@intel.com>
Right now alock assumes that number of locks taken will equal number of
core lines. This is not the case in pio, where only parts of metadata
are under locks. If pio request overlaps locked and not-locked metadata
section it will have different core lines number and awaited locks
number. To remedy this discrepancy additional method which gets count of
locks that will be taken/waited on is added to alock API.
Signed-off-by: Jan Musial <jan.musial@intel.com>
It's required, because environments other than Linux kernel may not define
their own DIV_ROUND_UP. Moving it to env would just generate boilerplate,
because its implementation is trivial and portable.
Signed-off-by: Robert Baldyga <robert.baldyga@intel.com>
This allows to avoid allocating cleaner metadata section and effectively
save up to 20% of metadata memory footprint.
Signed-off-by: Robert Baldyga <robert.baldyga@intel.com>
Since the threshold for the first bucket is always zero and the condition to
exit from the loop is never met in the first iteration it is save to start
iterating from `1`
This change is meant to avoid confusing static code analyzers
Signed-off-by: Michal Mielewczyk <michal.mielewczyk@intel.com>
Those names are used for creating allocators. In Linux kernel environment
starting from version 5.12 there is a kernel warning if allocator name
contains spaces. This patch resolves this problem by replacing spaces with
underscores.
Signed-off-by: Robert Baldyga <robert.baldyga@intel.com>
Cleaning policy is initialized on standby activate, after all the metadata
from primary cache is flushed and the actual recovery is being performed.
Thus initializing it earlier on standby attach is incorrect.
Signed-off-by: Robert Baldyga <robert.baldyga@intel.com>
conf_meta->core_count is not modified during load/recovery in the latest
version. Thus in case of error in cores initialization, in order to
iterate over the initialized cores we must depend on core->added only,
regardles of conf_meta->core_count value. for_each_core() macro does
exactly this.
Signed-off-by: Adam Rutkowski <adam.j.rutkowski@intel.com>
Fix error code for superblock checksum mismatch.
Superblock validation now returns a proper error on checksum check fail.
Signed-off-by: Krzysztof Majzerowicz-Jaszcz <krzysztof.majzerowicz-jaszcz@intel.com>
Set bit only on core addition and clean it on core removal.
This allows to avoid conf metadata modification in load / standby load
paths, which effectively prevents issues with metadata mismatch during
consequent standby activate attempts after initial activate failure.
Previously the first attempt changed the metadata, so on comparison with
metadata on drive failed on any following attempt, leading to inability
to activate the cache.
Signed-off-by: Robert Baldyga <robert.baldyga@intel.com>
First try to clean only the mapping. This operation does not require any
rollback, so even if flushing collision fails, core object is still
intact. In case of error we inform user that core was not removed by
returning new error code (-OCF_ERR_CORE_NOT_REMOVED).
After flushing collision succeeds we remove core from metadata and
flush superblock at the end. At that point the core is fully removed
from OCF and even if superblock flush error occurs there is nothing we
can do about it, so we just return the error code.
Signed-off-by: Robert Baldyga <robert.baldyga@intel.com>
This function must be fixed to work with metadata flapping. Until then
mark as not supported
Signed-off-by: Adam Rutkowski <adam.j.rutkowski@intel.com>
Signed-off-by: Krzysztof Majzerowicz-Jaszcz <krzysztof.majzerowicz-jaszcz@intel.com>
Superblock can be used during load of other sections, so we need to check
its CRC before other sections are loaded.
Signed-off-by: Robert Baldyga <robert.baldyga@intel.com>
Signed-off-by: Krzysztof Majzerowicz-Jaszcz <krzysztof.majzerowicz-jaszcz@intel.com>
So far the only resource protected by backfill queue blocking was internal
OCF request queue. Move unblock to backfill io completion to protect also
queue of underlying cache device.
Signed-off-by: Robert Baldyga <robert.baldyga@intel.com>
Cleaning policy initializaton initializes metadata for all cache lines
anyway, so this step is not needed.
Signed-off-by: Robert Baldyga <robert.baldyga@intel.com>
Introduce utility that allows to parallelize management operation across
all available io queues.
Signed-off-by: Robert Baldyga <robert.baldyga@intel.com>
This prevents using up pool of seq numbers in normal mode and blocking
addition of any new cores.
Signed-off-by: Robert Baldyga <robert.baldyga@intel.com>
The purpose of this change is not to write superblock to the cache
drive untill all other sections are initilized on disk in attach()
path. Combined with superblock clearing at the erarlier stage of
attach(), this assures there are no residual mappings in the collision
section in case of power failure during attach with pre-existing
metadata.
This is implemented by removing ocf_metadata_flush_all_set_status() step
at the beginning of ocf_metadata_flush_all().
ocf_metadata_flush_all() is called, except for the attach() case described
above, in two cases:
1. at the end of cache load - potentially after cache recovery
2. during detaching cache drive in cache stop.
To make sure there are no regressions in the first case, an explicit
_ocf_mngt_attach_shutdown_status() is added to load pipeline before
ocf_metadata_flush_all(). The second case is always ran after cache
drive is attached, so dirty status bit must have already be written to
the disk.
Signed-off-by: Adam Rutkowski <adam.j.rutkowski@intel.com>
Because of metadata flapping it is much more complicated to capture those
sections in flight in standby mode, so we read them directly from the cache
volume during the activate.
Signed-off-by: Robert Baldyga <robert.baldyga@intel.com>
This feature provides double buffering of config sections to prevent
situation when power failure during metadata flush leads to partially
updated metadata. Flapping mechanism makes it always possible to perform
graceful rollback to previous config metadata content in such situation.
Signed-off-by: Robert Baldyga <robert.baldyga@intel.com>
Request submitted in fast path may be freed before the sequential cutoff stats
are updated. Increment request reference counter to prevent it.
Signed-off-by: Michal Mielewczyk <michal.mielewczyk@intel.com>
Move error print to where it belongs, preventing this message to
pop up when same error code is reported elsewhere for other reason.
Signed-off-by: Rafal Stefanowski <rafal.stefanowski@intel.com>
This patch fixes the issue 988 (and 997) causing a kernel stack
overflow.
Signed-off-by: Krzysztof Majzerowicz-Jaszcz <krzysztof.majzerowicz-jaszcz@intel.com>
To allow the fastest switching from the passive-standby to active mode, the
runtime metadata must be kept 100% synced with the metadata on the drive and in
the RAM thus recovery is required after each collision section update.
To avoid long-lasting recovering of all the cachelines each time the collision
section is being updated, the passive update procedure recovers only those
which have its MD entries on the updated pages.
Signed-off-by: Michal Mielewczyk <michal.mielewczyk@intel.com>
Starting cache in a standby mode requires access to a valid cleaning policy
type. If the policy is stored only in the superblock, it may be overridden by
one of the metadata passive updates.
To prevent losing the information it should be stored in cache's runtime
metadata.
Signed-off-by: Michal Mielewczyk <michal.mielewczyk@intel.com>
Initializing cleaning policy is very time consuming. To reduce the time required
for activating cache instance the initialization sholud be done during passitve
start
Signed-off-by: Michal Mielewczyk <michal.mielewczyk@intel.com>
Since part of the recovery is done during `standby init`, the correct shutdown
status has to be set
Signed-off-by: Michal Mielewczyk <michal.mielewczyk@intel.com>
The unsafe mode is useful if the metadata of added cores is incomplete.
Such scenario is possible when starting cache to standby mode from partially
vaild metadata.
Signed-off-by: Michal Mielewczyk <michal.mielewczyk@intel.com>
Make sure all the invalid cachelines have reset status bits. This allows to
recognize invalid cachelines easily during populate.
Signed-off-by: Michal Mielewczyk <michal.mielewczyk@intel.com>
Recovery during passive start is based on the assuption that metadata collision
section stored on disk might be partially valid. Reseting this data would make
rebuilding metadata impossible.
Signed-off-by: Michal Mielewczyk <michal.mielewczyk@intel.com>
Decide whether to promote sequential cutoff stream
to global structures when threshold is reached
Signed-off-by: Rafal Stefanowski <rafal.stefanowski@intel.com>
GCC 11 static check finds an array size mismatch and prevents OCF from
compiling correctly. This fixes OpenCAS Linux issue #968
Signed-off-by: Krzysztof Majzerowicz-Jaszcz <krzysztof.majzerowicz-jaszcz@intel.com>