Signed-off-by: Avi Halaf <avi.halaf@huawei.com>
Signed-off-by: Robert Baldyga <robert.baldyga@huawei.com>
Signed-off-by: Michal Mielewczyk <michal.mielewczyk@huawei.com>
This avoids unnecessary map allocation and initialization of unused fields of
request structure. It also allows to track thier number separately from
the regular requests
Signed-off-by: Robert Baldyga <robert.baldyga@huawei.com>
Signed-off-by: Michal Mielewczyk <michal.mielewczyk@huawei.com>
Eliminate need to resolve cache based on the queue. This allows to share
the queue between cache instances. The queue still holds pointer to
a cache that owns the queue, but no management or io path relies on the
queue -> cache mapping.
Signed-off-by: Robert Baldyga <robert.baldyga@huawei.com>
Signed-off-by: Michal Mielewczyk <michal.mielewczyk@huawei.com>
Management path does not benefit much from mpools, as number of requests
allocated is very small. It's less restrictive (mngt_queue does not have
single-CPU affinity) thus avoiding mpool usage in management path allows
to introduce additional restrictions on mpool, leading to I/O performance
improvement.
Signed-off-by: Robert Baldyga <robert.baldyga@huawei.com>
Signed-off-by: Michal Mielewczyk <michal.mielewczyk@huawei.com>
The user is supposed to deinit/destroy it.
Signed-off-by: Robert Baldyga <robert.baldyga@huawei.com>
Signed-off-by: Michal Mielewczyk <michal.mielewczyk@huawei.com>
In situation when all the shards finish their work before parallelize
loop does it's final loop condition check, which involves access to
parallelize object, it's possible that parallelize object will be
deinitialized before this final access.
Increasing refcount by 1 before running parallelize and decreasing it
only after the loop is finished addresses this problem.
Signed-off-by: Robert Baldyga <robert.baldyga@huawei.com>
Signed-off-by: Michal Mielewczyk <michal.mielewczyk@huawei.com>
In some scenarios running the exact number of shards, regardless of
number of available queues is crucial for correctness of operation.
Signed-off-by: Robert Baldyga <robert.baldyga@huawei.com>
Signed-off-by: Michal Mielewczyk <michal.mielewczyk@huawei.com>
Otherwise, it may increase the number of hits, while the overall performance
has not been improved. This way, the hit rate is more correlated with
the performance changes.
Signed-off-by: Michael Lyulko <michael.lyulko@huawei.com>
Signed-off-by: Michal Mielewczyk <michal.mielewczyk@huawei.com>
The queues can be created and destroyed dynamically at any point in
the cache lifetime, and this can happen from different execution contexts,
thus there is a need to protect the queue_list with a lock.
Signed-off-by: Robert Baldyga <robert.baldyga@huawei.com>
Signed-off-by: Michal Mielewczyk <michal.mielewczyk@huawei.com>
Previously every created queue was added to io_queues list, which
made mngt_queue being used in ocf_parallelize. Change mngt_queue creation
API so that mngt_queue is not added to the list and doesn't have
unnecessary functionalities initialized.
Signed-off-by: Robert Baldyga <robert.baldyga@huawei.com>
Signed-off-by: Michal Mielewczyk <michal.mielewczyk@huawei.com>
The completion function should be the same either when it is called from
the queue context or from currnet context
Signed-off-by: Michal Mielewczyk <michal.mielewczyk@huawei.com>
Commit db6b009ef introduced changes in managing the master request life cycle,
but apparently not all paths have been updated. This change removes a redundant
ocf_req_get() before sending the requets into a queue
Signed-off-by: Michal Mielewczyk <michal.mielewczyk@huawei.com>
When flushing a request, the number of cache reads is unknown until all cache
lines are locked and the IOs are actually submitted.
Signed-off-by: Michal Mielewczyk <michal.mielewczyk@huawei.com>
Now the request can be pushed to a high priority queue (instead of ocf_queue_push_req_front)
and to a low priority queue (instead of ocf_queue_push_req_back).
Both functions were merged into one function (ocf_queue_push_req) and instead of the
allow_sync parameter there is now a flags parameter that can be an OR combination of
OCF_QUEUE_ALLOW_SYNC and OCF_QUEUE_PRIO_HIGH
Signed-off-by: Ian Levine <ian.levine@huawei.com>
Signed-off-by: Robert Baldyga <robert.baldyga@huawei.com>
This functionality is used by cleaning policies via cmpl_queue
to reschedule the completion, so that we avoid unlocking mutex in
the cleaner completion from interrupt context of IO completion.
This reverts commit 1e5eda68a7.
Signed-off-by: Robert Baldyga <robert.baldyga@huawei.com>
The flush_data is used by ocf_cleaner_do_flush_data_async(), which means
that callers of ocf_cleaner_fire() are now expected to guarantee that
entries are returned by getter in a sorted order. Currently the only case
when ocf_cleaner_fire() is called directly is for request cleaning, and
the request map is sorted by definition.
Signed-off-by: Robert Baldyga <robert.baldyga@huawei.com>
Majority of management operations should be blocked for detached cache,
although adding and removing cores should be possible.
Signed-off-by: Michal Mielewczyk <michal.mielewczyk@huawei.com>
Stop and cache detach were already sharing contexts implicitly, which allowed
to reuse some functions in both pipelines. However, changing the context structs
could lead to not obvious bugs.
To prevent such errors both methods now share context structure explicitly
Signed-off-by: Michal Mielewczyk <michal.mielewczyk@huawei.com>
'stop_pipeline' filed may be reused during cache lifetime (e.g. when cache is
detached and attached again - the pipeline would be freed and then
re-allocated). Calling completion after detach before freeing the pipeline may
lead to race condition.
Signed-off-by: Michal Mielewczyk <michal.mielewczyk@huawei.com>
Cache mngt lock cannot be unlocked from io completion context (which is
potentially atomic context) as it may involve sleeping operations.
Modify cleaner utility to support rescheduling to queue context before
calling the completion. Update cleaning policies to use that option.
Signed-off-by: Robert Baldyga <robert.baldyga@huawei.com>
HB lock takes inclusive metadata lock, which is taken also by metadata
flush, thus trying to call metadata flush under HB lock attempts to take
this lock recursively. In that case, if in the meantime some other thread
would try to take exclusive metadata lock, the inner inclusive lock would
block (because the lock keeps the order), with outer inclusive lock still
held, leading to a deadlock.
Signed-off-by: Robert Baldyga <robert.baldyga@huawei.com>
There are situations when we can end up in engine_pt with cache lines
locked for write. One example is engine_rd falling back to engine_pt after
failure during cache line preparation, where write lock has been already
taken. To handle this situation properly, unlock request using more general
unlock function.
Signed-off-by: Robert Baldyga <robert.baldyga@huawei.com>
there is an issue when someone call to parallelize/pipeline
with some struct that is aligned (say to 64B)
but these APIs add their own data, right before
the user's private data.
so, the user's data is no longer aligned
which might cause segfault in some cases.
Signed-off-by: Amir Haroush <amir.haroush@huawei.com>
Signed-off-by: Shai Fultheim <shai.fultheim@huawei.com>
Signed-off-by: Robert Baldyga <robert.baldyga@intel.com>
Because context has one field which is aligned to 64B
(struct ocf_volume cache_volume) the compiler use vmovdqa (aligned)
instead of vmovdqu (unaligned) in reality the address is not 64 aligned,
it ends with 0x8, so we get this segfault.
Signed-off-by: Amir Haroush <amir.haroush@huawei.com>
Signed-off-by: Shai Fultheim <shai.fultheim@huawei.com>
Signed-off-by: Robert Baldyga <robert.baldyga@intel.com>
With a high dirty ratio and occupancy, OCF might unable to map cache lines
for new requests, thus pass-through the I/O to core devices. IOPS will
drop afterwards. We need to control the dirty ratio.
Existing `alru' policy gives user the chance to control the stale buffer
time, activity threshold etc. They can affect the dirty ratio of the cache
device, but in an empirical manner, more or less. Introducing
`max_dirty_ratio' can make it explicit.
At first glance, it might be better to implement a dedicated cleaner policy
directly targeting dirty ratio goal, so that the `alru' parameters remains
orthogonal. But one the other hand, we still need to flush dirty cache
lines periodically, instead of just keeping a watermark of dirty ratio.
It indicates that existing `alru' parameters are still required if we
develop a new policy, and it seems reasonable to make it a parameter.
To sum up, this patch does the following:
- added a 'max_dirty_ratio' parameter with default value 100;
- with default value 100, `alru' cleaner is identical to what is was;
- with value N less than 100, the cleaner (when waken up) will active
brought dirty ratio to N, regardless of staleness time.
Signed-off-by: David Lee <live4thee@gmail.com>
Don't populate cleaning policies during initialization procedure so the user
has to call the latter explicitly.
Until now cleaning policies could be populated in two ways:
- implicitly during cleaning policy initialization,
- explicitly be calling populate.
The difference was that the former was single threaded.
This patch removes the functionally redundant and less efficient code.
Signed-off-by: Michal Mielewczyk <michal.mielewczyk@intel.com>
The function not only recovers cleaning policy metadata but is also utilized
to initialize data structures so more generic name is actually more accurate
Signed-off-by: Michal Mielewczyk <michal.mielewczyk@intel.com>
Initializing metadata in an asynchronous manner will allow to use
parallelization utilities in the future commits
Signed-off-by: Michal Mielewczyk <michal.mielewczyk@intel.com>
Normally cleaning policy would be deinitialized during stopping cache which is
one of steps of error handling e.g in case of failed cache activation. But since
`cache_stop()` may be called only for an attached cache instance, cleaning
policy needs to deinitialized explicitly.
Signed-off-by: Michal Mielewczyk <michal.mielewczyk@intel.com>
Remove one callback indirection level. I/O never changes it's direction
so there is no point in storing both read and write callbacks for each
request.
Signed-off-by: Robert Baldyga <robert.baldyga@intel.com>
In most (6/9) instances across engines ocf_core_stats_cache_error_update
is called upon each cache volume I/O error, possibly multiple times
per a user request in case of multi-cacheline requests. Backfill,
fast and read engine are exceptions, incrementing error stats only
once per user request.
This commit unifies ocf_core_stats_cache_error_update usage so that
in all the engines error statistic is incremented for once for every
error.
Signed-off-by: Adam Rutkowski <adam.j.rutkowski@intel.com>
It is wastefull to allocate a full 1B to store 1 bit of
alock status per cacheline. Fixed allocation of 128 bits
seems more reasonable.
Signed-off-by: Adam Rutkowski <adam.j.rutkowski@intel.com>
1. On 1 bit per cacheline is required for the status
2. ... however the size must be 8B aligned
Signed-off-by: Adam Rutkowski <adam.j.rutkowski@intel.com>
Metadata capacity reported by dmesg was actually a memory footprint.
A proper size of metadata is now reported.
Signed-off-by: Krzysztof Majzerowicz-Jaszcz <krzysztof.majzerowicz-jaszcz@intel.com>
Optional uuid parameter to ocf_volume_init() points to UUID object
initialized by the user. We should verify it is not excesively large
as we attempt to allocate a buffer to store a copy of the UUID.
Signed-off-by: Adam Rutkowski <adam.j.rutkowski@intel.com>
The proper way to avoid calling on_deinit() callback on an already
deinitialized volume is to deinitialize type callbacks, as it is done
in the previous commit.
This reverts commit a7f70687a9.
Signed-off-by: Adam Rutkowski <adam.j.rutkowski@intel.com>
After deinitialization of volume there is no need to call back to
type ops. Currently we would erroneously call on_deinit() callback
multiple times if ocf_volume_deinit() is performed more than once,
which we expect to happen and treat as a correct use of API.
Signed-off-by: Adam Rutkowski <adam.j.rutkowski@intel.com>
ocf_metadata_flush_superblock() is being called on the cache stop, after
deinitialization of the cores (and their volumes), thus accessing core
volume in superblock flushing procedure leads to use-after-free bug.
Fix this by moving volume type setting to the core insertion code.
Signed-off-by: Robert Baldyga <robert.baldyga@intel.com>
After moving from a volume, it's priv is assigned to the new owner.
Destroying the volume after moving from it must not attempt to use the
priv, especially not to attempt to deinit member volumes in case of
composite volume.
Signed-off-by: Adam Rutkowski <adam.j.rutkowski@intel.com>
Volumes are now exposed in OCF API and we should gracefully handle
attempt to open already opened volume (instead of ENV_BUG).
Signed-off-by: Adam Rutkowski <adam.j.rutkowski@intel.com>
It makes it possible to attach/load cache using volume types that have
non-standard constructors.
Signed-off-by: Robert Baldyga <robert.baldyga@intel.com>
Flush I/O should be forwarded to core and cache device. In case of core
this is simple - just mirror the I/O from the top volume. Since
cache data is owned by OCF it makes sense to send a simple flush I/O
with 0 address and size.
Current implementation attempts to use cache data I/O interface
(ocf_submit_cache_reqs function) instead of submitting empty flush to
the underlying cache device. This function is designed to read/write
from mapped cachelines while there is no traversation/mapping
performed on flush I/O.
If request map allocation succeeds, this results in sending I/O to
addres 0 with size and flags inherited from the top adapter I/O.
This doesn't make any sense, and can even result in invalid I/O if the
size is greater than cache device size.
Even worse, if flush request map allocation fails (which happens
always in case of large flush requests) then the erroneous call to
ocf_submit_cache_reqs results in NULL pointer dereference.
Signed-off-by: Adam Rutkowski <adam.j.rutkowski@intel.com>
Right now alock assumes that number of locks taken will equal number of
core lines. This is not the case in pio, where only parts of metadata
are under locks. If pio request overlaps locked and not-locked metadata
section it will have different core lines number and awaited locks
number. To remedy this discrepancy additional method which gets count of
locks that will be taken/waited on is added to alock API.
Signed-off-by: Jan Musial <jan.musial@intel.com>
It's required, because environments other than Linux kernel may not define
their own DIV_ROUND_UP. Moving it to env would just generate boilerplate,
because its implementation is trivial and portable.
Signed-off-by: Robert Baldyga <robert.baldyga@intel.com>
This allows to avoid allocating cleaner metadata section and effectively
save up to 20% of metadata memory footprint.
Signed-off-by: Robert Baldyga <robert.baldyga@intel.com>
Since the threshold for the first bucket is always zero and the condition to
exit from the loop is never met in the first iteration it is save to start
iterating from `1`
This change is meant to avoid confusing static code analyzers
Signed-off-by: Michal Mielewczyk <michal.mielewczyk@intel.com>
Those names are used for creating allocators. In Linux kernel environment
starting from version 5.12 there is a kernel warning if allocator name
contains spaces. This patch resolves this problem by replacing spaces with
underscores.
Signed-off-by: Robert Baldyga <robert.baldyga@intel.com>
Cleaning policy is initialized on standby activate, after all the metadata
from primary cache is flushed and the actual recovery is being performed.
Thus initializing it earlier on standby attach is incorrect.
Signed-off-by: Robert Baldyga <robert.baldyga@intel.com>
conf_meta->core_count is not modified during load/recovery in the latest
version. Thus in case of error in cores initialization, in order to
iterate over the initialized cores we must depend on core->added only,
regardles of conf_meta->core_count value. for_each_core() macro does
exactly this.
Signed-off-by: Adam Rutkowski <adam.j.rutkowski@intel.com>
Fix error code for superblock checksum mismatch.
Superblock validation now returns a proper error on checksum check fail.
Signed-off-by: Krzysztof Majzerowicz-Jaszcz <krzysztof.majzerowicz-jaszcz@intel.com>
Set bit only on core addition and clean it on core removal.
This allows to avoid conf metadata modification in load / standby load
paths, which effectively prevents issues with metadata mismatch during
consequent standby activate attempts after initial activate failure.
Previously the first attempt changed the metadata, so on comparison with
metadata on drive failed on any following attempt, leading to inability
to activate the cache.
Signed-off-by: Robert Baldyga <robert.baldyga@intel.com>