Increasing the occupancy before the backfill request has completed leads
to incorrect statistics
test_lazy_engine_errors() detected the bug in the following scenario:
1. Rio submitted a read request
2. The read turned out to be a full miss. After a successful read from
the core device, engine_read set the metadata bits to valid and
submitted a backfill request
3. In meantime, Rio submitted a write request to the same cache lines
4. Since the test uses an error device as cache (all sectors are
erroneous) the backfill request failed and was redirected to
engine_invalidate
5. engine_invalidate reset the metadata bits to invalid, but since cache
lines had waiters registered, the occupancy wasn't decremented as
there was an assumption that the new owner (in this case, the write
request) would do the bookkeeping
6. Upon receiving cache line locks, the write request detected that
the mapping changed so it was completed with an error without
decrementing the occupancy
7. The incorrect value of cached cache lines was persisted for the whole
lifetime of the cache
Signed-off-by: Michal Mielewczyk <michal.mielewczyk@huawei.com>
After attaching new cache device handle all the IOs in Pass-Through mode
until all the d2c requests are completed.
Signed-off-by: Robert Baldyga <robert.baldyga@huawei.com>
The flag isn't reset before retraversation so it might be true even if cache
line was reparted in the meantime
Signed-off-by: Michal Mielewczyk <michal.mielewczyk@huawei.com>
Resetting cache_error/core_error in ocf_req_forward_* functions may lead
to overwriting already reported error if the forward is being done in the
loop.
To avoid this potential problem, introduce set of forward init functions
intended to be called before the entire forward operation, which resets
the error code and sets a forward callback.
Signed-off-by: Robert Baldyga <robert.baldyga@huawei.com>
The fewer (atomic variable accesses on IO path) the better fare
Signed-off-by: Roel Apfelbaum <roel.apfelbaum@huawei.com>
Signed-off-by: Michal Mielewczyk <michal.mielewczyk@huawei.com>
Avoid unnecessary code execution in D2C mode.
Avoid multiple req->d2c check in normal I/O path.
Signed-off-by: Robert Baldyga <robert.baldyga@huawei.com>
Signed-off-by: Michal Mielewczyk <michal.mielewczyk@huawei.com>
Allow the core volume IOs to be forwarded directly to backend volumes to
avoid unnecessary allocations.
Signed-off-by: Robert Baldyga <robert.baldyga@huawei.com>
Signed-off-by: Michal Mielewczyk <michal.mielewczyk@huawei.com>
Otherwise, it may increase the number of hits, while the overall performance
has not been improved. This way, the hit rate is more correlated with
the performance changes.
Signed-off-by: Michael Lyulko <michael.lyulko@huawei.com>
Signed-off-by: Michal Mielewczyk <michal.mielewczyk@huawei.com>
Now the request can be pushed to a high priority queue (instead of ocf_queue_push_req_front)
and to a low priority queue (instead of ocf_queue_push_req_back).
Both functions were merged into one function (ocf_queue_push_req) and instead of the
allow_sync parameter there is now a flags parameter that can be an OR combination of
OCF_QUEUE_ALLOW_SYNC and OCF_QUEUE_PRIO_HIGH
Signed-off-by: Ian Levine <ian.levine@huawei.com>
Signed-off-by: Robert Baldyga <robert.baldyga@huawei.com>
HB lock takes inclusive metadata lock, which is taken also by metadata
flush, thus trying to call metadata flush under HB lock attempts to take
this lock recursively. In that case, if in the meantime some other thread
would try to take exclusive metadata lock, the inner inclusive lock would
block (because the lock keeps the order), with outer inclusive lock still
held, leading to a deadlock.
Signed-off-by: Robert Baldyga <robert.baldyga@huawei.com>
There are situations when we can end up in engine_pt with cache lines
locked for write. One example is engine_rd falling back to engine_pt after
failure during cache line preparation, where write lock has been already
taken. To handle this situation properly, unlock request using more general
unlock function.
Signed-off-by: Robert Baldyga <robert.baldyga@huawei.com>
Remove one callback indirection level. I/O never changes it's direction
so there is no point in storing both read and write callbacks for each
request.
Signed-off-by: Robert Baldyga <robert.baldyga@intel.com>
In most (6/9) instances across engines ocf_core_stats_cache_error_update
is called upon each cache volume I/O error, possibly multiple times
per a user request in case of multi-cacheline requests. Backfill,
fast and read engine are exceptions, incrementing error stats only
once per user request.
This commit unifies ocf_core_stats_cache_error_update usage so that
in all the engines error statistic is incremented for once for every
error.
Signed-off-by: Adam Rutkowski <adam.j.rutkowski@intel.com>
Flush I/O should be forwarded to core and cache device. In case of core
this is simple - just mirror the I/O from the top volume. Since
cache data is owned by OCF it makes sense to send a simple flush I/O
with 0 address and size.
Current implementation attempts to use cache data I/O interface
(ocf_submit_cache_reqs function) instead of submitting empty flush to
the underlying cache device. This function is designed to read/write
from mapped cachelines while there is no traversation/mapping
performed on flush I/O.
If request map allocation succeeds, this results in sending I/O to
addres 0 with size and flags inherited from the top adapter I/O.
This doesn't make any sense, and can even result in invalid I/O if the
size is greater than cache device size.
Even worse, if flush request map allocation fails (which happens
always in case of large flush requests) then the erroneous call to
ocf_submit_cache_reqs results in NULL pointer dereference.
Signed-off-by: Adam Rutkowski <adam.j.rutkowski@intel.com>
So far the only resource protected by backfill queue blocking was internal
OCF request queue. Move unblock to backfill io completion to protect also
queue of underlying cache device.
Signed-off-by: Robert Baldyga <robert.baldyga@intel.com>