Min and max values, keept as an explicit number of cachelines, are tightly
coupled with particular cache. This might lead to errors and mismatches after
reattaching cache of different size.
To prevent those errors, min and max should be calculated dynamically.
Signed-off-by: Michal Mielewczyk <michal.mielewczyk@intel.com>
Since the request carries an explicit information about number of the
cacheliens to be reparted, no need of keeping the boolean information if some
of the request's cachelines are assigned to a wrong partition
Signed-off-by: Michal Mielewczyk <michal.mielewczyk@intel.com>
Instead of redunant calculating number of cachlines to be reparted, keep this
information in request's info
Signed-off-by: Michal Mielewczyk <michal.mielewczyk@intel.com>
If partition's occupancy limit is reached, cachelines should be evicted from
request's target partition.
Information whether particular partition eviction should be triggered is
carried as a flag by request which triggered eviction.
Signed-off-by: Michal Mielewczyk <michal.mielewczyk@intel.com>
Moving metadata implementation out of obsolete metadata_hash.c
to .c files corresponding to function declaration header files.
This requires adding shared header for metadata implementation
metadata_internal.h. Some metadata header files did not have
a corresponding .c file - in this case it is added in this
commit.
Signed-off-by: Adam Rutkowski <adam.j.rutkowski@intel.com>
Metadata wrapper functions (calling iface->func) in header
files are changed to be declarations only. Hash interface
implementation functions in metadata_hash.c are given an
external linkage and are renamed to drop "hash" prefix.
Signed-off-by: Adam Rutkowski <adam.j.rutkowski@intel.com>
Locks acquired in ocf_metadata_flush(/load)_all are
acquired only for the duration of queueing asynch
service for flush/load, no actual metadata accesses
are performed there.
Also flush/load all are always performed with metadata
marked as deinitialized (metadata reference counter freezed),
so no I/O is reading nor writing the metadata. The only source
of potential concurrent metadata accesses are other management
operations, which should be synchronized using management lock.
Signed-off-by: Adam Rutkowski <adam.j.rutkowski@intel.com>
Initializing each stream with unique LBA ensures there are no initial
rbtree collisions, and thus helps to avoid clustering of all the streams
into one big linked list instead of forming performance friendly proper
tree structure.
Signed-off-by: Robert Baldyga <robert.baldyga@intel.com>
After writing metadata configuration to disk we must
send a flush request to make sure configuration sections
are commited to non-volatile storage.
Signed-off-by: Adam Rutkowski <adam.j.rutkowski@intel.com>
Load properties before checking memory needs and obtain cache line size
from context rather than from cache state.
Signed-off-by: Rafal Stefanowski <rafal.stefanowski@intel.com>
Rather then passing whole structs, supply
_ocf_mngt_calculate_ram_needed() with just the values it actually uses.
Signed-off-by: Rafal Stefanowski <rafal.stefanowski@intel.com>
During recovery procedure there is no guarantee that checksums
of runtime sections were flushed correctly before dirty shutdown.
Signed-off-by: Robert Baldyga <robert.baldyga@intel.com>
WA write must follow follow the same two-pass pattern
as WI does. This change modifies WA engine to default to
WI in case of any miss (either partial or full), not only
partial miss.
Signed-off-by: Adam Rutkowski <adam.j.rutkowski@intel.com>
Add second pass of write invalidate. It is necessary only
if concurrent I/O had inserted target LBAs to cache after
WI request did traversation. These LBAs might have been
written by WI request behind the concurrent I/O's back,
resulting in making these sectors effectively invalid.
In this case we must update these sectors' metadata to
reflect this. However we won't know about this after we
traverse the request again - hence calling ocf_write_wi
again with req->wi_second_pass set to indicate that this
is the second pass (core write should be skipped).
Signed-off-by: Adam Rutkowski <adam.j.rutkowski@intel.com>
core_id should be set in this function. The fact that
it is missing might lead to incorrect behaviour e.g. in
case of promotion policy.
Signed-off-by: Adam Rutkowski <adam.j.rutkowski@intel.com>
In case of partial hit WO engine first reads data for the entire
request address range from core device. Then it plumbs it by fetching
dirty sectors from cache device.
For unindentified reason this leads to a data corruption in YCSB
workload A. After flushing dirty data and re-loading cache the
data is correct.
This change modifies WO read handler to read clean data from the
cache. This is not optimal, as the clean sectors are now read twice
in case of partial hit. For now it seems to be good enough work-around
for the data corruption problem.
The symptoms, combined with the fact that this change seems to make
the problem go away, indicates that at some point WB write handler
(and/or special I/O request handlers like discard) puts CAS in a
state where in-memory medata wrongly indicates that a sector is
clean while in fact it is dirty, as marked in the on-disk metadata.
Signed-off-by: Adam Rutkowski <adam.j.rutkowski@intel.com>
Fail `ocf_mngt_cache_load` function with `OCF_ERR_INVAL`
error code when force flag is in use.
Log error message.
Closes#361
Signed-off-by: Slawomir Jankowski <slawomir.jankowski@intel.com>
ocf_engine_push_req_(front|back) must not dereference req
pointer after putting the request on queue list and unlocking
the queue. At this point handler interface may asynchronously
pick up the request, handle it and deallocate.
Signed-off-by: Adam Rutkowski <adam.j.rutkowski@intel.com>
There is no need to constantly hold metadata global lock
during collecting cachelines to clean. Since we are past
freezing dirty request counter, we know for sure that the
number of dirty cache lines will not increase. So the worst
case is when the loop realxes and releases the lock,
a concurent IO to CAS is serviced in WT mode, possibly
inserting and/or evicting cachelines. This does not interfere
with scanning for dirty cachelines. And the lower layer will
handle synchronization with concurrent I/O by acquiring
asynchronous read lock on each cleaned cacheline.
Signed-off-by: Adam Rutkowski <adam.j.rutkowski@intel.com>
In current implementation in case of fast media flushning
container may starve all concurrent containers flushing
due to continous rescheduling of offender requests to the
front of I/O queue. Pushing request to the back of IO
queue ensures FIFO handling and removes possibility of
starvation.
Signed-off-by: Adam Rutkowski <adam.j.rutkowski@intel.com>
Lower layer is prepared to handle used cachelines by
acquiring asynchronus read lock. It is very likely that
by the time the cacheline is actually cleaned its lock
state changes. So checking the lock at the moment of
constructing dirty cachelines list makes little sense.
Signed-off-by: Adam Rutkowski <adam.j.rutkowski@intel.com>
Moving _ocf_mngt_flush_containers outside global metadata
critical section. All this function does is sort core lines
and add queue request.
This fixes stalls reported by Linux scheduler due to
IO threads waiting on global metadata RW semaprhore for
several minutes.
Signed-off-by: Adam Rutkowski <adam.j.rutkowski@intel.com>