Add second pass of write invalidate. It is necessary only
if concurrent I/O had inserted target LBAs to cache after
WI request did traversation. These LBAs might have been
written by WI request behind the concurrent I/O's back,
resulting in making these sectors effectively invalid.
In this case we must update these sectors' metadata to
reflect this. However we won't know about this after we
traverse the request again - hence calling ocf_write_wi
again with req->wi_second_pass set to indicate that this
is the second pass (core write should be skipped).
Signed-off-by: Adam Rutkowski <adam.j.rutkowski@intel.com>
core_id should be set in this function. The fact that
it is missing might lead to incorrect behaviour e.g. in
case of promotion policy.
Signed-off-by: Adam Rutkowski <adam.j.rutkowski@intel.com>
In case of partial hit WO engine first reads data for the entire
request address range from core device. Then it plumbs it by fetching
dirty sectors from cache device.
For unindentified reason this leads to a data corruption in YCSB
workload A. After flushing dirty data and re-loading cache the
data is correct.
This change modifies WO read handler to read clean data from the
cache. This is not optimal, as the clean sectors are now read twice
in case of partial hit. For now it seems to be good enough work-around
for the data corruption problem.
The symptoms, combined with the fact that this change seems to make
the problem go away, indicates that at some point WB write handler
(and/or special I/O request handlers like discard) puts CAS in a
state where in-memory medata wrongly indicates that a sector is
clean while in fact it is dirty, as marked in the on-disk metadata.
Signed-off-by: Adam Rutkowski <adam.j.rutkowski@intel.com>
Fail `ocf_mngt_cache_load` function with `OCF_ERR_INVAL`
error code when force flag is in use.
Log error message.
Closes#361
Signed-off-by: Slawomir Jankowski <slawomir.jankowski@intel.com>
ocf_engine_push_req_(front|back) must not dereference req
pointer after putting the request on queue list and unlocking
the queue. At this point handler interface may asynchronously
pick up the request, handle it and deallocate.
Signed-off-by: Adam Rutkowski <adam.j.rutkowski@intel.com>
There is no need to constantly hold metadata global lock
during collecting cachelines to clean. Since we are past
freezing dirty request counter, we know for sure that the
number of dirty cache lines will not increase. So the worst
case is when the loop realxes and releases the lock,
a concurent IO to CAS is serviced in WT mode, possibly
inserting and/or evicting cachelines. This does not interfere
with scanning for dirty cachelines. And the lower layer will
handle synchronization with concurrent I/O by acquiring
asynchronous read lock on each cleaned cacheline.
Signed-off-by: Adam Rutkowski <adam.j.rutkowski@intel.com>
In current implementation in case of fast media flushning
container may starve all concurrent containers flushing
due to continous rescheduling of offender requests to the
front of I/O queue. Pushing request to the back of IO
queue ensures FIFO handling and removes possibility of
starvation.
Signed-off-by: Adam Rutkowski <adam.j.rutkowski@intel.com>
Lower layer is prepared to handle used cachelines by
acquiring asynchronus read lock. It is very likely that
by the time the cacheline is actually cleaned its lock
state changes. So checking the lock at the moment of
constructing dirty cachelines list makes little sense.
Signed-off-by: Adam Rutkowski <adam.j.rutkowski@intel.com>
Moving _ocf_mngt_flush_containers outside global metadata
critical section. All this function does is sort core lines
and add queue request.
This fixes stalls reported by Linux scheduler due to
IO threads waiting on global metadata RW semaprhore for
several minutes.
Signed-off-by: Adam Rutkowski <adam.j.rutkowski@intel.com>
a_req is allocated using env_vmalloc() so we need to free it
using env_vfree(), not env_free().
Signed-off-by: Robert Baldyga <robert.baldyga@intel.com>
Discard handling splits large request into several steps.
However the actual size of request map for discard was
determined based on original request size, not step request
size, resulting in waste of memory and allocations > 4K.
Signed-off-by: Adam Rutkowski <adam.j.rutkowski@intel.com>
512K is the maximum request size for which request map
fits into one page (4K) regardless of cacheline size.
Signed-off-by: Adam Rutkowski <adam.j.rutkowski@intel.com>
In case of cleaning metadata used to be flushed only when status of whole cache
line changed to clean.
This patch ensures that metadata flush is triggered after changin status of each
single sector is cache line.
Signed-off-by: Michal Mielewczyk <michal.mielewczyk@intel.com>
After second dirty write to cache line which was already dirty, metadata flush
was not triggered. In case of dirty shutdown, this led to data corruption.
Signed-off-by: Michal Mielewczyk <michal.mielewczyk@intel.com>
This way debug prints during metadata init phase won't cause crash
(because of the fact that temporary cache object does not have proper
ctx set hence does not have logger obj).
Signed-off-by: Michal Rakowski <michal.rakowski@intel.com>
Change 'OCF_ERR_START_CACHE_FAIL' to 'OCF_ERR_NO_MEM' while CAS fails in case of memory lack on device.
Add new error code for case, when device doesn't satisfy CAS requirements - 'OCF_ERR_INVAL_CACHE_DEV'.
Use 'OCF_ERR_INVAL_CACHE_DEV' in code.
Update error code match in test.
closes#317 issue
Signed-off-by: Ostrokrzew <slawomir.jankowski@intel.com>
To eliminate possibility of allocation error in cache stop, pipeline is
allocated on attach.
Due this change, the only possible non-zero status of ocf_mngt_cache_stop() is
just a warning and cache is always stopped after executing it.
Signed-off-by: Michal Mielewczyk <michal.mielewczyk@intel.com>
Cleaner doesn't set core object in req as it works in domain of cache
lines, which may belong to various cores. It this case should retrieve
core object not from the req, but from the map instead.
Signed-off-by: Robert Baldyga <robert.baldyga@intel.com>
This ensures that cleaner queue will not be changed
by starting another cleaning iteration before we put it.
Signed-off-by: Robert Baldyga <robert.baldyga@intel.com>
During error handling in WB write insert we didn't invalidate affected
cache lines. Because of that the cache stopped properly (as it's
supposed to), but cache lines were marked as inserted which caused
occupancy stats to increase even though nothing was succesfully
inserted.
Signed-off-by: Jan Musial <jan.musial@intel.com>
When single request to cache was issued, stats updating function was called with
0 bytes as value to update. In case of many request issued to cache, stats were
updated only in case of error.
Signed-off-by: Michal Mielewczyk <michal.mielewczyk@intel.com>
This change introduced a race condition. In some code paths after
cacheline trylock failed, hash bucket lock needed bo be upgraded
in order to obtain asynchronous lock. During hash bucket lock
upgrade, hash read locks were released followed by obtaining
hash write locks. After read locks were released, concurrent
thread could obtain hash bucket locks and modify cacheline
state. The thread upgrading hash bucket lock would need to
repeat traversation in order to safely continue.
This reverts commit 30f22d4f47.
Signed-off-by: Adam Rutkowski <adam.j.rutkowski@intel.com>
Since we do not expect that incrementing cache's reference counter
during cache init will fail at any condition it is can be changed
to an assert instead of error handling.
Signed-off-by: Michal Rakowski <michal.rakowski@intel.com>
pointer type to array which is 32 characters long;
**core.py**: Add missing import and modify class' field type
to keep consistency;
**ocf_mngt_core**: Remove local variable 'name';
remove env_vmalloc for 'name' - isn't no longer needed;
remove initialization 'name' - as above;
remove env_vfree for context->cfg.name - variable isn't no allocated
in memory;
check if cfg->name exists;
change label in goto from deleted err_name to the closest err_pipeline.
Signed-off-by: Slawomir_Jankowski <slawomir.jankowski@intel.com>
It allows to modify and retrieve particular PP params event if it isn't active
and store values between cache stop and load.
Signed-off-by: Michal Mielewczyk <michal.mielewczyk@intel.com>
Cache lock waiters hold cache refcount. Because of that,
if there were some waiters, deinitialization of cache
lock on the last put did never happen and putting the
cache was effectively impossible.
Signed-off-by: Robert Baldyga <robert.baldyga@intel.com>
Cleaning policy callbacks are typically called with hash buckets or
cache lines locked. However cleaning policies maintain structures
which are shared across multiple cache lines (e.g. ALRU list).
Additional synchronization is required for theese structures to
maintain integrity.
ACP already implements hash bucket locks. This patch is adding
ALRU list mutex.
Signed-off-by: Adam Rutkowski <adam.j.rutkowski@intel.com>
Adding locks to assure partition list consistency. Partition
lists is typically modified under cacheline or hash bucket lock.
This is not sufficient synchronization as adding/removing cacheline
from partition list affects neighbouring cachelines state as well.
Signed-off-by: Adam Rutkowski <adam.j.rutkowski@intel.com>
Since ocf requires loading cache with the same name as it was stopped, it should
also allow to read name from metadata.
Signed-off-by: Michal Mielewczyk <michal.mielewczyk@intel.com>
This change allows to check if specified cache name name is unique. To prevent
adding cache instance with the same name, context lock is acquired until name
isn't set.
Signed-off-by: Michal Mielewczyk <michal.mielewczyk@intel.com>
Adding synchronization around metadata collision segment pages.
This part of metadata is modified when cacheline is mapped/unmapped
and when dirty status changes.
Synchronization on page level is required on top of cacheline
and hash bucket locks to assure metadata flush always reads
consistent state when copying entire collision table memory
page.
Signed-off-by: Adam Rutkowski <adam.j.rutkowski@intel.com>
Modifying _ocf_mngt_io_class_configure and _ocf_mngt_io_class_remove
to never return -OCF_ERR_IO_CLASS_NOT_EXIST error code. This
return code was ignored by the caller anyway. In _ocf_mngt_io_class_remove
-OCF_ERR_IO_CLASS_NOT_EXIST indicated the IO class is already
removed, which is not an error. In _ocf_mngt_io_class_configure
-OCF_ERR_IO_CLASS_NOT_EXIST indicated empty IO class name, which
is actualy invalid input. This change made it possible to remove
erroneous error handling for -OCF_ERR_IO_CLASS_NOT_EXIST case in
ocf_mngt_cache_io_classes_configure.
This change fixes IO class configuration.
Signed-off-by: Adam Rutkowski <adam.j.rutkowski@intel.com>
The ocf_async_lock may be used in atomic context, thus we need
to replace synchronization primitives to non-sleeping variants.
Signed-off-by: Robert Baldyga <robert.baldyga@intel.com>
Hash bucket read/write lock is sufficient to safely attempt
cacheline trylock/lock. This change removes cacheline lock
global RW semaprhore and moves cacheline trylock/lock under
hash bucket read/write lock respectively.
Signed-off-by: Adam Rutkowski <adam.j.rutkowski@intel.com>
This temporarily increases amount of boiler-plate code, but
this is going to be mitigated in the following commits.
Signed-off-by: Adam Rutkowski <adam.j.rutkowski@intel.com>
There is one RW lock per hash bucket. Write lock is required
to map cacheline, read lock is sufficient for traversing.
Hash bucket locks are always acquired under global metadata
read lock. This assures mutual exclusion with eviction and
management paths, where global metadata write lock is held.
Signed-off-by: Adam Rutkowski <adam.j.rutkowski@intel.com>
Core is not assigned to request in cleaner, so to increase it's stats it has to
be retrieved from mapping.
Signed-off-by: Michal Mielewczyk <michal.mielewczyk@intel.com>
Inactive core stats should be caluculated and returned to adapter in unified
from, just like all stats are.
Signed-off-by: Michal Mielewczyk <michal.mielewczyk@intel.com>
Since stats builder is implemented for retrieving cache, core and ioclass stats,
adapters should use it instead.
Signed-off-by: Michal Mielewczyk <michal.mielewczyk@intel.com>
Global free cacheline list is divided into a set of freelists, one
per execution context. When attempting to map addres to cache, first
the freelist for current execution context is considered (fast path).
If current execution context freelist is empty (fast path failure),
mapping function attempts to get freelist from other execution context
list (slow path).
The purpose of this change is improve concurrency in freelist access.
It is part of fine granularity metadata lock implementation.
Signed-off-by: Adam Rutkowski <adam.j.rutkowski@intel.com>