Commit Graph

814 Commits

Author SHA1 Message Date
Robert Baldyga
25e2551964 Check core status during recovery based on core metadata
Signed-off-by: Robert Baldyga <robert.baldyga@intel.com>
2022-01-28 19:29:21 +01:00
Robert Baldyga
568c565497 Init properties before loading superblock
Signed-off-by: Robert Baldyga <robert.baldyga@intel.com>
2022-01-28 19:29:21 +01:00
Piotr Debski
9b980d3f22 fix for issue #1023
Better error for core size mismatch during activation/load

adding pyocf test for new error code

Signed-off-by: Piotr Debski <piotr.debski@intel.com>
2022-01-25 05:18:16 +01:00
Robert Baldyga
f4daf05237
Merge pull request #639 from arutk/eha
Fix error handling in cache attach
2022-01-19 15:26:34 +01:00
Robert Baldyga
fb8bea67b6 Set core_seq_no only in atomic mode
This prevents using up pool of seq numbers in normal mode and blocking
addition of any new cores.

Signed-off-by: Robert Baldyga <robert.baldyga@intel.com>
2022-01-19 11:38:12 +01:00
Adam Rutkowski
a32a787e3d Fix error handling in cache attach
Only close cores in error handling if attach parameter "open_cores" is
set to true.

Signed-off-by: Adam Rutkowski <adam.j.rutkowski@intel.com>
2022-01-13 17:26:47 +01:00
Michal Mielewczyk
5d74aec921 Add missing return in raw_ram_zero() in error path
Signed-off-by: Michal Mielewczyk <michal.mielewczyk@intel.com>
2022-01-12 07:46:49 +01:00
Adam Rutkowski
294e02bc1b Fail cache recovery in case of erroneous mapping
Signed-off-by: Adam Rutkowski <adam.j.rutkowski@intel.com>
2022-01-10 11:10:02 +01:00
Piotr Debski
609a22cfda added ERROR code for superblock mismatch
Signed-off-by: Piotr Debski <piotr.debski@intel.com>
2022-01-08 23:06:10 +01:00
Adam Rutkowski
9693b82cf9 Only flush superblock at the end of cache attach
The purpose of this change is not to write superblock to the cache
drive untill all other sections are initilized on disk in attach()
path. Combined with superblock clearing at the erarlier stage of
attach(), this assures there are no residual mappings in the collision
section in case of power failure during attach with pre-existing
metadata.

This is implemented by removing ocf_metadata_flush_all_set_status() step
at the beginning of ocf_metadata_flush_all().
ocf_metadata_flush_all() is called, except for the attach() case described
above, in two cases:
1. at the end of cache load - potentially after cache recovery
2. during detaching cache drive in cache stop.

To make sure there are no regressions in the first case, an explicit
_ocf_mngt_attach_shutdown_status() is added to load pipeline before
ocf_metadata_flush_all(). The second case is always ran after cache
drive is attached, so dirty status bit must have already be written to
the disk.

Signed-off-by: Adam Rutkowski <adam.j.rutkowski@intel.com>
2022-01-05 13:06:59 +01:00
Adam Rutkowski
196437f9bc Zero superblock before writing metadata
This is the first step towards atomic initialization of metadata
on cache disk.

Signed-off-by: Adam Rutkowski <adam.j.rutkowski@intel.com>
2022-01-05 13:06:59 +01:00
Robert Baldyga
c6644116ae
Merge pull request #614 from robertbaldyga/redesign-standby
Redesign failover standby API
2022-01-04 14:07:05 +01:00
Robert Baldyga
4aa3d8f9df Remove "unsafe" path from standby load
Signed-off-by: Robert Baldyga <robert.baldyga@intel.com>
2022-01-03 20:10:40 +01:00
Jan Musial
ae18ce274e Fix cache size requirements and some logging
Signed-off-by: Jan Musial <jan.musial@intel.com>
2022-01-03 14:30:07 +01:00
Robert Baldyga
b40fa0c2bf Fix closing volume on standby stop
Signed-off-by: Robert Baldyga <robert.baldyga@intel.com>
2021-12-29 20:54:45 +01:00
Robert Baldyga
86a2896bcf Rename "bind" to "standby"
Signed-off-by: Robert Baldyga <robert.baldyga@intel.com>
2021-12-29 20:32:03 +01:00
Robert Baldyga
b25cd91b86 Remove unused ocf_metadata_load_unsafe()
Signed-off-by: Robert Baldyga <robert.baldyga@intel.com>
2021-12-29 20:31:43 +01:00
Robert Baldyga
716b5751d6 Redesign failover standby API
Signed-off-by: Robert Baldyga <robert.baldyga@intel.com>
2021-12-29 20:31:40 +01:00
Robert Baldyga
4cabc60d40 Avoid loading runtime metadata sections during recovery
Signed-off-by: Robert Baldyga <robert.baldyga@intel.com>
2021-12-29 14:04:19 +01:00
Robert Baldyga
4625763df5 Return error on CRC mismatch during recovery
Signed-off-by: Robert Baldyga <robert.baldyga@intel.com>
2021-12-29 14:04:19 +01:00
Robert Baldyga
e73cbad2c7
Merge pull request #631 from mmichal10/dont-stop-cleaner
Don't stop cleaner in activate rollback
2021-12-27 16:51:32 +01:00
Robert Baldyga
0ac66ce4aa Fix cache stop after standby detach
Don't attempt to close cache volume if cache is in standby detached state.

Signed-off-by: Robert Baldyga <robert.baldyga@intel.com>
2021-12-23 22:39:37 +01:00
Michal Mielewczyk
a8bdba0cb2 Don't stop cleaner in activate rollback
Activate is not responsible for starting cleaner so rollback shouldn't stop it
eiter.

Signed-off-by: Michal Mielewczyk <michal.mielewczyk@intel.com>
2021-12-23 14:46:28 +01:00
Bob Chen
b6de614ada fix volume_close completion order
Signed-off-by: Bob Chen <beef9999@qq.com>
2021-12-22 15:18:34 +08:00
Robert Baldyga
a2916313ee
Revert "fix volume_close completion order" 2021-12-21 20:33:34 +01:00
chenbo
aa6e674034 fix volume_close completion order 2021-12-20 20:10:07 +08:00
Robert Baldyga
0751b2c0c0 Fix metadata flapping
Signed-off-by: Robert Baldyga <robert.baldyga@intel.com>
2021-12-15 22:21:57 +01:00
Robert Baldyga
cac5869406
Merge pull request #603 from robertbaldyga/metadata-flapping
Introduce flapping of metadata config sections
2021-12-15 17:11:15 +01:00
Robert Baldyga
df9a9f2722 Read superblock sections from cache volume during activate
Because of metadata flapping it is much more complicated to capture those
sections in flight in standby mode, so we read them directly from the cache
volume during the activate.

Signed-off-by: Robert Baldyga <robert.baldyga@intel.com>
2021-12-15 15:30:34 +01:00
Robert Baldyga
99c8c05f3f Introduce flapping of metadata config sections
This feature provides double buffering of config sections to prevent
situation when power failure during metadata flush leads to partially
updated metadata. Flapping mechanism makes it always possible to perform
graceful rollback to previous config metadata content in such situation.

Signed-off-by: Robert Baldyga <robert.baldyga@intel.com>
2021-12-15 15:30:34 +01:00
Neil Sun
7f82ef3048 Fix incorrect page count calculation with large PAGE_SIZE
e.g., PAGE_SIZE 65536, cache line 8k.

fix https://github.com/Open-CAS/open-cas-linux/issues/1015

Signed-off-by: Sun Feng <loyou85@gmail.com>
2021-12-14 20:07:59 +08:00
Robert Baldyga
60218759d2
Merge pull request #597 from rafalste/fix_core_zero_size_error
Fix core-zero-size error
2021-12-08 22:04:27 +01:00
Robert Baldyga
21c4673251
Merge pull request #600 from mmichal10/cleaning-cmpl
Call completion if failed to perform cleaning
2021-12-08 22:00:58 +01:00
Michal Mielewczyk
d6f2998890 Call completion if failed to perform cleaning
Signed-off-by: Michal Mielewczyk <michal.mielewczyk@intel.com>
2021-12-08 14:50:50 +01:00
Michal Mielewczyk
911a5cddf0 Deinit all registered volume types
Signed-off-by: Michal Mielewczyk <michal.mielewczyk@intel.com>
2021-12-08 14:16:49 +01:00
Michal Mielewczyk
655f732748 Don't access freed memory
Instead of accessing memory of a freed IO, redo size calculations

Signed-off-by: Michal Mielewczyk <michal.mielewczyk@intel.com>
2021-12-08 14:16:49 +01:00
Michal Mielewczyk
244712b020 Prevent race condition in fast path
Request submitted in fast path may be freed before the sequential cutoff stats
are updated. Increment request reference counter to prevent it.

Signed-off-by: Michal Mielewczyk <michal.mielewczyk@intel.com>
2021-12-08 09:00:04 +01:00
Rafal Stefanowski
b57bad4652 Fix core-zero-size error
Move error print to where it belongs, preventing this message to
pop up when same error code is reported elsewhere for other reason.

Signed-off-by: Rafal Stefanowski <rafal.stefanowski@intel.com>
2021-12-06 12:30:29 +01:00
Adam Rutkowski
b1494f4642 Remove option to failover without detach
Signed-off-by: Adam Rutkowski <adam.j.rutkowski@intel.com>
2021-11-30 15:18:08 +01:00
Adam Rutkowski
b455a393dd extra assertion in metadata passive update
Signed-off-by: Adam Rutkowski <adam.j.rutkowski@intel.com>
2021-11-30 12:04:57 +01:00
Adam Rutkowski
d0b00817f3 fix cacheline reset in passive metadata update
Signed-off-by: Adam Rutkowski <adam.j.rutkowski@intel.com>
2021-11-30 19:34:52 +01:00
Krzysztof Majzerowicz-Jaszcz
133ea307c8 Fix for issues #988 and #997
This patch fixes the issue 988 (and 997) causing a kernel stack
overflow.

Signed-off-by: Krzysztof Majzerowicz-Jaszcz <krzysztof.majzerowicz-jaszcz@intel.com>
2021-11-24 08:15:07 +01:00
Michal Mielewczyk
4ab22ee2dc Maintain runtime struct during failover standby
To allow the fastest switching from the passive-standby to active mode, the
runtime metadata must be kept 100% synced with the metadata on the drive and in
the RAM thus recovery is required after each collision section update.

To avoid long-lasting recovering of all the cachelines each time the collision
section is being updated, the passive update procedure recovers only those
which have its MD entries on the updated pages.

Signed-off-by: Michal Mielewczyk <michal.mielewczyk@intel.com>
2021-11-19 11:58:09 +01:00
Michal Mielewczyk
a6989d1881 Pio concurrency 2021-11-19 11:58:09 +01:00
Michal Mielewczyk
52824adaaf Additional cleaning policy info outside of the SB
Starting cache in a standby mode requires access to a valid cleaning policy
type. If the policy is stored only in the superblock, it may be overridden by
one of the metadata passive updates.

To prevent losing the information it should be stored in cache's runtime
metadata.

Signed-off-by: Michal Mielewczyk <michal.mielewczyk@intel.com>
2021-11-19 11:53:48 +01:00
Michal Mielewczyk
0e529479d6 Init cleaner during passive start
Initializing cleaning policy is very time consuming. To reduce the time required
for activating cache instance the initialization sholud be done during passitve
start

Signed-off-by: Michal Mielewczyk <michal.mielewczyk@intel.com>
2021-11-19 11:53:48 +01:00
Michal Mielewczyk
390e80794d Refactor cleaning policy initialization
Extract cleaning policy initialization to a separate function

Signed-off-by: Michal Mielewczyk <michal.mielewczyk@intel.com>
2021-11-19 11:53:48 +01:00
Michal Mielewczyk
6d4e6af5b6 Recovery on passive start
Adjust recovery procedure to allow rebuilding metadata from partialy valid
metadata

Signed-off-by: Michal Mielewczyk <michal.mielewczyk@intel.com>
2021-11-19 11:53:48 +01:00
Michal Mielewczyk
11dacd6a84 Set dirty shutdown status on standby init
Since part of the recovery is done during `standby init`, the correct shutdown
status has to be set

Signed-off-by: Michal Mielewczyk <michal.mielewczyk@intel.com>
2021-11-19 11:53:48 +01:00
Michal Mielewczyk
8f58add152 Lru populate unsafe
The unsafe mode is useful if the metadata of added cores is incomplete.

Such scenario is possible when starting cache to standby mode from partially
vaild metadata.

Signed-off-by: Michal Mielewczyk <michal.mielewczyk@intel.com>
2021-11-19 11:53:48 +01:00
Michal Mielewczyk
4deaa1e133 Reset all the status bits during recovery
Make sure all the invalid cachelines have reset status bits. This allows to
recognize invalid cachelines easily during populate.

Signed-off-by: Michal Mielewczyk <michal.mielewczyk@intel.com>
2021-11-19 11:53:48 +01:00
Michal Mielewczyk
fc7c901c8b Skip collision init on cache start passive
Recovery during passive start is based on the assuption that metadata collision
section stored on disk might be partially valid. Reseting this data would make
rebuilding metadata impossible.

Signed-off-by: Michal Mielewczyk <michal.mielewczyk@intel.com>
2021-11-19 11:53:48 +01:00
Michal Mielewczyk
048bbedd71 Fix metadata_clear_valid_if_clean()
The function should return the cacheline's valid status

Signed-off-by: Michal Mielewczyk <michal.mielewczyk@intel.com>
2021-11-19 11:53:48 +01:00
Michal Mielewczyk
bb0ff67fe9 Metadata clear_dirty_if_invalid() utility
Fix cacheline's metadata if it is dirty and invalid

Signed-off-by: Michal Mielewczyk <michal.mielewczyk@intel.com>
2021-11-19 11:53:48 +01:00
Michal Mielewczyk
800190153b Extend lru list API
Signed-off-by: Michal Mielewczyk <michal.mielewczyk@intel.com>
2021-11-19 11:53:48 +01:00
Michal Mielewczyk
5ca1404f06 Fix spelling in the error message
Signed-off-by: Michal Mielewczyk <michal.mielewczyk@intel.com>
2021-11-19 11:53:48 +01:00
Michal Mielewczyk
ccd0abfea5 Add cache line recovery utils to OCF internal API
Signed-off-by: Michal Mielewczyk <michal.mielewczyk@intel.com>
2021-11-19 11:53:48 +01:00
Michal Mielewczyk
a7bdaa751d Add error messages on superblock mismatch
Signed-off-by: Michal Mielewczyk <michal.mielewczyk@intel.com>
2021-11-19 11:53:48 +01:00
Rafal Stefanowski
0b39711b8b Add promote-on-threshold sequential cutoff switch
Decide whether to promote sequential cutoff stream
to global structures when threshold is reached

Signed-off-by: Rafal Stefanowski <rafal.stefanowski@intel.com>
2021-11-09 12:54:15 +01:00
Robert Baldyga
55eda53cde
Merge pull request #582 from jfckm/fix-detach-volume
Deinit cache volume on detach
2021-11-09 13:16:08 +01:00
Jan Musial
55eae1447d Deinit cache volume on detach
Signed-off-by: Jan Musial <jan.musial@intel.com>
Signed-off-by: Robert Baldyga <robert.baldyga@intel.com>
2021-11-08 18:25:30 +01:00
Robert Baldyga
5cbbfdd5ca
Merge pull request #587 from Open-CAS/gcc_11_fix
Fix for OCL issue #968 -  GCC 11 compilation error
2021-11-08 13:34:52 +01:00
Krzysztof Majzerowicz-Jaszcz
99c54be592 Fix for OCL issue #968 - GCC 11 compilation error
GCC 11 static check finds an array size mismatch and prevents OCF from
compiling correctly. This fixes OpenCAS Linux issue #968

Signed-off-by: Krzysztof Majzerowicz-Jaszcz <krzysztof.majzerowicz-jaszcz@intel.com>
2021-11-08 11:39:17 +01:00
Adam Rutkowski
8f080295bb
Merge pull request #585 from rafalste/license_change
Aplying BSD 3-Clause license to OCF source
2021-10-29 13:16:00 +02:00
Rafal Stefanowski
f22da1cde7 Fix license
Change license to BSD-3-Clause

Signed-off-by: Rafal Stefanowski <rafal.stefanowski@intel.com>
2021-10-28 13:08:50 +02:00
Rafal Stefanowski
3cc0d07197 License change to be approved by contributors
Signed-off-by: Rafal Stefanowski <rafal.stefanowski@intel.com>
2021-10-27 12:48:20 +02:00
Adam Rutkowski
12c8b4e333
Merge pull request #574 from Open-CAS/passive_api
Disable selected management operations in failover standby mode
2021-10-25 10:11:26 +02:00
Krzysztof Majzerowicz-Jaszcz
71262d5097 Cache standby mode API changes
Error for an invalid cache operation while in passive mode added

Signed-off-by: Krzysztof Majzerowicz-Jaszcz <krzysztof.majzerowicz-jaszcz@intel.com>

Error name correction

Signed-off-by: Krzysztof Majzerowicz-Jaszcz <krzysztof.majzerowicz-jaszcz@intel.com>

API changes for passive cache mode

Moved the passive cache error return source to the api for flush and
set_param

Signed-off-by: Krzysztof Majzerowicz-Jaszcz <krzysztof.majzerowicz-jaszcz@intel.com>

Further API changes for passive cache mode

Signed-off-by: Krzysztof Majzerowicz-Jaszcz <krzysztof.majzerowicz-jaszcz@intel.com>

 Passive api - review changes

Signed-off-by: Krzysztof Majzerowicz-Jaszcz <krzysztof.majzerowicz-jaszcz@intel.com>
2021-10-22 15:10:53 +02:00
Adam Rutkowski
f9fb80b887 Fix conditional valid bit reset
Status bits outside provided mask shall be unchanged.

Signed-off-by: Adam Rutkowski <adam.j.rutkowski@intel.com>
2021-10-12 22:56:45 +02:00
Adam Rutkowski
e2c6a25ee9 [REVERTME] Disable option to perform activate without detach
Signed-off-by: Adam Rutkowski <adam.j.rutkowski@intel.com>
2021-10-08 14:52:32 +02:00
Adam Rutkowski
5ad4d937f6 Failover detach
Signed-off-by: Adam Rutkowski <adam.j.rutkowski@intel.com>
2021-10-08 14:52:24 +02:00
Adam Rutkowski
48bc6c2f6b Always use async queue kick in management pipeline
Management pipelines tend to consist of multiple asynchronous steps.
Allowing synchronous queue kick results in massive call stacks (e.g.
almost 500 functions deep in case of cache stop). Since async kick
is required anyway, it seems reasonable to switch to async kick
in pipeline implementation.

Signed-off-by: Adam Rutkowski <adam.j.rutkowski@intel.com>
2021-09-17 12:39:09 +02:00
Jan Musial
010f30eeaf Validate activate parameters
Signed-off-by: Jan Musial <jan.musial@intel.com>
2021-09-14 08:56:41 +02:00
Jan Musial
b9c84e331c Fix attach with no cache_line_size specified
Signed-off-by: Jan Musial <jan.musial@intel.com>
2021-09-13 12:52:33 +02:00
Robert Baldyga
076b5995ed Fix metadata_clear_valid_if_clean()
Signed-off-by: Robert Baldyga <robert.baldyga@intel.com>
2021-09-10 08:12:11 +02:00
Robert Baldyga
7587bec07c
Merge pull request #567 from robertbaldyga/optimize-out-recovery-sector-loop
Optimize out looping over cache line sectors in recovery
2021-09-08 13:57:32 +02:00
Michal Mielewczyk
612f68b3c1 Fix metadata io detection in passive mode
Signed-off-by: Michal Mielewczyk <michal.mielewczyk@intel.com>
2021-09-08 13:33:04 +02:00
Robert Baldyga
1a3843ba12 Little coding style fix
Signed-off-by: Robert Baldyga <robert.baldyga@intel.com>
2021-09-07 22:54:10 +02:00
Robert Baldyga
1892f58aba Optimize out looping over cache line sectors in recovery
Signed-off-by: Robert Baldyga <robert.baldyga@intel.com>
2021-09-07 22:54:10 +02:00
Robert Baldyga
65d3e7a41a Introduce ocf_metadata_clear_valid_if_clean()
Signed-off-by: Robert Baldyga <robert.baldyga@intel.com>
2021-09-07 22:54:10 +02:00
Robert Baldyga
d7c1404f82 Simplify metadata bit function declarations
Signed-off-by: Robert Baldyga <robert.baldyga@intel.com>
2021-09-07 22:54:10 +02:00
Robert Baldyga
12a82d7fb1 Get rid of struct ocf_cache_line_settings
Remove struct that contains redundant data.

Signed-off-by: Robert Baldyga <robert.baldyga@intel.com>
2021-09-07 14:53:46 +02:00
Robert Baldyga
7b38ad205c Add cache activation from passive state
Signed-off-by: Robert Baldyga <robert.baldyga@intel.com>
2021-09-06 20:44:40 +02:00
Robert Baldyga
cc22c57cb7 Set proper cache pointer in front volumes
Signed-off-by: Robert Baldyga <robert.baldyga@intel.com>
2021-09-06 13:49:21 +02:00
Robert Baldyga
f96451a698 Introduce ocf_cache_is_passive()
Signed-off-by: Robert Baldyga <robert.baldyga@intel.com>
2021-09-06 13:49:21 +02:00
Robert Baldyga
c662649f31 Increment metadata refcount on cache front volume io
Signed-off-by: Robert Baldyga <robert.baldyga@intel.com>
2021-09-06 13:49:21 +02:00
Robert Baldyga
d5bd3fbd78 Free zeroed metadata pages on update in raw_dynamic
Signed-off-by: Robert Baldyga <robert.baldyga@intel.com>
2021-09-06 13:49:21 +02:00
Robert Baldyga
e23342cb0e Update metadata in passive mode
Signed-off-by: Robert Baldyga <robert.baldyga@intel.com>
2021-09-06 13:49:21 +02:00
Robert Baldyga
9b3a0c968e Introduce ocf_metadata_passive_update()
Signed-off-by: Robert Baldyga <robert.baldyga@intel.com>
2021-09-06 13:49:21 +02:00
Robert Baldyga
1fd9a448d4 Introduce passive cache state
Signed-off-by: Robert Baldyga <robert.baldyga@intel.com>
2021-09-06 13:49:21 +02:00
Robert Baldyga
ee42d9aaaf Duplicate cache name in struct ocf_cache
Cache name is needed for logging in passive mode, when config metadata
is still not accessible.

Signed-off-by: Robert Baldyga <robert.baldyga@intel.com>
2021-09-03 17:22:22 +02:00
Robert Baldyga
85e8b414c4 Add ocf_metadata_load_unsafe()
Signed-off-by: Robert Baldyga <robert.baldyga@intel.com>
2021-09-03 17:22:22 +02:00
Robert Baldyga
ad52a7e2e1 Introduce cache front volume
Signed-off-by: Robert Baldyga <robert.baldyga@intel.com>
2021-09-03 17:22:22 +02:00
Robert Baldyga
fa8e7564f0 Move ocf_io_get_internal() to private header
Signed-off-by: Robert Baldyga <robert.baldyga@intel.com>
2021-09-03 17:22:22 +02:00
Robert Baldyga
e31e7283d9 Rework volume type management
Signed-off-by: Robert Baldyga <robert.baldyga@intel.com>
2021-09-03 17:22:22 +02:00
Robert Baldyga
a2bef43975 Add missing lock in ocf_ctx_get_volume_type_id()
Signed-off-by: Robert Baldyga <robert.baldyga@intel.com>
2021-09-03 17:22:22 +02:00
Robert Baldyga
a00ec916e2 Make post metadata load init a separate step in pipeline
Signed-off-by: Robert Baldyga <robert.baldyga@intel.com>
2021-09-03 17:22:22 +02:00
Robert Baldyga
c6c6618ad8 Move recovery code from metadata to cache mngt
Signed-off-by: Robert Baldyga <robert.baldyga@intel.com>
2021-09-03 17:22:22 +02:00
Robert Baldyga
a2db4d14e8 Move core initialization code from metadata to mngt
Signed-off-by: Robert Baldyga <robert.baldyga@intel.com>
2021-09-03 17:22:22 +02:00
Robert Baldyga
24728330fc Make _ocf_mngt_load_add_cores a separate step in pipeline
Signed-off-by: Robert Baldyga <robert.baldyga@intel.com>
2021-09-03 17:22:22 +02:00