Ceph Cluster installation with ansible is simple, but you may experience errors and here are some tips, hope that helps.

Reinstall Ceph Cluster

If installation failed, you can purge the installation and reinstall the ceph cluster.

root@openstack-staging:/home/kevin/ceph-ansible# ansible-playbook infrastructure-playbooks/purge-cluster.yml

Health-WARN “mons are allowing insecure global_id reclaim”

** Make sure all clients have been upgraded before run the following command, or else those clients will be blocked after this is set **

According to the CVE also previously mentioned, there is a security issue where clients need to be upgraded to the releases mentioned. Once all the clients are updated (e.g. the rook daemons and csi driver), a new setting needs to be applied to the cluster that will disable allowing the insecure mode.

If you see both these health warnings, then either one of the rook or csi daemons has not been upgraded yet, or some other client is detected on the older version:

health: HEALTH_WARN
client is using insecure global_id reclaim
mon is allowing insecure global_id reclaim

If you only see this one warning, then the insecure mode should be disabled:

health: HEALTH_WARN
mon is allowing insecure global_id reclaim

Please make sure you have all client connected to ceph before run this command, or you can leave as it is.

ceph config set mon auth_allow_insecure_global_id_reclaim false

add docker repository failed

TASK [ceph-container-engine : add docker repository] ***************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************
Monday 21 June 2021 06:32:58 +0000 (0:00:01.319) 0:10:50.642 ***********
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: apt_pkg.Error: E:Conflicting values set for option Signed-By regarding source https://download.docker.com/linux/ubuntu/ focal: /usr/share/keyrings/docker-archive-keyring.gpg != , E:The list of sources could not be read.
fatal: [openstack-ceph01]: FAILED! => changed=false
module_stderr: |-
Traceback (most recent call last):
File "<stdin>", line 102, in <module>
File "<stdin>", line 94, in _ansiballz_main
File "<stdin>", line 40, in invoke_module
File "/usr/lib/python3.8/runpy.py", line 207, in run_module
return _run_module_code(code, init_globals, run_name, mod_spec)
File "/usr/lib/python3.8/runpy.py", line 97, in _run_module_code
_run_code(code, mod_globals, init_globals,
File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/tmp/ansible_apt_repository_payload__2_ke2d5/ansible_apt_repository_payload.zip/ansible/modules/apt_repository.py", line 604, in <module>
File "/tmp/ansible_apt_repository_payload__2_ke2d5/ansible_apt_repository_payload.zip/ansible/modules/apt_repository.py", line 581, in main
File "/usr/lib/python3/dist-packages/apt/cache.py", line 170, in __init__
self.open(progress)
File "/usr/lib/python3/dist-packages/apt/cache.py", line 232, in open
self._cache = apt_pkg.Cache(progress)
apt_pkg.Error: E:Conflicting values set for option Signed-By regarding source https://download.docker.com/linux/ubuntu/ focal: /usr/share/keyrings/docker-archive-keyring.gpg != , E:The list of sources could not be read.
module_stdout: ''
msg: |-
MODULE FAILURE
See stdout/stderr for the exact error
rc: 1

Workaround is remove /etc/apt/sources.list.d/docker.list on ceph nodes

root@openstack-ceph01:/home/kevin# rm /etc/apt/sources.list.d/docker.list

Failed to download ceph grafana dashboards file

This is because of my company FW blocked traffic, we can manually download the file and put it into the folder /etc/grafana/dashboards/ceph-dashboard and coment out the task, then rerun the ansible script, please make sure copy all of these files to other 2 ceph nodes.

TASK [ceph-grafana : download ceph grafana dashboards] *****************************************************************************************************
Monday 21 June 2021 06:37:51 +0000 (0:00:01.068) 0:03:57.615 ***********
failed: [openstack-ceph02] (item=ceph-cluster.json) => changed=false
ansible_loop_var: item
dest: /etc/grafana/dashboards/ceph-dashboard/ceph-cluster.json
elapsed: 40
item: ceph-cluster.json
msg: 'Request failed: <urlopen error timed out>'
url: https://raw.githubusercontent.com/ceph/ceph/master/monitoring/grafana/dashboards/ceph-cluster.json
failed: [openstack-ceph01] (item=ceph-cluster.json) => changed=false
ansible_loop_var: item
dest: /etc/grafana/dashboards/ceph-dashboard/ceph-cluster.json
elapsed: 40
item: ceph-cluster.json
msg: 'Request failed: <urlopen error timed out>'
url: https://raw.githubusercontent.com/ceph/ceph/master/monitoring/grafana/dashboards/ceph-cluster.json
failed: [openstack-ceph03] (item=ceph-cluster.json) => changed=false
ansible_loop_var: item
dest: /etc/grafana/dashboards/ceph-dashboard/ceph-cluster.json
elapsed: 40
item: ceph-cluster.json
msg: 'Request failed: <urlopen error timed out>'
url: https://raw.githubusercontent.com/ceph/ceph/master/monitoring/grafana/dashboards/ceph-cluster.json
failed: [openstack-ceph02] (item=cephfs-overview.json) => changed=false
ansible_loop_var: item
dest: /etc/grafana/dashboards/ceph-dashboard/cephfs-overview.json
elapsed: 40
item: cephfs-overview.json
msg: 'Request failed: <urlopen error timed out>'

Comment out the task

root@openstack-staging:/home/kevin/ceph-ansible# vim ./roles/ceph-grafana/tasks/configure_grafana.yml
#- name: download ceph grafana dashboards
# get_url:
# url: "https://raw.githubusercontent.com/ceph/ceph/{{ grafana_dashboard_version }}/monitoring/grafana/dashboards/{{ item }}"
# dest: "/etc/grafana/dashboards/ceph-dashboard/{{ item }}"
# with_items: "{{ grafana_dashboard_files }}"
# when:
# - not containerized_deployment | bool
# - not ansible_facts['os_family'] in ['RedHat', 'Suse']

HEALTH_WARN mons openstack-ceph01,openstack-ceph02,openstack-ceph03 are low on available space

The reason for the warning message is that the available capacity of the monitor node is less than the set capacity.

Generally, ceph stores the information(ceph status, dump information of each node, etc)in a files or save some data to db on the monitor node, so some capacity of the disk is always required for this.
For this reason, the function of checking the monitor node capacity of ceph is included.
And by default option value, the ceph monitor node generate the warning message when the root capacity remains less 30%.
The option related to this is as follows.
mon_data_avail_warn = 30 (default value)Check the option value

#ceph --admin-daemon /var/run/ceph/ceph-mon.<mon-hostname>.asok config show |grep 'mon_data_avail' (On monitor node)ex)
#ceph --admin-daemon /var/run/ceph/ceph-mon.cnode1.asok config show |grep 'mon_data_avail'
"mon_data_avail_crit": "5",
"mon_data_avail_warn": "30",
root@openstack-ceph01:/home/kevin# df -h
Filesystem Size Used Avail Use% Mounted on
udev 7.8G 0 7.8G 0% /dev
tmpfs 1.6G 2.0M 1.6G 1% /run
/dev/mapper/ubuntu--vg-ubuntu--lv 15G 10G 4.1G 72% /

Solution

ceph tell mon.* injectargs '--mon_data_avail_warn [value]'
example)
#ceph tell mon.* injectargs '--mon_data_avail_warn 10'

Verify

root@openstack-ceph01:/home/kevin# ceph tell mon.* injectargs '--mon_data_avail_warn 10'
mon.openstack-ceph01: {}
mon.openstack-ceph01: mon_data_avail_warn = '10' (not observed, change may require restart)
mon.openstack-ceph02: {}
mon.openstack-ceph02: mon_data_avail_warn = '10' (not observed, change may require restart)
mon.openstack-ceph03: {}
mon.openstack-ceph03: mon_data_avail_warn = '10' (not observed, change may require restart)
root@openstack-ceph01:/home/kevin# ceph -s
cluster:
id: e2d9d8f9-a56e-43f7-899c-e43b31d1e205
health: HEALTH_OK
services:
mon: 3 daemons, quorum openstack-ceph01,openstack-ceph02,openstack-ceph03 (age 34h)
mgr: openstack-ceph01(active, since 34h), standbys: openstack-ceph02, openstack-ceph03
osd: 6 osds: 6 up (since 34h), 6 in (since 34h)
rgw: 3 daemons active (3 hosts, 1 zones)
data:
pools: 9 pools, 233 pgs
objects: 227 objects, 5.4 KiB
usage: 629 MiB used, 767 GiB / 768 GiB avail
pgs: 233 active+clean

--

--