Ask Your Question

Revision history [back]

click to hide/show revision 1
initial version

One of two identical M.2 nvme drives disabling due to same NQN

Hi,

Trying to install Fedora 29 on an Intel NUC8i7hvk with two Intel 760p 1TB NVMe M.2 drives.
Latest 004C firwmare on the 760p drives and version 50 BIOS on the NUC. If I disable one of the two M.2 drives in the BIOS, the other works perfectly. It doesn't matter which is disabled In the BIOS, both function well when only one is enabled via BIOS.

If I enable both, the nvme_core kernel module disables one of them, see message that follows.

[root@localhost ~]# dmesg | grep -i nvme

[ 8.067045] nvme nvme0: pci function 0000:72:00.0
[ 8.067088] nvme nvme1: pci function 0000:73:00.0
[ 8.281930] nvme nvme0: ignoring ctrl due to duplicate subnqn (nqn.2017-12.org.nvmexpress:uuid:11111111-2222-3333-4444-555555555555).
[ 8.281932] nvme nvme0: Removing after probe failure status: -22
[ 8.284565] nvme0n1: p1

Running "nvme list" with M.2 slot 1 enabled and 2 disabled in the BIOS yields the following.

Node SN Model Namespace Usage Format FW Rev
/dev/nvme0n1 BTHH81850C8W1P0E INTEL SSDPEKKW010T8 1 1.02 TB / 1.02 TB 512 B + 0 B 004C

With M.2 slot 2 enabled and slot 1 disabled in the BIOS gives:

Node SN Model Namespace Usage Format FW Rev
/dev/nvme0n1 BTHH81850BX31P0E INTEL SSDPEKKW010T8 1 1.02 TB / 1.02 TB 512 B + 0 B 004C

Note the two NVMe drives sharing a common model number but different serial numbers.
I've also experimented with GPT and DOS partition tables, making certain, with the blkid command, that the UUID's where always different between the drives.

With slot 1 enabled in BIOS, the "nvme list-subsys" command yields:

nvme-subsys0 - NQN=nqn.2017-12.org.nvmexpress:uuid:11111111-2222-3333-4444-555555555555
+- nvme0 pcie 0000:72:00.0

With slot 2 enabled in BIOS, the "nvme list-subsys" command yields:

nvme-subsys0 - NQN=nqn.2017-12.org.nvmexpress:uuid:11111111-2222-3333-4444-555555555555
+- nvme0 pcie 0000:73:00.0

The problem seems to be that something (not sure if the BIOS or Linux) is assigning the same NVMe Qualified Name (NQN) to both NVMe drives when both are enabled in the BIOS. Thinking it might be due to BIOS, I tried all kinds of variations in the BIOS settings. Not sure if the NQN is coming from the BIOS or Linux kernel. So not sure whether to blame Intel or the kernel maintainers.

I did have one occassion, amongs dozens of attempts, where the nvme_core kernel module did not disable one of the two nvme drives. Both appeared in the "nvme list-subsys" output as nvme0 and nvme1 with working /dev/nvme0n1 and /dev/nvme0n2 entries under /dev. Was excited when I achieved that result but was fagile and unable to repeat the success. I think it might have had to do with having different grub2 MBR installs on each and corresponding EFI boot records for each in the EFI cache at the time, not sure.

Maybe the BIOS maintainers are to blame. If so, I'll take this to Intel Support instead. Read a bit of the NVMe publications on the topic of NQN naming.
My interpretation is that NQN names are to be unique per device. The only time a NQN should be found to be the same is when seeing the same device down two or more I/O paths, i.e. multipathing. Assigning the same 8-4-4-12 UUID to both NVMe devices sounds like a violation of the NVMe standard.
Assigning UUID's of 11111111-2222-3333-4444-555555555555 to both NVMe drives is just plain lazy, plenty of entropy to be found when querying the device.

Anyhow, been fighting this for days. Silver lining is that it is forcing me to explore NVMe, EFI and other topics beyond my comfort zone. Not sure if any of it is sinking in though. First step, I think, is to determine whether this is a BIOS problem or a Linux kernel (nvme module) problem. Any guidance is much appreciated. Thanks.