Anyone yet able to replicate the nvidia PCIe bandwidth issue?
I just tried a CachyOS live environment. The PCIe issue does not seem to persist with its kernel (6.14.8).
Tried booting fedora with that kernel using the Cachy kernel COPR, and the PCIe issue does indeed seem to be gone, though now ReBAR no longer works and the GPU is still underperforming:
01:00.0 VGA compatible controller: NVIDIA Corporation GA104M [GeForce RTX 3070 Mobile / Max-Q] (rev a1) (prog-if 00 [VGA controller])
Subsystem: Lenovo Device 3833
Physical Slot: 1
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 211
IOMMU group: 16
Region 0: Memory at 60000000 (32-bit, non-prefetchable) [size=16M]
Region 1: Memory at 6000000000 (64-bit, prefetchable) [size=256M]
Region 3: Memory at 6010000000 (64-bit, prefetchable) [size=32M]
Region 5: I/O ports at 5000 [size=128]
Expansion ROM at 61000000 [virtual] [disabled] [size=512K]
Capabilities: [60] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold-)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+
Address: 00000000fee00d78 Data: 0000
Capabilities: [78] Express (v1) Legacy Endpoint, IntMsgNum 0
DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s unlimited, L1 <64us
ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ TEE-IO-
DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset-
MaxPayload 256 bytes, MaxReadReq 512 bytes
DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
LnkCap: Port #0, Speed 2.5GT/s, Width x16, ASPM L0s L1, Exit Latency L0s <512ns, L1 <4us
ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
LnkCtl: ASPM Disabled; RCB 64 bytes, LnkDisable- CommClk-
ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 2.5GT/s, Width x8 (downgraded)
TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
Capabilities: [b4] Vendor Specific Information: Len=14 <?>
Capabilities: [100 v1] Virtual Channel
Caps: LPEVC=0 RefClk=100ns PATEntryBits=1
Arb: Fixed- WRR32- WRR64- WRR128-
Ctrl: ArbSelect=Fixed
Status: InProgress-
VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
Status: NegoPending- InProgress-
Capabilities: [250 v1] Latency Tolerance Reporting
Max snoop latency: 34326183936ns
Max no snoop latency: 34326183936ns
Capabilities: [258 v1] L1 PM Substates
L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
PortCommonModeRestoreTime=255us PortTPowerOnTime=10us
L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
T_CommonMode=0us LTR1.2_Threshold=0ns
L1SubCtl2: T_PwrOn=10us
Capabilities: [128 v1] Power Budgeting <?>
Capabilities: [420 v2] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP-
ECRC- UnsupReq- ACSViol- UncorrIntErr- BlockedTLP- AtomicOpBlocked- TLPBlockedErr-
PoisonTLPBlocked- DMWrReqBlocked- IDECheck- MisIDETLP- PCRC_CHECK- TLPXlatBlocked-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP-
ECRC- UnsupReq- ACSViol- UncorrIntErr- BlockedTLP- AtomicOpBlocked- TLPBlockedErr-
PoisonTLPBlocked- DMWrReqBlocked- IDECheck- MisIDETLP- PCRC_CHECK- TLPXlatBlocked-
UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+
ECRC- UnsupReq- ACSViol- UncorrIntErr+ BlockedTLP- AtomicOpBlocked- TLPBlockedErr-
PoisonTLPBlocked- DMWrReqBlocked- IDECheck- MisIDETLP- PCRC_CHECK- TLPXlatBlocked-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr- CorrIntErr- HeaderOF-
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+ CorrIntErr- HeaderOF+
AERCap: First Error Pointer: 00, ECRCGenCap- ECRCGenEn- ECRCChkCap- ECRCChkEn-
MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
HeaderLog: 00000000 00000000 00000000 00000000
Capabilities: [600 v1] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
Capabilities: [900 v1] Null
Capabilities: [bb0 v1] Physical Resizable BAR
BAR 0: current size: 16MB, supported: 16MB
BAR 1: current size: 256MB, supported: 64MB 128MB 256MB 512MB 1GB 2GB 4GB 8GB
BAR 3: current size: 32MB, supported: 32MB
Capabilities: [c1c v1] Physical Layer 16.0 GT/s <?>
Capabilities: [d00 v1] Lane Margining at the Receiver
PortCap: Uses Driver+
PortSta: MargReady- MargSoftReady-
Capabilities: [e00 v1] Data Link Feature <?>
Kernel driver in use: nvidia
Kernel modules: nouveau, nvidia_drm, nvidia
Edit: Also recently noticed a convenient ‘build open nvidia module’ package. Kudos to the packaging team.
Looks like the stuff in the previous post was yet again the result of version mismatches.
Can anyone with a Lenovo (or other brand) Nvidia laptop please test for this bandwidth/PCIe Gen issue? And even if you are unable to replicate it, could you please share lspci, inxi and journalctl outputs?
Update:
I just booted the new 6.15.4 kernel with the laptop in dGPU mode (desktop-like setup with disaply connected to dGPU), and that works just fine with full link speed and correct downgrading behaviour. However, Dynamic mode (Optimus mode) still has the link speed issue.
Found a temporary fix
According to this, it’s a lenovo BIOS issue, something about changing secure BIOS settings and disabling WMI, which caused configuration issues with nvidia dGPUs. Rolling back the BIOS fixed this link speed issue, though I do not know yet if a later bios has fixed this issue or not.
Lenovo’s Chinese support site has old BIOS packages available. Per the linked post, googling ‘[BIOS Ver.] lenovo cn’ should get you the site.
I will consider this the solution for this thread for now.
EDIT: Latest BIOS (J2CN61WW) does NOT fix this issue. Anyone know how to escalate this to Lenovo or get it to their BIOS/Legion team?