You are running wayland and I use Xorg.
ok, thanks
The thing to note for me is ‘LnkCap2’ in lspci -vv
LnkCap2: Supported Link Speeds: 2.5-16GT/s, Crosslink- Retimer+ 2Retimers+ DRS-
It shows supported link speeds for the GPU. It is entirely absent from lspci output on 6.13. Similarly, bridge devices are also stuck at Gen 1.
Try using sudo
$ uname -r
6.14.0-63.fc42.x86_64
$ sudo lspci -vv -s $(lspci | grep -i VGA | awk '{print $1}') |grep -e LnkCap -e LnkCtl
LnkCap: Port #0, Speed 8GT/s, Width x16, ASPM L0s L1, Exit Latency L0s <512ns, L1 <4us
LnkCtl: ASPM Disabled; RCB 64 bytes, LnkDisable- CommClk+
LnkCap2: Supported Link Speeds: 2.5-8GT/s, Crosslink- Retimer- 2Retimers- DRS-
LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
LnkCtl3: LnkEquIntrruptEn- PerformEqu-
sudo was already used in both cases.
So I’ve installed the rawhide kernel using the Rawhide-Nodebug repo. Nvidia proprietary drivers don’t seem to be working any more, with lspci -vv now showing nouveau in use.
However, the Link speeds and Caps are now within expected values. Is the PCIe bandwidth issue related specifically to the proprietary driver?
01:00.0 VGA compatible controller: NVIDIA Corporation GA104M [GeForce RTX 3070 Mobile / Max-Q] (rev a1) (prog-if 00 [VGA controller])
Subsystem: Lenovo Device 3833
Physical Slot: 1
Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Interrupt: pin A routed to IRQ 0
IOMMU group: 16
Region 0: Memory at 60000000 (32-bit, non-prefetchable) [disabled] [size=16M]
Region 1: Memory at 6000000000 (64-bit, prefetchable) [disabled] [size=256M]
Region 3: Memory at 6010000000 (64-bit, prefetchable) [disabled] [size=32M]
Region 5: I/O ports at 5000 [disabled] [size=128]
Expansion ROM at 61000000 [virtual] [disabled] [size=512K]
Capabilities: [60] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold-)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
Address: 0000000000000000 Data: 0000
Capabilities: [78] Express (v2) Legacy Endpoint, IntMsgNum 0
DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s unlimited, L1 <64us
ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ TEE-IO-
DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset-
MaxPayload 256 bytes, MaxReadReq 512 bytes
DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr- TransPend-
LnkCap: Port #0, Speed 16GT/s, Width x16, ASPM L0s L1, Exit Latency L0s <512ns, L1 <4us
ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
LnkCtl: ASPM Disabled; RCB 64 bytes, LnkDisable- CommClk-
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 2.5GT/s (downgraded), Width x8 (downgraded)
TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Range AB, TimeoutDis+ NROPrPrP- LTR+
10BitTagComp+ 10BitTagReq+ OBFF Via message, ExtFmt- EETLPPrefix-
EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
FRS-
AtomicOpsCap: 32bit- 64bit- 128bitCAS-
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-
AtomicOpsCtl: ReqEn-
IDOReq- IDOCompl- LTR+ EmergencyPowerReductionReq-
10BitTagReq- OBFF Disabled, EETLPPrefixBlk-
LnkCap2: Supported Link Speeds: 2.5-16GT/s, Crosslink- Retimer+ 2Retimers+ DRS-
LnkCtl2: Target Link Speed: 16GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance Preset/De-emphasis: -6dB de-emphasis, 0dB preshoot
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete- EqualizationPhase1-
EqualizationPhase2- EqualizationPhase3- LinkEqualizationRequest-
Retimer- 2Retimers- CrosslinkRes: unsupported
Capabilities: [b4] Vendor Specific Information: Len=14 <?>
Capabilities: [100 v1] Virtual Channel
Caps: LPEVC=0 RefClk=100ns PATEntryBits=1
Arb: Fixed- WRR32- WRR64- WRR128-
Ctrl: ArbSelect=Fixed
Status: InProgress-
VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
Status: NegoPending- InProgress-
Capabilities: [250 v1] Latency Tolerance Reporting
Max snoop latency: 0ns
Max no snoop latency: 0ns
Capabilities: [258 v1] L1 PM Substates
L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
PortCommonModeRestoreTime=255us PortTPowerOnTime=10us
L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
T_CommonMode=0us LTR1.2_Threshold=0ns
L1SubCtl2: T_PwrOn=10us
Capabilities: [128 v1] Power Budgeting <?>
Capabilities: [420 v2] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP-
ECRC- UnsupReq- ACSViol- UncorrIntErr- BlockedTLP- AtomicOpBlocked- TLPBlockedErr-
PoisonTLPBlocked- DMWrReqBlocked- IDECheck- MisIDETLP- PCRC_CHECK- TLPXlatBlocked-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP-
ECRC- UnsupReq- ACSViol- UncorrIntErr- BlockedTLP- AtomicOpBlocked- TLPBlockedErr-
PoisonTLPBlocked- DMWrReqBlocked- IDECheck- MisIDETLP- PCRC_CHECK- TLPXlatBlocked-
UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+
ECRC- UnsupReq- ACSViol- UncorrIntErr+ BlockedTLP- AtomicOpBlocked- TLPBlockedErr-
PoisonTLPBlocked- DMWrReqBlocked- IDECheck- MisIDETLP- PCRC_CHECK- TLPXlatBlocked-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+ CorrIntErr- HeaderOF-
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+ CorrIntErr- HeaderOF+
AERCap: First Error Pointer: 00, ECRCGenCap- ECRCGenEn- ECRCChkCap- ECRCChkEn-
MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
HeaderLog: 00000000 00000000 00000000 00000000
Capabilities: [600 v1] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
Capabilities: [900 v1] Secondary PCI Express
LnkCtl3: LnkEquIntrruptEn- PerformEqu-
LaneErrStat: 0
Capabilities: [bb0 v1] Physical Resizable BAR
BAR 0: current size: 16MB, supported: 16MB
BAR 1: current size: 8GB, supported: 64MB 128MB 256MB 512MB 1GB 2GB 4GB 8GB
BAR 3: current size: 32MB, supported: 32MB
Capabilities: [c1c v1] Physical Layer 16.0 GT/s <?>
Capabilities: [d00 v1] Lane Margining at the Receiver
PortCap: Uses Driver+
PortSta: MargReady- MargSoftReady-
Capabilities: [e00 v1] Data Link Feature <?>
Kernel modules: nouveau
The nvidia driver doesn’t support the 6.15rc kernel.
If you want to use nvidia with the 6.15rc kernel you will need to use the beta driver from F43 as I have zero intention to backport to the stable releases
At this point, I’m just tired and frustrated. I’ve also been posting on Nvidia forums, but so far no one there knows what’s going on either.
UPDATE: Nvidia driver 575 has fixed the PCIe bandwidth issue! I’m now getting the full 16GT/s with kernel 6.14.4.
UPDATE 2: Unfortunately, the issue has now returned again. I had full PCIe performance right up until I ran my usual optimisation script, which also contains autoaspm.
UPDATE 3: I appear to have replicated the situation in which I get full PCIe performance.
- Disable the rpmfusion rawhide repo
- Downgrade to 570 and reboot to build kernel modules
- Reboot into affected kernel
- Reenable rpmfusion rawhide and update nvidia drivers
- Reboot again, and check with lspci -vv and vkcube
I also get this AFTER the upgrade, if it helps:
$ cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX Open Kernel Module for x86_64 570.144 Release Build (dvs-builder@U22-I3-AF02-07-1) Thu Apr 10 20:13:56 UTC 2025
GCC version: gcc version 15.1.1 20250425 (Red Hat 15.1.1-1) (GCC)
Also, the above output says it’s the open kernel module, despite installing akmod-nvidia, and not akmod-nvidia-open. How do I set it to use the proprietary module?
Post
rpm -qa akmod-nvidia\*
$ rpm -qa akmod-nvidia\*
akmod-nvidia-575.51.02-4.fc43.x86_64
In addition: Nvidia kmods are now rebuilt. No Kernel/Userspace driver mismatches at the moment.
akmod-nvidia-575.51.02-4 has detection for the open driver, if your card supports the open driver it will default to it automatically.
Maybe try this to use the proprietary module
sudo sh -c 'echo "%_without_kmod_nvidia_detect 1" > /etc/rpm/macros.nvidia-kmod'
The file already exists with this parameter in there
Yes, it was indeed 0:
%_with_kmod_nvidia_open 0
%if 0%{?_with_kmod_nvidia_open:1}
checks for whether variable is defined or not. if defined, regardless value is 0
or 1
it always evaluates to True
Thanks for correcting my error.
Post edited Low Nvidia performance with newer kernels - #57 by leigh123linux
Yup, this one seems to have actually built and loaded the proprietary module:
cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX x86_64 Kernel Module 575.51.02 Thu Apr 10 16:08:06 UTC 2025
GCC version: gcc version 15.1.1 20250425 (Red Hat 15.1.1-1) (GCC)
Unfortunately, the proprietary module has not fixed the original PCIe bandwidth issue.
@leigh123linux I had edited this out earlier, but this might be important:
I had not modified the nvidia rpm macro in any way until you brought it up. It is possible that the macro on the 575 (or lower) package may be bugged.