Commit graph

134 commits

Author SHA1 Message Date
Vitaly Kuznetsov
2bb16bea5f KVM: nSVM: Swap the parameter order for svm_copy_vmrun_state()/svm_copy_vmloadsave_state()
Make svm_copy_vmrun_state()/svm_copy_vmloadsave_state() interface match
'memcpy(dest, src)' to avoid any confusion.

No functional change intended.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20210719090322.625277-1-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-07-26 08:09:46 -04:00
Vitaly Kuznetsov
9a9e74819b KVM: nSVM: Rename nested_svm_vmloadsave() to svm_copy_vmloadsave_state()
To match svm_copy_vmrun_state(), rename nested_svm_vmloadsave() to
svm_copy_vmloadsave_state().

Opportunistically add missing braces to 'else' branch in
vmload_vmsave_interception().

No functional change intended.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20210716144104.465269-1-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-07-26 08:09:36 -04:00
Vitaly Kuznetsov
bb00bd9c08 KVM: nSVM: Restore nested control upon leaving SMM
If the VM was migrated while in SMM, no nested state was saved/restored,
and therefore svm_leave_smm has to load both save and control area
of the vmcb12. Save area is already loaded from HSAVE area,
so now load the control area as well from the vmcb12.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20210628104425.391276-6-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-07-15 10:19:44 -04:00
Vitaly Kuznetsov
0a75829076 KVM: nSVM: Introduce svm_copy_vmrun_state()
Separate the code setting non-VMLOAD-VMSAVE state from
svm_set_nested_state() into its own function. This is going to be
re-used from svm_enter_smm()/svm_leave_smm().

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20210628104425.391276-4-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-07-15 10:19:43 -04:00
Vitaly Kuznetsov
fb79f566e4 KVM: nSVM: Check that VM_HSAVE_PA MSR was set before VMRUN
APM states that "The address written to the VM_HSAVE_PA MSR, which holds
the address of the page used to save the host state on a VMRUN, must point
to a hypervisor-owned page. If this check fails, the WRMSR will fail with
a #GP(0) exception. Note that a value of 0 is not considered valid for the
VM_HSAVE_PA MSR and a VMRUN that is attempted while the HSAVE_PA is 0 will
fail with a #GP(0) exception."

svm_set_msr() already checks that the supplied address is valid, so only
check for '0' is missing. Add it to nested_svm_vmrun().

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20210628104425.391276-3-vkuznets@redhat.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-07-15 10:19:43 -04:00
Maxim Levitsky
4b639a9f82 KVM: SVM: add module param to control the #SMI interception
In theory there are no side effects of not intercepting #SMI,
because then #SMI becomes transparent to the OS and the KVM.

Plus an observation on recent Zen2 CPUs reveals that these
CPUs ignore #SMI interception and never deliver #SMI VMexits.

This is also useful to test nested KVM to see that L1
handles #SMIs correctly in case when L1 doesn't intercept #SMI.

Finally the default remains the same, the SMI are intercepted
by default thus this patch doesn't have any effect unless
non default module param value is used.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20210707125100.677203-4-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-07-15 10:19:42 -04:00
Sean Christopherson
616007c866 KVM: x86: Enhance comments for MMU roles and nested transition trickiness
Expand the comments for the MMU roles.  The interactions with gfn_track
PGD reuse in particular are hairy.

Regarding PGD reuse, add comments in the nested virtualization flows to
call out why kvm_init_mmu() is unconditionally called even when nested
TDP is used.

Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210622175739.3610207-50-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-24 18:00:47 -04:00
Sean Christopherson
16be1d1292 KVM: x86/mmu: Move nested NPT reserved bit calculation into MMU proper
Move nested NPT's invocation of reset_shadow_zero_bits_mask() into the
MMU proper and unexport said function.  Aside from dropping an export,
this is a baby step toward eliminating the call entirely by fixing the
shadow_root_level confusion.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210622175739.3610207-19-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-24 18:00:39 -04:00
Sean Christopherson
31e96bc636 KVM: nSVM: Add a comment to document why nNPT uses vmcb01, not vCPU state
Add a comment in the nested NPT initialization flow to call out that it
intentionally uses vmcb01 instead current vCPU state to get the effective
hCR4 and hEFER for L1's NPT context.

Note, despite nSVM's efforts to handle the case where vCPU state doesn't
reflect L1 state, the MMU may still do the wrong thing due to pulling
state from the vCPU instead of the passed in CR0/CR4/EFER values.  This
will be addressed in future commits.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210622175739.3610207-16-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-24 18:00:38 -04:00
Sean Christopherson
dbc4739b6b KVM: x86: Fix sizes used to pass around CR0, CR4, and EFER
When configuring KVM's MMU, pass CR0 and CR4 as unsigned longs, and EFER
as a u64 in various flows (mostly MMU).  Passing the params as u32s is
functionally ok since all of the affected registers reserve bits 63:32 to
zero (enforced by KVM), but it's technically wrong.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210622175739.3610207-15-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-24 18:00:38 -04:00
Sean Christopherson
c906066288 KVM: x86: Drop pointless @reset_roots from kvm_init_mmu()
Remove the @reset_roots param from kvm_init_mmu(), the one user,
kvm_mmu_reset_context() has already unloaded the MMU and thus freed and
invalidated all roots.  This also happens to be why the reset_roots=true
paths doesn't leak roots; they're already invalid.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210609234235.1244004-14-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-17 13:09:54 -04:00
Sean Christopherson
b512910039 KVM: x86: Drop skip MMU sync and TLB flush params from "new PGD" helpers
Drop skip_mmu_sync and skip_tlb_flush from __kvm_mmu_new_pgd() now that
all call sites unconditionally skip both the sync and flush.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210609234235.1244004-8-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-17 13:09:52 -04:00
Sean Christopherson
d2e5601907 KVM: nSVM: Move TLB flushing logic (or lack thereof) to dedicated helper
Introduce nested_svm_transition_tlb_flush() and use it force an MMU sync
and TLB flush on nSVM VM-Enter and VM-Exit instead of sneaking the logic
into the __kvm_mmu_new_pgd() call sites.  Add a partial todo list to
document issues that need to be addressed before the unconditional sync
and flush can be modified to look more like nVMX's logic.

In addition to making nSVM's forced flushing more overt (guess who keeps
losing track of it), the new helper brings further convergence between
nSVM and nVMX, and also sets the stage for dropping the "skip" params
from __kvm_mmu_new_pgd().

Cc: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210609234235.1244004-7-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-17 13:09:52 -04:00
Maxim Levitsky
158a48ecf7 KVM: x86: avoid loading PDPTRs after migration when possible
if new KVM_*_SREGS2 ioctls are used, the PDPTRs are
a part of the migration state and are correctly
restored by those ioctls.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20210607090203.133058-9-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-17 13:09:48 -04:00
Maxim Levitsky
b222b0b881 KVM: nSVM: refactor the CR3 reload on migration
Document the actual reason why we need to do it
on migration and move the call to svm_set_nested_state
to be closer to VMX code.

To avoid loading the PDPTRs from possibly not up to date memory map,
in nested_svm_load_cr3 after the move, move this code to
.get_nested_state_pages.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20210607090203.133058-5-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-17 13:09:47 -04:00
Sean Christopherson
a36dbec67e KVM: nSVM: Drop pointless pdptrs_changed() check on nested transition
Remove the "PDPTRs unchanged" check to skip PDPTR loading during nested
SVM transitions as it's not at all an optimization.  Reading guest memory
to get the PDPTRs isn't magically cheaper by doing it in pdptrs_changed(),
and if the PDPTRs did change, KVM will end up doing the read twice.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210607090203.133058-3-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-17 13:09:46 -04:00
Krish Sadhukhan
b93af02c67 KVM: nVMX: nSVM: 'nested_run' should count guest-entry attempts that make it to guest code
Currently, the 'nested_run' statistic counts all guest-entry attempts,
including those that fail during vmentry checks on Intel and during
consistency checks on AMD. Convert this statistic to count only those
guest-entries that make it past these state checks and make it to guest
code. This will tell us the number of guest-entries that actually executed
or tried to execute guest code.

Signed-off-by: Krish Sadhukhan <Krish.Sadhukhan@oracle.com>
Message-Id: <20210609180340.104248-2-krish.sadhukhan@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-17 13:09:35 -04:00
Maxim Levitsky
809c79137a KVM: nSVM: remove a warning about vmcb01 VM exit reason
While in most cases, when returning to use the VMCB01,
the exit reason stored in it will be SVM_EXIT_VMRUN,
on first VM exit after a nested migration this field
can contain anything since the VM entry did happen
before the migration.

Remove this warning to avoid the false positive.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20210504143936.1644378-3-mlevitsk@redhat.com>
Fixes: 9a7de6ecc3 ("KVM: nSVM: If VMRUN is single-stepped, queue the #DB intercept in nested_svm_vmexit()")
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-05-07 06:06:14 -04:00
Maxim Levitsky
063ab16c14 KVM: nSVM: always restore the L1's GIF on migration
While usually the L1's GIF is set while L2 runs, and usually
migration nested state is loaded after a vCPU reset which
also sets L1's GIF to true, this is not guaranteed.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20210504143936.1644378-2-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-05-07 06:06:14 -04:00
Maxim Levitsky
9d290e1643 KVM: nSVM: leave the guest mode prior to loading a nested state
This allows the KVM to load the nested state more than
once without warnings.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20210503125446.1353307-4-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-05-03 11:25:38 -04:00
Maxim Levitsky
c74ad08f33 KVM: nSVM: fix few bugs in the vmcb02 caching logic
* Define and use an invalid GPA (all ones) for init value of last
  and current nested vmcb physical addresses.

* Reset the current vmcb12 gpa to the invalid value when leaving
  the nested mode, similar to what is done on nested vmexit.

* Reset	the last seen vmcb12 address when disabling the nested SVM,
  as it relies on vmcb02 fields which are freed at that point.

Fixes: 4995a3685f ("KVM: SVM: Use a separate vmcb for the nested L2 guest")

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20210503125446.1353307-3-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-05-03 11:25:37 -04:00
Maxim Levitsky
deee59bacb KVM: nSVM: fix a typo in svm_leave_nested
When forcibly leaving the nested mode, we should switch to vmcb01

Fixes: 4995a3685f ("KVM: SVM: Use a separate vmcb for the nested L2 guest")

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20210503125446.1353307-2-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-05-03 11:25:36 -04:00
Paolo Bonzini
fd49e8ee70 Merge branch 'kvm-sev-cgroup' into HEAD 2021-04-22 13:19:01 -04:00
Krish Sadhukhan
ee695f22b5 nSVM: Check addresses of MSR and IO permission maps
According to section "Canonicalization and Consistency Checks" in APM vol 2,
the following guest state is illegal:

    "The MSR or IOIO intercept tables extend to a physical address that
     is greater than or equal to the maximum supported physical address."

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Message-Id: <20210412215611.110095-5-krish.sadhukhan@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-21 12:20:01 -04:00
Maxim Levitsky
4020da3b9f KVM: x86: pending exceptions must not be blocked by an injected event
Injected interrupts/nmi should not block a pending exception,
but rather be either lost if nested hypervisor doesn't
intercept the pending exception (as in stock x86), or be delivered
in exitintinfo/IDT_VECTORING_INFO field, as a part of a VMexit
that corresponds to the pending exception.

The only reason for an exception to be blocked is when nested run
is pending (and that can't really happen currently
but still worth checking for).

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20210401143817.1030695-2-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-17 08:31:02 -04:00
Maxim Levitsky
232f75d3b4 KVM: nSVM: call nested_svm_load_cr3 on nested state load
While KVM's MMU should be fully reset by loading of nested CR0/CR3/CR4
by KVM_SET_SREGS, we are not in nested mode yet when we do it and therefore
only root_mmu is reset.

On regular nested entries we call nested_svm_load_cr3 which both updates
the guest's CR3 in the MMU when it is needed, and it also initializes
the mmu again which makes it initialize the walk_mmu as well when nested
paging is enabled in both host and guest.

Since we don't call nested_svm_load_cr3 on nested state load,
the walk_mmu can be left uninitialized, which can lead to a NULL pointer
dereference while accessing it if we happen to get a nested page fault
right after entering the nested guest first time after the migration and
we decide to emulate it, which leads to the emulator trying to access
walk_mmu->gva_to_gpa which is NULL.

Therefore we should call this function on nested state load as well.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20210401141814.1029036-3-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-17 08:31:01 -04:00
Sean Christopherson
eba04b20e4 KVM: x86: Account a variety of miscellaneous allocations
Switch to GFP_KERNEL_ACCOUNT for a handful of allocations that are
clearly associated with a single task/VM.

Note, there are a several SEV allocations that aren't accounted, but
those can (hopefully) be fixed by using the local stack for memory.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210331023025.2485960-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-17 08:30:58 -04:00
Krish Sadhukhan
9a7de6ecc3 KVM: nSVM: If VMRUN is single-stepped, queue the #DB intercept in nested_svm_vmexit()
According to APM, the #DB intercept for a single-stepped VMRUN must happen
after the completion of that instruction, when the guest does #VMEXIT to
the host. However, in the current implementation of KVM, the #DB intercept
for a single-stepped VMRUN happens after the completion of the instruction
that follows the VMRUN instruction. When the #DB intercept handler is
invoked, it shows the RIP of the instruction that follows VMRUN, instead of
of VMRUN itself. This is an incorrect RIP as far as single-stepping VMRUN
is concerned.

This patch fixes the problem by checking, in nested_svm_vmexit(), for the
condition that the VMRUN instruction is being single-stepped and if so,
queues the pending #DB intercept so that the #DB is accounted for before
we execute L1's next instruction.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oraacle.com>
Message-Id: <20210323175006.73249-2-krish.sadhukhan@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-17 08:30:52 -04:00
Paolo Bonzini
cb9b6a1b19 Merge branch 'kvm-fix-svm-races' into HEAD 2021-04-01 05:19:48 -04:00
Paolo Bonzini
6ebae23c07 Merge branch 'kvm-fix-svm-races' into kvm-master 2021-04-01 05:14:05 -04:00
Paolo Bonzini
3c346c0c60 KVM: SVM: ensure that EFER.SVME is set when running nested guest or on nested vmexit
Fixing nested_vmcb_check_save to avoid all TOC/TOU races
is a bit harder in released kernels, so do the bare minimum
by avoiding that EFER.SVME is cleared.  This is problematic
because svm_set_efer frees the data structures for nested
virtualization if EFER.SVME is cleared.

Also check that EFER.SVME remains set after a nested vmexit;
clearing it could happen if the bit is zero in the save area
that is passed to KVM_SET_NESTED_STATE (the save area of the
nested state corresponds to the nested hypervisor's state
and is restored on the next nested vmexit).

Cc: stable@vger.kernel.org
Fixes: 2fcf4876ad ("KVM: nSVM: implement on demand allocation of the nested state")
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-01 05:11:35 -04:00
Paolo Bonzini
a58d9166a7 KVM: SVM: load control fields from VMCB12 before checking them
Avoid races between check and use of the nested VMCB controls.  This
for example ensures that the VMRUN intercept is always reflected to the
nested hypervisor, instead of being processed by the host.  Without this
patch, it is possible to end up with svm->nested.hsave pointing to
the MSR permission bitmap for nested guests.

This bug is CVE-2021-29657.

Reported-by: Felix Wilhelm <fwilhelm@google.com>
Cc: stable@vger.kernel.org
Fixes: 2fcf4876ad ("KVM: nSVM: implement on demand allocation of the nested state")
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-01 05:09:31 -04:00
Cathy Avery
8173396e94 KVM: nSVM: Optimize vmcb12 to vmcb02 save area copies
Use the vmcb12 control clean field to determine which vmcb12.save
registers were marked dirty in order to minimize register copies
when switching from L1 to L2. Those vmcb12 registers marked as dirty need
to be copied to L0's vmcb02 as they will be used to update the vmcb
state cache for the L2 VMRUN.  In the case where we have a different
vmcb12 from the last L2 VMRUN all vmcb12.save registers must be
copied over to L2.save.

Tested:
kvm-unit-tests
kvm selftests
Fedora L1 L2

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Cathy Avery <cavery@redhat.com>
Message-Id: <20210301200844.2000-1-cavery@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-15 04:43:26 -04:00
Babu Moger
d00b99c514 KVM: SVM: Add support for Virtual SPEC_CTRL
Newer AMD processors have a feature to virtualize the use of the
SPEC_CTRL MSR. Presence of this feature is indicated via CPUID
function 0x8000000A_EDX[20]: GuestSpecCtrl. Hypervisors are not
required to enable this feature since it is automatically enabled on
processors that support it.

A hypervisor may wish to impose speculation controls on guest
execution or a guest may want to impose its own speculation controls.
Therefore, the processor implements both host and guest
versions of SPEC_CTRL.

When in host mode, the host SPEC_CTRL value is in effect and writes
update only the host version of SPEC_CTRL. On a VMRUN, the processor
loads the guest version of SPEC_CTRL from the VMCB. When the guest
writes SPEC_CTRL, only the guest version is updated. On a VMEXIT,
the guest version is saved into the VMCB and the processor returns
to only using the host SPEC_CTRL for speculation control. The guest
SPEC_CTRL is located at offset 0x2E0 in the VMCB.

The effective SPEC_CTRL setting is the guest SPEC_CTRL setting or'ed
with the hypervisor SPEC_CTRL setting. This allows the hypervisor to
ensure a minimum SPEC_CTRL if desired.

This support also fixes an issue where a guest may sometimes see an
inconsistent value for the SPEC_CTRL MSR on processors that support
this feature. With the current SPEC_CTRL support, the first write to
SPEC_CTRL is intercepted and the virtualized version of the SPEC_CTRL
MSR is not updated. When the guest reads back the SPEC_CTRL MSR, it
will be 0x0, instead of the actual expected value. There isn’t a
security concern here, because the host SPEC_CTRL value is or’ed with
the Guest SPEC_CTRL value to generate the effective SPEC_CTRL value.
KVM writes with the guest's virtualized SPEC_CTRL value to SPEC_CTRL
MSR just before the VMRUN, so it will always have the actual value
even though it doesn’t appear that way in the guest. The guest will
only see the proper value for the SPEC_CTRL register if the guest was
to write to the SPEC_CTRL register again. With Virtual SPEC_CTRL
support, the save area spec_ctrl is properly saved and restored.
So, the guest will always see the proper value when it is read back.

Signed-off-by: Babu Moger <babu.moger@amd.com>
Message-Id: <161188100955.28787.11816849358413330720.stgit@bmoger-ubuntu>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-15 04:43:25 -04:00
Maxim Levitsky
cc3ed80ae6 KVM: nSVM: always use vmcb01 to for vmsave/vmload of guest state
This allows to avoid copying of these fields between vmcb01
and vmcb02 on nested guest entry/exit.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-15 04:43:24 -04:00
Sean Christopherson
3a87c7e0d1 KVM: nSVM: Add helper to synthesize nested VM-Exit without collateral
Add a helper to consolidate boilerplate for nested VM-Exits that don't
provide any data in exit_info_*.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210302174515.2812275-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-15 04:43:16 -04:00
Sean Christopherson
cb6a32c2b8 KVM: x86: Handle triple fault in L2 without killing L1
Synthesize a nested VM-Exit if L2 triggers an emulated triple fault
instead of exiting to userspace, which likely will kill L1.  Any flow
that does KVM_REQ_TRIPLE_FAULT is suspect, but the most common scenario
for L2 killing L1 is if L0 (KVM) intercepts a contributory exception that
is _not_intercepted by L1.  E.g. if KVM is intercepting #GPs for the
VMware backdoor, a #GP that occurs in L2 while vectoring an injected #DF
will cause KVM to emulate triple fault.

Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Jim Mattson <jmattson@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210302174515.2812275-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-15 04:43:15 -04:00
Paolo Bonzini
6312975417 KVM: SVM: Pass struct kvm_vcpu to exit handlers (and many, many other places)
Refactor the svm_exit_handlers API to pass @vcpu instead of @svm to
allow directly invoking common x86 exit handlers (in a future patch).
Opportunistically convert an absurd number of instances of 'svm->vcpu'
to direct uses of 'vcpu' to avoid pointless casting.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210205005750.3841462-4-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-15 04:43:09 -04:00
Sean Christopherson
11f0cbf0c6 KVM: nSVM: Trace VM-Enter consistency check failures
Use trace_kvm_nested_vmenter_failed() and its macro magic to trace
consistency check failures on nested VMRUN.  Tracing such failures by
running the buggy VMM as a KVM guest is often the only way to get a
precise explanation of why VMRUN failed.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210204000117.3303214-13-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-15 04:42:37 -04:00
Krish Sadhukhan
6906e06db9 KVM: nSVM: Add missing checks for reserved bits to svm_set_nested_state()
The path for SVM_SET_NESTED_STATE needs to have the same checks for the CPU
registers, as we have in the VMRUN path for a nested guest. This patch adds
those missing checks to svm_set_nested_state().

Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Message-Id: <20201006190654.32305-3-krish.sadhukhan@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-15 04:42:35 -04:00
Paolo Bonzini
c08f390a75 KVM: nSVM: only copy L1 non-VMLOAD/VMSAVE data in svm_set_nested_state()
The VMLOAD/VMSAVE data is not taken from userspace, since it will
not be restored on VMEXIT (it will be copied from VMCB02 to VMCB01).
For clarity, replace the wholesale copy of the VMCB save area
with a copy of that state only.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-15 04:42:34 -04:00
Paolo Bonzini
4bb170a543 KVM: nSVM: do not mark all VMCB02 fields dirty on nested vmexit
Since L1 and L2 now use different VMCBs, most of the fields remain the
same in VMCB02 from one L2 run to the next.  Since KVM itself is not
looking at VMCB12's clean field, for now not much can be optimized.
However, in the future we could avoid more copies if the VMCB12's SEG
and DT sections are clean.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-15 04:42:33 -04:00
Paolo Bonzini
7ca62d1322 KVM: nSVM: do not mark all VMCB01 fields dirty on nested vmexit
Since L1 and L2 now use different VMCBs, most of the fields remain
the same from one L1 run to the next.  svm_set_cr0 and other functions
called by nested_svm_vmexit already take care of clearing the
corresponding clean bits; only the TSC offset is special.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-15 04:42:32 -04:00
Paolo Bonzini
7c3ecfcd31 KVM: nSVM: do not copy vmcb01->control blindly to vmcb02->control
Most fields were going to be overwritten by vmcb12 control fields, or
do not matter at all because they are filled by the processor on vmexit.
Therefore, we need not copy them from vmcb01 to vmcb02 on vmentry.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-15 04:42:32 -04:00
Paolo Bonzini
9e8f0fbfff KVM: nSVM: rename functions and variables according to vmcbXY nomenclature
Now that SVM is using a separate vmcb01 and vmcb02 (and also uses the vmcb12
naming) we can give clearer names to functions that write to and read
from those VMCBs.  Likewise, variables and parameters can be renamed
from nested_vmcb to vmcb12.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-15 04:42:31 -04:00
Cathy Avery
4995a3685f KVM: SVM: Use a separate vmcb for the nested L2 guest
svm->vmcb will now point to a separate vmcb for L1 (not nested) or L2
(nested).

The main advantages are removing get_host_vmcb and hsave, in favor of
concepts that are shared with VMX.

We don't need anymore to stash the L1 registers in hsave while L2
runs, but we need to copy the VMLOAD/VMSAVE registers from VMCB01 to
VMCB02 and back.  This more or less has the same cost, but code-wise
nested_svm_vmloadsave can be reused.

This patch omits several optimizations that are possible:

- for simplicity there is some wholesale copying of vmcb.control areas
which can go away.

- we should be able to better use the VMCB01 and VMCB02 clean bits.

- another possibility is to always use VMCB01 for VMLOAD and VMSAVE,
thus avoiding the copy of VMLOAD/VMSAVE registers from VMCB01 to
VMCB02 and back.

Tested:
kvm-unit-tests
kvm self tests
Loaded fedora nested guest on fedora

Signed-off-by: Cathy Avery <cavery@redhat.com>
Message-Id: <20201011184818.3609-3-cavery@redhat.com>
[Fix conflicts; keep VMCB02 G_PAT up to date whenever guest writes the
 PAT MSR; do not copy CR4 over from VMCB01 as it is not needed anymore; add
 a few more comments. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-15 04:42:28 -04:00
Dongli Zhang
43c11d91fb KVM: x86: to track if L1 is running L2 VM
The new per-cpu stat 'nested_run' is introduced in order to track if L1 VM
is running or used to run L2 VM.

An example of the usage of 'nested_run' is to help the host administrator
to easily track if any L1 VM is used to run L2 VM. Suppose there is issue
that may happen with nested virtualization, the administrator will be able
to easily narrow down and confirm if the issue is due to nested
virtualization via 'nested_run'. For example, whether the fix like
commit 88dddc11a8 ("KVM: nVMX: do not use dangling shadow VMCS after
guest reset") is required.

Cc: Joe Jin <joe.jin@oracle.com>
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Message-Id: <20210305225747.7682-1-dongli.zhang@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-15 04:28:02 -04:00
Paolo Bonzini
d2df592fd8 KVM: nSVM: prepare guest save area while is_guest_mode is true
Right now, enter_svm_guest_mode is calling nested_prepare_vmcb_save and
nested_prepare_vmcb_control.  This results in is_guest_mode being false
until the end of nested_prepare_vmcb_control.

This is a problem because nested_prepare_vmcb_save can in turn cause
changes to the intercepts and these have to be applied to the "host VMCB"
(stored in svm->nested.hsave) and then merged with the VMCB12 intercepts
into svm->vmcb.

In particular, without this change we forget to set the CR0 read and CR0
write intercepts when running a real mode L2 guest with NPT disabled.
The guest is therefore able to see the CR0.PG bit that KVM sets to
enable "paged real mode".  This patch fixes the svm.flat mode_switch
test case with npt=0.  There are no other problematic calls in
nested_prepare_vmcb_save.

Moving is_guest_mode to the end is done since commit 06fc777269
("KVM: SVM: Activate nested state only when guest state is complete",
2010-04-25).  However, back then KVM didn't grab a different VMCB
when updating the intercepts, it had already copied/merged L1's stuff
to L0's VMCB, and then updated L0's VMCB regardless of is_nested().
Later recalc_intercepts was introduced in commit 384c636843
("KVM: SVM: Add function to recalculate intercept masks", 2011-01-12).
This introduced the bug, because recalc_intercepts now throws away
the intercept manipulations that svm_set_cr0 had done in the meanwhile
to svm->vmcb.

[1] https://lore.kernel.org/kvm/1266493115-28386-1-git-send-email-joerg.roedel@amd.com/

Reviewed-by: Sean Christopherson <seanjc@google.com>
Tested-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-22 13:11:56 -05:00
Paolo Bonzini
a04aead144 KVM: nSVM: fix running nested guests when npt=0
In case of npt=0 on host, nSVM needs the same .inject_page_fault tweak
as VMX has, to make sure that shadow mmu faults are injected as vmexits.

It is not clear why this is needed at all, but for now keep the same
code as VMX and we'll fix it for both.

Based on a patch by Maxim Levitsky <mlevitsk@redhat.com>.

Fixes: 7c86663b68 ("KVM: nSVM: inject exceptions via svm_check_nested_events")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-18 07:33:30 -05:00
Maxim Levitsky
954f419ba8 KVM: nSVM: move nested vmrun tracepoint to enter_svm_guest_mode
This way trace will capture all the nested mode entries
(including entries after migration, and from smm)

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20210217145718.1217358-3-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-18 07:33:30 -05:00