| From 2b5ef70f254ad113daed1c26ece373ef3e1288b4 Mon Sep 17 00:00:00 2001 |
| From: Sasha Levin <sashal@kernel.org> |
| Date: Mon, 8 Feb 2021 14:29:56 +1100 |
| Subject: powerpc/64s: Fix pte update for kernel memory on radix |
| |
| From: Jordan Niethe <jniethe5@gmail.com> |
| |
| [ Upstream commit b8b2f37cf632434456182e9002d63cbc4cccc50c ] |
| |
| When adding a PTE a ptesync is needed to order the update of the PTE |
| with subsequent accesses otherwise a spurious fault may be raised. |
| |
| radix__set_pte_at() does not do this for performance gains. For |
| non-kernel memory this is not an issue as any faults of this kind are |
| corrected by the page fault handler. For kernel memory these faults |
| are not handled. The current solution is that there is a ptesync in |
| flush_cache_vmap() which should be called when mapping from the |
| vmalloc region. |
| |
| However, map_kernel_page() does not call flush_cache_vmap(). This is |
| troublesome in particular for code patching with Strict RWX on radix. |
| In do_patch_instruction() the page frame that contains the instruction |
| to be patched is mapped and then immediately patched. With no ordering |
| or synchronization between setting up the PTE and writing to the page |
| it is possible for faults. |
| |
| As the code patching is done using __put_user_asm_goto() the resulting |
| fault is obscured - but using a normal store instead it can be seen: |
| |
| BUG: Unable to handle kernel data access on write at 0xc008000008f24a3c |
| Faulting instruction address: 0xc00000000008bd74 |
| Oops: Kernel access of bad area, sig: 11 [#1] |
| LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA PowerNV |
| Modules linked in: nop_module(PO+) [last unloaded: nop_module] |
| CPU: 4 PID: 757 Comm: sh Tainted: P O 5.10.0-rc5-01361-ge3c1b78c8440-dirty #43 |
| NIP: c00000000008bd74 LR: c00000000008bd50 CTR: c000000000025810 |
| REGS: c000000016f634a0 TRAP: 0300 Tainted: P O (5.10.0-rc5-01361-ge3c1b78c8440-dirty) |
| MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 44002884 XER: 00000000 |
| CFAR: c00000000007c68c DAR: c008000008f24a3c DSISR: 42000000 IRQMASK: 1 |
| |
| This results in the kind of issue reported here: |
| https://lore.kernel.org/linuxppc-dev/15AC5B0E-A221-4B8C-9039-FA96B8EF7C88@lca.pw/ |
| |
| Chris Riedl suggested a reliable way to reproduce the issue: |
| $ mount -t debugfs none /sys/kernel/debug |
| $ (while true; do echo function > /sys/kernel/debug/tracing/current_tracer ; echo nop > /sys/kernel/debug/tracing/current_tracer ; done) & |
| |
| Turning ftrace on and off does a large amount of code patching which |
| in usually less then 5min will crash giving a trace like: |
| |
| ftrace-powerpc: (____ptrval____): replaced (4b473b11) != old (60000000) |
| ------------[ ftrace bug ]------------ |
| ftrace failed to modify |
| [<c000000000bf8e5c>] napi_busy_loop+0xc/0x390 |
| actual: 11:3b:47:4b |
| Setting ftrace call site to call ftrace function |
| ftrace record flags: 80000001 |
| (1) |
| expected tramp: c00000000006c96c |
| ------------[ cut here ]------------ |
| WARNING: CPU: 4 PID: 809 at kernel/trace/ftrace.c:2065 ftrace_bug+0x28c/0x2e8 |
| Modules linked in: nop_module(PO-) [last unloaded: nop_module] |
| CPU: 4 PID: 809 Comm: sh Tainted: P O 5.10.0-rc5-01360-gf878ccaf250a #1 |
| NIP: c00000000024f334 LR: c00000000024f330 CTR: c0000000001a5af0 |
| REGS: c000000004c8b760 TRAP: 0700 Tainted: P O (5.10.0-rc5-01360-gf878ccaf250a) |
| MSR: 900000000282b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 28008848 XER: 20040000 |
| CFAR: c0000000001a9c98 IRQMASK: 0 |
| GPR00: c00000000024f330 c000000004c8b9f0 c000000002770600 0000000000000022 |
| GPR04: 00000000ffff7fff c000000004c8b6d0 0000000000000027 c0000007fe9bcdd8 |
| GPR08: 0000000000000023 ffffffffffffffd8 0000000000000027 c000000002613118 |
| GPR12: 0000000000008000 c0000007fffdca00 0000000000000000 0000000000000000 |
| GPR16: 0000000023ec37c5 0000000000000000 0000000000000000 0000000000000008 |
| GPR20: c000000004c8bc90 c0000000027a2d20 c000000004c8bcd0 c000000002612fe8 |
| GPR24: 0000000000000038 0000000000000030 0000000000000028 0000000000000020 |
| GPR28: c000000000ff1b68 c000000000bf8e5c c00000000312f700 c000000000fbb9b0 |
| NIP ftrace_bug+0x28c/0x2e8 |
| LR ftrace_bug+0x288/0x2e8 |
| Call Trace: |
| ftrace_bug+0x288/0x2e8 (unreliable) |
| ftrace_modify_all_code+0x168/0x210 |
| arch_ftrace_update_code+0x18/0x30 |
| ftrace_run_update_code+0x44/0xc0 |
| ftrace_startup+0xf8/0x1c0 |
| register_ftrace_function+0x4c/0xc0 |
| function_trace_init+0x80/0xb0 |
| tracing_set_tracer+0x2a4/0x4f0 |
| tracing_set_trace_write+0xd4/0x130 |
| vfs_write+0xf0/0x330 |
| ksys_write+0x84/0x140 |
| system_call_exception+0x14c/0x230 |
| system_call_common+0xf0/0x27c |
| |
| To fix this when updating kernel memory PTEs using ptesync. |
| |
| Fixes: f1cb8f9beba8 ("powerpc/64s/radix: avoid ptesync after set_pte and ptep_set_access_flags") |
| Signed-off-by: Jordan Niethe <jniethe5@gmail.com> |
| Reviewed-by: Nicholas Piggin <npiggin@gmail.com> |
| [mpe: Tidy up change log slightly] |
| Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> |
| Link: https://lore.kernel.org/r/20210208032957.1232102-1-jniethe5@gmail.com |
| Signed-off-by: Sasha Levin <sashal@kernel.org> |
| --- |
| arch/powerpc/include/asm/book3s/64/radix.h | 6 ++++-- |
| arch/powerpc/mm/book3s64/radix_pgtable.c | 4 ++-- |
| 2 files changed, 6 insertions(+), 4 deletions(-) |
| |
| diff --git a/arch/powerpc/include/asm/book3s/64/radix.h b/arch/powerpc/include/asm/book3s/64/radix.h |
| index c7813dc628fc..59cab558e2f0 100644 |
| --- a/arch/powerpc/include/asm/book3s/64/radix.h |
| +++ b/arch/powerpc/include/asm/book3s/64/radix.h |
| @@ -222,8 +222,10 @@ static inline void radix__set_pte_at(struct mm_struct *mm, unsigned long addr, |
| * from ptesync, it should probably go into update_mmu_cache, rather |
| * than set_pte_at (which is used to set ptes unrelated to faults). |
| * |
| - * Spurious faults to vmalloc region are not tolerated, so there is |
| - * a ptesync in flush_cache_vmap. |
| + * Spurious faults from the kernel memory are not tolerated, so there |
| + * is a ptesync in flush_cache_vmap, and __map_kernel_page() follows |
| + * the pte update sequence from ISA Book III 6.10 Translation Table |
| + * Update Synchronization Requirements. |
| */ |
| } |
| |
| diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c b/arch/powerpc/mm/book3s64/radix_pgtable.c |
| index 98f0b243c1ab..39d488a212a0 100644 |
| --- a/arch/powerpc/mm/book3s64/radix_pgtable.c |
| +++ b/arch/powerpc/mm/book3s64/radix_pgtable.c |
| @@ -108,7 +108,7 @@ static int early_map_kernel_page(unsigned long ea, unsigned long pa, |
| |
| set_the_pte: |
| set_pte_at(&init_mm, ea, ptep, pfn_pte(pfn, flags)); |
| - smp_wmb(); |
| + asm volatile("ptesync": : :"memory"); |
| return 0; |
| } |
| |
| @@ -168,7 +168,7 @@ static int __map_kernel_page(unsigned long ea, unsigned long pa, |
| |
| set_the_pte: |
| set_pte_at(&init_mm, ea, ptep, pfn_pte(pfn, flags)); |
| - smp_wmb(); |
| + asm volatile("ptesync": : :"memory"); |
| return 0; |
| } |
| |
| -- |
| 2.30.2 |
| |