arm64: ensure completion of TLB invalidatation

Currently there is no dsb between the tlbi in __cpu_setup and the write
to SCTLR_EL1 which enables the MMU in __turn_mmu_on. This means that the
TLB invalidation is not guaranteed to have completed at the point
address translation is enabled, leading to a number of possible issues
including incorrect translations and TLB conflict faults.

This patch moves the tlbi in __cpu_setup above an existing dsb used to
synchronise I-cache invalidation, ensuring that the TLBs have been
invalidated at the point the MMU is enabled.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
index 421b99f..0f7fec5 100644
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -111,12 +111,12 @@
 	bl	__flush_dcache_all
 	mov	lr, x28
 	ic	iallu				// I+BTB cache invalidate
+	tlbi	vmalle1is			// invalidate I + D TLBs
 	dsb	sy
 
 	mov	x0, #3 << 20
 	msr	cpacr_el1, x0			// Enable FP/ASIMD
 	msr	mdscr_el1, xzr			// Reset mdscr_el1
-	tlbi	vmalle1is			// invalidate I + D TLBs
 	/*
 	 * Memory region attributes for LPAE:
 	 *