Msg/FetchSelfModifyingCode: Difference between revisions

Revision as of 07:07, 11 March 2019

Normally, the CPU should execute code that came from an executable file generated by an assembler or compiler. Normally, this code shouldn't be changed at runtime by directly modifying the instructions in memory. A program that (intentionally or unintentionally) modifies its own instructions and then executes them is called self-modifying code. This warning tells you that the instruction being executed came from an executable, and that the instruction is now different that what was in the executable.

There are some cases where intentional self-modifying code is useful. But unintentional self-modifying code is almost certainly incorrect.

Examples

ARMv7

.global _start
_start:
	ldr r0, =0xe2822001		// An opcode
    str r0, [pc, #0]		// Modify instruction at address 0x10. Click "Refresh" to see the new instruction.
	nop
    nop						// This instruction gets changed to an add (opcode 0xe2822001)
	nop

Nios II

.global _start
_start:
	movia r2, 0x18c00044	# An opcode
    stw r2, 0x10(r0)		# Modify instruction at address 0x10. Click "Refresh" to see the new instruction.
	nop
    nop					# This instruction gets changed to an add (opcode 0x18c00044)
	nop

On the simulator, the highlighted instruction executes as an add (r3 gets incremented), and not the original nop, because the opcode was changed in memory before execution. On real hardware, the behaviour is typically undefined (you may execute the old or new instruction) until you've flushed the instruction cache and flushed the instruction pipeline.

Debugging

Unintended self modifying code is notoriously hard to debug. The modified code is not detected until it is next executed, and it may have been modified a very long time ago, so it is often not easy to find the part of the program that actually did the modification. Also, the code currently shown in the disassembly window may not match the code that was originally loaded into memory. The objective is to find which part of the program stored values into the memory space used by your program's instructions.

A common cause of this problem is a memory store somewhere in your program that uses a bad pointer value. This could be simply computing the pointer incorrectly, or overrunning an array bound and overwriting a large swath of memory. Look at the value of the new "opcode" and see if it matches any data pattern that you might have been writing.
Use a data watchpoint (Watchpoints window, located in the same panel as the registers window by default). A watchpoint can interrupt the program whenever a memory load or store occurs to a range of addresses. Use a watchpoint to look for memory writes to the range of addresses used by your code (set the start and end addresses), and set Pause on Write. Then reload and restart your program. The watchpoint will stop your program whenever a store occurs to your selected region (your instructions), which will hopefully get you much closer to finding the root cause of the problem.

Implementation

The simulator keeps the most recently-loaded ELF executable in memory. During every instruction fetch, it compares the fetched opcode to the bytes in the ELF executable, and complains if they are different. This message is generated at the instruction fetch.

If you are really using self-modifying code, you can disable this warning. But make sure you've followed all of the prescribed rules regarding instruction cache flushing and pipeline flushing defined by the architecture. Nearly all architectures (except x86) will produce undefined results (unknown whether old or new instruction are executed) unless instruction cache and pipeline flushing is correctly done. The simulator does not model any of this complex behaviour: the simulator always executes the modified (new) instructions.

Disabling this message

This debugging check can be disabled in the Debugging Checks section of the Settings box: Instruction fetch: Modified opcode.

@@ Line 28: / Line 28: @@
 === Debugging ===
-Unintended self modifying code is notoriously hard to debug. The modified code is not detected until it is next executed, and it may have been modified a very long time ago, so it is often not easy to find the part of the program that actually did the modification. The main objective is to find which part of the program stored values into memory space used by your program's instructions.
+Unintended self modifying code is notoriously hard to debug. The modified code is not detected until it is next executed, and it may have been modified a very long time ago, so it is often not easy to find the part of the program that actually did the modification. Also, the code currently shown in the disassembly window may not match the code that was originally loaded into memory. The objective is to find which part of the program stored values into the memory space used by your program's instructions.
 * A common cause of this problem is a memory store somewhere in your program that uses a bad pointer value. This could be simply computing the pointer incorrectly, or overrunning an array bound and overwriting a large swath of memory. Look at the value of the new "opcode" and see if it matches any data pattern that you might have been writing.