What I have done so far:
Created a framework for attaching hooks into system calls, allowing me to trace where things get called.
Found out where things hook into the PSP's graphics engine.
Reverse engineered some parts of War of the Lion's graphics engine.
Unstretched the screen (in most places).
Figured out the general structure of a single redraw.
Found what is causing the slowdown.
Nullified the slowdown.
What I am working on right now:
Seeing if nullifying the slowdown has ill effects, and if so, what the best way to manage this is.
Discerning why the slowdown exists.
Locating where instructions are written from each frame.
Testing a screen unstretching patch.
Finding more places where the screen gets stretched.
Figuring out why there are PS1 GPU calls in a PSP game.
Figuring out where the halving and thirding are occurring.
What I have NOT done so far:
Figured out why the results of the timing function are having the affect they are.
Figured out the best way to remove the slowdown.
Found out where the GE lists are initialized.
Found out where the PS1 GPU calls are going.
Found out if the graphics engine is actually PS1 GPU emulation.
Found out what's causing any of the slowdown.
Fixed any of the slowdown.
Some miscellaneous observations:
The entirety of the main executable from the JP PS1 version of FFT is present in the file fftpack.bin. I can't find any traces of this file in the game when it's running, though.
The main thread for the game is called "psx_main", entailing that there is emulation going on. Given that the PS-X EXE doesn't appear to be in memory, though, this implies that there either isn't actually emulation, or they're doing JITing/dynarec for some reason. (I'd imagine all you'd need to fix is the memory accesses and offsets, and hardware I/O, but I might be wrong. That can be done easily with dynarec though.)
Very few GPU/ge calls are made per frame, seemingly always 0 (when the screen is blank) to 6. 2-4 seems typical. This implies that most of the rendering is either done all at once by sending a ton of commands to the GPU (which seems to be the case, given the mass of kilobytes of GPU instructions floating around at around 0x90f3c80), or done in CPU (would explain some of the slowdown!).
GPU calls seem to only have about 16 instructions at once (although I haven't confirmed this), meaning that doing all the rendering on the GPU in spurts is even more unlikely. This was a mistake. There are actually tons of instructions passed at a time. The 16 referred to the size of something else.
Frames are usually only drawn once every 1 or 2 vblanks, depending on factors that I am not sure of. For example, the load from memory stick screen is usually 1 redraw every two vblanks, but a lot of other stuff is every vblank.
During casting, the game only redraws every third vblank. I'm still looking into why.
I've found code that talks to the GPU, but modifying it doesn't seem to make the GPU do anything differently. Hurm.... Must investigate further.
There are PS1 GPU calls in the game that actually have an effect. It looks like there might be PS1 GPU emulation here.
The game rewrites all of its graphics instructions (up to 512kB) every frame. This might be related to the slowdown--it might try to do one (or two) frame(s) to write the instructions, one frame to run the instructions.
The slowdown is caused by something going funky in the function that detects timing between frames. Setting a fixed value for the time it tells us removes the slowdown, but I'm not sure why yet.