For me this has nothing to do with the fact you convert from PAL to NTSC.
I mean even if you had encoding directly in PAL, you would have had the same problem.
(synch problems due to PAL->NTSC are generally sliding delay, not offset delay).
You probably face to some strange behaviour we already noticed on some DVD : the audio track is shorter than the video track and if you encode and mux directly the audio is sometimes SEVERAL SECONDS away !
(see there :
http://www.kvcd.net/forum/viewtopic.php?p=87437 )
I don't remember somone has an explanation on this and how to detect it before to encode.