File Size Prediction Formula - Page 2

ARnet_tenRA · #21 01-08-2003, 12:07 PM

I was wondering how accurate the prediction is for people that sample the framerate rather than the GOP rate of 24. If it is negligible then we could switch to the simpler formula. Of course this is most important for PAL,25 or NTSC,29.97 fps.

Let me know.

-tenra

SansGrip · #22 01-08-2003, 01:22 PM

Quote:

Originally Posted by ARnet_tenRA

I was wondering how accurate the prediction is for people that sample the framerate rather than the GOP rate of 24.

It's most accurate when the sample length matches the GOP length. We ran both formulas (along with about a dozen others) already

.

girv · #23 01-09-2003, 07:06 AM

Quote:

If you sample 1 second for every minute of your video (ie. 25
samples for 25fps or 30 samples for 29.97fps) then the formula for all
framerates would be:
Predicted Size = 60 * sample size

Thats what I said

Sm = (Tm/Ts) * Ss. If you adjust this equation
for taking one GOP-length sample for every 60 seconds of movie you get:

Sm = 60 * (FR/Lg) * Ss
Sm = movie file size
FR = frame rate in frames per second
Lg = GOP Length in frames
Ss = sample file size

If you plug in the numbers for each frame rate you get the
constants ARnet_tenRA posted (59.94, 62.5, 74.925).

More generally this becomes:

Sm = (Tm/Cs) * (FR/Ls) * Ss
Sm = movie file size
Tm = movie length in seconds
Cs = number of samples taken
FR = frame rate in frames per second
Ls = length of each sample in frames
Ss = sample file size

girv · #24 01-09-2003, 07:14 AM

Quote:

Originally Posted by kwag

Just (800 - audio size) / 60 will give you predicted sample size

Why 800? ISTR something about VCD's being written in "MODE2" (or
something) so you could fit more data on the disk, but does this mean
that I can fit an 800Mb .mpg file on to a 700Mb disk if its used as a
VCD / SVCD?!

I've been using an upper limit of 700Mb until now, but an extra 100Mb
would be very nice

/girv

girv · #25 01-09-2003, 07:31 AM

Quote:

Originally Posted by SansGrip

It's most accurate when the sample length matches the GOP length. We ran both formulas (along with about a dozen others) already

.

Just a thought, but I wonder if the prediction would be even more
accurate if sample strips were started on frame numbers that were
multiples of GOP length instead of multiples of one second?
eg: sample 24 frames every 24*60 frames instead of every
framerate*60.

Im thinking that this way you would be creating the sample with
GOPs that would actually be in the final encode. Mad?

SansGrip · #26 01-09-2003, 08:39 AM

Quote:

Originally Posted by girv

I've been using an upper limit of 700Mb until now, but an extra 100Mb would be very nice

Then today's your lucky day, because you can indeed get a maximum of 800mb on a VCD. To be exact, 813,019,155 bytes

.

SansGrip · #27 01-09-2003, 08:41 AM

Quote:

Originally Posted by girv

Just a thought, but I wonder if the prediction would be even more accurate if sample strips were started on frame numbers that were
multiples of GOP length instead of multiples of one second?

They should always be multiples of GOP length. If your frame rate is not 23.976, you should specify the sample length, i.e. Sampler(length=24). If your frame rate is 23.976 you can just use Sampler() since the rounded frame rate happens to match the GOP length.

girv · #28 01-09-2003, 10:45 AM

Quote:

Originally Posted by SansGrip

They should always be multiples of GOP length. If your frame rate is not 23.976, you should specify the sample length, i.e. Sampler(length=24).

I was referring to the start frame of the sample strip not the number
of frames in it, which as you say should be equal to the GOP length.

e.g.: if your frame rate is 25fps then Sampler(length=24) will take
sample strips starting at frame 0, 1500 (25*60*1), 3000 (25*60*2),
4500 (25*60*3) ... correct? What I am suggesting is to instead align
the start of the sample strip to the multiple of GOP length closest to
these numbers i.e.: 0, 1488, 3000, 4488 ...

fps: 23.976
current sample start: 0,1437,2877,4316,...
proposed: 0,1440,2880,4320,...

fps: 25
current: 0,1500,3000,4500,...
proposed: 0,1488,3000,4488,...

fps: 29.97
current: 0,1798,3596,5395,...
proposed: 0,1800,3600,5400,...

The differences aren't much (+- 12 frames at most) but I just wondered
if it could give a little extra accuracy

girv · #29 01-09-2003, 10:52 AM

Quote:

Originally Posted by SansGrip

Then today's your lucky day, because you can indeed get a maximum of 800mb on a VCD. To be exact, 813,019,155 bytes

.

Sorry to be dumb

but if I have a .mpg file on my hard drive that
is 810,000,000 bytes then it can be burned on to a standard 700Mb
CD-R as a VCD? Happy day

!

What about SVCD? Is that the same?

What is the overhead for VCD/SVCD ie: how big can a .mpg file
be on my hard drive and still (just) fit on to a 700Mb CD-R ?

SansGrip · #30 01-09-2003, 11:32 AM

Quote:

Originally Posted by girv

I was referring to the start frame of the sample strip not the number of frames in it, which as you say should be equal to the GOP length.

Ah, I see. It might make it more accurate, yes, but my gut says not a lot. I'll have to modify Sampler slightly and try it

.

SansGrip · #31 01-09-2003, 11:36 AM

Quote:

Originally Posted by girv

What about SVCD? Is that the same?

Roughly, though I believe the SVCD filesystem is slightly different, so will have slightly different overhead.

Quote:

What is the overhead for VCD/SVCD ie: how big can a .mpg file
be on my hard drive and still (just) fit on to a 700Mb CD-R ?

The figure I gave (813,019,155 bytes) is compensated for filesystem overhead and the system stream. In other words, if you subtract from that the size of your audio (which I always encode first), you'll get the maximum number of bytes for your video stream.

If you have a .mpg file then it's already got the system stream in it, so the maximum byte count will be 825,105,664.

(By the way, this is how I always do my prediction:

813,019,155 - audio_bytes = max_video_bytes
max_video_bytes / frames_in_movie = bytes_per_frame
bytes_per_frame * frame_count_with_sampler = sample_bytes

It's almost always accurate within 0.5% or so. It's more involved than the regular formula, but I'm testing it for the next release of KVCDP

.)

ARnet_tenRA · #32 01-09-2003, 12:39 PM

Quote:

Originally Posted by girv

e.g.: if your frame rate is 25fps then Sampler(length=24) will take
sample strips starting at frame 0, 1500 (25*60*1), 3000 (25*60*2),
4500 (25*60*3) ... correct? What I am suggesting is to instead align
the start of the sample strip to the multiple of GOP length closest to
these numbers i.e.: 0, 1488, 3000, 4488 ...

How about this, always sample (24*60*n) as the starting frame for each sample. Don't worry about the frame rate at all. ie. 0, 1440, 2880, 4320, . . . When you do it this way all sample MPEGs produced will be exactly 1/60th of the final movie.

fps: 23.976
current sample start: 0,1437,2877,4316,...
proposed: 0,1440,2880,4320,...

fps: 25
current: 0,1500,3000,4500,...
proposed: 0,1440,2880,4320,...

fps: 29.97
current: 0,1798,3596,5395,...
proposed: 0,1440,2880,4320,...

This will have the benefit of aligning with the GOP like girv suggested and having the simplest formula no matter the length of movie or framerate:
Predicted Size = 60 * sample size

-ARnet_tenRA

SansGrip · #33 01-09-2003, 02:02 PM

Quote:

Originally Posted by ARnet_tenRA

When you do it this way all sample MPEGs produced will be exactly 1/60th of the final movie.

The main requirement of the formula is to be accurate, not necessarily simple

. A great deal of testing indicates that the most accurate formula for all kinds of sources and all resolutions is minutes-in-movie samples, each max-gop-size frames in length. That said, we should obviously test this method out against the current one

.

Sampler uses a pretty simple algorithm to decide which frames to select. The curious can take a look at the source code here.

Quote:

This will have the benefit of aligning with the GOP like girv suggested

I'm not sure that aligning with the GOP will make things any more accurate, but I can certainly build a test version of Sampler with that modification and try it out.

SansGrip · #34 01-09-2003, 02:04 PM

Quote:

Originally Posted by ARnet_tenRA

Don't worry about the frame rate at all.

The frame rate is only used by Sampler as a default sample length if none is specified -- it's not a part of the formula. I figured that movie-length-in-minutes samples of one second each would be a good generic default, and happened to correspond to the current file prediction formula for KVCD.

ARnet_tenRA · #35 01-10-2003, 09:03 AM

Hi all,

I just ran a test last night using my suggested formula, and I got some pretty exciting results. .0001% accuracy!!!

AVISynth script:

Code:

LoadPlugin("MPEG2DEC.dll")

mpeg2source("sample.d2v")

SelectRangeEvery(1440,24)

Sample file size = 12,091,021 bytes
Predicted movie size = 60 * 12,091,021 = 725,461,260 bytes
Actual final Movie size = 725,551,421 bytes
Error = 90,161 bytes
% error = 0.0001

Based on these results I would like to try some more movies and see what I get.

SansGrip · #36 01-10-2003, 10:04 AM

Quote:

Originally Posted by ARnet_tenRA

SelectRangeEvery(1440,24)

Note that SelectRangeEvery doesn't seem to be quite accurate. The old prediction method, which used SelectRangeEvery, produced sample strips that had a frame count fairly significantly different from what should have been produced.

That's one of the reasons I wrote Sampler

.

ARnet_tenRA · #37 01-10-2003, 11:04 AM

Quote:

Originally Posted by SansGrip

Note that SelectRangeEvery doesn't seem to be quite accurate. The old prediction method, which used SelectRangeEvery, produced sample strips that had a frame count fairly significantly different from what should have been produced.

That's one of the reasons I wrote Sampler

.

I knew that you could get up to one sample too many, but I was unaware of any more offset than that. Let me know it this is not the case.

I used SelectRangeEvery because I knew exactly where the frames would be captured from in the video (every 1440 frames). Maybe this is not the case.

Anyways, let me know if sampling 24 frames every (24*60*n) frames gives you as accurate results as I got. Whether you use Sampler or SelectRangeEvery.

-ARnet_tenRA