File Size Prediction Formula - Page 2 - digitalFAQ.com Forums [Archives]
 digitalFAQ.com Forums [Archives] File Size Prediction Formula

#21
01-08-2003, 12:07 PM
 ARnet_tenRA Free Member Join Date: Jan 2003 Location: Illinois, USA Posts: 73 Thanks: 0 Thanked 0 Times in 0 Posts
I was wondering how accurate the prediction is for people that sample the framerate rather than the GOP rate of 24. If it is negligible then we could switch to the simpler formula. Of course this is most important for PAL,25 or NTSC,29.97 fps.

Let me know.

-tenra
Someday, 12:01 PM
 Site Staff / Ad Manager Join Date: Dec 2002 Posts: 42 Thanks: ∞ Thanked 42 Times in 42 Posts
#22
01-08-2003, 01:22 PM
 SansGrip Free Member Join Date: Nov 2002 Location: Ontario, Canada Posts: 1,135 Thanks: 0 Thanked 0 Times in 0 Posts
Quote:
 Originally Posted by ARnet_tenRA I was wondering how accurate the prediction is for people that sample the framerate rather than the GOP rate of 24.
It's most accurate when the sample length matches the GOP length. We ran both formulas (along with about a dozen others) already .
#23
01-09-2003, 07:06 AM
 girv Free Member Join Date: Sep 2002 Posts: 108 Thanks: 0 Thanked 0 Times in 0 Posts
Quote:
 If you sample 1 second for every minute of your video (ie. 25 samples for 25fps or 30 samples for 29.97fps) then the formula for all framerates would be: Predicted Size = 60 * sample size
Thats what I said Sm = (Tm/Ts) * Ss. If you adjust this equation
for taking one GOP-length sample for every 60 seconds of movie you get:
• Sm = 60 * (FR/Lg) * Ss
Sm = movie file size
FR = frame rate in frames per second
Lg = GOP Length in frames
Ss = sample file size
If you plug in the numbers for each frame rate you get the
constants ARnet_tenRA posted (59.94, 62.5, 74.925).

More generally this becomes:
• Sm = (Tm/Cs) * (FR/Ls) * Ss
Sm = movie file size
Tm = movie length in seconds
Cs = number of samples taken
FR = frame rate in frames per second
Ls = length of each sample in frames
Ss = sample file size
#24
01-09-2003, 07:14 AM
 girv Free Member Join Date: Sep 2002 Posts: 108 Thanks: 0 Thanked 0 Times in 0 Posts
Quote:
 Originally Posted by kwag Just (800 - audio size) / 60 will give you predicted sample size
Why 800? ISTR something about VCD's being written in "MODE2" (or
something) so you could fit more data on the disk, but does this mean
that I can fit an 800Mb .mpg file on to a 700Mb disk if its used as a
VCD / SVCD?!

I've been using an upper limit of 700Mb until now, but an extra 100Mb
would be very nice

/girv
#25
01-09-2003, 07:31 AM
 girv Free Member Join Date: Sep 2002 Posts: 108 Thanks: 0 Thanked 0 Times in 0 Posts
Quote:
 Originally Posted by SansGrip It's most accurate when the sample length matches the GOP length. We ran both formulas (along with about a dozen others) already .
Just a thought, but I wonder if the prediction would be even more
accurate if sample strips were started on frame numbers that were
multiples of GOP length instead of multiples of one second?
eg: sample 24 frames every 24*60 frames instead of every
framerate*60.

Im thinking that this way you would be creating the sample with
GOPs that would actually be in the final encode. Mad?
#26
01-09-2003, 08:39 AM
 SansGrip Free Member Join Date: Nov 2002 Location: Ontario, Canada Posts: 1,135 Thanks: 0 Thanked 0 Times in 0 Posts
Quote:
 Originally Posted by girv I've been using an upper limit of 700Mb until now, but an extra 100Mb would be very nice
Then today's your lucky day, because you can indeed get a maximum of 800mb on a VCD. To be exact, 813,019,155 bytes .
#27
01-09-2003, 08:41 AM
 SansGrip Free Member Join Date: Nov 2002 Location: Ontario, Canada Posts: 1,135 Thanks: 0 Thanked 0 Times in 0 Posts
Quote:
 Originally Posted by girv Just a thought, but I wonder if the prediction would be even more accurate if sample strips were started on frame numbers that were multiples of GOP length instead of multiples of one second?
They should always be multiples of GOP length. If your frame rate is not 23.976, you should specify the sample length, i.e. Sampler(length=24). If your frame rate is 23.976 you can just use Sampler() since the rounded frame rate happens to match the GOP length.
#28
01-09-2003, 10:45 AM
 girv Free Member Join Date: Sep 2002 Posts: 108 Thanks: 0 Thanked 0 Times in 0 Posts
Quote:
 Originally Posted by SansGrip They should always be multiples of GOP length. If your frame rate is not 23.976, you should specify the sample length, i.e. Sampler(length=24).
I was referring to the start frame of the sample strip not the number
of frames in it, which as you say should be equal to the GOP length.

e.g.: if your frame rate is 25fps then Sampler(length=24) will take
sample strips starting at frame 0, 1500 (25*60*1), 3000 (25*60*2),
4500 (25*60*3) ... correct? What I am suggesting is to instead align
the start of the sample strip to the multiple of GOP length closest to
these numbers i.e.: 0, 1488, 3000, 4488 ...

fps: 23.976
current sample start: 0,1437,2877,4316,...
proposed: 0,1440,2880,4320,...

fps: 25
current: 0,1500,3000,4500,...
proposed: 0,1488,3000,4488,...

fps: 29.97
current: 0,1798,3596,5395,...
proposed: 0,1800,3600,5400,...

The differences aren't much (+- 12 frames at most) but I just wondered
if it could give a little extra accuracy
#29
01-09-2003, 10:52 AM
 girv Free Member Join Date: Sep 2002 Posts: 108 Thanks: 0 Thanked 0 Times in 0 Posts
Quote:
 Originally Posted by SansGrip Then today's your lucky day, because you can indeed get a maximum of 800mb on a VCD. To be exact, 813,019,155 bytes .
Sorry to be dumb but if I have a .mpg file on my hard drive that
is 810,000,000 bytes then it can be burned on to a standard 700Mb
CD-R as a VCD? Happy day !

What about SVCD? Is that the same?

What is the overhead for VCD/SVCD ie: how big can a .mpg file
be on my hard drive and still (just) fit on to a 700Mb CD-R ?
#30
01-09-2003, 11:32 AM
 SansGrip Free Member Join Date: Nov 2002 Location: Ontario, Canada Posts: 1,135 Thanks: 0 Thanked 0 Times in 0 Posts
Quote:
 Originally Posted by girv I was referring to the start frame of the sample strip not the number of frames in it, which as you say should be equal to the GOP length.
Ah, I see. It might make it more accurate, yes, but my gut says not a lot. I'll have to modify Sampler slightly and try it .
#31
01-09-2003, 11:36 AM
 SansGrip Free Member Join Date: Nov 2002 Location: Ontario, Canada Posts: 1,135 Thanks: 0 Thanked 0 Times in 0 Posts
Quote:
 Originally Posted by girv What about SVCD? Is that the same?
Roughly, though I believe the SVCD filesystem is slightly different, so will have slightly different overhead.

Quote:
 What is the overhead for VCD/SVCD ie: how big can a .mpg file be on my hard drive and still (just) fit on to a 700Mb CD-R ?
The figure I gave (813,019,155 bytes) is compensated for filesystem overhead and the system stream. In other words, if you subtract from that the size of your audio (which I always encode first), you'll get the maximum number of bytes for your video stream.

If you have a .mpg file then it's already got the system stream in it, so the maximum byte count will be 825,105,664.

(By the way, this is how I always do my prediction:

813,019,155 - audio_bytes = max_video_bytes
max_video_bytes / frames_in_movie = bytes_per_frame
bytes_per_frame * frame_count_with_sampler = sample_bytes

It's almost always accurate within 0.5% or so. It's more involved than the regular formula, but I'm testing it for the next release of KVCDP .)
#32
01-09-2003, 12:39 PM
 ARnet_tenRA Free Member Join Date: Jan 2003 Location: Illinois, USA Posts: 73 Thanks: 0 Thanked 0 Times in 0 Posts
Quote:
 Originally Posted by girv e.g.: if your frame rate is 25fps then Sampler(length=24) will take sample strips starting at frame 0, 1500 (25*60*1), 3000 (25*60*2), 4500 (25*60*3) ... correct? What I am suggesting is to instead align the start of the sample strip to the multiple of GOP length closest to these numbers i.e.: 0, 1488, 3000, 4488 ...
How about this, always sample (24*60*n) as the starting frame for each sample. Don't worry about the frame rate at all. ie. 0, 1440, 2880, 4320, . . . When you do it this way all sample MPEGs produced will be exactly 1/60th of the final movie.

fps: 23.976
current sample start: 0,1437,2877,4316,...
proposed: 0,1440,2880,4320,...

fps: 25
current: 0,1500,3000,4500,...
proposed: 0,1440,2880,4320,...

fps: 29.97
current: 0,1798,3596,5395,...
proposed: 0,1440,2880,4320,...

This will have the benefit of aligning with the GOP like girv suggested and having the simplest formula no matter the length of movie or framerate:
Predicted Size = 60 * sample size

-ARnet_tenRA
#33
01-09-2003, 02:02 PM
 SansGrip Free Member Join Date: Nov 2002 Location: Ontario, Canada Posts: 1,135 Thanks: 0 Thanked 0 Times in 0 Posts
Quote:
 Originally Posted by ARnet_tenRA When you do it this way all sample MPEGs produced will be exactly 1/60th of the final movie.
The main requirement of the formula is to be accurate, not necessarily simple . A great deal of testing indicates that the most accurate formula for all kinds of sources and all resolutions is minutes-in-movie samples, each max-gop-size frames in length. That said, we should obviously test this method out against the current one .

Sampler uses a pretty simple algorithm to decide which frames to select. The curious can take a look at the source code here.

Quote:
 This will have the benefit of aligning with the GOP like girv suggested
I'm not sure that aligning with the GOP will make things any more accurate, but I can certainly build a test version of Sampler with that modification and try it out.
#34
01-09-2003, 02:04 PM
 SansGrip Free Member Join Date: Nov 2002 Location: Ontario, Canada Posts: 1,135 Thanks: 0 Thanked 0 Times in 0 Posts
Quote:
 Originally Posted by ARnet_tenRA Don't worry about the frame rate at all.
The frame rate is only used by Sampler as a default sample length if none is specified -- it's not a part of the formula. I figured that movie-length-in-minutes samples of one second each would be a good generic default, and happened to correspond to the current file prediction formula for KVCD.
#35
01-10-2003, 09:03 AM
 ARnet_tenRA Free Member Join Date: Jan 2003 Location: Illinois, USA Posts: 73 Thanks: 0 Thanked 0 Times in 0 Posts
Hi all,

I just ran a test last night using my suggested formula, and I got some pretty exciting results. .0001% accuracy!!!

AVISynth script:
Code:
```LoadPlugin("MPEG2DEC.dll")

mpeg2source("sample.d2v")

SelectRangeEvery(1440,24)```
• Sample file size = 12,091,021 bytes
Predicted movie size = 60 * 12,091,021 = 725,461,260 bytes
Actual final Movie size = 725,551,421 bytes
Error = 90,161 bytes
% error = 0.0001
Based on these results I would like to try some more movies and see what I get.
#36
01-10-2003, 10:04 AM
 SansGrip Free Member Join Date: Nov 2002 Location: Ontario, Canada Posts: 1,135 Thanks: 0 Thanked 0 Times in 0 Posts
Quote:
 Originally Posted by ARnet_tenRA SelectRangeEvery(1440,24)
Note that SelectRangeEvery doesn't seem to be quite accurate. The old prediction method, which used SelectRangeEvery, produced sample strips that had a frame count fairly significantly different from what should have been produced.

That's one of the reasons I wrote Sampler .
#37
01-10-2003, 11:04 AM
 ARnet_tenRA Free Member Join Date: Jan 2003 Location: Illinois, USA Posts: 73 Thanks: 0 Thanked 0 Times in 0 Posts
Quote:
 Originally Posted by SansGrip Note that SelectRangeEvery doesn't seem to be quite accurate. The old prediction method, which used SelectRangeEvery, produced sample strips that had a frame count fairly significantly different from what should have been produced. That's one of the reasons I wrote Sampler .
I knew that you could get up to one sample too many, but I was unaware of any more offset than that. Let me know it this is not the case.

I used SelectRangeEvery because I knew exactly where the frames would be captured from in the video (every 1440 frames). Maybe this is not the case.

Anyways, let me know if sampling 24 frames every (24*60*n) frames gives you as accurate results as I got. Whether you use Sampler or SelectRangeEvery.

-ARnet_tenRA
#38
01-10-2003, 05:18 PM
 SansGrip Free Member Join Date: Nov 2002 Location: Ontario, Canada Posts: 1,135 Thanks: 0 Thanked 0 Times in 0 Posts
Quote:
 Originally Posted by ARnet_tenRA I knew that you could get up to one sample too many, but I was unaware of any more offset than that. Let me know it this is not the case.
Using the old prediction method I would get mismatches of a dozen or two frames. Not normally a big deal, but it is for our purposes.

Quote:
 Anyways, let me know if sampling 24 frames every (24*60*n) frames gives you as accurate results as I got.
I will try that out once GripFit is released. I don't want to get distracted by anything right now so I can get it out as quickly as possible .
#39
01-10-2003, 05:21 PM
 kwag Free Member Join Date: Apr 2002 Location: Puerto Rico, USA Posts: 13,537 Thanks: 0 Thanked 0 Times in 0 Posts
AHA!, will that be the official name, GripFit

-kwag
#40
01-10-2003, 05:23 PM
 SansGrip Free Member Join Date: Nov 2002 Location: Ontario, Canada Posts: 1,135 Thanks: 0 Thanked 0 Times in 0 Posts
Quote:
 Originally Posted by kwag AHA!, will that be the official name, GripFit
I haven't got round to deciding yet .

 Similar Threads Thread Thread Starter Forum Replies Last Post genK Avisynth Scripting 3 05-24-2003 07:21 AM Jellygoose Avisynth Scripting 8 01-01-2003 09:18 PM Paul0889 Avisynth Scripting 2 12-21-2002 01:03 AM Jellygoose Video Encoding and Conversion 3 12-17-2002 10:07 AM akrein62 Video Encoding and Conversion 0 11-15-2002 10:16 PM