I was just looking over this discussion thread. I'm also realizing that there is a fairly significant effort that has already gone into this, and a lot of thought as well, especially since the SFZ format contains major parts of these ideas.
To me, the best 'master sample format' would not contain any samples at all, but would only contain a fourier series description of the samples in the .wav, .au, .gig, file (along with an exponential attack/sustain/decay for each frequency). Since the original sample file is of a finite length, the number of samples could easily be the base fourier series period or wavelength. A new format that concentrated on a mathmatical description of the samples rather than keeping all the samples would be the ultimate in compression as well.
The other way to obtain even more compression is to limit the samples to a standard limited set of frequencies (maybe at 1-5 hz boundaries?) rather than the unlimited set of frequencies that are available now). If all frequencies that were discovered in the FFT were moved from the infinite source of available frequencies to the nearest one (but still well within the limits of hearing), it would limit the number of different solutions that all sounded identical. The downside of using this method is that some info is lost and it wouldn't be possible to re-convert the sample set back to the original file.
There are many more elements to discuss, and the SFZ format has many of them (sample rate invariant, sample size invariant, etc) and I am a great believer in working with the things other people have thought of - especially with sound since the math principles has been well considered for over 2000 years. During my first pass at the discussion, I just didn't see in the spec where there was any discussion on a method for converting samples (answers) to a formula (master sample format). It may be there, I may have just missed it.
I'm sure that there is also a need to create a 'difference' set of numbers, so that a simpler formula can be applied, the 'difference' set added to a description formula and the original file reproduced.
I hope this kicks off some other good discussion about what other metadata is necessary to capture and where it should go (midi note or its equivalent mapping to a sound, how to deal with velocity layers, what should be hardcoded "integer" functions and what should be pointers, etc).