“Do cameras capture information in a linear space, even when capturing over 14+ stops of dynamic range?
In a 14-bit ADC camera, the brightest stop is represented by 8192 code values (16383-8192), the
next brighest is represented by 4096 code values (8191-4096), and so on and so forth. The
darkest stop (-13 below) is only represented by 2 values (1 or 0).
That’s not a lot of information to work with. If logarithmic coding is understood to mean that
each stop gets an equal number of values, aren’t the camera processors (FPGA/ASIC) merely
interpolating data like crazy in the low end?”
Short answer, yes and no.
Yes, most camera sensors do initially capture (quantize) information linearly in a digital space however, typically with an offset, assigning more than one code value for the darkest stop as typically the linear spaces bit-depth is greater than that of the reconstructed image and some cameras perform multiple samples at different gain levels (Dual Gain Architecture, Dual Gain Output).
No cameras do not typically interpolate a substantial amount of information at the bottom end of the signal and most cameras ‘log’ containers aren’t truly logarithmic. They have a bias towards the bottom end of the signal storing information linearly.
Here’s how and why,
For ease, I’ll break apart the ARRI Alexa – while notably, the camera has famed DGA, several high-end cameras employ similar principles just with a different scheme/circuitry. The ARRI’s Log-C ‘logarithmic’ container is akin to ones used by Sony, RED and other high-end manufacturers. While each one differs in the total latitude each container can hold the principle is the same and the differences are somewhat negligible in this context.
To quote SMPTE RDD30:2014 on ARRIRAW Image File Structure and Interpretation Supporting Deferred Demosaicing to a Logarithmic Encoding – “Photosite values are radiometrically linear representations of the energy they receive, represented as 16-bit unsigned integers, incorporating an offset such that a photosite receiving no energy would have a 16-bit unsigned integer value of 256.”
In this document, they also have the equation that encodes a 16-bit unsigned integer photo site value (which is a linear representation of the energy the photosite receives) to an encoded 12-bit integer value in its Log-C space. Where 𝑣_𝑖 is the encoded 12-bit integer and 𝑣_𝑝 is the unsigned 16-bit linear integer. This will allow a better understanding of how a camera encodes to partially logarithmic ‘container’/values. The equation is as follows –
v_i=\begin{cases} v_p,\ v_p < 1024\\ 512\times q+(v_p\gg q)v_p\geq 1024\end{cases}
Where q is the integer part of the difference \log_2{\left(v_p\right)-9}
The equation above is vague at best and cryptographic. An attempted understanding of the above definition follows. Beforehand, I should note two things – firstly, \gg is a right shift register. x\gg y (example) is a shift register by y places of the binary expression of x. Secondly, the above equation is, from my understanding, an RHS expression dictating an if statement with two different expressions. If a value is under 1024 its value stays the same. A value the same as or over 1024 is ‘compressed’ utilising the bottom expression. This begins to showcase how every logarithmic container isn’t truly ‘logarithmic’ – it doesn’t necessarily compress the bottom part of the curve. As you said above the bottom values are small. With this, there’s no need for interpolating information outside of the initial linear values they are assigned during AD conversion as they are so small and doing so would be redundant.
Defining q bellow
q=\log_2{\left(v_p\right)-9\varepsilon} where 0\le\varepsilon<1
Consequentially 2^q=2^{\log_2}{\left(v_p\right)}\ 2^{-9}2^\varepsilon
Equivalent to (1) 2^q=\frac{v_p}{512}K where 1\le K< 2
With the above stated the bottom can begin to be expressed as (2) v_i={512}_q\frac{v_p}{2^q}
Plugging (1) into (2):
v_i=512q+\frac{v_p}{2^q}
v_i = 512\left(q+K^\prime\right) where 0.5<K^\prime :=\frac{1}{k}\le1
v_i=512(\log_2{\left(v_p\right)-9+\varepsilon+K^\prime)} where \begin{cases} 0.5<K^\prime\le1\\0\le\varepsilon<1\end{cases}
One cannot discern the true value of, sum of K^\prime + \varepsilon one can only determine it’s between 0 and 1.5 a bracket of
512\left(\log_2{(v_p)}-9\right)<\ v_i<512(\log_2{\left(v_p\right)-7.5)}
However, for showcasing how it interprets linear values it’s not entirely necessary. For ease, I wrote a simple function in python to help showcase the above’s interpretation of linear values for the following points. The following python function calculates and shows the encoded highest possible value and the lowest possible value as well as the average of the two. The python function is as follows –
import math
def arrirawdecode(vp):
vp = vp
if vp < 1024:
vp = vp
print(vp)
if vp >= 1024:
vigreater = 512 * (math.log(vp,2) – 9)
viless = 512 * (math.log(vp,2) – 7.5)
viaprox = viless – vigreater
viaprox1 = vigreater + viaprox/2
print(viaprox1,”less than”,viless,”, more than”,vigreater)
260 |
260 |
264 |
264 |
272 |
272 |
288 |
288 |
320 |
320 |
384 |
384 |
512 |
512 |
768 |
768 |
1280 |
1060 |
2304 |
1495 |
4352 |
1964 |
8448 |
2454 |
16640 |
2955 |
33024 |
3461 |
65792 |
3970 |
Note how the first 8+ values are encoded the same as it’s linear value as note the top set ‘if v_p<1024, v_p=v_i’. If the value is below 1024 it will remain the same, preserving the bottom end of the signal. As you say above, there’s no need to interpolate information from these values and compressing them would be quite detrimental. To plot the above information into a simple graph –
Note the similarities between the graph created from the above values (bellow) and the Log-C chart taken from Nick Shaw (above). Also note how at the bottom end of the signal, the darkest stops have substantially less value than the brightest stops.
The following graph is part of the ‘ARRI Log C Curve Usage in VFX’ resource. Its curve looks impressive with signal distribution and it is. However, I find it a little misleading. So, I’ve plotted the same values as above however, the 16-bit values on the x-axis are now linear instead of logarithmic (how the ARRI graph is set out). I’ve also put a red marker where 18% grey (middle grey) lies showcasing the theoretical ‘mid point’ of the signal.
I’ll emphasise again, the numbers are near approximate. However, with ARRI advertising a capture of 14+ stops and the above values showcasing storage of information of just under 15 stops. Theoretically, the bottom 7 stops of camera latitude at 800 ASA would have code values of 4, 8, 16 etc. Not 1, 2 etc, which allows a shift for more code values per stop at the lower end.
260 |
260 |
264 |
264 |
272 |
272 |
Note; I believe most ACEs IDTs are written in python and are open source. So a more accurate algorithm is available to do further tests. This is just simply to showcase the distribution of values.
Here are a series of exponential values taking into account the 256 value offset (8-bits) which would be the camera’s apparent noise floor (dark current noise, reset noise, thermal noise, S/H noise, amplifier thermal noise etc), which is set from my understanding during ‘black balancing’, ’sensor calibration’ however, that’s just a guess. (Left column 16-bit linear values, right column 12-bit approx Log-C encoded values)
Which is interesting.
So, a camera does initially capture information linearly, it then holds the majority of the bottom end of the signal linearly and compresses the top end logarithmically so you’re not writing a value of 32768 for ever pixel for your ‘brightest’ stop.
Thanks
Gabs