sbc: faster 'sbc_calculate_bits' function

By using SBC_ALWAYS_INLINE trick, the implementation of 'sbc_calculate_bits'
function is split into two branches, each having 'subband' variable value
known at compile time. It helps the compiler to generate more optimal code
by saving at least one extra register, and also provides more obvious
opportunities for loops unrolling.

Benchmarked on ARM Cortex-A8:

== Before: ==

$ time ./sbcenc -b53 -s8 -j test.au > /dev/null

real    0m3.989s
user    0m3.602s
sys     0m0.391s

samples  %        image name               symbol name
26057    32.6128  sbcenc                   sbc_pack_frame
20003    25.0357  sbcenc                   sbc_analyze_4b_8s_neon
14220    17.7977  sbcenc                   sbc_calculate_bits
8498     10.6361  no-vmlinux               /no-vmlinux
5300      6.6335  sbcenc                   sbc_calc_scalefactors_j_neon
3235      4.0489  sbcenc                   sbc_enc_process_input_8s_be_neon
2172      2.7185  sbcenc                   sbc_encode

== After: ==

$ time ./sbcenc -b53 -s8 -j test.au > /dev/null

real    0m3.652s
user    0m3.195s
sys     0m0.445s

samples  %        image name               symbol name
26207    36.0095  sbcenc                   sbc_pack_frame
19820    27.2335  sbcenc                   sbc_analyze_4b_8s_neon
8629     11.8566  no-vmlinux               /no-vmlinux
6988      9.6018  sbcenc                   sbc_calculate_bits
5094      6.9994  sbcenc                   sbc_calc_scalefactors_j_neon
3351      4.6044  sbcenc                   sbc_enc_process_input_8s_be_neon
2182      2.9982  sbcenc                   sbc_encode
1 file changed