To those of us who have read the literature on AMII, it is noticable that the research papers on AMII who have incorporated the use of MFCCs, delta coefficients and delta-delta coefficients, have failed to include any explanation as to how to actually compute the delta and delta-delta coefficients. This was a hurdle I needed to overcome myself, and to my dismay, I've discovered that my own calculations of these coefficients has been erroneous. Information on the maths behind their calculations was not easy to come by. My rigorous searches were not in vain, however. The following paper offers some insight:
- Mason, J.S., Zhang, X. 1991 - Velocity and acceleration features in speaker recognition
From this paper:
The, word 'dynamic' is sometimes used synonymously with first order analysis (features from which are often given the prefix 'delta'), and it should be emphasised that here we adopt the more general usage, encompassing under the term dynamic anything that is 'non-static', ie velocity, acceleration and higher orders.So what are 'Dynamic features':
Dynamic features can be derived from either temporal differencing or from regression analysis. In the first case the dynamic feature is derived by simply subtracting static features separated by a suitable time span, with an iterative process for higher orders. In the second case a polynomial fit is applied to the static series. Both involve a window moving along the time course, and it is the importance of choosing an appropriate window size which is demonstrated here.So that explains the use of the window in the code which I used as my reference, this being the implementation provided by Dan Ellis in Rastamat.
No comments:
Post a Comment