Tuesday, March 24, 2009

MFCCs, delta and delta-delta coefficients exlained very nicely

Jinjin Ye's Master of Science thesis, "Speech Recognition Using Time Domain Features from Phase Space Reconstructions" explains the calculation of the velocity and acceleration MFCC coefficients very simply and concisely. See section 2.1.2 'Common Features'.

Cepstral analysis, regression and the such in speech recognition

On continuing on the magical roller coaster ride that is the discovery of the calculation of delta coefficients, I've come up with these papers which add more insight. These are referenced (not clearly though!) in the Manson paper, I previously mentioned:
My explorers hard hat is on. Flash light in hand. Compass at the ready. Into the dark caverns of calculus I immerse...

Monday, March 23, 2009

Use the Usenet!

I should have thought of this a long time ago.

The usenet groups are a very useful source of information in general. This matlab group should prove very useful indeed.

Delta and Delta-delta coefficients - explained!

So what are delta and delta-delta coefficients?

To those of us who have read the literature on AMII, it is noticable that the research papers on AMII who have incorporated the use of MFCCs, delta coefficients and delta-delta coefficients, have failed to include any explanation as to how to actually compute the delta and delta-delta coefficients. This was a hurdle I needed to overcome myself, and to my dismay, I've discovered that my own calculations of these coefficients has been erroneous. Information on the maths behind their calculations was not easy to come by. My rigorous searches were not in vain, however. The following paper offers some insight:

- Mason, J.S., Zhang, X. 1991 - Velocity and acceleration features in speaker recognition

From this paper:
The, word 'dynamic' is sometimes used synonymously with first order analysis (features from which are often given the prefix 'delta'), and it should be emphasised that here we adopt the more general usage, encompassing under the term dynamic anything that is 'non-static', ie velocity, acceleration and higher orders.
So what are 'Dynamic features':
Dynamic features can be derived from either temporal differencing or from regression analysis. In the first case the dynamic feature is derived by simply subtracting static features separated by a suitable time span, with an iterative process for higher orders. In the second case a polynomial fit is applied to the static series. Both involve a window moving along the time course, and it is the importance of choosing an appropriate window size which is demonstrated here.
So that explains the use of the window in the code which I used as my reference, this being the implementation provided by Dan Ellis in Rastamat.

Gaël Richard visits the ARG

Prof. Gaël Richard of Télécom Paris Tech visited the Audio Research Group last Friday March 20th. He gave a seminar on approaches to Automatic Musical Instrument Recognition in DIT Kevin Street, which had a reasonable attendance.

Considering Prof. Richard is an eminence in the area of audio research, it was a real honour for the ARG to have him visit us.

Again, my appreciation to Prof. Richard for answering my questions relating to AMII and for offering us all some insight into his own research into musical instrument recognition.

Thursday, March 19, 2009

Good code repository

Besides the usual popular DSP resources online, e.g. Mathworks and DSPRelated, I came across this very useful programmer's repository, Programmer's United Development Net (PUDN).

In particular the Speech-Voice recognition/combine section has some matlab source files which implement some of the useful features such as MFCCs and other coefficients.

ps. The English version of PUDN (Speech) is here.

Monday, March 16, 2009

A pot of gold

I've discovered my own little pot of gold, just before St. Patrick's Day. Roger Jang has translated some Chinese books on Pattern Recognition into English and put them online. These look like excellent resources. Not to forget to mention the toolboxes and other resources on his website. From first glimpse, the interactive tutorials look very educational and may shed some light on areas I've had trouble understanding myself.

Thursday, March 12, 2009

Multiclass classification using SVM

I've been trying to get some learning algorithms implemented on my current data.

I started using SVM (light) and have just realised that the classifier works for just binary classification, i.e. max 2 classes! So, I needed to find a solution, and I didn't have to look too far (Well, at least I hope it provides the solution as I haven't implemented the classifier just yet.) SVM (Multiclass) by the same author of the light version, Thorsten Joachims is an implementation of the multi-class Support Vector Machine (SVM) described in:

On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines, Koby Crammer and Yoram Singer (Journal of Machine Learning Research), 2001.

SVM Multiclass can be found here.
SVM Light can be found here.
Koby Crammer's home site is here.

I crashed the AES submission website!!

I finally submitted my AES 126 paper, but not without some last minute panic. When uploading my paper, the AES submission website crashed! I contacted the AES the following day and it turned out that my paper along with some Eastern European paper submissions was causing the same problem. I did all I could to assist, supplying code, software versions, logs etc. and the problem was resolved. Apparently, some 'illegal' character was causing the problem. That's about all I know. Anyway, my first paper has finally gone through and as the man says, it's like getting your first tattoo, you just want to get more...or something to that effect.