Publications

10 Results
Skip to search filters

Reducing computation in an i-vector speaker recognition system using a tree-structured universal background model

Speech Communication

McClanahan, Richard M.; De Leon, Phillip L.

The majority of state-of-the-art speaker recognition systems (SR) utilize speaker models that are derived from an adapted universal background model (UBM) in the form of a Gaussian mixture model (GMM). This is true for GMM supervector systems, joint factor analysis systems, and most recently i-vector systems. In all of these systems, the posterior probabilities and sufficient statistics calculations represent a computational bottleneck in both enrollment and testing. We propose a multi-layered hash system, employing a tree-structured GMM-UBM which uses Runnalls' Gaussian mixture reduction technique, in order to reduce the number of these calculations. With this tree-structured hash, we can trade-off reduction in computation with a corresponding degradation of equal error rate (EER). As an example, we reduce this computation by a factor of 15 × while incurring less than 10% relative degradation of EER (or 0.3% absolute EER) when evaluated with NIST 2010 speaker recognition evaluation (SRE) telephone data.

More Details

Performance of I-vector speaker verification and the detection of synthetic speech

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

McClanahan, Richard M.; Stewart, Bryan; De Leon, Phillip L.

In this paper, we present new research results on the vulnerability of speaker verification (SV) systems to synthetic speech. Using a state-of-the-art i-vector SV system and evaluating with the Wall-Street Journal (WSJ) corpus, our SV system has a 0.00% false rejection rate (FRR) and 1.74 × 10-5 false acceptance rate (FAR). When the i-vector system is tested with state-of-the-art speaker-adaptive, hidden Markov model (HMM)-based synthetic speech generated from speaker models derived from the WSJ journal corpus, 22.9% of the matched claims are accepted highlighting the vulnerability of SV systems to synthetic speech. We propose a new synthetic speech detector (SSD) which uses previously-proposed features derived from image analysis of pitch patterns but extracted on phoneme-level segments and which leverages the available enrollment speech from the SV system. When the SSD is applied to human and synthetic speech accepted by the SV system, the overall system has a FRR of 7.35% and a FAR of 2.34 × 10-4 which is lower than previously-reported systems and thus significantly reduces the vulnerability. © 2014 IEEE.

More Details

Efficient speaker verification using Gaussian mixture model component clustering

McClanahan, Richard M.

In speaker verification (SV) systems that employ a support vector machine (SVM) classifier to make decisions on a supervector derived from Gaussian mixture model (GMM) component mean vectors, a significant portion of the computational load is involved in the calculation of the a posteriori probability of the feature vectors of the speaker under test with respect to the individual component densities of the universal background model (UBM). Further, the calculation of the sufficient statistics for the weight, mean, and covariance parameters derived from these same feature vectors also contribute a substantial amount of processing load to the SV system. In this paper, we propose a method that utilizes clusters of GMM-UBM mixture component densities in order to reduce the computational load required. In the adaptation step we score the feature vectors against the clusters and calculate the a posteriori probabilities and update the statistics exclusively for mixture components belonging to appropriate clusters. Each cluster is a grouping of multivariate normal distributions and is modeled by a single multivariate distribution. As such, the set of multivariate normal distributions representing the different clusters also form a GMM. This GMM is referred to as a hash GMM which can be considered to a lower resolution representation of the GMM-UBM. The mapping that associates the components of the hash GMM with components of the original GMM-UBM is referred to as a shortlist. This research investigates various methods of clustering the components of the GMM-UBM and forming hash GMMs. Of five different methods that are presented one method, Gaussian mixture reduction as proposed by Runnall's, easily outperformed the other methods. This method of Gaussian reduction iteratively reduces the size of a GMM by successively merging pairs of component densities. Pairs are selected for merger by using a Kullback-Leibler based metric. Using Runnal's method of reduction, we were able to achieve a factor of 2.77 reduction in a posteriori probability calculations with no loss in accuracy when the original UBM consisted of 256 component densities. When clustering was implemented with a 1024 component UBM, we achieved a computation reduction of 5 with no loss in accuracy and a reduction by a factor of 10 with less than 2.4% relative loss in accuracy.

More Details
10 Results
10 Results