From: acsinger@uiuc.edu Sent: Sunday, October 12, 2003 9:52 PM To: Joseph Picone Cc: k.uherek@ieee.org; acsinger@uiuc.edu Subject: ((RQ)) IEEE Transactions on Signal Processing - T-SP-01444-2003 Oct 12, 2003 Dr. Joseph Picone MS State Box 9571 MS State,MS 39762 Paper:T-SP-01444-2003 Applications of Risk Minimization to Speech Recognition Dear Dr. Picone, I am writing to you concerning the above referenced manuscript, which you submitted to the IEEE Transactions on Signal Processing. Based on the enclosed set of reviews, it was recommended that the manuscript be REVISED AND RESUBMITTED (RQ). I would like to call your attention to the specific concerns of the reviewers as to the novelty of the results and the specific contributions vis-a-vis the literature on SVMs. We hope you will be able to implement the comments of the reviewers (**See note below about attachments). Please note that the reviewers will review your revised article only one more time. The only decisions available to an AE after a second round of reviews are A(Accepted), AQ(Accepted with Mandatory Revisions), and R(Reject). Your revised manuscript must be submitted back to Manuscript Central http://sps-ieee.manuscriptcentral.com no later than 60 days from the date of this letter to be further considered for publication in the IEEE Transactions on Signal Processing. If we do not receive your revised manuscript within this specified time, your manuscript will be considered withdrawn. Please be sure to upload the revised manuscript in Dr. Picone's account. That is the account that will have the number T-SP-01444-2003R1 listed. Once you see this number simply click on the underlined title and follow the submission guidelines. If you do not see this number please send an email to Mr. Kevin Uherek at k.uherek@ieee.org and he will assist you in finding this number. If you have any questions regarding the reviews, please contact me. Any other inquiries should be directed to Mr. Kevin Uherek. Regards, Dr. Andrew Singer acsinger@uiuc.edu Mr. Kevin Uherek k.uherek@ieee.org SPS Publications Office **Any ATTACHMENTS FROM THE REVIEWERS can be found by you, (Dr. Picone) going to the paper in your account. On the right side of the screen you will see a button "View Comments/Respond". Click on it. You will then see the same email sent to you from Dr. Andrew Singer but the attachments from reviewers are listed at the very bottom. http://sps-ieee.manuscriptcentral.com Reviewer 1 Comments: I. REVIEW Please expand and give details in Section III. A. Suitability of topic 1. Is the topic appropriate for publication in these transactions? (X) Yes ( ) Perhaps ( ) No 2. Is the topic important to colleagues working in the field? (X) Yes ( ) Moderately So ( ) No (explain: ) B. Content 1. Is the paper technically sound? (X) Yes ( ) No (why not? )

2. Is the coverage of the topic sufficiently comprehensive and balanced? ( ) Yes (X) Important information is missing or superficially treated. ( ) Treatment somewhat unbalanced, but not seriously so. ( ) Certain parts significantly overstresses. 3. How would you describe the technical depth of the paper? ( ) Superficial ( ) Suitable for the non-specialist ( ) Appropriate for the Generally Knowledgeable Individual Working in the Field or a Related Field (X) Suitable only for an Expert 4. How would you rate the technical novelty of the paper? ( ) Novel (X) Somewhat Novel ( ) Not Novel C. Presentation 1. How would you rate the overall organization of the paper? (X) Satisfactory ( ) Could be improved ( ) Poor 2. Are the title and abstract satisfactory? (X) Yes ( ) No (explain: ) 3. Is the length of the paper appropriate? If not, recommend how the length of the paper should be amended, including a possible target length for the final manuscript. (X) Yes ( ) No 4. Are symbols, terms, and concepts adequately defined? (X) Yes ( ) Not always ( ) No 5. How do you rate the English usage? (X) Satisfactory ( ) Needs improvement ( ) Poor 6. Rate the Bibliography. (X) Satisfactory ( ) Unsatisfactory (explain: ) D. Overall rating 1. How would you rate the technical contents of the paper? ( ) Excellent (X) Good ( ) Fair ( ) Poor 2. How would you rate the novelty of the paper? ( ) Highly Novel (X) Sufficiently Novel ( ) Slightly Novel ( ) Not Novel 3. How would you rate the literary presentation of the paper? (X) Totally Accessible ( ) Mostly Accessible ( ) Partially Accessible ( ) Inaccessible 4. How would you rate the appropriateness of this paper for publication in this IEEE Transactions? (X) Excellent Match ( ) Good Match ( ) Weak Match ( ) Poor Match II. RECOMMENDATION ( ) A Publish Unaltered ( ) AQ Publish in Minor, Required Changes (as noted in Section III) (X) RQ Review Again After Major Changes (as noted in Section III) ( ) R Reject (Paper is not of sufficient quality or novelty to be published in this Transactions) ( ) R Reject (A major rewrite is required. Author should be encouraged to resubmit rewritten paper at some later time.) ( ) R Reject (Paper is seriously flawed; do not encourage resubmission.) III. DETAILED COMMENTS Please state why you rated the paper as you did in Sections I and II. If you have indicated that revisions are required, please give the author specific guidance regarding those revisions, differentiating between optional and mandatory changes. 1) Good tutorial on the evolution of pattern recognition techniques as applied to speech recognition. Good description of the SVM and reasons for its use. The latter may not be well known in the signal processing community. 2) Inadequate explanation and justification for eq. (13). This only gives local information about the observable pdfs near the decision boundaries. 3) Inadequate explanation of the overall recognition system. I assume that it's a standard IBM style model but that must be made clear and explicit. 4) Inadequate evaluation of the results. How can we conclude that there is a performance improvement. Table 1 is not enough. 5) Inadequate discussion of the pitfalls of the SVM when projecting into very high dimensional feature spaces.Reviewer 2 Comments: I. REVIEW Please expand and give details in Section III. A. Suitability of topic 1. Is the topic appropriate for publication in these transactions? (X) Yes ( ) Perhaps ( ) No 2. Is the topic important to colleagues working in the field? (X) Yes ( ) Moderately So ( ) No (explain: ) B. Content 1. Is the paper technically sound? (X) Yes ( ) No (why not? )

2. Is the coverage of the topic sufficiently comprehensive and balanced? ( ) Yes (X) Important information is missing or superficially treated. ( ) Treatment somewhat unbalanced, but not seriously so. ( ) Certain parts significantly overstresses. 3. How would you describe the technical depth of the paper? ( ) Superficial ( ) Suitable for the non-specialist ( ) Appropriate for the Generally Knowledgeable Individual Working in the Field or a Related Field (X) Suitable only for an Expert 4. How would you rate the technical novelty of the paper? ( ) Novel (X) Somewhat Novel ( ) Not Novel C. Presentation 1. How would you rate the overall organization of the paper? ( ) Satisfactory (X) Could be improved ( ) Poor 2. Are the title and abstract satisfactory? (X) Yes ( ) No (explain: ) 3. Is the length of the paper appropriate? If not, recommend how the length of the paper should be amended, including a possible target length for the final manuscript. (X) Yes ( ) No 4. Are symbols, terms, and concepts adequately defined? ( ) Yes (X) Not always ( ) No 5. How do you rate the English usage? ( ) Satisfactory (X) Needs improvement ( ) Poor 6. Rate the Bibliography. (X) Satisfactory ( ) Unsatisfactory (explain: ) D. Overall rating 1. How would you rate the technical contents of the paper? ( ) Excellent ( ) Good (X) Fair ( ) Poor 2. How would you rate the novelty of the paper? ( ) Highly Novel (X) Sufficiently Novel ( ) Slightly Novel ( ) Not Novel 3. How would you rate the literary presentation of the paper? ( ) Totally Accessible (X) Mostly Accessible ( ) Partially Accessible ( ) Inaccessible 4. How would you rate the appropriateness of this paper for publication in this IEEE Transactions? ( ) Excellent Match (X) Good Match ( ) Weak Match ( ) Poor Match II. RECOMMENDATION ( ) A Publish Unaltered ( ) AQ Publish in Minor, Required Changes (as noted in Section III) (X) RQ Review Again After Major Changes (as noted in Section III) ( ) R Reject (Paper is not of sufficient quality or novelty to be published in this Transactions) ( ) R Reject (A major rewrite is required. Author should be encouraged to resubmit rewritten paper at some later time.) ( ) R Reject (Paper is seriously flawed; do not encourage resubmission.) III. DETAILED COMMENTS Please state why you rated the paper as you did in Sections I and II. If you have indicated that revisions are required, please give the author specific guidance regarding those revisions, differentiating between optional and mandatory changes. Overall comments: The paper presents some new ideas for SVM application to speech recognition but requires substantial improvement in overall presentation. Too much space is devoted to introductory or review material; not enough is devoted to the specific proposals made by the authors. The writing style is a bit too cavalier and needs to be improved. However, the ideas described and tested appear promising. With substantial re-writing, I think this paper would be suitable for presentation in IEEE transactions. Specific comments: p. 1. "Simplistic techniques...": this derogates early work in recognition and is an example of what I call a cavalier or immodest writing style. I suggest "simple techniques" instead. "relatively fragile" --> relative to what? Drop "relative", and be more specific in describing system fragility. "we demonstrate this" --> instead "that" between last 2 words. p. 2-4. A bit too much review of HMM/EM fundamentals. Condense. p. 3. "based as an optimization" --> "based ON an optimization"? "in this example any amount of effort ... will not" --> "in this example NO amount of effort". p. 4. "The primary different between ML-based HMM... the wrong model is used". This is a good point, but the phrasing is confusing. Perhaps something like "the objective criterion for the latter reflects classification performance even if the wrong probabilistic model is used"...? p.5-8: This overview section seems slightly long. p. 5. "Closed-loop optimality". I assume this refers to performance on the training set. I don't believe that "closed-loop" is a common term in machine learning or speech recognition. "the design of a classifier is essentially a process" --> "the design of a classifier is essentially THE process" p. 6, first paragraph introduces the term "loss function", but then this is re-introduced and italicized further down the page. Italicize/emphasize the term at its first use. p. 8-10. This section summarizes SVMs, but this is all review material. Suggest condensing. p. 11. On page 11 of a paper whose conclusion starts on page 16, finally we're starting to hear details of what the authors did in this study. I suggest adding one or two lines describing in more detail the ML-approach to estimating A and B of the sigmoid. "expense of increased computations" --> "expense of increased computation time" p. 12. "the phone 's'" --> "the phone /s/". However, the point that the authors are trying to make is independent of the phone identity, so I suggest not introducing the phone identity. p. 13. What do the authors mean exactly by "we use segments composed of the three sections"? I assume three equal-dimension vectors are concatenated. What is the dimensionality of these vectors? How are they themselves derived? The following text mentions LPC (log-area parameters), but it's not clear if this is how the segments were calculated. p. 14. "This was a good database" --> "This is a good database" "We have observed similar trends on a number of static classification tasks" : details (or references) please. p. 16. "When we allow the SVM to decide the best segmentation and hypothesis combination by using the N-best segmentations..." This point regarding the segmentations, either decided by the baseline HMM system or by the SVMs, is very unclear. A lot more detail is needed on the authors' proposed technique. The authors have provided several figures describing the fundamentals of SVMs, but no figures describing their own recognition system architecture. Is their method essentially an N-best rescoring method? Will they always require a baseline HMM to provide N-best lists, or could the SVMs be used to supplant the HMM? "Table 1" -- this should be Table 2, I think. I strongly urge the authors to give slightly less details about review material that is already published, and many more details on what they themselves did in this study. Reviewer 3 Comments: I. REVIEW Please expand and give details in Section III. A. Suitability of topic 1. Is the topic appropriate for publication in these transactions? (X) Yes ( ) Perhaps ( ) No 2. Is the topic important to colleagues working in the field? (X) Yes ( ) Moderately So ( ) No (explain: ) B. Content 1. Is the paper technically sound? (X) Yes ( ) No (why not? )

2. Is the coverage of the topic sufficiently comprehensive and balanced? ( ) Yes (X) Important information is missing or superficially treated. ( ) Treatment somewhat unbalanced, but not seriously so. ( ) Certain parts significantly overstresses. 3. How would you describe the technical depth of the paper? (X) Superficial ( ) Suitable for the non-specialist ( ) Appropriate for the Generally Knowledgeable Individual Working in the Field or a Related Field ( ) Suitable only for an Expert 4. How would you rate the technical novelty of the paper? ( ) Novel ( ) Somewhat Novel (X) Not Novel C. Presentation 1. How would you rate the overall organization of the paper? ( ) Satisfactory (X) Could be improved ( ) Poor 2. Are the title and abstract satisfactory? ( ) Yes (X) No (explain: see review) 3. Is the length of the paper appropriate? If not, recommend how the length of the paper should be amended, including a possible target length for the final manuscript. (X) Yes ( ) No 4. Are symbols, terms, and concepts adequately defined? (X) Yes ( ) Not always ( ) No 5. How do you rate the English usage? (X) Satisfactory ( ) Needs improvement ( ) Poor 6. Rate the Bibliography. (X) Satisfactory ( ) Unsatisfactory (explain: ) D. Overall rating 1. How would you rate the technical contents of the paper? ( ) Excellent ( ) Good (X) Fair ( ) Poor 2. How would you rate the novelty of the paper? ( ) Highly Novel ( ) Sufficiently Novel (X) Slightly Novel ( ) Not Novel 3. How would you rate the literary presentation of the paper? (X) Totally Accessible ( ) Mostly Accessible ( ) Partially Accessible ( ) Inaccessible 4. How would you rate the appropriateness of this paper for publication in this IEEE Transactions? ( ) Excellent Match ( ) Good Match (X) Weak Match ( ) Poor Match II. RECOMMENDATION ( ) A Publish Unaltered ( ) AQ Publish in Minor, Required Changes (as noted in Section III) ( ) RQ Review Again After Major Changes (as noted in Section III) (X) R Reject (Paper is not of sufficient quality or novelty to be published in this Transactions) ( ) R Reject (A major rewrite is required. Author should be encouraged to resubmit rewritten paper at some later time.) ( ) R Reject (Paper is seriously flawed; do not encourage resubmission.) III. DETAILED COMMENTS Please state why you rated the paper as you did in Sections I and II. If you have indicated that revisions are required, please give the author specific guidance regarding those revisions, differentiating between optional and mandatory changes. This paper proposes two deep, theoretically interesting research topics, then drops both of them without warning. In the end, the experiments performed by the authors seem to be nothing more than standard SVM classification experiments, performed almost step by step according to the instructions in Vapnik's textbook. Section 2 discusses the well-known problems with ML, and suggests MCE as a solution. Section 3 discusses the principle of structural risk minimization. Combination of these concepts would be very interesting, as MCE is itself a type of ERM; unfortunately, both of these concepts are dropped without warning, and the paper never returns to them. Section 4 presents the support vector machine, and section 5 proposes an SVM/HMM hybrid. An SVM/HMM hybrid would be very interesting. ANN/HMM hybrids use the classification power of an ANN in order to improve the observation PDF computation of an HMM. As a result, the hybrid systems often outperform mixture Gaussian HMMs, at the cost of substantially increased computational complexity during training. An SVM/HMM hybrid, using SVM to compute the observation PDFs of an HMM in some way, would presumably do even better than the ANN/HMM hybrid, presumably at even greater computational cost. Unfortunately, this idea is dropped immediately after being suggested, because it is too computationally complex. In the end, the authors perform n-ary classification of fixed-length segmental feature vectors. As far as I can tell, the only interaction between the HMM system and the SVM system is that the HMM is used to find segment boundaries for the SVM to classify; the final log-probabilities of the HMM and the SVM are then apparently added together in order to create the final recognition score. In my view, adding together the scores of two classifiers, both of which are trained and tested according to standard published references, does not constitute a "hybrid" system of sufficient novelty to merit publication.