Black Box Requirements?
<ol>
  <li>Understand the vowel sounds (which are continuous)
  <li>Understand single syllable words
  <li>Understand words (by joining syllables)
  <li>Understand words by sound patterns (Just by focusing on the volume?)
</ol>

White Box Requirements?
<ol>
  <li>Previous sounds that were heard are to be remembered (in secondary memory).
  <li>Only some of the sounds are to be associated with characters.
  <li>Once something is heard, determine whether to output the characters or to continue listening further on or in the worst case, to discard the sound that was heard because no match is found.
  <li>Based on the memory (previous sounds heard), presently heard sounds are to be identified giving context sensitiveness to the recognition process.
</ol>

More Confusion:
<ol>
	<li>What if the sound of two or more is heard at the same time?
</ol>