Our group consists of four members: Angel Dionisio (Computer Science), Julio Dionisio (Computer Science), Betchel Joseph (Computer Science) and William Ho (Computer Engineering). This project was undertaken as our senior design project at The City College of New York.
Our instructor is Professor Esther Levin. Her areas of expertise are in the fields of spoken language human-machine interactions, speech recognition, spoken language understanding, spoken dialog systems, indexing and information retrieval, and statistical learning theory.
The project will be the implementation of an automated directory assistance program for The City College of New York using the VoiceXML language. The personnel and department information used in this project were retrieved from CCNY's Faculty & Staff Directory and the Departments & Divisions Directory respectively.
This project was developed with the idea of creating a voice recognition directory for the departments, professors, and all of the many different offices that form The City College of New York. This directory will be especially helpful for students in need of information either early in the morning or well after school hours. The application also allows those who do not have access to a web browser to have the information immediately available whether they are on or off campus.
VoiceXML (VXML) is the W3C's standard XML format for specifying interactive voice dialogues between a human and a computer. It allows voice applications to be developed and deployed in an analogous way to HTML for visual applications. Just as HTML documents are interpreted by a visual web browser, VoiceXML documents are interpreted by a voice browser.
For a more detailed explanation, please see Wikipedia's description here.
BeVocal Café is an online VoiceXML development environment. It is a free, Web-based development environment that features a wealth of valuable tools, documentation and other resources. Applications can be easily deployed to production and hosted on BeVocal servers.
By utilizing Bevocal Café to host our program, we have access to the program through the internet and through the phone. This also reduces our overhead costs as we will not have to purchase all the hardware and software required to run a VoiceXML server.
For a better understanding of BeVocal Café and what they do, please visit their website here.
This is where programmers map out the structure of their system. This process allows us to visualize how the system will work prior to writing any code.
We first determine how the caller will use our system and what capabilities the system will need to have. We then create our call flow diagrams using the least number of steps that will allow the caller to accomplish their objective. For each diagram, we also take into consideration any errors that may occur while using the system. After this is done, we move onto the process of writing VoiceXML files based on our call flow diagrams.
Speech recognition systems provide computers with the ability to recognize what a user says. These systems use grammars to identify the words and phrases that can be spoken by the user. Grammars formally define the set of allowable phrases that can be recognized by the speech engine. There are several types of grammar formats: GSL, GSC, ABNF, JSGF, and XML. Grammars can also be embedded within the VXML file or placed in an external grammar file.
For our case, we will be using the GSL format as this is the only format that BeVocal Café supports. Simple grammar choices were embedded within the VXML files while more complex choices were placed in grammar files to provide more flexibility for our system.
For our grammar files, each phrase recognized by the system has two slots/variables. The first slot allows the caller to confirm their selection and the second slot provides the details after the caller has confirmed their choice. Both our professor and department grammar files are constructed similarly.
After coding our initial VXML and grammar files, we test our system on BeVocal Café. We log calls to demonstrate how the system works. Any bugs found are immediately fixed. Once satisfied with the program's integrity, the system is shown to the client and any further requests are discussed and documented. The call flow diagrams are then updated and the coding implemented. Bugs are checked and we meet with the client again. This cycel continues until the client is satisfied with the product.