Backup of Speech recognition using Julius(No. 3)

Tutorial?

The aim of this tutorial is to explain how to use Julius engine with SIGVerse for speech recognition service.

In the example below, the user asks the robot to move in several directions using speech.

Overview
Services
How to build grammars
Source Code understanding
Controller
World file
Downloading the project

Overview †

In the client side (windows), two services are used, Julius service and SpeechRec service. The first one is used to recognize speech from a microphone and convert it to text, the recognized text will be subscribed on a shared memory to be read after that by the SpeechRec service, this later is used to send the result to the controller, in the other part, the controller receives data from the SpeechRec service to used them for controlling the robot.

↑

Services †

↑

Julius service †

The Julius project was downloaded from the official website, and was integrated with speech recognition service. The main was to retrieve the recognized speech as text and send it to the SIGVerse service, to do so, the "output_stdout.cpp" file which aims to print the recognized text was modified to communicate with the SIGVerse service using windows shared memroy.

↑

Source code explanation †

Below is the main lines for retriving the recognized text and publishing the on the shared memory:

output_stdout.cpp file:

#define BUF_SIZE 128
  TCHAR szName[]=TEXT("Global\\MyFileMappingObject");
  TCHAR szNameConf[]=TEXT("Global\\MyFileMappingObjectConf");
  TCHAR resultReco[]=TEXT("");
  HANDLE hMapFile;
  HANDLE hMapFileConf;
  LPCTSTR pBuf ;
  LPCTSTR pBufConf ;
  TCHAR clearing[]=TEXT(""); 
  TCHAR resultRecoConf[]=TEXT("");

This part of code is used to initialize a file named MyFileMappingObject in the Windows shared memory.

↑

Speech recognition service †

This is the SIGVerse service for speech recognition, its aim is to retrieve the recognized text from the windows shared memory published by Julius service and send it to the SIGVerse controller to control the robot.

↑

Source code explanation †

TCHAR szName[]=TEXT("Global\\MyFileMappingObject");

Create a file in windows shared memory with the same name as used in julius service.

system("start .\\julius.exe -input mic -C .\\SIGVerseGrammar/Sample.jconf");

Start the julius service when launching the service.

hMapFile = CreateFileMapping(

INVALID_HANDLE_VALUE, // use paging file NULL, // default security PAGE_READWRITE, // read/write access 0, // maximum object size (high-order DWORD) BUF_SIZE, // maximum object size (low-order DWORD) szName); // name of mapping object if (hMapFile == NULL) { _tprintf(TEXT("Could not create file mapping object (%d).\n"), GetLastError()); return 1; }

Initialize the file in the shared memory.

pBuf = (LPTSTR) MapViewOfFile(hMapFile,   // handle to map object

FILE_MAP_ALL_ACCESS, // read/write permission 0, 0, BUF_SIZE);

 if (pBuf == NULL)

{ _tprintf(TEXT("Could not map view of file (%d).\n"), GetLastError()); CloseHandle(hMapFile); return 0.1; }

Initialize the buffer.

std::string send_msg = (std::string) pBuf; this->sendMsg("man_000",(char*) send_msg.c_str());

Send the recognized text to the controller.

if (strcmp(s.c_str(),"Stop_Reco")==0) { Enable = false; } else if(strcmp(s.c_str(),"Start_Reco")==0) { Enable = true; }

Enable/Disable the recognition when needed.

↑

Compiling the project †

You need to run SIGViewer as administrator cause of the method use to implement this service

Use of the sharing memory between SIGVerse Service and Julius recognition

You also have to put both of Service and Julius executable on administrator privilege.

Voice recognition Service:
- The user can enable and disable the voice recognition by sending a message to the service
- This functionality is useful to avoid message conflicts. Enable service:
```
 sendMsg(“VoiceReco_Service”,”Start_Recognition”)
```
  Disable service:
```
 sendMsg(“VoiceReco_Service”,”Stop_Recognition”)
```

As you can notice SIGVerse Service and Julius Speech to text are separated. Recognition Service uses shared memory to send Speech to text information from Julius to SIGVerse Service then Send it to the controller.

↑

How to build grammars †

User has to use Language Model file which contains a large list of words and their probability of occurrence in a given sequence or grammar file which contains much smaller sets of predefined combinations of words. We use grammar in our service for doing command. Each word in grammar file has an associated list of phonemes (which correspond to the distinct sounds that makes a word). Acoustic model is associated to grammar it contain a statistical representation of each sound that makes each word. Each sound corresponds to a phoneme. Recognition grammar is separated into two files:

• the ".grammar" file which defines a set of rules governing the words the SRE is expected to recognize; rather than listing out each word in the .grammar file, a Julian grammar file uses "Word Categories" - which is the name for a list of words to be recognized (which are defined in a separate ".voca" file);

• the ".voca" file which defines the actual "Word Candidates" in each Word Category and their pronunciation information (Note: the phonemes that make up this pronunciation information must be the same as will be used to train your Acoustic Model).

User has to use grammar generator to create new grammars
Please refer to this tutorial for more information about grammar <link> .

Compiling grammar : The .grammar and .voca files need to be compiled into ".dfa" and ".dict" files so that Julius can use them. This is done using "mkdfa.pl" grammar compiler. The .grammar and .voca files need to have the same file prefix, and this prefix is then specified to the mkdfa.pl script.

Compile your files (sample.grammar and sample.voca) as follows:

Command to execute:

 PATH-TO-EXECUTABLE/mkdfa.pl .\grammar\simple

This will generate the expected grammar

It generates sample.dfa and sample.term files which contain automation information, and sample.dict files which contain word dictionary information. Now user can perform recognition on the new defined grammar.

↑

Source Code understanding †

Recongnition.cpp:

#include <sphelper.h>
#include <string>
#include <iostream>
#include "SIGService.h"
#include <windows.h>
#include <tchar.h>
#include <conio.h>
#include "app.h"
//includes for Shared memory
#include <windows.h>
#include <stdio.h>
#include <conio.h>
#include <tchar.h>
#pragma comment(lib, "user32.lib")

#define BUF_SIZE 256
TCHAR szName[]=TEXT("Global\\MyFileMappingObject");
LPCTSTR pBuf;
HANDLE hMapFile;
bool Enable;
std::string send_msg_for;

class VoiceRecognition : public sigverse::SIGService
{
public:
	VoiceRecognition(std::string name) : SIGService(name){};
	~VoiceRecognition();
   double onAction();
	void onRecvMsg(sigverse::RecvMsgEvent &evt);
	void  onInit (); 	
};

 VoiceRecognition::~VoiceRecognition()
{
	this->disconnect();
}

void VoiceRecognition::onInit (){
//system("start");
Enable = true;
system("start .\\julius.exe -input mic -C .\\SIGVerseGrammar/Sample.jconf");
std::string send_msg_for = "";
sleep(2000);
}

double VoiceRecognition::onAction()
{
	char* kk = "";
    /////// shared memory //////
if (Enable)
{
hMapFile = CreateFileMapping(
                INVALID_HANDLE_VALUE,    // use paging file
                NULL,                    // default security
                PAGE_READWRITE,          // read/write access
                0,                       // maximum object size (high-order DWORD)
                BUF_SIZE,                // maximum object size (low-order DWORD)
                szName);                 // name of mapping object
  if (hMapFile == NULL)
  {
     _tprintf(TEXT("Could not create file mapping object (%d).\n"),
            GetLastError());
     return 1;
  }
pBuf = (LPTSTR) MapViewOfFile(hMapFile,   // handle to map object
                       FILE_MAP_ALL_ACCESS, // read/write permission
                       0,
                       0,
                       BUF_SIZE);
  if (pBuf == NULL)
  {
     _tprintf(TEXT("Could not map view of file (%d).\n"),
            GetLastError());
          CloseHandle(hMapFile);
          return 0.1;
  }
     std::string send_msg ="VOICE_DATA " + (std::string) pBuf;
   //strcat((char*) send_msg.c_str(),"VOICE_DATA ");
   //strcat(kk,"VOICE_DATA ");
   //strcat(kk,(char*)send_msg.c_str());
   //std::string send_msgs;
   //strcpy((char*)send_msgs.c_str(),kk);
   //if(strcmp(send_msg.c_str(),send_msg_for.c_str())==1)
   //  {
	this->sendMsg("man_000",(char*) send_msg.c_str());
   //this->sendMsg("man_000",kk);  
	printf ("%s \n", (char*) send_msg.c_str() );
   // printf ("%s \n", kk ); 
   // }
  send_msg_for =  send_msg;
   UnmapViewOfFile(pBuf);
	pBuf = _T("");
  CloseHandle(hMapFile);
 // printf("close the file mapping \n");  
}
	return 0.1;
}

void VoiceRecognition::onRecvMsg(sigverse::RecvMsgEvent &evt)
{
   std::string sender = evt.getSender();
   std::string msg = evt.getMsg();
   std::string s = msg;
	printf("Message  : %s  \n",s.c_str());
	std::wstring ws;
	printf("Sender  :  %s  \n", sender.c_str());
		if (strcmp(s.c_str(),"Stop_Reco")==0)
		{
          Enable = false;
		}
		else if(strcmp(s.c_str(),"Start_Reco")==0)
		{
		  Enable = true;
		}
}

int main(int argc, char** argv)
{
	VoiceRecognition srv("VoiceReco_Service");
	srv.onInit();
	unsigned short port = (unsigned short)(atoi(argv[2]));
	srv.connect(argv[1], port);
	//srv.connect("192.168.40.195", 9000);
	srv.startLoop();
	return 0;
}

This shows Service source code, user can notice that the Service is reading data from shared memory register and don’t interact directly with Julius process which is writing in the same register.
Words recognition Confidence Outputs:
- The Julius Voice Recognition Service outputs words recognition confidence, this is useful to know each degree of recognition each word has and to use it for several purpose.
- The confidence depends of the number of different words defined in each word group inside the grammar.

↑

Controller †

↑

World file †

↑

Backup of Speech recognition using Julius (No. 3)

Overview †

Services †

Julius service †

Source code explanation †

Speech recognition service †

Source code explanation †

Compiling the project †

How to build grammars †

Source Code understanding †

Controller †

World file †

Downloading the project †