[[Tutorial]]
//
//  This page was written by tafifet & guezout
//

The aim of this tutorial is to show how to use Julius engine with SIGVerse for speech recognition.
Up:[[Tutorial]]     Previous:[[Joystick service]]

In the example below, the user asks the robot to move in several directions using speech, the speech recognition task is achieved by Julius engine which will send the recognized text to the SIGVerse speech recognition service, this later sends the recognized text to the controller to control the robot.
----


This tutorial shows how to use Julius engine with SIGVerse for speech recognition.

#contents
*Overview [#ed1b8135]
In the client side (windows), two services are used, Julius service and SpeechRec service. The first one is used to recognize speech from a microphone and convert it to text, the recognized text will be subscribed on a shared memory to be read after that by the SpeechRec service, this later is used to send the result to the controller, in the other part, the controller receives data from the SpeechRec service to used them for controlling the robot.

The Speech recognition Service consists in two parallel processes :
- The Julius Speech recognition process  Julius.exe which is used to recognize speech from a microphone, to convert it to text and to publish it in a shared memory.
- The SIGService Process Takeit.sig which reads recognized speech from shared memory and sends it to the controller.

&ref(JuliusOverview.PNG,,80%);


&ref(juliusOverview.PNG,,80%);
*Speach recognition plug in  [#ed1b9196]
** Configuration  [#ed1b8156]

*Services[#ed1b8135]
** Julius service [#ed1b8136]
The Julius project was downloaded from the [[official website:http://julius.sourceforge.jp/en_index.php]], and was integrated with speech recognition service. The main was to retrieve the recognized speech as text and send it to the SIGVerse service, to do so, the "output_stdout.cpp" file which aims to print the recognized text was modified to communicate with the SIGVerse service using windows shared memroy.
 
You need to run services as an administrator, go to Speech recognition Service directory , you will find 2  files: takeIt.sig and julius.exe, by right click on these files, go to properties -> Compatibility -> Privilege level and check the box "Run this program as an administrator".

*** Source code explanation [#ed1b8138]
Below is the main lines for retriving the recognized text and publishing the on the shared memory:
To start the service, add the takeIt.sig file to SIGViewer services, it will automatically launch the julius.exe process.

output_stdout.cpp file:

 hMapFile = CreateFileMapping(
                 INVALID_HANDLE_VALUE,    // use paging file
                 NULL,                    // default security
                 PAGE_READWRITE,          // read/write access
                 0,                       // maximum object size (high-order DWORD)
                 BUF_SIZE,                // maximum object size (low-order DWORD)
                 szName);                 // name of mapping object
   if (hMapFile == NULL)
   {
      _tprintf(TEXT("Could not create file mapping object (%d).\n"),
             GetLastError());
      return 1;
   }
 pBuf = (LPTSTR) MapViewOfFile(hMapFile,   // handle to map object
                        FILE_MAP_ALL_ACCESS, // read/write permission
                        0,
                        0,
                        BUF_SIZE);
   if (pBuf == NULL)
   {
      _tprintf(TEXT("Could not map view of file (%d).\n"),
             GetLastError());
       CloseHandle(hMapFile);
      return 1;
   }

Create a new file file mapping object, and initialize a buffer.

 int i;
  if(firstLoop == 1){
   ZeroMemory(
  (PVOID)pBuf,                 
  (_tcslen(pBuf) * sizeof(pBuf) )
 );
	UnmapViewOfFile(pBuf);
   CloseHandle(hMapFile);
 printf("close the file mapping \n");  
	}
	firstLoop=1;
 strcpy (resultReco," ");
  if (seq != NULL) {
    for (i=1;i<n-1;i++) {
	  strcat(resultReco,winfo->woutput[seq[i]]); 
    }
  }
  myprintf("%s",resultReco);  


This part of code allows to retrieve the recognized text and print it in the shared memory.

** Speech recognition service [#ed1b8137]
This is the SIGVerse service for speech recognition, its aim is to retrieve the recognized text from the windows shared memory published by Julius service and send it to the SIGVerse controller to control the robot.


*** Source code explanation [#ed1b8338]
 TCHAR szName[]=TEXT("Global\\MyFileMappingObject");

Initialize a TCHAR variable with the name of the mapping object

 system("start .\\julius.exe -input mic -C .\\SIGVerseGrammar/Sample.jconf");


Start the julius service.

 hMapFile = CreateFileMapping(
                 INVALID_HANDLE_VALUE,    // use paging file
                 NULL,                    // default security
                 PAGE_READWRITE,          // read/write access
                 0,                       // maximum object size (high-order DWORD)
                 BUF_SIZE,                // maximum object size (low-order DWORD)
                 szName);                 // name of mapping object
   if (hMapFile == NULL)
   {
      _tprintf(TEXT("Could not create file mapping object (%d).\n"),
             GetLastError());
      return 1;
   }

Create and initialize the file mapping object.

 pBuf = (LPTSTR) MapViewOfFile(hMapFile,   // handle to map object
                        FILE_MAP_ALL_ACCESS, // read/write permission
                        0,
                        0,
                        BUF_SIZE);
   if (pBuf == NULL)
   {
      _tprintf(TEXT("Could not map view of file (%d).\n"),
             GetLastError());
       CloseHandle(hMapFile);
      return 0.1;
   }

Create a buffer.

 std::string send_msg =(std::string) pBuf;
 this->sendMsg("robot_000",send_msg.c_str());

Send the recognized text to the controller.


 if (strcmp(s.c_str(),"Stop_Reco")==0)
  {
   Enable = false;
  }
 else if(strcmp(s.c_str(),"Start_Reco")==0)
  {
   Enable = true;
  }
  

Enable/Disable the recognition when needed.



**Compiling the project [#ed1b8438]

You first need to download the whole project from the GIT repository.

Go to "SIGVerseJulius\msvc" and open the VS 2008 project.
Include SIGVerse libraries.
 
In the solution explorer window, 4 projects are listed, start by compiling them in the following order:
 -1) libsent
 -2) libjulius
 -3) julius
 -4) takeIt

After that, you need to run services as an administrator, go to "\SIGVerseJulius\msvc\Debug", you will find 2 generated files: takeIt.exe and julius.exe, by right click on these files, go to properties -> Compatibility -> Privilege level and check the box "Run this program as an administrator".

To start the service, run the takeIt.exe service, it will automatically run the julius.exe service. 

Ps: Don't forget to configure the microphone before.

*How to build grammars [#ed1b9136]
*Speech recognition grammar  [#ed1b9136]
** Building grammar  [#ed1b8176]


User has to use Language Model file which contains a large list of words and their probability of occurrence in a given sequence or grammar file which contains much smaller sets of predefined combinations of words.
We use grammar in our service for doing command.
Use Language Model file which contains a large list of words and their probability of occurrence in a given sequence or grammar file which contains much smaller sets of predefined combinations of words.
Each word in grammar file has an associated list of phonemes (which correspond to the distinct sounds that makes a word).
Acoustic model is associated to grammar it contain a statistical representation of each sound that makes each word. Each sound corresponds to a phoneme.
Acoustic model is associated to grammar, it contain a statistical representation of each sound that makes each word, each sound corresponds to a phoneme.

Recognition grammar is separated into two files: 

•	the ".grammar" file which defines a set of rules governing the words the SRE is expected to recognize;  rather than listing out each word in the .grammar file, a Julian grammar file uses "Word Categories" - which is the name for a list of words to be recognized (which are defined in a separate ".voca" file);
•	the ".grammar" file which defines a set of rules governing the words that Julius is expected to recognize;  rather than listing out each word in the .grammar file, the Julius grammar file uses "Word Categories" - which is the name for a list of words to be recognized (which are defined in a separate ".voca" file);

•	the ".voca" file which defines the actual "Word Candidates" in each Word Category and their pronunciation information (Note: the phonemes that make up this pronunciation information must be the same as will be used to train your Acoustic Model).
•	the ".voca" file which defines the actual "Word Candidates" in each Word Category and their pronunciation information.

-User has to use grammar generator to create new grammars 
-Please refer to this tutorial for more information about grammar <link> .

Compiling grammar :
You can use the file VoxForgeDict that contains the dictionary, it is located in:

 PATH-TO-SERVICE\SIGVerseGrammar\VoxForgeDict

** Compiling grammar  [#ed1b8156]

The .grammar and .voca files  need to be compiled into ".dfa"  and ".dict" files so that Julius can use them.  This is done using  "mkdfa.pl" grammar compiler. The .grammar and .voca files need to have the same file prefix, and this prefix is then specified to the mkdfa.pl script.   
-Compile your files (sample.grammar and sample.voca) as follows:

-Command to execute:

  PATH-TO-EXECUTABLE/mkdfa.pl .\grammar\simple
  PATH-TO-SERVICE/mkdfa.pl .\grammar\simple

This will generate the expected grammar

&ref(GrammarGen.PNG,,80%);

-It generates sample.dfa and sample.term files which contain automation information, and sample.dict files which contain word dictionary information. 
Now user can perform recognition on the new defined grammar.

*Source Code understanding [#ed1b8136]

-Recongnition.cpp:


 #include <sphelper.h>
 #include <string>
 #include <iostream>
 #include "SIGService.h"
 #include <windows.h>
 #include <tchar.h>
 #include <conio.h>
 #include "app.h"
 //includes for Shared memory
 #include <windows.h>
 #include <stdio.h>
 #include <conio.h>
 #include <tchar.h>
 #pragma comment(lib, "user32.lib")

 #define BUF_SIZE 256
 TCHAR szName[]=TEXT("Global\\MyFileMappingObject");
 LPCTSTR pBuf;
 HANDLE hMapFile;
 bool Enable;
 std::string send_msg_for;

 class VoiceRecognition : public sigverse::SIGService
 {
 public:
	VoiceRecognition(std::string name) : SIGService(name){};
	~VoiceRecognition();
    double onAction();
	void onRecvMsg(sigverse::RecvMsgEvent &evt);
	void  onInit (); 	
 };

  VoiceRecognition::~VoiceRecognition()
 {
	this->disconnect();
 }

 
 void VoiceRecognition::onInit (){
 //system("start");
 Enable = true;
 system("start .\\julius.exe -input mic -C .\\SIGVerseGrammar/Sample.jconf");
 std::string send_msg_for = "";
 sleep(2000);
 }

 double VoiceRecognition::onAction()
 {
	char* kk = "";
     /////// shared memory //////
 if (Enable)
 {
 hMapFile = CreateFileMapping(
                 INVALID_HANDLE_VALUE,    // use paging file
                 NULL,                    // default security
                 PAGE_READWRITE,          // read/write access
                 0,                       // maximum object size (high-order DWORD)
                 BUF_SIZE,                // maximum object size (low-order DWORD)
                 szName);                 // name of mapping object
   if (hMapFile == NULL)
   {
      _tprintf(TEXT("Could not create file mapping object (%d).\n"),
             GetLastError());
      return 1;
   }
 pBuf = (LPTSTR) MapViewOfFile(hMapFile,   // handle to map object
                        FILE_MAP_ALL_ACCESS, // read/write permission
                        0,
                        0,
                        BUF_SIZE);
   if (pBuf == NULL)
   {
      _tprintf(TEXT("Could not map view of file (%d).\n"),
             GetLastError());
           CloseHandle(hMapFile);
           return 0.1;
   }
      std::string send_msg ="VOICE_DATA " + (std::string) pBuf;
    //strcat((char*) send_msg.c_str(),"VOICE_DATA ");
    //strcat(kk,"VOICE_DATA ");
    //strcat(kk,(char*)send_msg.c_str());
    //std::string send_msgs;
    //strcpy((char*)send_msgs.c_str(),kk);
    //if(strcmp(send_msg.c_str(),send_msg_for.c_str())==1)
    //  {
	this->sendMsg("man_000",(char*) send_msg.c_str());
    //this->sendMsg("man_000",kk);  
	printf ("%s \n", (char*) send_msg.c_str() );
    // printf ("%s \n", kk ); 
    // }
   send_msg_for =  send_msg;
    UnmapViewOfFile(pBuf);
	pBuf = _T("");
   CloseHandle(hMapFile);
  // printf("close the file mapping \n");  
 }
 	return 0.1;
 }

 void VoiceRecognition::onRecvMsg(sigverse::RecvMsgEvent &evt)
 {
    std::string sender = evt.getSender();
    std::string msg = evt.getMsg();
    std::string s = msg;
	printf("Message  : %s  \n",s.c_str());
	std::wstring ws;
	printf("Sender  :  %s  \n", sender.c_str());
		if (strcmp(s.c_str(),"Stop_Reco")==0)
		{
           Enable = false;
		}
		else if(strcmp(s.c_str(),"Start_Reco")==0)
		{
		  Enable = true;
		}
 }

 int main(int argc, char** argv)
 {
	VoiceRecognition srv("VoiceReco_Service");
	srv.onInit();
	unsigned short port = (unsigned short)(atoi(argv[2]));
	srv.connect(argv[1], port);
	//srv.connect("192.168.40.195", 9000);
	srv.startLoop();
	return 0;
 }

-This shows Service source code, user can notice that the Service is reading data from shared memory register and don’t interact directly with Julius process which is writing in the same register.
-Words recognition Confidence Outputs:
--The Julius Voice Recognition Service outputs words recognition confidence, this is useful to know each degree of recognition each word has and to use it for several purpose.
--The confidence depends of the number of different words defined in each word group inside the grammar.

*Controller [#ed1b8158]
The controller receives messages from the SIGVerse service and convert the to commands.
The controller receives messages from the SIGVerse service and convert them to commands.

voiceRecognition.cpp:

 #include "Controller.h"
 #include "Logger.h"
 #include <unistd.h>
 #include "ControllerEvent.h"
 #include <sstream>
 class voiceRecognition : public Controller
 {
 public:
  void onInit(InitEvent &evt);
  double onAction(ActionEvent &evt);
  void onRecvMsg(RecvMsgEvent &evt);
 public:
 RobotObj *my;
  double velocity;
  int i;
  double direction1;
  double direction2;
 int j;
 double Weel_one;
 double Weel_two;
 };
 void voiceRecognition::onInit(InitEvent &evt)
 {
 Weel_one = 0.0;
 Weel_two = 0.0;
 my = this->getRobotObj(this->myname());
 my->setWheel(10.0, 10.0);
 }
 double voiceRecognition::onAction(ActionEvent &evt)
 {
 my->setWheelVelocity(Weel_one,Weel_two);
 return 0.01;
 }
 void voiceRecognition::onRecvMsg(RecvMsgEvent &evt)
 {
  std::string sender = evt.getSender();
  std::string msg = evt.getMsg();
  LOG_MSG(("message : %s", msg.c_str()));
 if(strcmp("moveforward",msg.c_str())==0)
 {
 Weel_one = 3.0;
 Weel_two = 3.0;
 }
    else if(strcmp("back",msg.c_str())==0)
    {
 Weel_one = -3.0;
 Weel_two = -3.0;
 }
      else if(strcmp("turnleft",msg.c_str())==0)
  {
 Weel_one = 0.78;
 Weel_two = -0.78;
   }
   else if(strcmp("turnright",msg.c_str())==0)
     {
 Weel_one = -0.78;
 Weel_two = 0.78;
    }
 }
 extern "C"  Controller * createController ()
 {
   return new voiceRecognition;
 }


If you want to Enable/disable the speech recognition service to avoid some conflicts, you need to send these messages to the service when needed:

Enable service:
  sendMsg(“VoiceReco_Service”,”Start_Recognition”)
Disable service:
  sendMsg(“VoiceReco_Service”,”Stop_Recognition”) 


*World file [#ed1b8156]

VoiceRecognition.xml:

 <?xml version="1.0" encoding="utf-8"?>
 <world name="myworld1">
  <gravity x="0.0" y="-980.7" z="0.0"/>
  <instanciate class="WheelRobot-nii-v1.xml" type="Robot">
    <set-attr-value name="name" value="robot_000"/>
    <set-attr-value name="language" value="c++"/>
    <set-attr-value name="implementation"
                    value="./voiceRecognition.so"/>
    <set-attr-value name="dynamics" value="false"/>
    <set-attr-value name="x" value="-100.0"/>
    <set-attr-value name="y" value="30.0"/>
    <set-attr-value name="z" value="-130.0"/>
    <set-attr-value name="collision" value="true"/>
    <!--stereo camera right-->
    <camera id="1"
            link="REYE_LINK"
            direction="0.0 -1.0 1.0"
            position="0.0 0.0 3.0"/>
    <!--stereo camera left-->
    <camera id="2"
            link="LEYE_LINK"
            direction="0.0 -1.0 1.0"
            position="0.0 0.0 3.0"/>
    <!--distance sensor-->
    <camera id="3"
            link="WAIST_LINK0"
            direction="0.0 0.0 1.0"
            position="0.0 -5.0 20.0"/>
    <!--monitoring camera-->
    <camera id="4"
            link="WAIST_LINK2"
            direction="0 0 1"
            quaternion="0.0 0.0 -0.966 0.259"
            position="0.0 40.0 120.0"/>
  </instanciate>
 </world>


*Running the service [#ed1b6236]
Here are steps to follow to run the project:

- First, you need to run the controller, go to the controller directory and run execute this line:

 $ sigserver.sh -w ./VoiceRecognition.xml

- After that, you have to run the SIGViewer.

- Then, run the takeIt.exe service for speech recognition, you need to configure the microphone first.

To make the robot moving, the allowed voice commands are :
move forward, move backward, turn left and turn right.





*Speech Recognition Service Source code [#ed1b8135]

** Julius process [#ed1b8136]
The Julius project can be downloaded from the [[official website:http://julius.sourceforge.jp/en_index.php]], the version used is 4.1.5, it was integrated with the SIGVerse speech recognition service. The main is to retrieve the recognized speech as text and send it to the SIGVerse service, to do so, the "output_stdout.cpp" file which aims to print the recognized text was modified to allow the communication with the SIGVerse service using the windows shared memroy.


It is located in : path_to_julius_service\julius\output_stdout.cpp

*** Source code explanation [#ed1b8138]
Below is the main lines for retriving the recognized text and publishing the on the shared memory:


output_stdout.cpp file:


#highlight(cpp){{
 hMapFile = CreateFileMapping(
                 INVALID_HANDLE_VALUE,    // use paging file
                 NULL,                    // default security
                 PAGE_READWRITE,          // read/write access
                 0,                       // maximum object size (high-order DWORD)
                 BUF_SIZE,                // maximum object size (low-order DWORD)
                 szName);                 // name of mapping object
   if (hMapFile == NULL)
   {
      _tprintf(TEXT("Could not create file mapping object (%d).\n"),
             GetLastError());
      return 1;
   }
 pBuf = (LPTSTR) MapViewOfFile(hMapFile,   // handle to map object
                        FILE_MAP_ALL_ACCESS, // read/write permission
                        0,
                        0,
                        BUF_SIZE);
   if (pBuf == NULL)
   {
      _tprintf(TEXT("Could not map view of file (%d).\n"),
             GetLastError());
       CloseHandle(hMapFile);
      return 1;
   }
}}
Create a new file mapping object, and initialize a buffer.

#highlight(cpp){{
 int i;
  if(firstLoop == 1){
   ZeroMemory(
  (PVOID)pBuf,                 
  (_tcslen(pBuf) * sizeof(pBuf) )
 );
	UnmapViewOfFile(pBuf);
   CloseHandle(hMapFile);
 printf("close the file mapping \n");  
	}
	firstLoop=1;
 strcpy (resultReco," ");
  if (seq != NULL) {
    for (i=1;i<n-1;i++) {
	  strcat(resultReco,winfo->woutput[seq[i]]); 
    }
  }
  myprintf("%s",resultReco);  
}}

This part of code allows to retrieve the recognized text and print it in the shared memory.

** SIGService process [#ed1b8137]
This is the SIGVerse service for speech recognition, its aim is to retrieve the recognized speech from the windows shared memory published by Julius proces and to send it to the SIGVerse controller to control the robot.


*** Source code explanation [#ed1b8338]

recognition.cpp:

#highlight(cpp){{
 TCHAR szName[]=TEXT("Global\\MyFileMappingObject");
}}

Initialize a TCHAR variable with the name of the mapping object

#highlight(cpp){{
 system("start .\\julius.exe -input mic -C .\\SIGVerseGrammar/Sample.jconf");
}}

Start the julius service.

#highlight(cpp){{
 hMapFile = CreateFileMapping(
                 INVALID_HANDLE_VALUE,    // use paging file
                 NULL,                    // default security
                 PAGE_READWRITE,          // read/write access
                 0,                       // maximum object size (high-order DWORD)
                 BUF_SIZE,                // maximum object size (low-order DWORD)
                 szName);                 // name of mapping object
   if (hMapFile == NULL)
   {
      _tprintf(TEXT("Could not create file mapping object (%d).\n"),
             GetLastError());
      return 1;
   }
}}


Create and initialize the file mapping object.

#highlight(cpp){{
 pBuf = (LPTSTR) MapViewOfFile(hMapFile,   // handle to map object
                        FILE_MAP_ALL_ACCESS, // read/write permission
                        0,
                        0,
                        BUF_SIZE);
   if (pBuf == NULL)
   {
      _tprintf(TEXT("Could not map view of file (%d).\n"),
             GetLastError());
       CloseHandle(hMapFile);
      return 0.1;
   }
}}
Create a buffer.

 std::string send_msg =(std::string) pBuf;
 this->sendMsg("robot_000",send_msg.c_str());

Send the recognized text to the controller.


#highlight(cpp){{

 if (strcmp(s.c_str(),"Stop_Reco")==0)
  {
   Enable = false;
  }
 else if(strcmp(s.c_str(),"Start_Reco")==0)
  {
   Enable = true;
  }
}}

Enable/Disable the recognition when needed.



**Compiling the project [#ed1b8438]

You first need to download the whole project from the GIT repository.

Go to "SIGVerseJulius\msvc" and open the VS 2008 project. You also need to include SIGVerse libraries.
 
In the solution explorer window, 4 projects are listed, you have to compile them in the following order:
 1) libsent
 2) libjulius
 3) julius
 4) takeIt





*Downloading the project [#ed1b8736]
To download the project from GIT repository, use the following link:

https://github.com/SIGVerse/samples/tree/master/SpeechRecognition


#highlight(end)

Up:[[Tutorial]]     Previous:[[Joystick service]]

#counter


Front page   New List of pages Search Recent changes   Help   RSS of recent changes