Convert Speech Data in Audio Files to Text with C# and Azure Speech API

I’ve been listening to many podcasts recently and they have hundreds of hours of content all stored in audio files as speech. How could we begin to index or search them?

A solution to searching podcast audio is to convert the audio files to text with associated time stamps. We’ll index the text and then be able to search the content and retrieve the text and audio slices we are looking for.

First, create a Visual Studio project (console command line is fine), and install the ProjectOxford.SpeechRecognition nuget package that looks like this:

Next, spin up a Bing Speech API service in Azure. You can search for it with the “speech” keyword, and it looks like this:

After your Speech to Text service spins up, you will want to get the access keys for it. On the dashboard for your instance, click the “Show access keys …” link that looks like this. You’ll want to copy the access key value in to your app.config or code that you use at run time.

Here’s a code example to create a service handle from its factory, and send audio data to the service from your source audio file. Be sure to change the audio file name and your service API key in the code example below.

string key = "YOUR_KEY_GOES_HERE";
Console.WriteLine("Key provided is: {0}", key);
Console.Write("Please provide file: ");
string file = @"C:\MyFavoritePodCast.mp3";
Console.WriteLine("File provided is: {0}", file);
var defaultLocale = "en-US";
var mode = SpeechRecognitionMode.LongDictation;
using (var dataClient = SpeechRecognitionServiceFactory.CreateDataClient(mode, defaultLocale, key))
{
// Event handlers for speech recognition results
dataClient.OnResponseReceived += OnDataDictationResponseReceivedHandler;
dataClient.OnPartialResponseReceived += OnPartialResponseReceivedHandler;
dataClient.OnConversationError += OnConversationErrorHandler;
using (var fileStream = new FileStream(file, FileMode.Open, FileAccess.Read))
{
Console.Write("Processing File");

var bytesRead = 0;
var buffer = new byte[1024];
try
{
do
{
// Get more Audio data to send into byte buffer.
bytesRead = fileStream.Read(buffer, 0, buffer.Length);
// Send of audio data to service. 
dataClient.SendAudio(buffer, bytesRead);
}
while (bytesRead > 0);
}
finally
{
// We are done sending audio.  Final recognition results will arrive in OnResponseReceived event call.
dataClient.EndAudio();
}
Console.WriteLine();
Console.Write("Waiting for response");
// Big sleep to ensure async requests complete.
Thread.Sleep(25000000);
}
}

In the Speech to Text code example above we setup some event handler functions, so we need to implement them. Here is an example of how they would look and how you could print out the text received as output from the Speech to Text service.

private static void OnDataDictationResponseReceivedHandler(object sender, SpeechResponseEventArgs e)
{
Console.WriteLine();
Console.WriteLine("--- OnDataDictationResponseReceivedHandler ---");
switch (e.PhraseResponse.RecognitionStatus)
{
case RecognitionStatus.EndOfDictation:
case RecognitionStatus.DictationEndSilenceTimeout:
Console.WriteLine("Completed");
break;
}
WriteResponseResult(e);
}
private static void WriteResponseResult(SpeechResponseEventArgs e)
{
Console.WriteLine();
if (e.PhraseResponse.Results.Length == 0)
{
Console.WriteLine("No phrase response is available.");
}
else
{
Console.WriteLine("########## Final n-BEST Results ##############");
for (int i = 0; i < e.PhraseResponse.Results.Length; i++)
{
Console.WriteLine(
"[{0}] Confidence={1}, Text=\"{2}\"",
i,
e.PhraseResponse.Results[i].Confidence,
e.PhraseResponse.Results[i].DisplayText);
}
Console.WriteLine();
}
done = true;
}
private static void OnIntentHandler(object sender, SpeechIntentEventArgs e)
{
Console.WriteLine();
Console.WriteLine("--- Intent received by OnIntentHandler() ---");
Console.WriteLine("{0}", e.Payload);
Console.WriteLine();
}
private static void OnPartialResponseReceivedHandler(object sender, PartialSpeechResponseEventArgs e)
{
Console.WriteLine();
Console.WriteLine("--- Partial result received by OnPartialResponseReceivedHandler() ---");
Console.WriteLine("{0}", e.PartialResult);
Console.WriteLine();
done = true;
}
private static void OnConversationErrorHandler(object sender, SpeechErrorEventArgs e)
{
Console.WriteLine();
Console.WriteLine("--- Error received by OnConversationErrorHandler() ---");
Console.WriteLine("Error code: {0}", e.SpeechErrorCode.ToString());
Console.WriteLine("Error text: {0}", e.SpeechErrorText);
Console.WriteLine();
done = true;
}

You will certainly want to handle the text outputs from the Speech to Text audio service with some persistence and check for errors, but this gives a great starting example of how to setup the functions and process the arguments.

Reading an Excel file in C# using EPPlus

I had written a post about using EPPlus for writing Excel formatted reports in a previous blog article that you can find here: 

Creating an Excel report with C# using a MySql data source

What if we want to read an Excel file source? Can we still use EPPlus? Yes!

First, add the EPPlus nuget package to your Visual Studio project.

Next, create an ExcelPackage handle and open your source Excel file like this. You can access worksheets in your workbook like the below code.

                ExcelPackage package = new ExcelPackage(new FileInfo(sourceExcelFile));
                ExcelWorksheet workSheet = package.Workbook.Worksheets[1];

Note that most things in EPPlus are indexed as 1 to N rather than 0 to N-1. Accessing Worksheets[1] is the first worksheet. Accessing column[1] is the first column, and row[1] is the first row.

Now, we want to iterate and access our worksheet cell data. We can iterate and access our worksheet cell data like this:

for (int rowIndex = workSheet.Dimension.Start.Row; rowIndex <= workSheet.Dimension.End.Row; rowIndex++)
{
for (int colIndex = workSheet.Dimension.Start.Column; colIndex <= workSheet.Dimension.End.Column; colIndex++)
{
// You can access cells directly like this:
string myCellText = workSheet.Cells[rowIndex, colIndex].Text;
}
}

Warning: The workSheet.Dimension.End.Row value is sometimes not what you expect. Your source Excel file could have 100 rows, and workSheet.Dimension.End.Row may still be a value of 1 million. You will probably want to add some code to validate your cells and break the loop when your records are done being processed.

Another way to work with the cells is to get its value with a .Value.ToString() call like this:

		string myCellText = workSheet.Cells[rowIndex, colIndex].Text;
string myCellText = workSheet.Cells[rowIndex, colIndex].Value.ToString();

Most of the time the .Text and .Value.ToString() will be the same result, but not always. On case I encountered was with date columns. The .Text resulted in a nice “5/16/2017” output, and the .Value.ToString() was giving me an unexpected number like “43340”.