Windows 10… N – don’t use it for app development.

At least, if the app in question uses speech. Background is that I routinely use Azure VMs for development… so let’s extend that to writing apps. The ready-made VMs that also include Visual Studio are the “N” variants. So I used them, as usual. However, I wasted a number of hours failing to get to the bottom of this error when executing a pretty standard block of text-to-speech code (at the time I was not questioning whether “N” was OK to use):

MigratingWinPhoneApp15

Class not registered

at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
 at System.Runtime.CompilerServices.TaskAwaiter.
ThrowForNonSuccess(Task task)
 at System.Runtime.CompilerServices.TaskAwaiter.
HandleNonSuccessAndDebuggerNotification(Task task)
 at System.Runtime.CompilerServices.TaskAwaiter`1.GetResult()
 at HW.MainPage.d__1.MoveNext()} System.Runtime.InteropServices.COMException

I spent a lot of time after that trying to debug, googling up the wrong tree, reinstalling various releases of Visual Studio, all with the same, bad, result. In fact I should have just gone to bed, because as ever a tiny light bulb came on when I thought about… N.

So this morning I googled issues around Visual Studio and Windows N 10, and almost immediately found this:

NotWin10N_05

So I tried the media feature pack, got a message that “does not apply to this installation” or somesuch. Yes I could have persevered, but decided to trash the N instance, and create a non-N instance, and manually install Visual Studio, and this sample. And then it was all fine.

NotWin10N01

And I thought there might be some issue with trying to do speech on a VM, but that was all fine, and came loud and clear through my speakers on my host PC. QED.

 

Advertisements

Windows 10 Speech: speaking and storing as audio

This both speaks and stores as audio (wav) the passed text:


 

The code underpinning that:


 

using Windows.UI.Xaml.Controls;
using TextToSpeech;
namespace App1 {
public sealed partial class MainPage : Page
 {
 public MainPage()
 {
 InitializeComponent();
 var si = new SpeakIt();
 var textToSpeak = " I have a high respect for your nerves";
 SpeakIt.ReadText(textToSpeak);
 si.StoreText(textToSpeak);
 }
 }
}

 

 

using System;
using Windows.UI.Xaml.Controls;
using Windows.Media.SpeechSynthesis;
using System.Linq;
using Windows.Storage;
using Windows.Storage.Streams;
using System.Threading.Tasks;
namespace TextToSpeech
{
 public class SpeakIt
 {
 private const string PreferredVoice = "Susan";
 private const int BufferSize = 4096;
 private SpeechSynthesizer _synthesizer = new SpeechSynthesizer();
public SpeakIt() {
 SetPreferredVoice();
 }
public static async void ReadText(string mytext) {
 // requires the using Windows.UI.Xaml.Controls namespace...
 var mediaPlayer = new MediaElement();
using (var speech = new SpeechSynthesizer()) {
 speech.Voice = SpeechSynthesizer.AllVoices.First(voice => voice.Id.Contains(PreferredVoice));
 var stream = await speech.SynthesizeTextToStreamAsync(mytext);
 mediaPlayer.SetSource(stream, stream.ContentType);
 mediaPlayer.Play();
 }
 }
 
 public async void StoreText(string myText) {
 var synthesisStream = await _synthesizer.SynthesizeTextToStreamAsync(myText);
 var sf = await CreateLocalFile($"{Guid.NewGuid()}.wav");
 await SaveSpeechStreamToStorageFile(synthesisStream, sf);
 }
private static async Task<StorageFile> CreateLocalFile(string fileName) {
 // https://msdn.microsoft.com/en-gb/library/windows/apps/br227251
 var sfo = ApplicationData.Current.LocalFolder;
 var sf = await sfo.CreateFileAsync(fileName); 
 return sf;
 }
private static async Task SaveSpeechStreamToStorageFile(SpeechSynthesisStream synthesisStream, StorageFile sf) {
 var writeStream = await sf.OpenAsync(FileAccessMode.ReadWrite);
 var outputStream = writeStream.GetOutputStreamAt(0);
 var dataWriter = new DataWriter(outputStream);
 var buffer = new Windows.Storage.Streams.Buffer(BufferSize);
while (synthesisStream.Position < synthesisStream.Size) {
 await synthesisStream.ReadAsync(buffer, BufferSize, InputStreamOptions.None);
 dataWriter.WriteBuffer(buffer);
 }
 dataWriter.StoreAsync().AsTask().Wait();
 outputStream.FlushAsync().AsTask().Wait();
 outputStream.Dispose();
 writeStream.Dispose();
 }
private void SetPreferredVoice() {
 _synthesizer.Voice = SpeechSynthesizer.AllVoices.First(voice => voice.Id.Contains(PreferredVoice));
 }
 }
}

					

UWP / Speech

Pretty much just screenshots. In practice Speech remains as-is from Phone 8.1 as far as I can see. On this occasion I moved my VSFF to a VM running W10 Enterprise, as you need that for the emulators. However although on a Windows 8.1 install in 2014 I could get the emulator working, this time I could not.

I had not previously grasped that whereas the previous generation of Speech (references e.g. the “Hazel” voice) manage both the speech creation and the audio “rendering”, the new generation (references e.g. the “Susan” voice) does not: typically you can create the speech stream in a non-e.g. XAML context, but to render you typically use MediaElement… which uses a XAML page. Terrible generalisations on my part, but I can see it is probably related to a wish to be SOLID.

Anyhoo, these are the key takeaways for me right now… real simple, but proves the point (what point? 🙂 ). The use of the voice “Susan” in the snippet assumes that you have installed it.

using Windows.UI.Xaml.Controls;
using TextToSpeech;
namespace App1 {
public sealed partial class MainPage : Page
 {
 public MainPage()
 {
 InitializeComponent();
 SpeakIt.ReadText("this is a test; oh yes");
 }
 }
}

 

using System;
using Windows.UI.Xaml.Controls;
using Windows.Media.SpeechSynthesis;
using System.Linq;
namespace TextToSpeech
{
 public static class SpeakIt
 {
 private const string PreferredVoice = "Susan";
public static async void ReadText(string mytext) {
 MediaElement mediaPlayer = new MediaElement();
using (var speech = new SpeechSynthesizer()) {
 speech.Voice = SpeechSynthesizer.AllVoices.First(voice => voice.Id.Contains(PreferredVoice));
var x = SpeechSynthesizer.AllVoices.ToList();
 foreach (var item in x) {
 var x1 = item.Description;
 var x2 = item.DisplayName;
 var x3 = item.Gender;
 var x4 = item.Id;
 var x5 = item.Language;
 }
SpeechSynthesisStream stream = await speech.SynthesizeTextToStreamAsync(mytext);
 mediaPlayer.SetSource(stream, stream.ContentType);
 mediaPlayer.Play();
 }
 }
 }
}

This is a useful MVA video on 8.1 (i.e. 10) speech.

Just the screenshots I took while trying to get a result – may of them are very close to those from 1.5 years back, the only difference being Windows 10 now, Windows 8.1 then:

 

PowerShell: splitting an input file and saving to wav format in chunks

On 4 out of 5 days, I have a car journey that is between 0.75 and 1.25 hours. I want to be able to take a free (e.g. Project Gutenberg) book, or at least a DRM free book, split it into sections, and create an audio file from each section.
Let’s say that the following is my entirety of my book:

Guten01

I want to read/hear in sections: lines 1 and 2 (section 1), lines 3 and 4 (section 2), lines 5 and 6 (section 3), line 7 (section 4), giving this:

Guten02

This PowerShell is one way to do that (Although I write the split text back out to disk and then read it back in, that step could be removed).

Guten03

function Get-FileName($extension = "txt") {
 "{0}{1}_{2}.{3}" -f ($outputRootDir, $outputFileNamePrefix, $chunk, $extension)
}
function Write-WavFile() {
 $speech = New-Object -TypeName System.Speech.Synthesis.SpeechSynthesizer
 $speech.SelectVoice("Microsoft Hazel Desktop")
 $textToSpeak = Get-Content -Path $(Get-FileName) -Encoding UTF8
 $speech.SetOutputToWaveFile($(Get-FileName "wav"))
 $speech.Speak($textToSpeak)
 $speech.Dispose()
 $speech = $null
}
function Split-File (
 $fileToSplit = 'C:\Temp\pandp.txt',
 $splitMarker = "SPLITHERE",
 $outputFileNamePrefix = "TheseLinesAudio",
 $outputRootDir = "c:\temp\"
) {
 Add-Type -AssemblyName System.Speech
 $reader = New-Object -TypeName System.IO.StreamReader($fileToSplit)
 $chunk = 1
 $speech = New-Object -TypeName System.Speech.Synthesis.SpeechSynthesizer
 $speech.SelectVoice("Microsoft Hazel Desktop")
 while (($line = $reader.ReadLine()) -ne $null) {
 if ($line -match $splitMarker) {
 Write-WavFile
 $chunk++
 } else {
 Add-Content -Path $(Get-FileName "txt") -Value $line -Encoding utf8
 }
 }
 Write-WavFile
 $reader.Close()
 $reader.Dispose()
 $reader = $null
}
#entry point...
Split-File

PowerShell:speech and encoding

If you have some text in a file, and push that through the SAPI to get speech out, be aware that the output is sensitive to the encoding of the file. The problem happens in the text, not in the speech: the speech is just a victim of the text.

As an example, look at this text:

SpeechEncoding01

Pass this through Get-Content, and see what has been stored:

SpeechEncoding02

If you then pass that through SAPI, then quite reasonably what you see there is what you will get, including “… Euro Symbol, Trade Mark Symbol…”.

And that is because on my laptop, the default encoding is not the same as the encoding of the source file. If I look at the menu in NotePad++, I see this:

SpeechEncoding03

So if I now add the correct encoding switch to Get-Content, we now get this…

SpeechEncoding04

Doing an end-to-end run now gives no odd speech incidentals, although if you listen to the SoundCloud below, you will hear it is not ideal. For that we would probably need the Speech Markup language.

SpeechEncoding05

 

PS>Add-Type -AssemblyName system.speech
PS>$speech = New-Object -TypeName system.speech.synthesis.speechsynthesizer
PS>$speech.SelectVoice("Microsoft Hazel Desktop")
PS>$text = Get-Content -Path C:\temp\voicesnip01.txt -Encoding UTF8
PS>$text
‘It’s Jeremy’s sledge.’.
Véronique was extremely angry.
The Place Vendôme was unusually quiet.
PS>$speech.Speak($text)

SpeechEncoding06

$speech.SetOutputToWaveFile("c:\temp\snippy.wav")
$speech.Speak($text)
$speech.Dispose()
$speech = $null

Microsoft Speech: Hazel and Susan

These are both GB voices. To my ears, the more recent Susan voice has more quality than the Hazel voice.

Programmatically, the Hazel voice can be got at easily:

SpeechPs01

Add-Type -AssemblyName system.speech
$x = New-Object -TypeName System.Speech.Synthesis.SpeechSynthesizer
$x.GetInstalledVoices() | % { $_.voiceinfo}

SpeechPs02

$x.SelectVoice("Microsoft Hazel Desktop")

And from there, we can get some speech out:

SpeechPs03

$x.Speak(“East Fife, 4. Forfar, 5”)

If you then run a Get-Member over the object, you see its methods:

$x | gm -MemberType Method
WavFiless01

So we can do this:

$x.SetOutputToWaveFile("c:\temp\test.wav") 
$x.Speak("East Fife, 4. Forfar, 5") 
$x.Dispose() $x = $null

drm02

Obviously you’ll have to run that yourself to hear the evidence, but you now have a valid wav file speaking in the Hazel voice.

But going back to the list of installed voices, even though I am on Windows 10, and the Susan voice appears in Time and Language/Speech, I cannot get it to surface easily. Well, at all, right now.

SpeechPs04

I’ve been ploughing through the Registry, and from there I find where the artefacts are held both for the Hazel and the Susan voices (in fact I used George in the end as the Susan equivalent, because it does not occur so often as Susan in the Registry). For example:

George03

My hope is that the only differences between the 2 types are location, and once I can coerce the new voices into the same place as the old voices, then SAPI will just discover them. That may well be naive. We shall see. Finally for tonight, having done shed loads of registry screenshots in the hope that some of them will give me strong clues in the next pass, I’m dumping them here: