Windows 10 Speech: speaking and storing as audio

This both speaks and stores as audio (wav) the passed text:


The code underpinning that:


using Windows.UI.Xaml.Controls;
using TextToSpeech;
namespace App1 {
public sealed partial class MainPage : Page
 public MainPage()
 var si = new SpeakIt();
 var textToSpeak = " I have a high respect for your nerves";



using System;
using Windows.UI.Xaml.Controls;
using Windows.Media.SpeechSynthesis;
using System.Linq;
using Windows.Storage;
using Windows.Storage.Streams;
using System.Threading.Tasks;
namespace TextToSpeech
 public class SpeakIt
 private const string PreferredVoice = "Susan";
 private const int BufferSize = 4096;
 private SpeechSynthesizer _synthesizer = new SpeechSynthesizer();
public SpeakIt() {
public static async void ReadText(string mytext) {
 // requires the using Windows.UI.Xaml.Controls namespace...
 var mediaPlayer = new MediaElement();
using (var speech = new SpeechSynthesizer()) {
 speech.Voice = SpeechSynthesizer.AllVoices.First(voice => voice.Id.Contains(PreferredVoice));
 var stream = await speech.SynthesizeTextToStreamAsync(mytext);
 mediaPlayer.SetSource(stream, stream.ContentType);
 public async void StoreText(string myText) {
 var synthesisStream = await _synthesizer.SynthesizeTextToStreamAsync(myText);
 var sf = await CreateLocalFile($"{Guid.NewGuid()}.wav");
 await SaveSpeechStreamToStorageFile(synthesisStream, sf);
private static async Task<StorageFile> CreateLocalFile(string fileName) {
 var sfo = ApplicationData.Current.LocalFolder;
 var sf = await sfo.CreateFileAsync(fileName); 
 return sf;
private static async Task SaveSpeechStreamToStorageFile(SpeechSynthesisStream synthesisStream, StorageFile sf) {
 var writeStream = await sf.OpenAsync(FileAccessMode.ReadWrite);
 var outputStream = writeStream.GetOutputStreamAt(0);
 var dataWriter = new DataWriter(outputStream);
 var buffer = new Windows.Storage.Streams.Buffer(BufferSize);
while (synthesisStream.Position < synthesisStream.Size) {
 await synthesisStream.ReadAsync(buffer, BufferSize, InputStreamOptions.None);
private void SetPreferredVoice() {
 _synthesizer.Voice = SpeechSynthesizer.AllVoices.First(voice => voice.Id.Contains(PreferredVoice));


PowerShell: splitting an input file and saving to wav format in chunks

On 4 out of 5 days, I have a car journey that is between 0.75 and 1.25 hours. I want to be able to take a free (e.g. Project Gutenberg) book, or at least a DRM free book, split it into sections, and create an audio file from each section.
Let’s say that the following is my entirety of my book:


I want to read/hear in sections: lines 1 and 2 (section 1), lines 3 and 4 (section 2), lines 5 and 6 (section 3), line 7 (section 4), giving this:


This PowerShell is one way to do that (Although I write the split text back out to disk and then read it back in, that step could be removed).


function Get-FileName($extension = "txt") {
 "{0}{1}_{2}.{3}" -f ($outputRootDir, $outputFileNamePrefix, $chunk, $extension)
function Write-WavFile() {
 $speech = New-Object -TypeName System.Speech.Synthesis.SpeechSynthesizer
 $speech.SelectVoice("Microsoft Hazel Desktop")
 $textToSpeak = Get-Content -Path $(Get-FileName) -Encoding UTF8
 $speech.SetOutputToWaveFile($(Get-FileName "wav"))
 $speech = $null
function Split-File (
 $fileToSplit = 'C:\Temp\pandp.txt',
 $splitMarker = "SPLITHERE",
 $outputFileNamePrefix = "TheseLinesAudio",
 $outputRootDir = "c:\temp\"
) {
 Add-Type -AssemblyName System.Speech
 $reader = New-Object -TypeName System.IO.StreamReader($fileToSplit)
 $chunk = 1
 $speech = New-Object -TypeName System.Speech.Synthesis.SpeechSynthesizer
 $speech.SelectVoice("Microsoft Hazel Desktop")
 while (($line = $reader.ReadLine()) -ne $null) {
 if ($line -match $splitMarker) {
 } else {
 Add-Content -Path $(Get-FileName "txt") -Value $line -Encoding utf8
 $reader = $null
#entry point...

Microsoft Speech: Hazel and Susan

These are both GB voices. To my ears, the more recent Susan voice has more quality than the Hazel voice.

Programmatically, the Hazel voice can be got at easily:


Add-Type -AssemblyName system.speech
$x = New-Object -TypeName System.Speech.Synthesis.SpeechSynthesizer
$x.GetInstalledVoices() | % { $_.voiceinfo}


$x.SelectVoice("Microsoft Hazel Desktop")

And from there, we can get some speech out:


$x.Speak(“East Fife, 4. Forfar, 5”)

If you then run a Get-Member over the object, you see its methods:

$x | gm -MemberType Method

So we can do this:

$x.Speak("East Fife, 4. Forfar, 5") 
$x.Dispose() $x = $null


Obviously you’ll have to run that yourself to hear the evidence, but you now have a valid wav file speaking in the Hazel voice.

But going back to the list of installed voices, even though I am on Windows 10, and the Susan voice appears in Time and Language/Speech, I cannot get it to surface easily. Well, at all, right now.


I’ve been ploughing through the Registry, and from there I find where the artefacts are held both for the Hazel and the Susan voices (in fact I used George in the end as the Susan equivalent, because it does not occur so often as Susan in the Registry). For example:


My hope is that the only differences between the 2 types are location, and once I can coerce the new voices into the same place as the old voices, then SAPI will just discover them. That may well be naive. We shall see. Finally for tonight, having done shed loads of registry screenshots in the hope that some of them will give me strong clues in the next pass, I’m dumping them here:



Windows 10 Speech: very basic code

No error handling, very dirty, just wanted to get something that produces sound.


    <Grid Background="{ThemeResource ApplicationPageBackgroundThemeBrush}">
            <MediaElement x:Name="media" AutoPlay="False"/>
            <TextBox x:Name="textBox1" Text="My Dear Text" Margin="5"/>
            <Button x:Name="blueButton" Margin="5" Background="LightBlue" Content="ButtonRight" RelativePanel.RightOf="textBox1"/>
            <Button x:Name="orangeButton" Click="orangeButton_Click" Margin="5" Background="Orange" Content="ButtonBelow"
                    RelativePanel.RightOf="textBox1" RelativePanel.Below="blueButton"/>


using System;
using System.Collections.Generic;
using Windows.Media.SpeechSynthesis;
using Windows.UI.Xaml.Controls;
using Windows.UI.Xaml.Media;
using Windows.ApplicationModel.Resources.Core;
// The Blank Page item template is documented at
namespace App1 {
    /// <summary>
    /// An empty page that can be used on its own or navigated to within a Frame.
    /// </summary>
    public sealed partial class MainPage : Page {
        private SpeechSynthesizer synthesizer;
        private ResourceContext speechContext;
        private ResourceMap speechResourceMap;
        public static MainPage Current;
        public MainPage() {
            synthesizer = new SpeechSynthesizer();
            speechContext = ResourceContext.GetForCurrentView();
            speechContext.Languages = new string[] { SpeechSynthesizer.DefaultVoice.Language };
            speechResourceMap = ResourceManager.Current.MainResourceMap.GetSubtree("LocalizationTTSResources");
        public List<Scenario> Scenarios
            get { return this.scenarios; }
        private async void orangeButton_Click(object sender, Windows.UI.Xaml.RoutedEventArgs e) {
            if (media.CurrentState.Equals(MediaElementState.Playing)) {
            else {
                string text = textBox1.Text.ToString();
                if (!String.IsNullOrEmpty(text)) {
                    // Change the button label. You could also just disable the button if you don't want any user control.
                    try {
                        // Create a stream from the text. This will be played using a media element.
                        SpeechSynthesisStream synthesisStream = await synthesizer.SynthesizeTextToStreamAsync(text);
                        // Set the source and start playing the synthesized audio stream.
                        media.AutoPlay = true;
                        media.SetSource(synthesisStream, synthesisStream.ContentType);
                    catch (System.IO.FileNotFoundException) {
                        // If media player components are unavailable, (eg, using a N SKU of windows), we won't
                        // be able to start media playback. Handle this gracefully
                        var messageDialog = new Windows.UI.Popups.MessageDialog("Media player components unavailable");
                        await messageDialog.ShowAsync();
                    catch (Exception) {
                        // If the text is unable to be synthesized, throw an error message to the user.
                        media.AutoPlay = false;
                        var messageDialog = new Windows.UI.Popups.MessageDialog("Unable to synthesize text");
                        await messageDialog.ShowAsync();

Ref SSML... this worked... and the difference between loud and soft is perceptible:
string Ssml =
     @"<speak version='1.0' " +
     "xmlns='' xml:lang='en-GB'>" +
     "<prosody volume='x-loud'> This is extra loud volume. </prosody>";

This worked:
string Ssml =
               @"<speak version='1.0' " +
               "xmlns='' xml:lang='en-GB'>" +
               "Hello <prosody contour='(0%,+80Hz) (10%,+80%) (40%,+80Hz)'>World</prosody> " +
               "<break time='500ms' />" +
               "Goodbye <prosody rate='slow' contour='(0%,+20Hz) (10%,+30%) (40%,+10Hz)'>World</prosody>" +

ref ssml:

Googling the above, see a lot of complaints about this. When I have time I will try this out:

I installed fresh Windows 10 and Visual Studio Community 2015, and the designer failed to load (for MainPage.xaml etc). I had to:

  1. enable developer mode in system settings (update section) as suggested in info dialog
  2. (re)install Visual C++ redistributable for VS 2015

But I don’t know which one exactly resolved the problem… Now the designer loads as expected. (I tried only C# universal app yet)


… and generally tidy up the post.



Windows 10 on Windows Phone, with Speech Apps


And in the end it wasn’t SO hard. By the end of the weekend, I have this:

  • Lumia 635 with no micro-SD upgraded to Windows 10
  • Windows 10 speech app (Universal) building and running ok in Visual Studio 2015. It has both synthesis and recognition, so using that as a template, I should now be able to build whatever I need.
  • The same app running on the above Lumia 635 (this is from Microsoft and GitHub, to be clear)

Don’t know why, but right now I cannot take screenshots on this phone, so these are photos of the Lumia 635 taken from my Lumia 735:

Windows Speech Synthesis

I’ve been away from this area for a bit. I’m watching this video from Microsoft Visual Academy.


Supporting MSDN stuff here. Here is the Speech SDK.


So anyway, just diving in…