Windows 10 Speech: speaking and storing as audio

This both speaks and stores as audio (wav) the passed text:


 

The code underpinning that:


 

using Windows.UI.Xaml.Controls;
using TextToSpeech;
namespace App1 {
public sealed partial class MainPage : Page
 {
 public MainPage()
 {
 InitializeComponent();
 var si = new SpeakIt();
 var textToSpeak = " I have a high respect for your nerves";
 SpeakIt.ReadText(textToSpeak);
 si.StoreText(textToSpeak);
 }
 }
}

 

 

using System;
using Windows.UI.Xaml.Controls;
using Windows.Media.SpeechSynthesis;
using System.Linq;
using Windows.Storage;
using Windows.Storage.Streams;
using System.Threading.Tasks;
namespace TextToSpeech
{
 public class SpeakIt
 {
 private const string PreferredVoice = "Susan";
 private const int BufferSize = 4096;
 private SpeechSynthesizer _synthesizer = new SpeechSynthesizer();
public SpeakIt() {
 SetPreferredVoice();
 }
public static async void ReadText(string mytext) {
 // requires the using Windows.UI.Xaml.Controls namespace...
 var mediaPlayer = new MediaElement();
using (var speech = new SpeechSynthesizer()) {
 speech.Voice = SpeechSynthesizer.AllVoices.First(voice => voice.Id.Contains(PreferredVoice));
 var stream = await speech.SynthesizeTextToStreamAsync(mytext);
 mediaPlayer.SetSource(stream, stream.ContentType);
 mediaPlayer.Play();
 }
 }
 
 public async void StoreText(string myText) {
 var synthesisStream = await _synthesizer.SynthesizeTextToStreamAsync(myText);
 var sf = await CreateLocalFile($"{Guid.NewGuid()}.wav");
 await SaveSpeechStreamToStorageFile(synthesisStream, sf);
 }
private static async Task<StorageFile> CreateLocalFile(string fileName) {
 // https://msdn.microsoft.com/en-gb/library/windows/apps/br227251
 var sfo = ApplicationData.Current.LocalFolder;
 var sf = await sfo.CreateFileAsync(fileName); 
 return sf;
 }
private static async Task SaveSpeechStreamToStorageFile(SpeechSynthesisStream synthesisStream, StorageFile sf) {
 var writeStream = await sf.OpenAsync(FileAccessMode.ReadWrite);
 var outputStream = writeStream.GetOutputStreamAt(0);
 var dataWriter = new DataWriter(outputStream);
 var buffer = new Windows.Storage.Streams.Buffer(BufferSize);
while (synthesisStream.Position < synthesisStream.Size) {
 await synthesisStream.ReadAsync(buffer, BufferSize, InputStreamOptions.None);
 dataWriter.WriteBuffer(buffer);
 }
 dataWriter.StoreAsync().AsTask().Wait();
 outputStream.FlushAsync().AsTask().Wait();
 outputStream.Dispose();
 writeStream.Dispose();
 }
private void SetPreferredVoice() {
 _synthesizer.Voice = SpeechSynthesizer.AllVoices.First(voice => voice.Id.Contains(PreferredVoice));
 }
 }
}

		
Advertisements

UWP / Speech

Pretty much just screenshots. In practice Speech remains as-is from Phone 8.1 as far as I can see. On this occasion I moved my VSFF to a VM running W10 Enterprise, as you need that for the emulators. However although on a Windows 8.1 install in 2014 I could get the emulator working, this time I could not.

I had not previously grasped that whereas the previous generation of Speech (references e.g. the “Hazel” voice) manage both the speech creation and the audio “rendering”, the new generation (references e.g. the “Susan” voice) does not: typically you can create the speech stream in a non-e.g. XAML context, but to render you typically use MediaElement… which uses a XAML page. Terrible generalisations on my part, but I can see it is probably related to a wish to be SOLID.

Anyhoo, these are the key takeaways for me right now… real simple, but proves the point (what point? ūüôā ). The use of the voice “Susan” in the snippet assumes that you have installed it.

using Windows.UI.Xaml.Controls;
using TextToSpeech;
namespace App1 {
public sealed partial class MainPage : Page
 {
 public MainPage()
 {
 InitializeComponent();
 SpeakIt.ReadText("this is a test; oh yes");
 }
 }
}

 

using System;
using Windows.UI.Xaml.Controls;
using Windows.Media.SpeechSynthesis;
using System.Linq;
namespace TextToSpeech
{
 public static class SpeakIt
 {
 private const string PreferredVoice = "Susan";
public static async void ReadText(string mytext) {
 MediaElement mediaPlayer = new MediaElement();
using (var speech = new SpeechSynthesizer()) {
 speech.Voice = SpeechSynthesizer.AllVoices.First(voice => voice.Id.Contains(PreferredVoice));
var x = SpeechSynthesizer.AllVoices.ToList();
 foreach (var item in x) {
 var x1 = item.Description;
 var x2 = item.DisplayName;
 var x3 = item.Gender;
 var x4 = item.Id;
 var x5 = item.Language;
 }
SpeechSynthesisStream stream = await speech.SynthesizeTextToStreamAsync(mytext);
 mediaPlayer.SetSource(stream, stream.ContentType);
 mediaPlayer.Play();
 }
 }
 }
}

This is a useful MVA video on 8.1 (i.e. 10) speech.

Just the screenshots I took while trying to get a result – may of them are very close to those from 1.5 years back, the only difference being Windows 10 now, Windows 8.1 then:

 

PowerShell: splitting an input file and saving to wav format in chunks

On 4 out of 5 days, I have a car journey that is between 0.75 and 1.25 hours. I want to be able to take a free (e.g. Project Gutenberg) book, or at least a DRM free book, split it into sections, and create an audio file from each section.
Let’s say that the following is my entirety of my book:

Guten01

I want to read/hear in sections: lines 1 and 2 (section 1), lines 3 and 4 (section 2), lines 5 and 6 (section 3), line 7 (section 4), giving this:

Guten02

This PowerShell is one way to do that (Although I write the split text back out to disk and then read it back in, that step could be removed).

Guten03

function Get-FileName($extension = "txt") {
 "{0}{1}_{2}.{3}" -f ($outputRootDir, $outputFileNamePrefix, $chunk, $extension)
}
function Write-WavFile() {
 $speech = New-Object -TypeName System.Speech.Synthesis.SpeechSynthesizer
 $speech.SelectVoice("Microsoft Hazel Desktop")
 $textToSpeak = Get-Content -Path $(Get-FileName) -Encoding UTF8
 $speech.SetOutputToWaveFile($(Get-FileName "wav"))
 $speech.Speak($textToSpeak)
 $speech.Dispose()
 $speech = $null
}
function Split-File (
 $fileToSplit = 'C:\Temp\pandp.txt',
 $splitMarker = "SPLITHERE",
 $outputFileNamePrefix = "TheseLinesAudio",
 $outputRootDir = "c:\temp\"
) {
 Add-Type -AssemblyName System.Speech
 $reader = New-Object -TypeName System.IO.StreamReader($fileToSplit)
 $chunk = 1
 $speech = New-Object -TypeName System.Speech.Synthesis.SpeechSynthesizer
 $speech.SelectVoice("Microsoft Hazel Desktop")
 while (($line = $reader.ReadLine()) -ne $null) {
 if ($line -match $splitMarker) {
 Write-WavFile
 $chunk++
 } else {
 Add-Content -Path $(Get-FileName "txt") -Value $line -Encoding utf8
 }
 }
 Write-WavFile
 $reader.Close()
 $reader.Dispose()
 $reader = $null
}
#entry point...
Split-File

Windows 10 Speech: very basic code

No error handling, very dirty, just wanted to get something that produces sound.

MainPage.xaml

<Page
    x:Class="App1.MainPage"
    xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"
    xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"
    xmlns:local="using:App1"
    xmlns:d="http://schemas.microsoft.com/expression/blend/2008"
    xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"
    mc:Ignorable="d">
 
 
 
    <Grid Background="{ThemeResource ApplicationPageBackgroundThemeBrush}">
        <RelativePanel>
            <MediaElement x:Name="media" AutoPlay="False"/>
            <TextBox x:Name="textBox1" Text="My Dear Text" Margin="5"/>
            <Button x:Name="blueButton" Margin="5" Background="LightBlue" Content="ButtonRight" RelativePanel.RightOf="textBox1"/>
            <Button x:Name="orangeButton" Click="orangeButton_Click" Margin="5" Background="Orange" Content="ButtonBelow"
                    RelativePanel.RightOf="textBox1" RelativePanel.Below="blueButton"/>
            
        </RelativePanel>
    </Grid>
</Page>


		
		



using System;
using System.Collections.Generic;
using Windows.Media.SpeechSynthesis;
using Windows.UI.Xaml.Controls;
using Windows.UI.Xaml.Media;
using Windows.ApplicationModel.Resources.Core;
 
// The Blank Page item template is documented at http://go.microsoft.com/fwlink/?LinkId=402352&clcid=0x409
 
namespace App1 {
    /// <summary>
    /// An empty page that can be used on its own or navigated to within a Frame.
    /// </summary>
    public sealed partial class MainPage : Page {
 
        private SpeechSynthesizer synthesizer;
        private ResourceContext speechContext;
        private ResourceMap speechResourceMap;
 
        public static MainPage Current;
        public MainPage() {
            this.InitializeComponent();
            synthesizer = new SpeechSynthesizer();
            speechContext = ResourceContext.GetForCurrentView();
            speechContext.Languages = new string[] { SpeechSynthesizer.DefaultVoice.Language };
            speechResourceMap = ResourceManager.Current.MainResourceMap.GetSubtree("LocalizationTTSResources");
        }
 
        public List<Scenario> Scenarios
        {
            get { return this.scenarios; }
        }
 
        private async void orangeButton_Click(object sender, Windows.UI.Xaml.RoutedEventArgs e) {
 
            if (media.CurrentState.Equals(MediaElementState.Playing)) {
                media.Stop();
            }
            else {
                string text = textBox1.Text.ToString();
                if (!String.IsNullOrEmpty(text)) {
                    // Change the button label. You could also just disable the button if you don't want any user control.
 
 
                    try {
                        // Create a stream from the text. This will be played using a media element.
                        SpeechSynthesisStream synthesisStream = await synthesizer.SynthesizeTextToStreamAsync(text);
 
                        // Set the source and start playing the synthesized audio stream.
                        media.AutoPlay = true;
                        media.SetSource(synthesisStream, synthesisStream.ContentType);
                        media.Play();
                    }
                    catch (System.IO.FileNotFoundException) {
                        // If media player components are unavailable, (eg, using a N SKU of windows), we won't
                        // be able to start media playback. Handle this gracefully
 
                        var messageDialog = new Windows.UI.Popups.MessageDialog("Media player components unavailable");
                        await messageDialog.ShowAsync();
                    }
                    catch (Exception) {
                        // If the text is unable to be synthesized, throw an error message to the user.
 
                        media.AutoPlay = false;
                        var messageDialog = new Windows.UI.Popups.MessageDialog("Unable to synthesize text");
                        await messageDialog.ShowAsync();
                    }
                }
            }
 
 
 
 
        }
 
      
    }
}    


Ref SSML... this worked... and the difference between loud and soft is perceptible:
string Ssml =
     @"<speak version='1.0' " +
     "xmlns='http://www.w3.org/2001/10/synthesis' xml:lang='en-GB'>" +
     "<prosody volume='x-loud'> This is extra loud volume. </prosody>";


This worked:
string Ssml =
               @"<speak version='1.0' " +
               "xmlns='http://www.w3.org/2001/10/synthesis' xml:lang='en-GB'>" +
               "Hello <prosody contour='(0%,+80Hz) (10%,+80%) (40%,+80Hz)'>World</prosody> " +
               "<break time='500ms' />" +
               "Goodbye <prosody rate='slow' contour='(0%,+20Hz) (10%,+30%) (40%,+10Hz)'>World</prosody>" +
               "</speak>";
https://msdn.microsoft.com/en-us/library/windows.media.speechsynthesis.speechsynthesizer.aspx

ref ssml:
https://msdn.microsoft.com/en-us/library/jj127898.aspx

speechy10
Googling the above, see a lot of complaints about this. When I have time I will try this out:

I installed fresh Windows 10 and Visual Studio Community 2015, and the designer failed to load (for MainPage.xaml etc). I had to:

  1. enable developer mode in system settings (update section) as suggested in info dialog
  2. (re)install Visual C++ redistributable for VS 2015

But I don’t know which one exactly resolved the problem… Now the designer loads as expected. (I tried only C# universal app yet)

 

… and generally tidy up the post.

 


					

Windows Speech Synthesis

I’ve been away from this area for a bit. I’m watching this video from Microsoft Visual Academy.

cortana01

Supporting MSDN stuff here. Here is the Speech SDK.

cortana02

So anyway, just diving in…