Windows 10 Speech: speaking and storing as audio

This both speaks and stores as audio (wav) the passed text:


 

The code underpinning that:


 

using Windows.UI.Xaml.Controls;
using TextToSpeech;
namespace App1 {
public sealed partial class MainPage : Page
 {
 public MainPage()
 {
 InitializeComponent();
 var si = new SpeakIt();
 var textToSpeak = " I have a high respect for your nerves";
 SpeakIt.ReadText(textToSpeak);
 si.StoreText(textToSpeak);
 }
 }
}

 

 

using System;
using Windows.UI.Xaml.Controls;
using Windows.Media.SpeechSynthesis;
using System.Linq;
using Windows.Storage;
using Windows.Storage.Streams;
using System.Threading.Tasks;
namespace TextToSpeech
{
 public class SpeakIt
 {
 private const string PreferredVoice = "Susan";
 private const int BufferSize = 4096;
 private SpeechSynthesizer _synthesizer = new SpeechSynthesizer();
public SpeakIt() {
 SetPreferredVoice();
 }
public static async void ReadText(string mytext) {
 // requires the using Windows.UI.Xaml.Controls namespace...
 var mediaPlayer = new MediaElement();
using (var speech = new SpeechSynthesizer()) {
 speech.Voice = SpeechSynthesizer.AllVoices.First(voice => voice.Id.Contains(PreferredVoice));
 var stream = await speech.SynthesizeTextToStreamAsync(mytext);
 mediaPlayer.SetSource(stream, stream.ContentType);
 mediaPlayer.Play();
 }
 }
 
 public async void StoreText(string myText) {
 var synthesisStream = await _synthesizer.SynthesizeTextToStreamAsync(myText);
 var sf = await CreateLocalFile($"{Guid.NewGuid()}.wav");
 await SaveSpeechStreamToStorageFile(synthesisStream, sf);
 }
private static async Task<StorageFile> CreateLocalFile(string fileName) {
 // https://msdn.microsoft.com/en-gb/library/windows/apps/br227251
 var sfo = ApplicationData.Current.LocalFolder;
 var sf = await sfo.CreateFileAsync(fileName); 
 return sf;
 }
private static async Task SaveSpeechStreamToStorageFile(SpeechSynthesisStream synthesisStream, StorageFile sf) {
 var writeStream = await sf.OpenAsync(FileAccessMode.ReadWrite);
 var outputStream = writeStream.GetOutputStreamAt(0);
 var dataWriter = new DataWriter(outputStream);
 var buffer = new Windows.Storage.Streams.Buffer(BufferSize);
while (synthesisStream.Position < synthesisStream.Size) {
 await synthesisStream.ReadAsync(buffer, BufferSize, InputStreamOptions.None);
 dataWriter.WriteBuffer(buffer);
 }
 dataWriter.StoreAsync().AsTask().Wait();
 outputStream.FlushAsync().AsTask().Wait();
 outputStream.Dispose();
 writeStream.Dispose();
 }
private void SetPreferredVoice() {
 _synthesizer.Voice = SpeechSynthesizer.AllVoices.First(voice => voice.Id.Contains(PreferredVoice));
 }
 }
}

		
Advertisements

UWP / Speech

Pretty much just screenshots. In practice Speech remains as-is from Phone 8.1 as far as I can see. On this occasion I moved my VSFF to a VM running W10 Enterprise, as you need that for the emulators. However although on a Windows 8.1 install in 2014 I could get the emulator working, this time I could not.

I had not previously grasped that whereas the previous generation of Speech (references e.g. the “Hazel” voice) manage both the speech creation and the audio “rendering”, the new generation (references e.g. the “Susan” voice) does not: typically you can create the speech stream in a non-e.g. XAML context, but to render you typically use MediaElement… which uses a XAML page. Terrible generalisations on my part, but I can see it is probably related to a wish to be SOLID.

Anyhoo, these are the key takeaways for me right now… real simple, but proves the point (what point? 🙂 ). The use of the voice “Susan” in the snippet assumes that you have installed it.

using Windows.UI.Xaml.Controls;
using TextToSpeech;
namespace App1 {
public sealed partial class MainPage : Page
 {
 public MainPage()
 {
 InitializeComponent();
 SpeakIt.ReadText("this is a test; oh yes");
 }
 }
}

 

using System;
using Windows.UI.Xaml.Controls;
using Windows.Media.SpeechSynthesis;
using System.Linq;
namespace TextToSpeech
{
 public static class SpeakIt
 {
 private const string PreferredVoice = "Susan";
public static async void ReadText(string mytext) {
 MediaElement mediaPlayer = new MediaElement();
using (var speech = new SpeechSynthesizer()) {
 speech.Voice = SpeechSynthesizer.AllVoices.First(voice => voice.Id.Contains(PreferredVoice));
var x = SpeechSynthesizer.AllVoices.ToList();
 foreach (var item in x) {
 var x1 = item.Description;
 var x2 = item.DisplayName;
 var x3 = item.Gender;
 var x4 = item.Id;
 var x5 = item.Language;
 }
SpeechSynthesisStream stream = await speech.SynthesizeTextToStreamAsync(mytext);
 mediaPlayer.SetSource(stream, stream.ContentType);
 mediaPlayer.Play();
 }
 }
 }
}

This is a useful MVA video on 8.1 (i.e. 10) speech.

Just the screenshots I took while trying to get a result – may of them are very close to those from 1.5 years back, the only difference being Windows 10 now, Windows 8.1 then: