Vikas Goyal

How to build a Skill for Amazon Echo device on Alexa

Artificial Intelligence has become the hottest buzzword which is attracting everyone. Amazon Echo device is one of the greatest example of that. This smart home device listens to a user’s commands and performs operations.

It’s very trending and very easy to use. For example, let’s say “Alexa, book a cab for me” and the device will act and start booking the cab. Do you know how is it working? How is it able to book a cab via Uber? Initially, it was also surprising for me but when I started exploring, I got inspired to dig further and see how it actually works.
Uber has built an Alexa skill, which is available on Amazon Echo device and Alexa responds to user’s commands through that skill.

You might be thinking, what is an Alexa skill? How to integrate it with Echo device? So in this blog we are going to do two things: first, we will understand what is an Alexa skill and after that, we will integrate an Alexa skill step by step. Let’s start with basic Workflow of Alexa Skills.

Workflow

Amazon echo device is an input output device which responds to user’s commands. It records user’s commands and converts them to text format as an input for amazon sever. For this task, Alexa is using API.AI server.

API.AI is a smart and powerful tool to make your machine smarter. For more details about it you can refer here.

API.AI converts speech input into text format and maps action using Alexa skill configuration (we will get to know about Alexa configuration later in this blog). After that an Alexa server interacts with your skill’s server or lambda function to get result for the action and passes that result to Echo device. Later on, we will also talk about what is lambda function and skill hosting server in this blog.

What is Amazon Alexa skill?

 

So first thing we need to understand here is what is Amazon Alexa skill. In technical terms, Skill is an interface between user’s command and our server, which converts voice speech in the text format actions and our server will provides speech object response for these actions.

And, in general language it is an Amazon echo device’s app, just like other apps we have in our smart phones. The only difference is that, mobile apps works with our finger touch actions and Amazon Echo Skills work with user’s voice commands.

Let’s understand this with an example-

Command- Alexa how is weather today?

Answer from Alexa device can be like this- Rain and storms will continue to move across North Carolina this evening. We don’t have anything severe to worry about, but don’t be surprised if you hear any rumbles of thunder. Temperatures tonight will stay mild, only dropping into the 60’s and the 70’s.

Don’t you think this is interesting. For listening to weather forecast, I don’t even have to touch my smart phone, or turn on the TV, but it gets communicated to me by simple voice command.

But we have some questions here, how Echo device comes up with an answer? How it is able to understand my question? How it is responding to my questions like a human?

Because, Alexa device is having an integrated skill for “weather forecast” which understands user’s commands and provides the answers for them.
The skill gets the action for the command and it uses its server to fetch the result to perform action or convert that result into speech object, whatever is required.

Now let’s start building an Alexa skill, step-by-step.

Step-by- Step build an Alexa skill

After getting the introduction about Alexa skill now our main job is to build an Alexa skill. So start with signing in Amazon developer account. After login you will find the tab for Alexa, just click on it and initiate a process to create Alexa skill on Amazon developer console.

 

Alexa skill configuration is divided in following 6 steps-

  1. Skill information
  2. Interaction Model
  3. Configuration
  4. Test
  5. Publishing Information
  6. Privacy & Compliance

Skill Information

We need to provide basic Skill information to create an Alexa skill, like name of skill, invocation name. Keep Audio player no for your first skill.

Interaction Model

Interaction model has all the configuration which is used by Alexa skill to understand a voice input or we can say user commands which can be possibly answered by echo device.

Recently Amazon launched a SkillBuilder interface where you can easily setup the interaction model configuration. Currently, it is in beta version so I am not discussing it here.

Before the interaction model configuration, we need to understand what are utterances (schema for user’s commands) and how do they bind with Intents and Slots. The term Intent refers here to an action and slots are parameters for that action.

Utterances and how will they convert into actions?

As I already mentioned Utterances can be like our general question or request which we usually want to ask. For example, if we are developing a skill for restaurant system then commands can be like-

  • Order a Pizza for me.
  • What is the delivery time for my order number xxx?
  • What is the total cost for my order number xxx?

These utterances we need to define on Amazon console with the mapped action.

In this blog, I am going to take the example of building skills for JIRA-server. It will fetch the result for project status and JIRA ticket’s status.

Utterances Example-
  • ProjectStatus how many tickets are open in {Project}
  • ProjectStatus what is the status for ticket {TickeedtNumber} in {Project}
  • ProjectStatus what is the current status for ticket {TicketNumber} in {Project}

Above we have defined 3 Utterances which are basically commands to get status for JIRA tickets.

Let’s understand the utterance- ProjectStatus how many tickets are open in {Project}

In above utterance example ProjectStatus is an Intent or in other word it is an action,  {Project} is a slot or parameter for the Intent. Every Slot is a map with a data type or slot type, we can define our own slot type to map a slot.

So now question is, how these commands give the actual result for the asked ticket number? How Alexa device will identify, what is the request object in lambda function call?

It’s because an Alexa device maps the command with the intent and slots.

Intents are actions which define the command type and slots are value or parameter of that intent action.

 

How to define Intent Action and slot?

First define an Intent Schema, sample for intent schema can be found here.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
{
  "intents": [
    {
      "slots": [
        {
          "name": "Project",
          "type": "PROJECT"
        },
        {
          "name": "TicketNumber",
          "type": "AMAZON.NUMBER"
        }
      ],
      "intent": "ProjectStatus"
    },
    {
      "intent": "AMAZON.HelpIntent"
    }
  ]
}

 

In above Intent Schema example, we have defined Intent for ProjectStatus with slots Project and TicketNumber. TicketNumber slot type is AMAZON.NUMBER and Project slot type is PROJECT.

AMAZON.NUMBER is data type which is provided by amazon, but PROJECT is custom slot type which we need to define.

In interaction model tab we can define own data type which is basically “Custom Slot Types” so amazon skill can find and map the value of command from defined slot types.

How to define Custom Slot type?

Under Custom Slot type by clicking Add Slot Type you will see a window like this-

In Enter Type provide the slot name and in Enter Values you can provide the value of this slot type which will be accepted by your server.

For example, the command for my utterance “ProjectStatus how many tickets are open in {Project}” will be- Alexa, ask StoryBoard how many tickets are open in StudentNest.

The command has two things, first one is StoryBoard and second is StudentNest.

StoryBoard is the name of skill, Echo device maps the command with defined utterances and extracts the data from it. In our example it will find the command under ProjectStatus intent with the PROJECT value StudentNest.

How does it map the StudentNest with Project slot? Because we have defined the possible value of slot and it picked and mapped the project name from there. If you want to add more than one value for PROJECT, you can just enter the multiple value in new line of slot value.

Like I have further used Intent for DeveloperStatus and for which I have defined USERNAME with multiple values. See in attached screen-

When you are ready with Intent and slot, you can move to the next step.

Configuration

For configuration, you need to provide server code which understands the intent and slots in request object and provides speech object for that to Alexa.

There are two options available in which you can provide server interaction, first one is lambda function and second is own hosted server address.

You can create a lambda function on Amazon AWS console and upload a node.js project. Once you provide the entry point on Alexa skill, the lambda function is going to be called by Alexa skill with request object and function will provide response on that in speech object.

You can also provide your own hosted server, which behaves in the same way as lambda function.

I have used the lambda function as it can be easily configured with skill just by providing the lambda function id and Alexa skill can start interaction with it. You can find the sample code for lambda function on github, which I have used in example skill.

Now we need to understand two simple things to create lambda function- request & response. As a request object in our lambda function we will get an Intent in form of JSON object. To create JSON response for skill server, it maps slot value which is analysed from user commands using our Alexa skill configuration. For this request object, we need to provide the SSML speech as response.

In the github repo, to understand how to build speech object you can directly refer to AlexaSkill.js class.

For Example-

Command- “Alexa, ask StoryBoard how many tickets are open in StudentNest”

Request Object-

1
2
3
4
5
6
7
8
9
{
  "slots": [
    {
      "name": "Project",
      "type": "StudentNest"
    }
  ],
  "intent": "ProjectStatus"
}

After processing this request object your lambda function will provide the SSML speech as response like-

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
{
  "version": "1.0",
  "response": {
    "outputSpeech": {
      "type": "SSML",
      "ssml": "<speak> There are <break time='0.5s'/> " + "<say-as   interpret-as='digits'>" +" 20 </say-as> open tickets are found in Student Nest</speak>"
    },
    "shouldEndSession": true
  }
}

After providing the skill server in Alexa configuration, we can move to next step.

Test

Once you are done with Skill Configuration, you can enable testing mode for skill. It will allow you to test the skill with your developer account without publishing. You can also add co-developer account from developer console setting.

Recently Amazon has also added Skills Beta Testing option in developer console where you can also add testers emails. It is another good option to provide testing environment for a skill, which was not available earlier.

In Skill Beta Testing you can also test your skill server.

In Service Simulator, you can enter the command in text format and check the response from skill server to validate if you are getting the desired response. You can try all possible combinations of defined utterance.

For example, I enter the command- Alexa, ask JiraTracker How many open tickets are in project StudentNest, so the output can be something like this on service simulator.

You can also listen to the response output by pressing Listen” button on the bottom-right. Same thing you can do in echo device by login with testing account and asking all possible combination of defined utterance one-by-one.

Publishing Information

First you need to select the skill category and sub-category. After that you should provide testing instructions to amazon for their developers to test your skill before approval. This testing part will not be visible to users.

Select the country and region where skill will be available. After that, provide short and long description of skill.

You also need to provide example phrases which is basically command sample that can be asked from skill, like-

  • Alexa, ask StoryBoard what is the status of project StudentNest
  • Alexa, ask StoryBoard how many open tickets are in StudentNest 1001

Next, you can also configure the search keyword for skill and last thing, in this tab you need to provide skill logo in two sizes- one is 108*108 and other is 512*512.

Privacy & Compliance

Read and agree with terms and conditions so you can publish the skill, make sure your skill is not breaking any Amazon terms and conditions and you are selecting the right option for each question.

Amazon publishing process is something similar to Apple, it usually takes 3-4 working days and you can get rejection for breaking any terms and conditions. So make sure you read all the policies carefully and create the skill accordingly.

Hope you can now develop some exciting amazon Alexa skills. For any further query please drop the comments below. You can also refer the amazon link for developing Alexa skill from here.

 

Related Articles

#Tech

NHibernate, Linq and Lazy Collections

For the past few months we have been busy building a services framework for one of our clients so that they can build all of their enterprise web services without ever having to worry about the cross cutting concerns and... Read more
#Tech

Page Redirects using Spring.NET

Who is responsible for page redirects in ASPNET MVP – The View or the Presenter None of the above, it is you :) On a serious note, it is the View as one shouldn’t pollute the Presenter with the page navigation... Read more