How to build your own Python Voice Assistant | thecodingpie

Are you ready to build your own voice assistant like Siri, Alexa, or like Jarvis? In this tutorial, you will learn how to code your own voice assistant using Python.

thecodingpie . . 10 min read . 7.3K Hits

Are you interested in building your own virtual voice assistant like Jarvis in the movie Iron Man? If you are interested in building one, then you have come to the right place.

Howdy folks, In this tutorial, you will learn how to build your own personal voice assistant like Jarvis using Python.

You can download the finished project code from my Github repo -  Final Version.

Now before getting started, let's understand what we are going to build...

Understanding What we are going to build?

The speech recognition program which we are going to build will be able to recognize these commands:

  • name - tells its name.
  • date - tells the date.
  • time - tells the current time.
  • how are you? - will say "I am fine...".
  • search - will search using Google.
  • and finally, if we say "quit" or "exit", it will terminate.

To achieve all these functionalities, we are going to use mainly 3 python modules:

  • SpeechRecognition - to recognize our speech and to convert it into text format using Google's Web Speech API.
  • PyAudio - for accessing and working with the Microphone.
  • pyttsx3 - for converting given text to speech(ie for generating computer voice)

How we are going to build this?

It's basically very simple. We need to create only 3 functions and that's it!

  • The first function, recognize_voice(), will be responsible for capturing our voice (which we input through the Microphone), recognizing it, and returning the "text"  version of it.
  • Then we will take that "text" version of our voice and give it to another function called reply(), which will be responsible for replying back to us and doing all sorts of other crazy things (like searching google, telling the current time, etc.).
  • Finally, a function called speak(), which will take whatever text we give it and converts it into speech.

We will repeat the above functions infinitely until the user says "quit" or "exit".

Requirements

  • You should be good at python3.
  • You should have python3.3 or a higher version installed on your computer.
  • You should have venv installed. If you are using Python 3.3 or newer, then the venv is already included in the Python standard library and requires no additional installation.
  • You should have a microphone (your laptop's builtin one or the one on your earphone will do the job)
  • You should need an Internet connection.
  • Finally, you should have a modern code editor like visual studio code.

With these things in place, let's get started.

Initial Setups

  • First, create a folder named voice_assistant anywhere on your computer.
  • Then open it inside visual studio code.

Now let's make a new virtual environment using venv and activate it. To do that:

  • Open Terminal > New Terminal.
  • Then type:
python3 -m venv venv

This command will create a virtual environment named venv for us.

  • To activate it, if you are on windows, type the following:
venv\Scripts\activate.bat
  • If you are on Linux/Mac, then:
source venv/bin/activate

Now you should see something like this:

venv activated
This means you have successfully activated your virtual environment

Note: Virtual environments like venv help us to keep all the dependencies related to the current project in its own environment isolated from the main computer. That's one of the main reasons why we are using it.

  • Finally, create a new file named "main.py" directly inside the voice_assistant folder like below:
option to create a new file inside visual studio code
Click on this Icon and create the file
  • Now you will have something similar to this:
visual studio code with the new main.py file
main.py

That's it, now let's install those required modules.

Installing the requirements

For recognizing our voice and converting it into text, we need some additional module like SpeechRecognizer, so let's install it. Type the following command in the terminal:

pip install SpeechRecognition

Now If you are using the Microphone as the input source, in our case we are, then we need to install the PyAudio package

The process for installing PyAudio will vary depending on your operating system.

For Linux:

sudo apt-get install python-pyaudio python3-pyaudio
pip install pyaudio

If you are on Mac:

brew install portaudio
pip install pyaudio

If you are on Windows:

pip install pyaudio

If you got any errors installing PyAudio on Windows, then refer to this StackOverflow solution. If you are on different machines, then try to Google the error. If you still got those errors, then feel free to comment below.

Once you’ve got PyAudio installed, you can test the installation from the terminal by typing this:

python -m speech_recognition

Make sure your default microphone is on and unmuted. If the installation worked, you should see something like this:

Success message

If you are using Ubuntu, then you may get some errors of the form "ALSA lib [...] Unknown PCM" like this:

ALSA errors

To suppress those errors, see this Stackoverflow answer.

Now to give the program the ability to talk, we have to install the pyttsx3 module:

pip install pyttsx3

pyttsx3 is a Text to Speech (TTS) library for Python 2 and 3. It works without an internet connection or delay. It also supports multiple TTS engines, including Sapi5, nsss, and espeak.

That's it, we have installed and set up all the pre-requirements. Now it's time to write the program itself, so let's do that.

recognize_voice()

First of all, let's import all the necessary imports.

Type the following code inside the main.py file:

# all our imports
import speech_recognition as sr
from time import sleep
from datetime import datetime
import webbrowser
import pyttsx3


  • First, we are importing the speech_recognition module as sr.
  • Then we are importing the sleep() function from the time module. We will use this in a bit to make a fake delay.
  • Then for knowing the current date and time, we need that datetime module.
  • Then to open up a browser and do a google search, we need the help of the webbrowser module.
  • Then as I said earlier, to convert text to speech, we need pyttsx3.

All of the magic in SpeechRecognition happens with the Recognizer class. So let's instantiate it next:

# make an instance of Recognizer class
r = sr.Recognizer()


Now configure the pyttsx3:

# confs for pyttsx3
engine = pyttsx3.init()


  • pyttsx3 will be responsible for generating the computer voice. To see/hack the gender, age, speed, etc. of the generated computer voice, read this description.

Now let's create that recognize_voice() function. This recognize_voice() function will do the following:

  • listens to our Microphone.
  • recognize our voice with the help of recognize_google() function.
  • converts it into text format.
  • And then returns that text version of our voice.

Create the recognize_voice() function like below:

""" fn to recognize our voice and return the text_version of it"""
def recognize_voice():
  text = ''

  # create an instance of the Microphone class
  with sr.Microphone() as source:
    # adjust for ambient noise
    r.adjust_for_ambient_noise(source)

    # capture the voice
    voice = r.listen(source)

    # let's recognize it
    try:
      text = r.recognize_google(voice)
    except sr.RequestError:
      speak("Sorry, the I can't access the Google API...")
    except sr.UnknownValueError:
      speak("Sorry, Unable to recognize your speech...")
  return text.lower()


  • If some error happens like if your Internet connection is bad, then it will just speak() the appropriate message.

Remember that the speak() function is not a builtin function. We have to create it and we will do it at the end because it is a small function.

And also remember that this speak() function will convert the given text to speech(the computer-generated voice).

Now at the very bottom of the file, type the following:

# wait a second for adjust_for_ambient_noise() to do its thing
sleep(1)

while True:
  speak("Start speaking...")
  # listen for voice and convert it into text format
  text_version = recognize_voice()

  # give "text_version" to reply() fn
  reply(text_version)
  • After making a delay of 1 second, we start an infinite loop.
  • Then speak() the message "Start speaking...", which will be like a prompt for the end-user.
  • Then we listen for the voice and convert it into text format using the recognize_voice() function which we just created.
  • Now we have the text_version of our inputted speech. So we can use this to generate responses like telling the date, current time, searching the google like that according to what we asked for.
  • That's what the reply() function is going to do.

Now let's create that reply() function.

reply()

This function will accept text_version as an argument and then act accordingly. Type the following code below the recognize_voice() function which we created earlier:



""" fn to respond back """
def reply(text_version):
  # name
  if "name" in text_version:
    speak("My name is JARVIS")
  
  # how are you?
  if "how are you" in text_version:
    speak("I am fine...")

  # date
  if "date" in text_version:
    # get today's date and format it - 9 November 2020
    date = datetime.now().strftime("%-d %B %Y")
    speak(date)

  # time
  if "time" in text_version:
    # get current time and format it like - 02 28 
    time = datetime.now().time().strftime("%H %M")
    speak("The time is " + time)
  
  # search google
  if "search" in text_version:
    speak("What do you want me to search for?")
    keyword = recognize_voice()

    # if "keyword" is not empty
    if keyword != '':
      url = "https://google.com/search?q=" + keyword

      # webbrowser module to work with the webbrowser
      speak("Here are the search results for " + keyword)
      webbrowser.open(url)
      sleep(3)
  
  # quit/exit
  if "quit" in text_version or "exit" in text_version:
    speak("Ok, I am going to take a nap...")
    exit()
  • See it's very simple. All we are doing is just checking if "any_piece_of_text" is present in the given text_version. If we found any of those certain texts which we are looking for, then we will act accordingly like speak() -ing the current time, or date, searching the Google by opening the webbrowser like that.
  • Again see, we are using the speak() function, but haven't created it yet. And that's what we are going to do next.

speak()

Type the following code above/below reply() function:


""" speak (text to speech) """
def speak(text):
  engine.say(text)
  engine.runAndWait()
  • Pretty straight forward isn't it? Here we are using the engine, we earlier instantiated, to say() the text we give. And that's the only thing we are doing inside the speak() function. 

That's it you have successfully created your own python voice assistant in a matter of time! 

Now let's test it. Type the following code inside the terminal window at the bottom:

python main.py

Go on, ask a few questions like "What is your name?", "What is the date today?", "Search Google" like that.

Have fun with it...

Final Code

Here is the final version of the main.py file. If you got any error, then cross-check your code with the following one:

# all our imports
import speech_recognition as sr
from time import sleep
from datetime import datetime
import webbrowser
import pyttsx3


# make an instance of Recognizer class
r = sr.Recognizer()


# confs for pyttsx3
engine = pyttsx3.init()


""" speak (text to speech) """
def speak(text):
  engine.say(text)
  engine.runAndWait()


""" fn to recognize our voice and return the text_version of it"""
def recognize_voice():
  text = ''

  # create an instance of the Microphone class
  with sr.Microphone() as source:
    # adjust for ambient noise
    r.adjust_for_ambient_noise(source)

    # capture the voice
    voice = r.listen(source)

    # let's recognize it
    try:
      text = r.recognize_google(voice)
    except sr.RequestError:
      speak("Sorry, the I can't access the Google API...")
    except sr.UnknownValueError:
      speak("Sorry, Unable to recognize your speech...")
  return text.lower()


""" fn to respond back """
def reply(text_version):
  # name
  if "name" in text_version:
    speak("My name is JARVIS")
  
  # how are you?
  if "how are you" in text_version:
    speak("I am fine...")

  # date
  if "date" in text_version:
    # get today's date and format it - 9 November 2020
    date = datetime.now().strftime("%-d %B %Y")
    speak(date)

  # time
  if "time" in text_version:
    # get current time and format it like - 02 28 
    time = datetime.now().time().strftime("%H %M")
    speak("The time is " + time)
  
  # search google
  if "search" in text_version:
    speak("What do you want me to search for?")
    keyword = recognize_voice()

    # if "keyword" is not empty
    if keyword != '':
      url = "https://google.com/search?q=" + keyword

      # webbrowser module to work with the webbrowser
      speak("Here are the search results for " + keyword)
      webbrowser.open(url)
      sleep(3)
  
  # quit/exit
  if "quit" in text_version or "exit" in text_version:
    speak("Ok, I am going to take a nap...")
    exit()


# wait a second for adjust_for_ambient_noise() to do its thing
sleep(1)

while True:
  speak("Start speaking...")
  # listen for voice and convert it into text format
  text_version = recognize_voice()

  # give "text_version" to reply() fn
  reply(text_version)

 

Wrapping Up

I hope you enjoyed this tutorial. In some places, I intentionally skipped the explanation part. Because those codes were simple and self-explanatory. That's why I left it to you to decode it on your own.

True learning takes place when you try things on your own. By simply following a tutorial won't make you a better programmer. You have to use your own brain. 

 If you still have any error, first try to decode it on your own by googling it. 

If you didn't find any solutions, then only comment on them below. Because you should know how to find and resolve a bug on your own and that's a skill that every programmer should have!

And that's it, Thank you ;)

About Me

Hey folks, my name is Aravind, and I am the man behind this website. To know more about me, check out the About Me page. If you like and enjoy my content, then please consider supporting what I do through - Buy Me a coffee.

Comments(19)
Wiktor on Nov 11, 2020
Wow, great, I'm looking forward to do this tomorrow!
SH4DOWM3CHA on Nov 12, 2020
Couldn't you use a switch/case function instead of doing an if function for every keywords, or an array, or something a bit smaller? Or is it not possible?
thecodingpie on Nov 12, 2020
@SH4DOWM3CHA, hey it's totally possible to use switch case. Yes, you can do that.
Edouard on Nov 12, 2020
Hello, It seems that I have some compatibility issues with python 3.7.6 and pyttsx3. : TypeError: item 1 in _argtypes_ passes a union by value, which is unsupported. Ed
thecodingpie on Nov 12, 2020
Hey @Edouard, why shouldn't you just try upgrading/downgrading your python's version?
Uwe on Nov 14, 2020
Hi, great work :) editing in VSC and running in the mac terminal it works. But not in the VSC terminal. It cannot find the speech_recognition module at import. Do you have an idea why? Thanks a lot. Uwe
thecodingpie on Nov 15, 2020
Hey @Uwe, did you forgot to activate the virtual environment while working in the VSC terminal?
Uwe on Nov 15, 2020
Hi @thecodingpie, I did activate it in both the mac and VSC terminal and it also shows it in the VSC terminal with (venv) (base) ...... If I comment this import it out, it fails with pyttsx3 too -strange
thecodingpie on Nov 15, 2020
Hey @Uwe, can you please paste the whole error here (or at least the important piece) ? Then, it will be easy to debug it...
Uwe on Nov 16, 2020
thanks a lot in advance. here we go: [Running] python -u "/Users/uwe/Python-Projects/voice_assistant/main.py" Traceback (most recent call last): File "/Users/uwe/Python-Projects/voice_assistant/main.py", line 6, in <module> import speech_recognition as sr ModuleNotFoundError: No module named 'speech_recognition' [Done] exited with code=1 in 0.069 seconds
thecodingpie on Nov 16, 2020
Hey @Uwe, after activating the virtual environment try again installing SpeechRecognition using - pip install SpeechRecognition
Uwe Sommer on Nov 16, 2020
Hi @thecodingpie, did try that all ready and it (not surprisingly :) says: Requirement already satisfied: SpeechRecognition in ./venv/lib/python3.7/site-packages (3.8.1) I can't figure out why it can't find it (same with pyttsx3 the others work) but ok in Mac terminal it works so it be :) Thanks a lot
thecodingpie on Nov 17, 2020
Hey @Uwe, please make sure if you have anaconda installed, then you 'deactivate' d it. And also make sure you are creating the main.py file outside venv folder. If you still have the problem, then try this link - https://github.com/Uberi/speech_recognition/issues/294
Uwe Sommer on Nov 17, 2020
Hi @thecodingpie, did try that all ready and it (not surprisingly :) says: Requirement already satisfied: SpeechRecognition in ./venv/lib/python3.7/site-packages (3.8.1) I can't figure out why it can't find it (same with pyttsx3 the others work) but ok in Mac terminal it works so it be :) Thanks a lot
Anubhav on Nov 18, 2020
Hello and Thanks much for the code. In my case, the program has no problem but when he says "Start Speaking ", I say but he doesn't respond. My microphone is also open at that time
thecodingpie on Nov 18, 2020
Hey @Anubhav, that may be because of maybe your network problem or it may be because of the noise in the audio. Check your code and keep trying.
MichielDL on Nov 20, 2020
Hello, (almost) totally new to coding here. I'm using python 3.8.3 and it seems there isn't a compatible "wheel" for PyAudio. Is there something i can do about this? Or where do I find a compatible package
thecodingpie on Nov 20, 2020
Hey @MichielDL, download - PyAudio‑0.2.11‑cp38‑cp38‑win_amd64.whl from here - https://www.lfd.uci.edu/~gohlke/pythonlibs/#pyaudio. Then do - pip install PyAudio‑0.2.11‑cp38‑cp38‑win_amd64.whl - from the directory where you downloaded with the virtual environment activated. I hope you are on a 64bit computer, otherwise, find suitable version of PyAudio.whl from the website then install it using pip.
Michiel De Ley on Nov 20, 2020
Thanks, it's fixed! by the way it's really awesome that you're doing this and thanks for replying so fast
Leave Your Comments

Similar Posts

Let's Build a Web Scraper with Python & BeautifulSoup4

. 15 min read . 1.5K hits

A Quick Guide on How to Setup a Python Virtual Environment [Windows, Linux & Mac]

. 4 min read . 872 hits

10 Best Python Courses Online | Learn Python Online | 2020

. 13 min read . 2.3K hits

Make your own text based adventure game in Python3

. 13 min read . 2.2K hits

Number guessing game | Python3

. 8 min read . 1.5K hits