How to Script Interactive Programs or TUIs on Python with pexpect

If I had a interactive program or shell like bash, how do I automate and script what I want to do with Python?

There’s a useful pip module called pexpect that you can install.

The idea of pexpect originated from a programming language called Expect that automates interactions with programs that expose a text terminal like ftp, ssh, etc.

Think of any program that opens a session that you have to exit!

Prerequisites

  1. Python 2.7 or Python 3.3 or above
  2. pip install pexpect
  3. If you get lost, read the documentation.

Now, that you have installed pexpect, we can make a simple program as an example.

 

bash example. Sending one command.

Let’s say that I wanted to automate a new session of bash and use ls.

bash works a little differently than other interactive programs since it has a flag that allows input.

  1. You spawn the interactive program with pexpect.spawn("/bin/bash -c ls").
  2. You expect what comes after the command like \r\n (meaning new line) or the next command you know you’ll send.
  3. child.before is all the output before what we expect. We can print or use it in an if clause.

 

ssh example. Knowing what exactly we’re expecting. Sending multiple commands.

bash was a special example since the program allows input with its -c flag.

What if the program doesn’t have such a flag?

Let’s say that I wanted to automate a command on another machine with ssh.

Not just one command. Multiple commands.

Manual
  1. ssh into the machine.
  2. Create a file called slothparadise with touch.
  3. ls the current folder.

Automated

When we use pexpect to automate the manual process, we take a look at what we expect to happen.

We pinpoint what we will see after the command.

For ssh, it’s easy to see that after each command, we see [email protected]:~.

We can use regex or in this case, it’s easy to use the word as it is.

 

regex example

You can expect uncertain output with regex.

If you don’t know what regex is, it is short for regular expression, which matches ranges of characters or words.

Let’s use regex with the ssh example.

Instead of expecting [email protected]:~, we give room for expecting other users like [email protected]:~.

Who knows? The user might be on root!

If you’re not familiar with regex, then I recommend to double-check your regex with this website.

The regex detects the highlighted blue.

  1. We create slothparadise2.
  2. We list the contents of the directory with ls

Instead of the word, I used the regex expression, and it works the same as before.

Those are the basics of pexpect! You can expand on these examples, and you’ll be on your way to scripting those tiresome interactive programs with an easy-to-use language like Python!

Largest Prime Factor

Project Euler is a website with a series of math problems that can be solved with programming. Problem 3 is about finding the largest prime factor of a number.

Although my first impression is that the problem looks simple since I can iterate through the possibilities one by one and find the largest prime factor.

The problem is that we’re finding the largest prime factor!

The largest prime factor of a very large number can be as big as the number itself, and I can run of memory if I test every number one by one!

I need to do something more.

What can we do to speed up the process of finding the largest prime factor?

 

A decent solution

I like to use Python, so I’ll show a decent answer in Python.

def lpf(a):
    b = 2
    while (a > b):
        if (a % b == 0):
            a = a / b;
            b = 2;
        else:
            b += 1;
    print("Largest Prime Factor: %d" % (b))

I start b at 2 because we aren’t considering 1. If there aren’t any factors at all, then the largest prime factor would be the number itself.

The largest prime factor of a prime number is the number itself.

We have to consider composite numbers though.

I know that the condition for finding the largest prime factor of a composite number is that the factor has to be lower than the number that we’re testing.

a = the number
b = largest prime factor
a > b if we're dealing with a composite number.

To check if a big number is divisible by a smaller number, you can check if the division equals 0 with modulo.

a % b == 0

If a is divisible by b, then we can break down a by dividing by b.

Every time we divide a by b, the smaller a becomes a factor of the original a.

We keep redoing this process of dividing the composite numbers by b because we will eventually hit a point where we cannot break down a anymore.

At the point where we find an a that cannot be divided cleanly is when we get the prime number that we’re looking for.

Why does this work? I’ll try to explain as best as I can in text.

We reset b to 2 constantly because b is just a tool to check for a's primality.

If the current a manages to be divisible by a b, we divide the a, and thus, we have to check the new a’s primality.

Let’s say we have 10.

10 % 2 == 0.

Now, let’s break down 10 to 5 because we divide a by the divisible number, b, which happens to be 2.

5 % 2 != 0
5 % 3 != 0
5 % 4 != 0
5 == 5, so here we stop.

5 is the largest prime factor.

If a is not prime, then we break a down by b. To reiterate, b is a tool for us to check primality.

If the a happens to be prime, then we know that the number is the largest prime factor since we’re walking through the problem backwards from larger to smaller prime factors of a.

We break down the root that is a into smaller a's by dividing by b until a is a prime number.

Every a on each division will certainly be a factor of the original a.

Since the while loop checks if every a is prime, we will know that a will eventually be prime. We’re going backwards from the original largest a to smaller and smaller a’s!

We will eventually reach the largest prime factor when a is no longer divisible by any b.

What is the largest prime factor of the number 600851475143 ?

Answer: 6857

Time complexity: log b (n)

How to install Pip as a User

I don’t have root access on this Macintosh that I’m using. How am I supposed to get the pip package manager for Python?

Python comes installed on every Macintosh.

python --version
Python 2.7.10

Step 1) Get easy_install

easy_install is a Python utility for installing certain Python related packages.

You might already have easy_install! To check:

easy_install

If you don’t have easy_install, you can install easy_install for the user.

wget --no-check-certificate https://bootstrap.pypa.io/ez_setup.py -O - | python - --user
Installing easy_install script to /Users/.../Library/Python/2.7/bin
Installing easy_install-2.7 script to /Users/.../Library/Python/2.7/bin

Step 2) Install pip as user

easy_install --user pip
Searching for pip
Best match: pip 8.1.2
Adding pip 8.1.2 to easy-install.pth file
Installing pip script to /Users/.../Library/Python/2.7/bin
Installing pip3.5 script to /Users/.../Library/Python/2.7/bin
Installing pip3 script to /Users/.../Library/Python/2.7/bin
Using /Users/.../Library/Python/2.7/lib/python/site-packages
Processing dependencies for pip
Finished processing dependencies for pip

Step 3) pip install –user

You will be able to install pip packages as the user.

pip install --user xlsx2Csv
Collecting xlsx2Csv
Installing collected packages: xlsx2Csv
Successfully installed xlsx2Csv-0.7.2

Let me know if you encounter any problems while installing pip as a user.

Why do you need Unicode? Encoding in Python

Why do you need unicode? This article’s purpose is to tell you the purpose of unicode and briefly, how it is used in Python.

Let’s start from square one. As you know, bytes compose all of the characters, strings, and values that we have on the machine. When you use a computer, for instance, you see the world at a much higher level, and for a language like Python, you normally don’t have to worry about every single byte and the translation of words into the right bytes.

BUT, English isn’t good enough. More than half the world uses non-Latin characters. ASCII? Ever heard of it? Unicode came to provide every character from all languages unique numbers called code points.

Unicode is the set of all characters used in the world, and it has two main encodings UTF-8 and UTF-16. Think of unicode as the alphabet with different translation tables. Encodings are the translation tables where you see and use the understandable end like a string called “dog,” and behind the scenes, a language like Python uses the encoding to properly translate the bytes.

These encodings, decodings, or translation tables have the ability to translate mathematical symbols and Chinese characters, which is you should come to appreciate unicode. Consider the range of characters that exist beyond the English language!

Most people use UTF-8 (because it includes mostly every character) to work with encoding and decoding values especially for character and string data types.

 

Now to Python

Two types of strings exist in Python, byte strings and unicode strings.

When you’re working with a language like Python, you have byte strings, which are strings having every element as a byte. Whereas a unicode string has every element as a character on these unique numbers called code points.

 

Why the two?

Byte strings are used for writing to files, transferring to networks, etc. while unicode strings can be used to manipulate and translate to any character that exists on the planet. You’re always sending byte strings around, and in the actual program, when writing, you tend to make changes to the unicode type string.

For a language like Python, you’re usually operating on the unicode strings, and Python encodes the output of your manipulation to whatever correct byte layout translation your terminal application is using.

 

Great, it’s automatic?

Not quite. There are specific scenarios where Python can’t encode or decode output automatically. For example, pipes need you to encode manually. Another big example is across networks. You may get funky byte strings over the web if you make HTTP requests to foreign language webpages, so you’ll need to decode the pages with an encompassing decoding type. With UTF-8, a popular encoding of unicode, you can translate the totally foreign byte string to unicode translated format so that you can actually manipulate and understand what you received correctly.

 

Let’s review.

You manipulate unicode strings. Consider unicode as the entirety of every character existing on the planet. Unicode has encodings. Encodings are formats like UTF-8 that arrange the characters into a number format so that they’re all unique. These unique values in turn convert the characters to their proper bytes, making it sendable and readable.

Strings in Python can be very confusing when it comes to encoding! Even more confusing is that:

Python 2.x:

str is byte string
unicode is Unicode string

Python 3.x:

bytes is byte string
str is Unicode string

But luckily, for most instances of Python 2 vs Python 3, the translation between u'foo' (unicode) and b'foo' (byte) are the same because Python does the encoding and decoding automatically. But not always, so it’s good to understand the difference. Python also messes up its automatic encoding and decoding, which brings you the need to understand what’s actually happening to make the proper encoding and decoding changes manually.