How-To

Running Applications in Docker, Part 2: Table Joins in PostgreSQL

Tom demos table joins, shell scripts for repetitive tasks and using "exec" to run commands inside a container from the host system.

In my previous article on running applications on Docker, I deployed a PostgreSQL database and ran various psql commands to show its viability. I deployed my database on Docker on a local x64 system running Linux, but I could have used the same commands to deploy the image in the cloud or on the edge. Not only is this good for testing, but you can also create your own Docker images with your own applications and code and deploy them when and where needed.

In my upcoming articles, I will show you how multiple Docker instances can connect and work together. Before showing you how to do that, however, I will cover a few more PostgreSQL features in this article.

Table Joins
Thus far, our PostgreSQL queries have only been accessing one table. While this is fine for a limited number of use cases, databases prove especially powerful when they are allowed to combine or join data from multiple tables. In theory, you could store all of the data about an object in a single table, but this would not necessarily prove effective. For example, a library could store the information about a book and the borrower in the same table; however, this method would not only waste a lot of storage space with duplicate information, it would also waste compute resources as every time the personal information of the borrower changed, rows of data would need to be updated.

[Click on image for larger view.]

A basic purpose of databases is to store unique information in a single table. As such, the above table could be broken down into two different tables: one with book information, and another with personal information on the library patrons.

[Click on image for larger view.]

The acute observer will notice the borrower unique identifier (Borrower UID) appears in both the tables. By using this column of data, you can join the tables so that every time a row of data from the book table is displayed, it is combined with the information from the personal information table. In SQL parlance, this is called a "join." I do need to apologize to all the database pros at the gross simplification I used in this example.

Table Join Example
To demonstrate a join, I will show how I combined the table (apps) that I created in my previous article, which contains the applications on a set of computers (devices), with additional information about those devices.

The device information was stored in a CSV file (/tmp/DeviceNames.csv) that I copied from the host machine running Docker to the PostgreSQL instance by entering the following command on the host machine:

  docker cp /tmp/DeviceNames.csv postgresql01:/tmp/hw_devices.csv

This CSV file contained the following columns of data: _device_id, architecture, device_name, name, and version.

I then logged in to my PostgreSQL instance, launched psql, connected to my database (appsdata), created a new table (hw_devices), and copied the CSV data into it by entering:


  docker exec -it postgresql01 bash
  psql -U PostUser01
  \c
  CREATE TABLE hw_devices("_device_id" VARCHAR(40),"architecture" VARCHAR(40),"device_name" VARCHAR(40),"name" VARCHAR(120),"version" VARCHAR(120));
  COPY hw_devices FROM '/tmp/hw_devices.csv'  DELIMITER ','  CSV HEADER;

I displayed the columns in the table with the applications (apps05) and the table with the hardware information (hw_devices) by entering:


  \d apps05
  \d hw_devices
[Click on image for larger view.]
[Click on image for larger view.]

I joined the information about the devices with the information about the application by entering:

  SELECT * FROM apps05 JOIN hw_devices ON apps05._device_id = hw_devices._device_id;

Note how I prefaced the columns that I wanted to use for the joins (_device_id) with the table that I wanted to select from.

To simplify the query, I gave the tables aliases (hw and app), and only selected certain columns to display by entering:

  SELECT hw.device_name, hw. Architecture, app.name, app.version FROM apps05 app JOIN hw_devices hw ON app._device_id = hw._device_id;

A partial output of the join is shown in the screenshot below.

[Click on image for larger view.]

The SQL statement ordered the applications by the device architecture and then the device ID.

  SELECT hw.device_name, hw.version, hw.Architecture, app.name, app.version FROM apps05 app JOIN hw_devices hw ON app._device_id = hw._device_id ORDER BY hw._device_id, hw.architecture;
[Click on image for larger view.]

Shell Scripts with PostgreSQL
I have been entering and using psql directly from the command line to execute commands to manipulate my PostgreSQL databases, tables, and data. However, PostgreSQL has many different ways that programing languages can interact with PostgreSQL, and you can use bash shell scripts by using the shell’s "here" feature. To use the here feature, mark the commands that you want to run with a "<<" and a unique identifier (usually EOF), and it will execute those commands until the identifier is encountered again.

The script below will perform actions from earlier in the article and display the first five lines of the CSV file. In this script, I also show how you can use variables (dbname and username) in the shell script.

  #!/bin/bash
  dbname="appsdata"
  username="PostUser01"
  psql $dbname $username << EOF
  \d apps05
  \d hw_devices
  COPY (SELECT hw.device_name, hw.version, hw.Architecture, app.name, app.version FROM apps05 app JOIN hw_devices hw ON app._device_id = hw._device_id ORDER BY hw._device_id, hw.architecture)  to '/tmp/Apps_by_Device.txt' with csv;
  \q
  EOF
  ls -l /tmp/Apps_by_Device.txt
  head -5 /tmp/Apps_by_Device.txt

Note: The PostgreSQL Docker image does not come with an editor, and I had to install an editor (vim) in the image by entering:

  apt-get update
  apt-get install vim

I made the script (PostQuery01.sh) executable and ran it by entering:


  chmod 755 PostQuery01.s
  ./PostQuery01.sh
[Click on image for larger view.]

Running a Non-Interactive Command in a Docker Container
So far, I have run all the commands interactively from within the Docker container, but you can also run them non-interactively from the host system by using the exec command.

For example, to list the files contained in the /tmp directory of the postslq01 container, I entered:

  docker exec postgresql01 ls -l /tmp
[Click on image for larger view.]

To run the script from the above host, I entered:

docker exec postgresql01 /PostQuery01.sh
[Click on image for larger view.]

Running psql Commands from a File
Using a shell script, you can combine shell and psql commands, but if you only need to run a psql command, you can us the \i command.

To demonstrate this, I created the following file that I called psqlQuery01.psql.

  \c appsdata
  \d apps05
  \d hw_devices
  COPY (SELECT hw.device_name, hw.version, hw.Architecture, app.name, app.version FROM apps05 app JOIN hw_devices hw ON app._device_id = hw._device_id ORDER BY hw._device_id, hw.architecture)  to '/tmp/Apps_by_Device.txt' with csv;
  \q

I entered psql and executed the command by entering:

  psql -U PostUser01
  \i ./psqlQuery01.psql
[Click on image for larger view.]

Conclusion
Databases provide powerful ways to store data, and by using joins you can combine the output from various tables. By using shell scripts with databases, you can repeat common repetitive tasks. Finally, Docker makes it easy, by using the exec command, to run commands inside a container from the host system.

So far, I have only used psql to work with the database. In my next article, I will deploy a Docker image that has a web-based PostgreSQL management tool.

Featured

Subscribe on YouTube