Blog O' Matty


Exporting Wordpress Posts To Markdown

This article was posted by Matty on 2017-11-24 10:22:36 -0500 EST

I’ve been running my technology blog on top of Wordpress for the past 12-years. It was a great choice when i started but the core product has morfed into more than I need. When you combine that with a constant stream of security vulnerabilities I decided last month it was time to move to a static website generation tool. Like any new venture I sat down one Saturday morning and jotted down the requirements for my new website generator:

I experimented with Jekyl, Pelican and Hugo and after several weeks of testing I fell in love with Hugo. Not only was it super easy to install (it’s a single binary written in GO) but I had the bulk of my website converted after watching the Hugo video series from Giraffe Academy:

The biggest challenge I faced was getting all of my old posts (1200+) out of my existing Wordpress installation. Pelican comes with the pelican-import utility which can take a Wordpress XML export file and convert each post to markdown. Even though I decided to use Hugo to create my content I figured I would use the best tool for the job to perform the conversion:

$ pelican-import -m markdown --wpfile -o posts blogomatty.xml

In the example above I’m passing a file that I exported through the Wordpress UI and generating one markdown file in the posts directory for each blog post. The output files had the following structure:

Title: Real world uses for OpenSSL
Date: 2005-02-13 23:42
Author: admin
Category: Articles, Presentations and Certifications
Slug: real-world-uses-for-openssl
Status: published

If you are interested in learning more about all the cool things you can
do with OpenSSL, you might be interested in my article [Real world uses
for OpenSSL](/articles/realworldssl.html). The article covers
encryption, decryption, digital signatures, and provides an overview of 
[ssl-site-check](/code) and [ssl-cert-check](/code).

These files didn’t work correctly out of the gate since Hugo requires you to encapsulate the front matter (the metadata describing the post) with “—” for markdown or “+++” for TOML formatting. To add the necessary formatting I threw together a bit of shell:

#!/bin/sh

for post in `ls posts_to_process`; do
   echo "Processing post ${post}"
   echo "---" > posts_processed/${post}.md
   header=0
   cat "posts_to_process/${post}" | while read line; do
       if echo $line | egrep -i "^Status:" > /dev/null; then
            echo "$line"
            echo "---" >> posts_processed/${post}.md
            header=1
       elif [ ${header} -eq 1 ]; then
           echo $line >> posts_processed/${post}.md
       elif echo $line | egrep -i "^Title:" > /dev/null; then
            echo $line | awk -F':' '{print $2$3}' | sed 's/^ *//g' | sed 's/"/\\"/g' | \
                 awk '{ print "title:", "\""$0"\"" }' >> posts_processed/${post}.md
       else
           echo $line >> posts_processed/${post}.md
       fi
   done
done

This takes the existing post and appends a “—” before and after the front matter. It also escapes quotes and addresses titles that have a single “:” in them. My posts still had issues with the date format and the author wasn’t consistent. To clean up the date I used my good buddy sed:

$ sed -i 's/Date: \(.*\) \(.*\)/Date: \1T\2:00-04:00/g'

To fix the issue with the author I once again turned to sed:

$ sed -i 's/^[Aa]uthor.*/author: matty/'

I had to create a bunch of additional hacks to work around some content consistency issues (NB: content consistency is my biggest take away from this project) but the end product is a blog that runs from statically generated content. In a future post I will dive into Hugo and the gotchas I encountered while converting my site. It was a painful process but luckily the worst is behind me. Now I just need to finish automating a couple manual processes and blogging will be fun again.

Using docker volumes on SELinux-enabled servers

This article was posted by Matty on 2017-09-30 12:21:00 -0400 EDT

I was doing some testing this week and received the following error when I tried to access a volume inside a container:

$ touch /haproxy/i

touch: cannot touch 'i': Permission denied

When I checked the system logs I saw the following error:

Sep 28 18:40:23 kub1 audit[8881]: AVC avc: denied { write } for pid=8881 comm="touch" name="haproxy" dev="sda1" ino=655362 context=system_u:system_r:container_t:s0:c324,c837 tcontext=unconfined_u:object_r:default_t:s0 tclass=dir permissive=0

The docker container was started with the “-v” option to bind mount a directory from the host:

$ docker run -d -v /haproxy:/haproxy --restart unless-stopped

The error shown above was generated because I didn’t tell my orchestration tool to apply an SELinux label to the volume I was trying to map into the container. In the SELinux world processes and file system objects are given contexts to describe their purpose. These contexts are then used by the kernel to allow processes to access file objects if policy allows it. To allow a docker container to access a volume on a SELinux-enabled host you need to attach the “z” or “Z” flag to the volume mount. These flags are thoroughly described in the docker-run manual page:

“To change a label in the container context, you can add either of two suffixes :z or :Z to the volume mount. These suffixes tell Docker to relabel file objects on the shared volumes. The z option tells Docker that two containers share the volume content. As a result, Docker labels the content with a shared content label. Shared volume labels allow all containers to read/write content. The Z option tells Docker to label the content with a private unshared label. Only the current container can use a private volume.”

When I added the “Z” suffix to the volume everything worked as expected:

$ docker run -d -v /haproxy:/haproxy:Z --restart unless-stopped

My haproxy instances fired up, life was grand and the haproxy containers were distributing connections to my back-end servers. Then a question hit me. How does this work under the covers? I started reading and came across two (one & two) excellent posts by Dan Walsh. When a container starts the processes comprising that container will be labeled with an SELinux context. You can run ‘ps -eZ’ or ‘docker inspect …’ to view the context of a container:

$ docker run --name gorp --rm -it -v /foo:/foo fedora:26 /bin/sh /bin/sh.distrib

$ docker inspect -f '{{ .ProcessLabel }}' gorp

system_u:system_r:container_t:s0:c31,c878

$ ps -eZ | grep (docker inspect -f '{{ .State.Pid }}' gorp)

system_u:system_r:container_t:s0:c31,c878 20197 pts/5 00:00:00 sh

In order for the process to be able to write to a volume the volume needs to be labeled with a SELinux context that the process context has access to. This is the the purpose of the ‘[zZ]’ flags. If you start a container without the z flag you will receive a permission denied error because the SELinux volume level and the process level don’t match (you can read more about levels here). This may be easier to illustrate with an example. If I start a docker command and mount a volume without the “z” flag we can see that the SELinux levels are different:

$ docker run --name gorp --rm -it -v /foo:/foo fedora:26 /bin/sh /bin/sh.distrib

$ docker inspect -f '{{ .ProcessLabel }}' gorp

system_u:system_r:container_t:**s0:c21,c30**

$ ls -ladZ /foo

drwxr-xr-x. 2 root root
system_u:object_r:container_file_t:s0:c135,c579 4096 Sep 29
12:22 /foo

If we tell docker to label the volume with the correct SELinux context prior to performing the bind mount the levels are updated to allow the container process to access the volume. Here is another example:

$ docker run --name gorp --rm -it -v /foo:/foo:Z fedora:26 /bin/sh /bin/sh.distrib

$ docker inspect -f '{{ .ProcessLabel }}' gorp

system_u:system_r:container_t:**s0:c126,c135**

$ ls -ladZ /foo

drwxr-xr-x. 2 root root
system_u:object_r:container_file_t:s0:c126,c135 4096 Sep 30 10:42 /foo

The contexts that apply to docker are defined in the lxc_contexts file:

$ cat /etc/selinux/targeted/contexts/lxc_contexts

process = "system_u:system_r:container_t:s0"
content = "system_u:object_r:virt_var_lib_t:s0"
file = "system_u:object_r:container_file_t:s0"
ro_file="system_u:object_r:container_ro_file_t:s0"
sandbox_kvm_process = "system_u:system_r:svirt_qemu_net_t:s0"
sandbox_kvm_process = "system_u:system_r:svirt_qemu_net_t:s0"
sandbox_lxc_process = "system_u:system_r:container_t:s0"

It’s really interesting how these items are stitched together. You can read more about how this works here and here. You can also read Dan’s article describing why it’s important to leave SELinux enabled.

Which file descriptor (STDOUT, STDERR, etc.) is my application writing to?

This article was posted by Matty on 2017-09-29 09:07:00 -0400 EDT

When developing ansible playbooks a common pattern is to run a command and use the output in a future task. Here is a simple example:

---
- hosts: localhost
  connection: local
  tasks:
  - name: Check if mlocate is installed
    command: dnf info mlocate
    register: mlocate_output

  - name: Update the locate database
    command: updatedb
    when: '"No matching Packages to list" in mlocate_output.stderr'

In the first task dnf will run and the output from the command will be placed in either STDOUT or STDERR. But how do you know which one? One way is to add a debug statement to your playbook:

---
- hosts: localhost
  connection: local
  tasks:
  - name: Check if mlocate is installed
    command: dnf info mlocate
    register: mlocate_output

  - name: Print the contents of mlocate_output
    debug:
      var: mlocate_output

Once the task runs you can view the stderr and stdout fields to see which of the two is populated:

TASK [Print the contents of mlocate_output]
ok: [localhost] => {
"mlocate_output": {
"changed": true,
"cmd": [
"dnf",
"info",
"mlocate"
],
"delta": "0:00:31.239145",
"end": "2017-09-27 16:39:46.919038",
"rc": 0,
"start": "2017-09-27 16:39:15.679893",
"stderr": "",
"stderr_lines": [],
"stdout": "Last metadata expiration check: 0:43:16 ago on Wed 27 Sep 2017 03:56:05 PM EDT.nInstalled PackagesnName : mlocatenVersion : 0.26nRelease : 16.fc26nArch : armv7hlnSize : 366 knSource : mlocate-0.26-16.fc26.src.rpmnRepo : @SystemnFrom repo : fedoranSummary : An utility for finding files by namenURL : https://fedorahosted.org/mlocate/nLicense : GPLv2nDescription : mlocate is a locate/updatedb implementation. It keeps a databasen : of all existing files and allows you to lookup files by name.n : n : The 'm' stands for "merging": updatedb reuses the existingn : database to avoid rereading most of the file system, which makesn : updatedb faster and does not trash the system caches as much asn : traditional locate implementations.",
.....

In the output above we can see that stderr is empty and stdout contains the output from the command. While this works fine it requires you to write a playbook and wait for it to run to get feedback. Strace can provide the same information and in most cases is a much quicker. To get the same information we can pass the command as as argument to strace and limit the output to just write(2) system calls:

$ strace -yy -s 8192 -e trace=write dnf info mlocate

.....
write(1, "Description : mlocate is a locate/updatedb implementation. It keeps a database ofn : all existing files and allows you to lookup files by name.n : n : The 'm' stands for "merging": updatedb reuses the existing database to avoidn : rereading most of the file system, which makes updatedb faster and does notn : trash the system caches as much as traditional locate implementations.", 442Description : mlocate is a locate/updatedb implementation. It keeps a database of
.....

The first argument to write(2) is the file descriptor being written to. In this case that’s STDOUT. This took less than 2 seconds to run and by observing the first argument to write you know which file descriptor the application is writing to.

Working around the ansible "python2 yum module is needed for this module" error

This article was posted by Matty on 2017-09-27 15:59:00 -0400 EDT

During a playbook run I was presented with the following error:

failed: [localhost] (item=[u'yum']) => {"failed": true, "item": ["yum"], "msg": "python2 yum module is needed for this module"}

The role that was executing had a task similar to the following:

- name: Install rsyslog packages
  yum: pkg={{item}} state=installed update_cache=false
  with_items:
    - rsyslog
  notify: Restart rsyslog service

The OS on the system I was trying to update was running Fedora 26 which uses the dnf package manager. Dnf is built on top of Python3 and Fedora 26 no longer includes the yum Python 2 bindings by default (if you want to use the ansible yum module you can create a task to install the yum package). Switching the task to use package instead of yum remedied this issue. Here is the updated task:

- name: Install rsyslog packages
  package: pkg={{item}} state=installed
  with_items:
    - rsyslog
  notify: Restart rsyslog service

The issue was easy to recognize after reading through the yum module source code. Posting this here in case it helps others.

The subtle differences between the docker ADD and COPY commands

This article was posted by Matty on 2017-09-24 08:48:00 -0400 EDT

This weekend I spent some time cleaning up a number of Dockerfiles and getting them integrated into my build system. Docker provides the ADD and COPY commands to take the contents from a given source and copy them into your container. On the surface both commands appear to do the same thing but there is one slight difference. The COPY command works solely on files and directories:

The ADD instruction copies new files, directories or remote file URLs from and adds them to the file system of the image at the path .

While the ADD commands supports files, directories AND remote URLs:

The ADD instruction copies new files, directories or remote file URLs from and adds them to the files ystem of the image at the path .

The additional feature provided by ADD allows you to retrieve remote resources and stash them in your container for use by your applications:

ADD http://prefetch.net/path/to/stuff /stuff

Some of the Dockerfiles I’ve read through on github have done some extremely interesting things with ADD and remote resource retrieval. Noting this here for future reference.