Containerising Calibre-Server To Read And Annotate Ebooks Via Web Browser
I'm going to an in-person event later this year and, in preparation, attendees have been asked to read (and make notes on) a specific book.
For me, this presents something of a challenge: although I _used to be_ an avid reader, nowadays I really struggle to be able to sit and read. I _think_ it's a combination of a busy life and having two dogs who'll happily find something (usually ball related) for me to do if I seem even _vaguely_ idle for more than a few seconds.
I didn't _particularly_ want to arrive at the event with nothing but excuses, so I decided that the solution was probably to find a way to increase the amount of opportunity that I get to dip into the book.
The easiest way to do that, seemed to be to make the book (and any annotations that I might make) available across all of my various devices. That way it would always be to hand if I found myself briefly unoccupied.
The book in question _is_ available on Kindle but, aside from Kindle not properly supporting the majority of my devices, I didn't want to build a workflow around Amazon given that they've stopped users from keeping backups of their purchases.
I needed something different. Although the solution would ideally not be tied to one provider's ecosystem, finding something open and multi-platform is apparently not that easy.
After searching around a bit, I decided that my best hope probably lay with Calibre's content server (it helped that I've previously used Calibre to manage an old Kindle). The catch, though, was that Calibre's quite GUI heavy and I wanted to be able to run this headless on a server rather than on my machine.
In this post, I talk about the customisations that I made to achieve my needs as well as a few to make the eventual solution more secure.
* * *
### Requirements
I had a few requirements that I wanted to meet:
* The solution must be FOSS (which Calibre is)
* Needs to be separate from my "main" library: on the offchance I end up sending someone a link, I don't need/want them having access to my full library
* Should run as a server rather than needing to be on my personal laptop
* Should come back up automatically after a restart
* **Must** allow annotations and highlighting etc
* * *
### Calibre Image
A quick hunt around revealed that Linuxserver have a Calibre image, meaning that running Calibre in docker should be quite easy.
Using someone else's post as a reference, I span up a container to play with it:
docker run \
--name calibre-server \
-p 48080:8080 \
-p 48081:8081 \
-e PASSWORD='password' \
-e TZ='Europe/London' \
-v calibre-test-config:/config \
linuxserver/calibre:latest
This worked but it quickly became clear that the image was heavier and more complex than I really wanted or needed.
By default, the container runs Kasm (a web native VNC server) allowing access to Calibre's GUI:
This is, undoubtedly, a clever and cool use of technology but it's not _particularly_ convenient for use on mobile devices.
In the top right, though, there's a `Connect/share` button. This reveals a menu which allowed me to start Calibre's content server on port `8081` (which the container then exposed as `48081`)
This was _much_ more convenient:
The simple UI allowed me to browse, read and annotate books:
Importantly, it worked just the same on Firefox Mobile.
* * *
#### Shortcomings
The functionality that the container offered was _almost_ ideal, but there were a couple of shortcomings.
The Content Server needed to be manually enabled whenever the container restarted, which meant that I'd need to continue to expose the web-to-VNC service.
My biggest concern, though, came when I `exec`'d into the container to see what processes were running:
ps -x
PID TTY STAT TIME COMMAND
1 ? Ss 0:00 /package/admin/s6/command/s6-svscan -d4 -- /run/service
17 ? S 0:00 s6-supervise s6-linux-init-shutdownd
20 ? Ss 0:00 /package/admin/s6-linux-init/command/s6-linux-init-shutdownd -d3 -c /run/s6/basedir -g 3000 -C -B
36 ? S 0:00 s6-supervise svc-pulseaudio
37 ? S 0:00 s6-supervise s6rc-fdholder
38 ? S 0:00 s6-supervise svc-docker
39 ? S 0:00 s6-supervise svc-nginx
40 ? S 0:00 s6-supervise svc-kasmvnc
41 ? S 0:00 s6-supervise svc-kclient
42 ? S 0:00 s6-supervise s6rc-oneshot-runner
43 ? S 0:00 s6-supervise svc-cron
44 ? S 0:00 s6-supervise svc-de
52 ? Ss 0:00 /package/admin/s6/command/s6-ipcserverd -1 -- /package/admin/s6/command/s6-ipcserver-access -v0 -E -l0 -i data/rules -- /package/
214 ? Ss 0:00 bash ./run svc-cron
232 ? Ss 0:00 nginx: master process /usr/sbin/nginx -g daemon off;
242 ? S 0:00 sleep infinity
248 ? Ss 0:00 bash ./run svc-docker
373 ? S 0:00 sleep infinity
374 pts/0 Ss 0:00 bash
584 pts/0 R+ 0:00 ps -x
The container was running `docker`, `pulseaudio` and various other services. Some of these were even running with `root` privileges.
This simply wouldn't do.
* * *
### Customising The Container
The docker image had _what_ I needed, the problem was that it wasn't running it in the _way_ that I needed.
Calibre's UI isn't actually the only way that you can invoke the content server because there's also the `calibre-server` command. So, I decided to experiment with bypassing the container's entrypoint so that it would only run the bit that I needed.
Invoking `calibre-server` worked OK within the _existing_ container, but failed when I tried to use it with a fresh/empty volume because it expects that some setup will have occurred first.
Although one option was to run the container "normally" on first run, it would have felt a bit defeatist, so I set about figuring out what was needed and wrote a script that could act as the replacement entrypoint:
#!/bin/bash
#
# Custom entrypoint to configure and launch the
# Calibre content-server
CALIBRE_LIBRARY_PATH=${CALIBRE_LIBRARY_PATH:-"/config/Calibre_Library/"}
CALIBRE_USER=${CALIBRE_USER:-"abc"}
# Preconfigure user auth if it doesn't exist
if [ ! -f /config/.config/calibre/server-users.sqlite ]
then
calibre-server --userdb /config/.config/calibre/server-users.sqlite --manage-users -- add $CALIBRE_USER $PASSWORD 2> /dev/null
fi
# Create a library if one doesn't exist
if [ ! -d "$CALIBRE_LIBRARY_PATH" ]
then
# Create the library dir
mkdir -p "$CALIBRE_LIBRARY_PATH"
# It won't be considered a library by Calibre yet, we need to add a book
# so that the DB gets created
cat << " EOM" | sed -e 's/^ //' > /tmp/welcome.md
# Welcome
Welcome to Calibre-Server, preconfigured by Ben Taskers hacky bash script.
You should now be able to upload books to your library for reading and annotation.
EOM
# Add the book
calibredb --library-path "$CALIBRE_LIBRARY_PATH" add /tmp/welcome.md
fi
# Start the server
#
# We use basic auth mode here because Calibre will
# use digest by default. We're going to want our SSL
# reverse proxy to send it's own creds, which is much
# easier to configure with basic auth
#
calibre-server \
--listen-on 0.0.0.0 \
--port 8081 \
--access-log /dev/stdout \
--disable-allow-socket-preallocation \
--enable-auth \
--disable-use-bonjour \
--enable-use-sendfile \
--userdb /config/.config/calibre/server-users.sqlite \
--auth-mode basic \
"$CALIBRE_LIBRARY_PATH"
Note: there's a copy of this script on Codeberg.
Having written this script, I made adjustments to my docker invocation:
* Port `8080` was no longer required as the web2vnc stuff wasn't being run
* I mapped in the new entrypoint script
* I overrode the container's entrypoint so that it would use the script
The run command looked like this:
# Start from a clean slate
docker volume rm calibre-test-config
docker run \
--name calibre-server \
-p 48081:8081 \
-e CALIBRE_USER="abc" \
-e PASSWORD='1Ch4ng3d7h15R3411y!' \
-e TZ='Europe/London' \
-v calibre-test-config:/config \
-v $PWD/entrypoint.sh:/entrypoint.sh \
--entrypoint="/entrypoint.sh" \
linuxserver/calibre:latest
Logs from the first run showed the preconfiguration adding my Welcome book:
Added book ids: 1
calibre server listening on 0.0.0.0:8081
Subsequent runs correctly detected that the library existed and so didn't attempt to re-insert the book.
`calibre-server` was listening and I was able to upload, read and annotate ebooks using my web browser:
Just as importantly, the container was now running far fewer processes:
docker exec -it 118affff481b ps -x
PID TTY STAT TIME COMMAND
1 ? Ss 0:00 /bin/bash /entrypoint.sh
12 ? Sl 0:00 /opt/calibre/bin/calibre-server --listen-on 0.0.0.0 --port 8081 --access-log /dev/stdout --disable-allow-socket-preallocation --e
24 ? S 0:00 /opt/calibre/bin/calibre-parallel --pipe-worker from calibre.utils.safe_atexit import main; main()
33 pts/0 Rs+ 0:00 ps -x
Whilst this was a significant improvement, it wasn't perfect: the other stuff was _still_ in the container and so could potentially still present attack surface.
Whilst it was true that an adversary would now need to start by exploiting an issue in `calibre-server`, vulnerability chaining _is_ a thing and the existence of other tooling can sometimes help promote a minor flaw into a bad day. Ultimately, this was _still_ a container running with root privileges with tools like `docker` inside it.
I decided to look at having the entrypoint script remove unnecessary packages by doing something like:
apt remove pulseaudio nginx docker-ce cron
However, when I looked at the list of installed packages, I found that the image was **much** fatter than I'd originally realised:
dpkg-query --list | wc -l
729
* * *
#### Can we use Wolfi?
Rather than spending time figuring out which of those packages could safely be removed, I decided that it'd probably make sense to start over from a more secure base: enter Wolfi.
Wofli is an (un)distro intended as a secure _and minimal_ base for container images.
Wolfi's packages are regularly rebuilt in order to remediate CVEs but, unfortunately, there wasn't a package for Calibre, so I wouldn't get the full benefit of this.
However, there not being a package wasn't a major blocker: it just meant that I needed to manually install Calibre into the image. _Ideally_ I'd have liked to build it from source, but Calibre's build process is pretty complex so I settled for pulling binaries (which is also what the linuxserver image does).
* * *
##### Calibre Dependencies
Calibre depends on `PyQt6` which, in turn, depends on a number of shared libraries, so the list of dependencies was a little longer than I'd like.
Unfortunately, I also had to build that list manually because `PyQt6` was installed with `pip` which doesn't pay a _huge_ amount of attention to non Python dependencies.
This involved a bit of trial and error because the only clue that something was wrong came when Calibre threw exceptions like this (though only ever one at a time):
> ImportError: cannot import name 'QWebEnginePage' from 'qt.webengine'
>
> ImportError: cannot import name 'QBrush' from 'qt.core'
This was odd, because these modules are provided by `PyQt6` which _definitely_ was installed.
Running `calibre_postinstall` threw similar exceptions, but it's output also gave a clue about the underlying issue:
> Failed to import PyQt module: PyQt6.QtWebEngineCore with error: libxkbfile.so.1: cannot open shared object file: No such file or directory
Although `PyQt6` was installed, it couldn't be imported because some of _its_ dependencies weren't.
Getting this working was a case of
1. Running `calibre_postinstall`
2. Extracting the missing library name from the output
3. Using `apk search` to identifying the package which provided the library
4. `apk add`ing it (and updating the Dockerfile)
5. `goto 1` until exceptions stop
It turned out that I needed 15 additional packages.
* * *
##### The `Dockerfile`
After working through the dependencies, I arrived at the following Dockerfile:
FROM cgr.dev/chainguard/wolfi-base
# Install Calibre
RUN --mount=type=cache,target=/var/cache/apk \
apk add python3 curl jq py3-pip libxcb libnss \
libpcre2-8-0 libglvnd libfontconfig1 libxkbcommon \
libxcomposite libxdamage libxext libxrandr \
libxtst libdrm alsa-lib mesa-gbm libxkbfile \
&& pip install pyqt6 \
&& mkdir /opt/calibre /config \
&& latest=$(curl -s "https://api.github.com/repos/kovidgoyal/calibre/releases/latest" | jq -r '.assets[] | select(.name | contains("x86_64")) | .browser_download_url' ) \
&& curl -L -s -o /tmp/release.txz "$latest" \
&& tar xvf /tmp/release.txz -C /opt/calibre \
&& rm -f /tmp/release.txz \
&& /opt/calibre/calibre_postinstall
COPY entrypoint.sh /entrypoint.sh
ENTRYPOINT ["/entrypoint.sh"]
One build later, I had a Wolfi based container running Calibre Content Server.
Despite the dependency list being longer than I'd wanted, the container still had far less in it than the linuxserver image had:
apk list -I | wc -l
82
I wasn't done yet, though, as there was still room for improvement:
* The application was still running as root
* There were a few packages that could be removed post-install
I tagged the following onto the end of the existing `RUN` statement:
&& apk del curl jq py3-pip wolfi-keys wolfi-base apk-tools \
&& adduser -u 1000 -D calibre \
&& chown calibre /config
And then added a line to have the container drop permissions:
USER calibre
My container image was ready, so I built and published it:
docker tag test codeberg.org/bentasker/calibre-content-server-docker:8.5.0
docker push codeberg.org/bentasker/calibre-content-server-docker:8.5.0
As a final indicator of the benefits of this, my image was 44% of the size of the linuxserver one:
$ docker image ls | grep calibre
codeberg.org/bentasker/calibre-content-server-docker 8.5.0 84b9667dc463 7 hours ago 1.29GB
linuxserver/calibre 8.5.0 347f2c1b5fe2 39 hours ago 2.91GB
Unfortunately, both Calibre and PyQt6 are quite large so there wasn't much scope to reduce the size further.
You can see the final Dockerfile on Codeberg.
* * *
### Deploying
I was ready to actually deploy into production, which meant
* Standing a container up on the host box
* Acquiring a SSL cert
* Configuring my Nginx reverse proxy
Note: I could also have dropped it into my Kubernetes cluster, but decided that it was better to keep things simple in case I ended up needing to troubleshoot issues.
I don't use named volumes in prod: instead I bind mount from the host filesystem (allowing backups to be performed with `rsync`), so I started by creating the directory structure and setting the owner to the UID used by the container:
mkdir -p /home/ben/docker_files/Calibre_Web/data
sudo chown 1000 /home/ben/docker_files/Calibre_Web/data
Next, I started the container:
CALIBRE_PASS="UseSomethingSecure"
docker run \
-d \
--restart=always \
--name calibre-server \
-p 48081:8081 \
-e CALIBRE_USER="abc" \
-e PASSWORD="$CALIBRE_PASS" \
-e TZ='Europe/London' \
-v /home/ben/docker_files/Calibre_Web/data:/config \
codeberg.org/bentasker/calibre-content-server-docker:8.5.0
`calibre-server` was up and I was able to hit it directly and browse to my welcome book:
* * *
#### Fronting with Nginx
Note: Because this isn't a fresh deployment of Nginx, some of this may be a little specific to me
I created a simple server on port 80:
server {
listen 80;
root /usr/share/nginx/letsencryptbase;
index index.php index.html index.htm;
server_name calibre.example.com;
location / {thoug
return 301 https://$host$request_uri;
add_header X-Clacks-Overhead "GNU Terry Pratchett";
}
location /.well-known/ {
try_files $uri 404;
}
}
I won't go into depth here, but next I created DNS records and used `certbot` to acquire a SSL certificate for that subdomain.
I was almost ready to configure the HTTPS `server` block. However, first I needed to construct the `authorization` header that the proxy would send to Calibre.
I _could_ just have let the proxy pass through the client provided auth header, but I prefer the flexibility of managing auth within Nginx (and I still wanted auth turned on in Calibre so that random devices on the LAN couldn't so easily hit it directly).
Basic Authorization is just a base64 encoding of `username:password`, so in a shell I ran
echo -n abc:${CALIBRE_PASS} | base64
I took note of the result and started to write the Nginx config:
server {
listen 443;
root /usr/share/nginx/empty;
index index.php index.html index.htm;
server_name calibre.example.com;
ssl on;
ssl_certificate /certs/calibre.example.com/fullchain.pem;
ssl_certificate_key /certs/live/calibre.example.com/privkey.pem;
location / {
proxy_pass http://192.168.13.5:48081;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $remote_addr;
proxy_set_header Host $http_host;
# Set this to the output of the shell call
proxy_set_header Authorization "Basic ZGlkIHlvdSB0aGluayBJIGZvcmdvdCB0byByZWRhY3QgdGhpcz8=";
# Bump the maximum request size up so that
# ebooks can be uploaded
client_max_body_size 20m;
satisfy any;
allow 192.168.13.0/24;
deny all;
auth_basic "Authentication is a must";
auth_basic_user_file /etc/nginx/htpasswd-files/developers;
add_header X-Clacks-Overhead "GNU Terry Pratchett";
}
}
With this live, I was able to use HTTPS to access Calibre.
* * *
#### Tangent: Obsidian Integration
I use Obsidian for a range of things and one of the aspects that I like about it is its extensibility - there's a vast range of community plugins to make it do new and wonderful things.
It turned out that there's a Calibre integration plugin for Obsidian, which communicates with the Content Server.
So, as well as being able to read books in a web browser, I can also use Obsidian:
All of the underlying functionality (annotations etc) works. That's not too surprising as, after all, Obsidian's an electron app and so is really just a glorified web browser.
The only minor issue with the plugin is that it doesn't expose a way to provide credentials, so it'll only work from IPs that I've allow-listed (which is fine, because portable devices tend to be on my tailnet).
* * *
#### Conclusion
I now have a web based instance of Calibre which allows me to read and annotate ebooks on any device with a web-browser.
Admittedly, highlighting and annotating small passages using a touch screen is a _little_ fiddly but it otherwise seems to work well enough:
On Android, using it as a web app seems to work particularly well, with support for swiping to turn the page.
Pleasingly, there's also _sort of_ an offline mode (the docs note that it's not as fully featured as it could have been).
Hopefully, all of this should enable me to pick the book up and progress a little whenever I've a few minutes spare, something that wouldn't be as possible otherwise.
Of course, arguably, the time that I've spent doing and writing about this _could_ instead have been used to, err, _read the book_.