uge_l2_rdbms_python_proto

UGE / L2 / Intro to relational databases / Python project prototype

Background and goals

The introductory course to relational databases ends with a project illustrating the use of a relational database together with a simple CRUD web application.

The Python language could be a better fit compared to PHP for this kind of assignment as both students and the teaching staff are already familiar with the former from preceding units.

To showcase Python as a platform suitable for this purpose, a small demo project specified in ./demo-project-specs.md will be implemented.

The present document aims to motivate the functional and technical choices which have been made.

Interpretation of the specifications

Undefined aspects of the specifications are clarified as follows, with the intent of keeping the project as simple as possible:

the platform only deals with one currency (simoleons, arbitrarily),
the platform only supports one language (English, arbitrarily),
each user only has one account/wallet/balance (all three terms synonymous),
there is no notion of financial order (transactions are immediate),
only the effective transactions are logged, the rejected ones are not,
fees are billed on top of internal transfers instead of taking a cut.

Database model

Entity-relationship model

The database model is fully described as types, tables and constraints in ./sql/tables.sql. The file is annotated with comments and remarks explaining the design choices and highlights some additional constraints which remain to be enforced at upper levels.

An entity-relationship diagram of the model can be generated from the SQL table descriptions using a tool like SQLFairy (note: SQLFairy crashes with recent PostgreSQL dialects, so it is not usable here).

Consistency

The consistency of the account balance and the financial transaction log can be ensured through multiple means such as executing modification and insertions in SQL transactions at the application level, or using triggers within the database itself. The former option is chosen over the latter to avoid implicit actions.

Additional constraints to prevent alterations of the financial transaction log such as deletions are omitted here as the web application will act as the sole gateway to interact with the database and will offer no way of doing this kind of operations. For the same reason, the use of stored procedures was not retained.

Web application

Platform

Interface with the database

Object-Relational Mappers (ORM) such as SQLAlchemy are sometimes used to partially or fully abstract the database and its entities behind objects in the Object-Oriented Programming (OOP) sense. With those tools, it is possible to generate tables, indexes, and constraints from annotated classes. They also allow automatic querying and implicit joins to perform information retrieval, and inserting and updating records in the database directly by setting fields on linked objects.

However, the object-relational impedance mismatch problems often render the use of ORMs impractical especially for complex queries, more often than not resulting in the developers bypassing the ORM altogether. The tools also tend to introduce performance issues due to the implicit queries they generate in situations in which developers do not always expect them to.

In the context of an database course aiming to teach SQL, such level of abstraction is also a hindrance by itself. Therefore, the use of an ORM has been excluded for this project.

More direct and less intricate forms of mappings, such as transparent result tuples to Python immutable named tuples conversion, are preferred instead. Features of this kind, which are directly provided by database drivers like psycopg, are often sufficient by themselves.

It remains desirable to segregate SQL queries into their own modules in order to follow the Single-Responsibility Principle (SRP). Fortunately, libraries such as embrace allow easily deriving Data Access Objects (DAO) from sets of annotated raw SQL queries, exposing them as callable Python functions.

CGI scripts vs. Services

The most simple and primitive way of making Python programs accessible from a web browser would be through the Common Gateway Interface (CGI). Through it, the user is able to execute the scripts by visiting a matching URL, to then visualise its output directly in their browser.

The other approach is to make the program a server application as a persistent service process, in charge of accepting, handling and replying to network requests by itself.

The former, identical to the way the oldest PHP scripts are run, has become less popular in favour of the latter, mainly for performance and maintainability concerns for real applications outside of trivial, independent, and self-contained scripts. For those reasons, this project will be implemented as a service process.

Comprehensive vs. Light-weight frameworks

While bare Python could be used to create such a web service, it is desirable to work at a higher abstraction level to focus on the application-specific features rather than the standard protocol implementation details.

Python frameworks such as Django offer comprehensive solutions for building web applications based on the Model-View-Controller (MVC or MTV/MVT) design pattern, ensuring separation of concerns while abstracting lower level logic. This kind of complete solutions provide routing logic to map requests to user-defined handlers, and are integrated with a Object-Relational Mapper as well as with a templating engine to generate HTML pages dynamically.

Other more light-weight frameworks such as Flask or the more recent FastAPI instead focus on the first part, taking care of unpacking and routing requests to the user-defined handlers, while leaving the rest to the application developers. This approach does not fully impose an entire environment, and allows better composability with libraries which can be freely chosen.

Because the use of an ORM is not desirable in this project for the reasons detailed in a previous section, the choices of frameworks is limited to these light-weight frameworks. Here, FastAPI is preferred over Flask due to its more modern architecture, using parameters and dependency injection over thread-local global variables. The chosen templating engine is Jinja for its simplicity.

Project structure overview

./sql/
- tables.sql: database initialisation statements
- queries.sql: annotated SQL queries (from which the DAO is derived)
./app/
- app_database.py: database connection pool and transaction helper
- app_sessions.py: (signed cookies) session data management helpers
- app_templating.py: template rendering helper
- app_{account,wallet}.py: page-specific request handlers
- app.py: FastAPI web application entry point
./templates/: Jinja HTML templates
./static/: static web resources (stylesheets)
./flake.nix: project runtime and development environment description

Security considerations

SQL injections are prevented by using proper query parameters substitution, automatically handled by the embrace and psycopg libraries.
Injections in rendered views are prevented by the automatic sanitisation of inserted variables by the Jinja templating engine.
Passwords are salted and hashed when stored in the database to ensure some minimal protection of the data at rest. The hashing is handled by the passlib library, which also covers algorithm migrations.
Cross-Site Request Forgery (CSRF) attacks are mitigated through the conjunctive use of POST requests for user actions and SameSite restrictions for session cookies. (note: this will become a sufficient protection only when support in browsers will become ubiquitous).
In its current state, the application does not implement any kind of rate limiting. Such restriction would be needed for real world applications in order to mitigate account password brute-force attacks, but also to prevent users from avoiding transfer fees by sending many small unbilled transactions.

Going further

Technical platform improvements

Form validation could be refactored to make constraints more easily composable and to improve error reporting to users with a per-field granularity.
Error handling could be improved by the use of exceptions to be turned into user-friendly error messages in a generic way, either through decorators or with a middleware.
Request handlers could be made fully asynchronous after migrating to psycopg3 to improve performance under a large amount of requests.
Database schema migration will need to be implemented to ease the deployment of subsequent versions of the application.

Functional project improvements

A minimum transaction amount needs to be introduced to prevent users from dodging the transaction fees by sending small unbillable amounts.
The financial transaction log on the wallet page gets longer over time and needs to be properly paginated.
Some stronger form of authentication could be used, either by delegating the authentication to some other provider, or by adding multiple-factor authentication.
Users should be offered the possibility to modify their account data, delete their account, and to download all the data concerning them.
The creation of new user accounts could require some email validation. Users could be notified via email of incoming money transfers.
The user interface could be translated into multiple languages with the help of an internationalisation/localisation library such as Babel.

Development environment

The development and execution environment is fully described using the Nix language and package manager as a Nix Flake in ./flake.nix. This allows creating reproducible environments containing all the software dependencies of the program itself, as well as optional development tools.

(Note: this projects requires Nix version 2.4 or higher).

This environment can be used on NixOS, MacOS, or any Linux system having Nix installed. In principle, Nix should also be usable on Windows through the Windows Subsystem for Linux (WSL) compatibility layer. Full virtual machines and containers can also be derived from the same description file.

All the commands in this section have to be run within the provided development shell, which can be entered by running the following command at the root directory of the project:

nix develop

Local database

The Nix Flake development shell provides its own self-contained PostgreSQL server, configured to operate independently of any other instances running on the same system.

All data are written to the ./development_database/pgadata directory. The database server can be initialised by running the following command:

initdb --no-locale --encoding UTF8 --auth-host reject --auth-local peer

The local development PostgreSQL server can then be started by running the following command, with the $PGHOST environment variable automatically set by the development shell:

postgres -h "" -k "$PGHOST" -d 2

This server listens to local requests through a UNIX domain socket located at ./development_database/.s.PGSQL.5432, to which programs run in the development shell will implicitly automatically connect to.

The development shell ships with both the psql and pgcli tools to interact directly with the database. The latter provides additional features such as syntax highlighting and better auto-completion.

A new local database for the application can be created and its table can be initialised with:

createdb app
psql app < ./sql/tables.sql

Should the need arise, this database can be deleted with the following command before being created and initialised again:

dropdb app

Local application server

The server application can be started in development mode with:

uvicorn \
  --reload-dir app \
  --reload-dir templates \
  --reload \
  --app-dir app \
  app:main

This server will listen to incoming requests to a locally bound port. It will automatically reload itself when its files are edited, and display logs about type checking and runtime errors.

Production deployment

While the deployment phase is way out of the scope of a database introductory course, asking students to deploy their application on the Internet could be a motivational factor. It would also facilitate evaluating and grading projects, avoiding deployment and testing hassles to the teaching staff.

Standard daemon

The Nix Flake provides a Nix package which can be used to run the web application in a production context as a daemon, managed by a standard init system.

Docker container

A Docker container can also be derived from this package to be deployed on popular cloud hosting services.

Example of container deployment

An example of deployment procedure using the free tier provided by the Heroku hosting service is given below. This makes use of the skopeo tool to upload the Docker container to the service.

# Log in to an Heroku account.
heroku login

# Create a new project with a random name, hosted somewhere in Europe.
heroku create --region eu

# Set a local environment variable with the assigned name of the project so
# that the next commands operate on it.
export HEROKU_APP='name of the created app'

# Set a randomly-generated signed cookie secret key for our application.
heroku config:set COOKIE_SECRET_KEY=$(pwgen --secure 128 1)

# Attach a PostgreSQL database to the newly created app.
# This sets a connection URL in "DATABASE_URL" in the server's environment,
# containing the confidential database username and password.
heroku addons:create heroku-postgresql:hobby-dev --version=13

# Create the tables in the database.
heroku psql < ./sql/tables.sql

# Prepare a Docker container.
# This creates a Docker archive streaming script as `./result`.
nix build .#docker

# Log in to the Heroku container registry to upload our container.
skopeo login --username _ --password $(heroku auth:token) registry.heroku.com

# Upload the Docker image to Heroku (uploading about ~200MB).
./result \
| gzip --fast \
| skopeo --insecure-policy copy \
    docker-archive:/dev/stdin \
    docker://registry.heroku.com/$HEROKU_APP/web

# Deploy and launch the uploaded container.
heroku container:release web

# If all went well, the app should now be deployed and accessible on
# https://$HEROKU_APP.herokuapp.com
heroku open

# If not, logs can be remotely inspected for debugging.
heroku logs --tail

Conclusion

It took approximatively three days of work to the author to architecture and implement the demo project, with prior basic knowledge of SQL and Python, but with no prior familiarity with the used libraries (FastAPI, Jinja, psycopg, nor embrace). The extensive documentation of those dependencies has provided examples to quickly obtain a working application. The tooling and development environment allowed a quick, friction-less setup phase.

The level of abstraction provided by the framework and libraries allowed focusing on the database schema and queries aspects, without needing to develop a deep understanding of the intricate protocols and standards linked to web systems. Their APIs did not impose, but did encourage the developer to follow some best-practices regarding the structure (separation of concerns) and the security (proper query and template parameterisation, session handling) aspects of the application.

Copyright and licensing

This project is distributed under the terms of European Union Public Licence version 1.2, a copy of which is provided in ./licence.txt.