A brief insight into this own-developed blog system

This Blog is completely native 64-bit x86 code (implemented in ObjectPascal and some x86 inline assembler), because I don't like script-based Blog systems and script-based CMS systems, as they often only have countless security holes. And that's not what I want on my servers. The same applies to SQL databases, where SQL injections and so on, for example, are still often a problem.

The Blog system also has its own epoll-based HTTPD with several worker threads and its own small but fine in-memory key/value database engine with a JSON on-disk data format. Or in another words, the HTTPD is the Blog itself.

It runs on its very own dedicated virtual server, where really nothing else is running except SSH and so on, with Debian Stretch, two permanently dedicated Intel Xeon E5-2680 v4 CPU cores, permanently allocated 6GB RAM, permanently allocated 40GB SSD storage and with a 1 GBit ethernet connection to the Internet, at least at the time I was writing this.

So, while it is running, all database operations will be processed completely only in-memory. Therefore, the database will be loaded from the JSON file completely into the memory at daemon startup (including a complete reindexing every time on the database load process and on each database content update action, for to save unnecessary disk space), and it will be written back to the JSON file at the daemon shutdown (including old database JSON file rotation, as a security mechanism if something goes wrong at the database writeback process), or if you explicitly request it in the admin interface. Correspondingly, it's pretty fast, but admittedly, that's not suitable for larger datasets, but it's sufficient enough for a small Blog like this one. 

The Blog's own HTTPD uses EPoll as said before, but with EPOLLEXCLUSIVE if it exists on a target system with a newer Linux kernel >= 4.5, otherwise it uses the EPOLLONESHOT approach as fallback solution, which has to be set again and again for each event, in contrast to EPOLLEXCLUSIVE, which need to be only set once.

The HTTPD has, if it is in the Daemon mode after the daemon fork call, two system processes, a guarding process and the actual process with the umpteen worker threads (where each worker thread calls epoll_wait to the same epoll object). It creates twice as many worker threads as many logical CPU cores it finds. Each worker thread can in turn do everything, accept clients, serve clients, handle client timeouts, and so on in an continuous non-blocking way. There are no restrictions or distinctions.

The HTTP Request Parser is a DFA state machine that also runs asynchronous or, to put it more precisely, continuous non-blocking, so that a worker thread can serve several clients at the same time without blocking anything for a too long time, where each DFA state transition array entry byte have also a action-flag-bitmask, in sense of, this character belongs to the HTTP request method, this character belongs to the URI, this character belongs to the HTTP version, this character belongs to the key/value-pair key, this character belongs to the key/value-pair value, and this character belongs to the final CRLF-CRLF character sequence and so on.

The HTTPD also has two LRU-based caches, a general page request cache and a search query cache, and it has two ports open, 80 for normal HTTP, and 443 for HTTPS. It supports HTTP 0.9, 1.0 and 1.1 including gzip-compression, chunked-transfer, keep-alive, and so on. I have scheduled to support HTTP 2.0 later, as soon as I have enough time and willingness for to implement a HTTP 2.0 support layer in this Blog-own HTTPD. 

The HTTPD is otherwise also a normal HTTPD with an access to a default htdocs directory with the CSS files, JS files, images and so on.

The Blog HTML code is currently hardcoded in the HTTPD itself which I will eventually replace with an own template system, also as soon as I have enough time and willingness for to implement so such template-system stuff. 

And as a JS-based HTML5 WYSIWYG editor I wrote something of my own with the name HypraEditor (demo here), because third-party solutions like TinyMCE, CKEditor, etc. were not acceptable for me, because they are either not free of charge (if the supposedly also have open source community variants) or have external third-party JS-Framework dependencies (such as to jQuery and so on).

The password stuff for the admin interface is not transmitted as plaintext via HTTPS but as SHA-512 hash with two salts, one of which is generated by the HTTPD itself, and the other salt is generated by the client via Javascript. However, this second salt is time-dependent (UTC time zone) and is only valid for about 6 minutes, or more precisely, for about 2 minutes with a tolerance of -/+ 2 minutes each, so that the HTTPD can generate the same second salt for the hash value check during the login attempt. In other words, the server and client must be running with the correct time with -/+ two minutes error tolerance for a successful login attempt, but regardless of the time zone, since a UTC time zone correction is performed.

The post and page UUIDs are generated with a own arc4random cryptographically secure pseudo-random number generator (CSPRNG) implementation, but with ChaCha20 instead RC4 as its basic building block. It uses many entropy sources at the entropy pool fill process, including the time, /dev/random, /dev/srandom, /dev/urandom, /proc/sys/kernel/random/uuid, environment variables, x86 rdrand instruction (if available), x86 rdseed instruction (if available), and so on.

Critical errors are written to the syslog API in a systemd compatible way and incoming HTTP access requests are written to a access.log file in the usual standard extended HTTPD Access Log format.

And it uses only IPv6 sockets, but in such a way that IPv4 clients can also connect.

And here you can also have a short look at the admin interface of this blog: 

And as a small hint by the way, the visible admin interface Session ID in the URL in the YouTube video is only the first part of the entire Session ID, because the other part is transferred as a short-lived CSRF token cookie. So the visible session ID in the URL in the video is useless on its own without the second part.

And finally here are some line-of-code statistics:
  • The main pascal source code file has about 16022 lines of code
  • The PasMP.pas pascal source code file has about 13909 lines of code
  • The PasDblStrUtils.pas pascal source code file has about 3969 lines of code
  • The PasJSON.pas pascal source code file has about 2205 lines of code
  • The PUCU.pas pascal source code file has about 45687 lines of code
  • So about 81792 lines of code in total.