By Francesco in C — 14 Nov 2024

Building an HTTP server in C

This article will be updated during the process...

Ok, let's start this loooong journey! Remember, this is not a tutorial on how to build it, just my journey.

I'm not a native English speaker, so sorry for any mistakes I might make.

The goal is simple: build a semi-functional HTTP server. You can find all of the docs and code here. The name is simple buf effective: cws, yes C Web Server, isn't it?
However, before starting to type random keys on my keyboard and hope that everything works out of the box I should start learning some basic Internet stuff. That's why I read the Beej's Guide to Network Programming and learned about the TCP/IP stack. You can find all the code on my GitHub profile.

Why C?

I mean, why not? Jokes aside, I chose C because I want to improve my low-level coding skills. This could open a debate, as some people argue that C is a high-level language, but compared to Python, isn't it low-level? Anyway, let's skip that for now.
Additionally, easy projects often don't push you enough to learn something valuable. Sure, they might be quick wins, but they don't offer the depth that comes from more challenging tasks.

Setting up project

Let’s start by choosing the right build system. Makefile? Nah, let’s go with the newer and easier Meson Build System! It’s essentially a Python script (which is why it’s so easy) that configures the project. While this wasn’t my first time using Meson, I definitely learned more about it during this project.

Documentation

I wrote it. Why? For me. The future me should be able to understand what happened in the code that the "previous me" wrote months (or years) ago. I can’t remember everything! I'm putting a lot of effort writing it.

First steps

I began by writing some basic network code: a simple TCP client and server. The server listens for incoming connections, and when a client connects, it sends a "Hello, there!" message. It’s a straightforward example to get started.

Next, I extended the project by implementing similar functionality on the client side, creating the foundation for a basic chatroom. This setup allowed multiple clients to connect and exchange messages, simulating a simple chat environment.

Epoll
I thought to myself, "How can I handle a large number of clients without blocking my server?" The answer: non-blocking sockets and epoll. epoll is an I/O event notification mechanism that helps efficiently manage multiple client connections. Learn more here.

Hash table
This was one of my favorite parts of the journey. I decided to write a hash map from scratch, even though I had no prior experience with them. It turned out to be a lot of fun! In about a day, I managed to create what I think is a decent implementation, thanks to Wikipedia and the help of some folks online. (Yes, I got stuck on collision resolution and had to ask for help!)

Here’s a quick summary: a hash map uses a hash function to compute an index for storing values in a bucket array. However, sometimes different keys can produce the same index, leading to collisions. To handle this, you need a collision resolution strategy. I went with the chaining method, using a linked list for each bucket.

Was it overengineered? Absolutely. But I saw it as an opportunity to learn and explore something new, and it was totally worth it!

HTTP requests
Now I have to understand how HTTP requests work.
What have I learned? A client sends a request and the server parses the request and send the response, easy...? I wasn't easy for me to implement the server-side code. That's an example of HTTP request:

GET / HTTP/1.1
User-Agent: Mozilla/4.0 (compatible; MSIE5.01; Windows NT)
Host: www.tutorialspoint.com
Accept-Language: en-us
Accept-Encoding: gzip, deflate
Connection: Keep-Alive

The first line is the request line, you have the method (GET, POST, HEAD, etc...), the location (the requested resource) and the version (the last HTTP version is 2). The following lines are other stuff... now an HTTP response example:

HTTP/1.1 200 OK\r\n
Content-Type: text/html\r\n
Content-Length: 88\r\n
Connection: close\r\n
\r\n
<HTML>

Yes, an HTTP request/response has \r\n at the end of the line, why? Because it's written in the RFC 2616 section 4, it's the standard. However the first line is the version and the status code. Then the content type of the resource and how long the resource is. The <HTML> tag should have the resource.

Just an FYI, I spent a lot of time trying to understand why some outputs were just newlines, and then I discovered that in CLion, you need to check the 'Emulate terminal in the output console' option for the print statements to work.

Static files
A basic function of a web server is to serve static files, such as index.html or style.css. This can be achieved by searching for the requested file in a specific root directory (e.g., /var/www/html on Ubuntu) and attempting to open it. If the file exists, the server reads its contents and includes them in the response body.

When a browser requests an HTML file that references other resources, such as CSS files, it will send additional GET requests for those files. The server should respond by simply sending the requested files. Each file type requires an appropriate Content-Type header, such as text/html for HTML files, text/css for CSS files, or application/json for JSON files. More here.

Images and videos!

It was a nice journey until this moment. I had to rewrite all the response code: you need to open the file in "rb" mode, so you process the file byte per byte, without any interpretation. Then you get the file size using fseek and ftell. Malloc everything, use fread and send the response...
Extra: BUGS! Yes, this code now will make the browser loop if a resource does not exist on the folder.

Path Taversal Vulnerability

TODO

Why C?

Setting up project

Documentation

First steps

Come modificare l'autore in tutti i commit fatti tramite Git