Hosting a static HTTP website on Google Cloud Storage

Always free tier, access to server logs for web analytics, configurable MIME (XHTML). Limitations — HTTPS is not free, starts at $18.25/month. Each step documented and is a little bit verbose but surprisingly simple.

Lets start.

Domain ownership verification

Google Search Console's Domain verification method requires TXT record, that would be shadowed by CNAME in the next step. I have to use URL prefix — put a small file in my root.

Serve static website using HTTP

In domain registrar console set CNAME to connect domain to Cloud Storage:

NAME TYPE DATA sergeykish.com CNAME c.storage.googleapis.com.

Install gsutil:

$ yay -Ss google-cloud-sdk
$ gcloud init
You must log in to continue. Would you like to log in (Y/n)?  Y
Create a bucket, name matching CNAME above, grant public access, assign specialty pages, upload them:
$ gsutil mb -b on gs://sergeykish.com
$ gsutil iam ch allUsers:objectViewer gs://sergeykish.com
$ gsutil web set -m index.html -e 404.html gs://sergeykish.com
$ gsutil cp index.html gs://sergeykish.com/index.html
$ gsutil cp 404.html gs://sergeykish.com/404.html
Copying file://404.html [Content-Type=text/html]...

Content Type is derived from source extension. I store it in xattr user.mime_type:

$ cat getmime
#!/bin/sh
getfattr -n user.mime_type --only-values $1

$ cat setmime
#!/bin/sh
setfattr -n user.mime_type -v $1 $2

$ cat cp
#!/bin/sh
gsutil -h "Content-Type:$(./getmime $1)" cp $1 gs://sergeykish.com/$1

$ touch foo
$ ./setmime 'application/xhtml+xml' foo
$ ./cp foo
Copying file://foo [Content-Type=application/xhtml+xml]...

Fetch Logs

Create another bucket to store logs, gave GCS permission to write logs, enable logging:

$ gsutil mb gs://sergeykish-logs
$ gsutil acl ch -g cloud-storage-analytics@google.com:W gs://sergeykish-logs
$ gsutil logging set on -b gs://sergeykish-logs gs://sergeykish.com

Download logs

$ gsutil rsync -r gs://sergeykish-logs logs

GoAccess

$ pacman -S goaccess

Use Google Cloud Storage format:

$ cat ~/.goaccessrc
# Google Cloud Storage
time-format %f
date-format %f
log-format "%x","%h",%^,%^,"%m","%U","%s",%^,"%b","%D",%^,"%R","%u"

Each file starts with a header:

$ goaccess logs/sergeykish.com_usage_2020_05_*
Parsed 1 lines producing the following errors:
Token 'time_micros' doesn't match specifier '%x'
Format Errors - Verify your log/date/time format

A few strategies to remove it:

tail -n +2 logs/sergeykish.com_usage_* | goaccess
sed '1d' logs/sergeykish.com_usage_* | goaccess

But sed can also remove my IP, HEAD and POST requests:

$ cat report
#!/bin/sh
# sed treats multiple input files as one long stream.
# -s, --separate
# consider files as separate rather than as a single, continuous long stream.
sed -s '1d;/109.86.170.113/d;/,"HEAD",/d;/,"POST",/d' logs/sergeykish.com_usage_* | goaccess -o report.html

Check today only data:

$ cat report-today
#!/bin/sh
sed -s '1d;/109.86.170.113/d;/,"HEAD",/d;/,"POST",/d' logs/sergeykish.com_usage_$(date +%Y_%m_%d)_* | goaccess -o report-today.html
Done.