Sunday, May 21, 2017

Project Alphaberry - Part 5 - Checkpoint

So here we are, 4 months into Project Alphaberry. At this point, it is time to look back, reflect, and figure out what the next steps are. I have a running system, its pretty cool, and I learned a lot of things along the way, but now, we need to rebuild.

Yes, that’s right, rebuild. I turned everything off the other day, getting ready to reformat all the drives, reinstall the Alpha OS, and start from scratch. Why you ask, well, sometimes simple is good, however, I didn’t go all simple, I did it really complex, and not only that, I had a fatal flaw that was developing. I mixed my “lab” environment in with my “home” environment. This started to have a bleeding effect where I wanted to try something new, but it didn’t feel “safe enough” to try out, I would lose things, things I cared about, things I needed to save would be lost, and my feelings would get hurt. OK, maybe not all those things, but I needed to rebuild this thing outside my home infrastructure, as that is already pretty solid, and what I want.

The core problem, I needed a separate network, but when building everything, I didn’t realize how important that this would be, until I remembered that I wanted the Alphaberry project to be portable. Disconnect the one network cable and one power cable, pick up the “case” and plug it in somewhere else. This started becoming impossible when I used my home router DHCP for statically assigning the Pi IP addresses, then the issue of dual purposing the DNS, and then finally, using the core infrastructure of my home network with the Intel NUC’s. Also, I don’t need HA at home for most things, but I was building it to survive and keep 100% uptime, that’t almost impossible at my house, I have power brownouts, I have kids who like to unplug things, but I have backups, and can restore things when I need to, so having that HA setup, which for most of my software, required three nodes, I started getting confused on what needed to go where.

The secondary problem, hitting a rabbit hole. Well, if I want to run Elasticsearch, I want to do it in a container, and I also want Kibana, but the Elastic containers are like… overly secure and take forever to spin up, so I want to optimize the containers, but… I need a container registry, because I don’t want to put all my trash containers on the internet, but… I remember something about secure registries, I think I need SSL certs, does that mean I need an SSL cert provider at home? Crap, how am I going to publish these containers, I could probably do it from my desktop, but I use my laptop too, oh, GitLab has some cool … that an entire CI pipeline, I should play with that, that is pretty awesome, but I need to do it for real, instead of on my sandbox, so I need a registry to…. Wait, what was I doing….

Yes, that was my last mindset, I got stuck in this super nasty loop rabbit hole, I got so concerned about doing it the right way, I forgot the entire point of this project, which was to build out some cool and awesome and play around with technologies in a cluster.


I bought a Netgate SG-1000 Microfirewall (pfSense), the thing is tiny and only has two network ports, I will plug one into my home network switch (WAN), and one into the Alphaberry network switch (LAN). This is the first start to the fully segregated environment.

Once that is complete, the Alpha will be formatted to a simpler OS (I am looking at probably Ubuntu core or CoreOS). This will serve as a Docker swarm master and non-ARM Container hosting. CentOS is awesome, and I know it fairly well from using it extensively at work, however, its way overkill for what I need at home. Finally, it will also be the server that gets hit with the traffic from the external HTTP ports (80/443).

Next, the Pi builds, I will re-flash everything with Raspian Jessie Lite + Docker CE only. This will drastically simplify these boxes. Of coarse, there will be some basic setup on them (SSH, NTP, etc…), but the idea is to keep them simple to setup, so I don’t need to run Ansible on them, just flash and go. I have a separate Pi that I will use to build the images, meaning I will keep it up to date manually, then occasionally burn the image from that, and spend a bit of time re-flashing the cluster nodes (would be nice to automate, but staying away from rabbit holes).

As mentioned, I will be using Docker Swarm for all of the container management, and Portainer to view and control it. For monitoring, I will still use ELK with Beats, but will put the metric beats inside containers, and most likely run the “all in one” ELK container on the Alpha.

The Alpha will contain a simple GIT server, meaning, over SSH. It will also have Ansible installed. This will allow me to run some common tasks across the servers (cleanup, restarts, etc…). I won’t be doing any significant system administration here, since “technically” I only need to configure two nodes, the Alpha, and the seeder Pi. However, hostnames might get set, because that would just be annoying to have everything named the same.

DHCP and DNS will be handled by the pfSense software, since the hardware is so damn small, it can be part of the cluster, which means that all of the settings will move with it. I will however configure my internal DNS to the search domain, which will not overlap with my home DNS ( vs. alphaberry).

I will run my own custom Docker Registry, but this will be as simple as I can make it, and run out of a container. If I do in fact require an SSL cert, I will use a self-signed one internally.

Lastly, I will run Traefik as the ingest for all HTTP access to the cluster, this should require minimal configuration, which can be done through the docker file for the custom container that runs it.

In conclusion, I learned Ansible, Raspian, a lot more about CentOS, Nomad, more about Consul, Metricbeats, DNS, DHCP, networking, and a whole bunch more system administration. The next phase of this project is building out some simple network diagrams, and then standing something up to actually be able to use without getting lost in the details because I over engineered for learning.

No comments: