Skip to content

Hi, I'm CyberWatchDoug

Husband, father, fixer.

This is where I share musings on information technology and security.

You can also find me on YouTube, GitHub, Bluesky, and LinkedIn.

Recent Posts:

Advanced Vector Extensions (AVX) and MongoDB

As part of routine homelab maintenance, I investigated upgrading containers that run MongoDB and ran into AVX-related compatibility issues.

Well, this wasn't anything routine, as I quickly learned quite a bit about MongoDB's requirements. Specifically, the Advanced Vector Extensions instructions set, or AVX for short.

What is AVX?

AVX is a family of CPU instruction-set extensions that provide SIMD (Single Instruction, Multiple Data) vector operations. AVX exposes wider vector registers and instructions for floating-point and integer math, which allows the CPU to process multiple values in parallel with a single instruction. That makes workloads compression, cryptography, and numeric computation much faster when developers choose to compile their code to take advantage.

In short, AVX is a hardware feature implemented in the processor microarchitecture; software only uses it if the compiler and runtime explicitly enable and emit AVX instructions. There are multiple generations (AVX, AVX2, AVX-512) with increasing register width and capabilities.

With that technical jargon out of the way, let's focus back on to MongoDB's requirements.

What does MongoDB require and why?

Modern MongoDB binaries are frequently built and optimized using modern compiler toolchains that may emit vectorized instructions when available. If a MongoDB binary is compiled with AVX-enabled optimizations, attempting to run it on a CPU that does not support AVX can cause immediate failures (such as illegal-instruction errors) or force slower fallback code paths where the binary provides them. From my limited experience, new versions of MongoDB from the last couple years prefer to just break with immediate failures.

Functionally, AVX not only affects binary compatibility but also performance characteristics. When AVX is present, you often see improved throughput and lower CPU time for the same workload. I've never benchmarked or ran significant workloads to notice, but a quick search can easily provide those enlightening details (should you choose to go down that rabbit hole).

Checking CPU Flags

Okay, so the latest and greatest MongoDB requires AVX and will immediately fail if it's missing for the CPU. Thankfully, there is an easy way to check for this beforehand (on Linux anyways!).

For Linux, the simplest checks are lscpu and examining flags in /proc/cpuinfo:

lscpu | grep -i avx
grep -m1 '^flags' /proc/cpuinfo

Look for tokens like avx, avx2 (and related tokens such as fma or sse4_2) in the flags string. Those tokens indicate the CPU advertises support for the corresponding instruction sets.

Inside a container, the visible CPU flags are generally the host's flags. You can probe them from a short-lived container as well:

# Docker:
docker run --rm --privileged alpine:3.23 grep -m1 '^flags' /proc/cpuinfo

# Podman:
podman run --rm --privileged alpine:3.23 grep -m1 '^flags' /proc/cpuinfo

# Kubernetes
kubectl run -it --rm test-pod --image=alpine --restart=Never -- sh
grep -m1 '^flags' /proc/cpuinfo
exit

For Windows, you'll have to use the Sysinternals coreinfo utility (coreinfo -f) to enumerate supported instruction sets, or check from WSL if available.

For cloud VMs it is best to consult the provider's VM SKU documentation, as many public cloud providers list the CPU generation and whether the instance exposes AVX/AVX2. If not, then you'll need to review things for yourself.

In either case (Linux/Windows/Cloud), I have found the following list helpful. This assumes you know your CPU model/family: - QEMU / KVM CPU model configuration.

Confirming CPU flags was never on my bingo card until now. It is important for both a compatibility check (will the binary/software run) and an operational one (are you getting expected performance).

Note that some hypervisor configurations or nested virtualization setups can hide or alter exposed CPU features. This was the case for my Proxmox VE servers. I was still using the type x86-64-v2-AES default for my VMs, which does not include the AVX relevant flags. I'll detail those fixes for another post, so stay tuned!

Mitigations if AVX is missing

If you find a host without AVX support, practical mitigations include:

  • Selecting a different VM or host SKU whose CPUs advertise AVX/AVX2.
  • Selecting a different CPU type for your VM that advertises AVX/AVX2.
  • Building MongoDB from source with conservative compiler flags that avoid AVX, producing a compatible binary for older CPUs. (Not something I recommend)
  • Using an earlier MongoDB binary that targets older instruction sets (weighing security/support implications, also not recommended).
  • Running MongoDB on managed services (e.g., Atlas) or on dedicated nodes provisioned with compatible CPUs.

General Recommendations

So in addition to other system admin duties I perform on my homelab, I now have a few other items to build out and add to my routine:

  • Use or add a pre-install check that inspects CPU flags on each VM, container, or node intended to run MongoDB to catch incompatibilities early.
  • If using Kubernetes, look into having a label for nodes with a capability marker (for example cpu.avx=true) and use node selectors/affinity for MongoDB pods so they only schedule onto compatible hosts.
  • Ensure CI and test runners either match the minimum CPU feature set of production or build multiple artifacts targeting different instruction sets.

That last one is more of a check on sanity to keep me viewing things as production vs testing. Because let's be honest, nothing in my homelab is technically production, even though I rely on it. It's all just under development, all the time!

However, these steps reduce surprise failures and let you take advantage of vectorized performance when the software developers make it available.

Perspective Shift - SME to Advisor

Early in my cybersecurity journey, I believed my value lay in finding vulnerabilities, maintaining detection capabilities, and keeping adversaries at bay. The work was enjoyable, and it let me focus on technical skills while building a strong foundation for problem solving and analytical thinking.

Building Confidence in Uncomfortable Conversations

I've never been great at uncomfortable conversations. Not at work, and not at home.

There was always something in me that zeroed in on the uncomfortable aspects and amplified my anxiety. I'm sure there are plenty of explanations I could research to understand why but that's not the aim of this post (maybe another time).

Every tough discussion felt like a minefield and I couldn't help but want to avoid them altogether. As I got older, and wiser, I learned that the difficult, challenging parts of life are where real growth happens. I began exploring ways to practice and overcome this tendency.

Quick Wins by Reading API Reference

I know it can be tedious and more of a time sink to read through API reference or schema. But it is absolutely worth it, especially if you also take notes for yourself.

Let me explain. I have been wrapping my head around how I can get up and running with using CloudNativePG for my databases. Starting with Linkding, but have plans to expand it to a half dozen other self-hosted apps.

I didn't want to just copy someone else's manifest, or ask GenAI to create a manifest for me. I took the time to read what each field was for, and went down the rabbit hole reading first the Cluster reference, then ClusterSpec and several deeper fields within that.

From Proxmox to Kubernetes - Evolving My Homelab (part 3)

This is part 3 of my series detailing a transition of my Homelab architecture to using Kubernetes with Proxmox. You can check out the previous parts here:

In part 3, I’ll explore my implementation strategy for Kubernetes and how it has transformed my homelab.

The Strategic Shift to Kubernetes

Kubernetes has become the logical next step for my homelab. I've pointed this out briefly in parts 1 and 2. I enjoy the break-and-fix cycle of learning in my homelab. However, over the years I have found more joy with making things work consistently, from the very start, every time.

Updating Linkding Deployment With Cloudflare Tunnels

This is an update to my previous post regarding my process for creating a Linkding self-hosted service with FluxCD in my homelab Kubernetes cluster.

You can read the original post here: Creating Linkding Deployment with FluxCD

After getting my Linkding deployment working through my FluxCD GitOps lifecycle, I quickly realized that I was missing some key functionality and configuration steps.

The first one being that my Linkding app wasn't being exposed for me to access locally on my network. It was only accessible from a node in my cluster that could access the cluster IP addresses. This a is problem if I'm planning to use Linkding for all my bookmarks!

The next one being that the Deployment does not declare any superuser account. In the original version of my Deployment I was required to perform an exec command inside the Container to create my superuser name and password before I could ever login. This was using a python script and very tedious! Not what I want if my aim is to have a declarative, stateful Deployment where we could potentially deploy Linkding to a brand new Kubernetes cluster with a superuser already setup and configured. I have the PersistentVolumeClaim setup for the data directory to persist within the cluster, but an initial or bootstrap deploy to a brand new cluster would not result in any superuser account getting setup. This relates to the idea of idempotentency, where I want the Deployment to be applied the first time and any number of times after that without changing the outcome beyond the initial deployment.

These updates support declarative, repeatable deployments of linkding and improves security by not hardcoding credentials.

For a full breakdown of this updated structure to my Linkding Deployment you can check out my homelab GitHub repository at https://github.com/cyberwatchdoug/homelab/tree/main

From Proxmox to Kubernetes - Evolving My Homelab (part 2)

This is part 2 of my series detailing a transition of my Homelab architecture to using Kubernetes with Proxmox. You can check out part one here: From Proxmox to Kubernetes - Evolving My Homelab (part 1)

In this post, I continue my journey evolving my homelab from a simple Proxmox setup to a more robust Kubernetes-based architecture. Building on part one, I’ll share what worked, what didn’t, and how my approach to self-hosting and automation has changed over time.

Secrets Management With External Secrets Operator and 1Password (part 2)

In this second part for my Secrets Management with External Secrets Operator (ESO) and 1Password, I will be detailing how I configured my ESO deployment through GitOps using Flux, Kustomization, and Secrets resources. You can read the first part here: Secrets Management With External Secrets Operator and 1Password (part 1).

A recap on why ESO: the goal of the ESO operator is to synchronize secrets from these external sources into Kubernetes secrets, so they can be more easily accessed and used throughout the cluster.

All of these configuration files can be found in my homelab GitHub repository located here: https://github.com/cyberwatchdoug/homelab/tree/main

Updated Linkding Deployment with FluxCD

This is an update to my previous post regarding my process for creating a Linkding self-hosted service with FluxCD in my homelab Kubernetes cluster.

You can read the original post here: Creating Linkding Deployment with FluxCD

After getting my Linkding deployment working through my FluxCD GitOps lifecycle, I quickly realized that I was missing some key functionality and configuration steps.

The first one being that my Linkding app wasn't being exposed for me to access locally on my network. It was only accessible from a node in my cluster that could access the cluster IP addresses. This a is problem if I'm planning to use Linkding for all my bookmarks!

The next one being that the Deployment does not declare any superuser account. In the original version of my Deployment I was required to perform an exec command inside the Container to create my superuser name and password before I could ever login. This was using a python script and very tedious! Not what I want if my aim is to have a declarative, stateful Deployment where we could potentially deploy Linkding to a brand new Kubernetes cluster with a superuser already setup and configured. I have the PersistentVolumeClaim setup for the data directory to persist within the cluster, but an initial or bootstrap deploy to a brand new cluster would not result in any superuser account getting setup. This relates to the idea of idempotentency, where I want the Deployment to be applied the first time and any number of times after that without changing the outcome beyond the initial deployment.

These updates support declarative, repeatable deployments of linkding and improves security by not hardcoding credentials.

For a full breakdown of this updated structure to my Linkding Deployment you can check out my homelab GitHub repository at https://github.com/cyberwatchdoug/homelab/tree/main

Secrets Management With External Secrets Operator and 1Password (part 1)

In this first part for my Secrets Management with External Secrets Operator (ESO) and 1Password series, I'm going to detail how to get ESO deployed through GitOps using Flux, Kustomization resources, and Helm resources. All of these configuration files can be found in my homelab GitHub repository located here: https://github.com/cyberwatchdoug/homelab/tree/main

What exactly is External Secrets Operator, and why should we use it? Great question. ESO is a Kubernetes operator that solves the dilemma of secrets management in Kubernetes from external sources. The list of providers is lengthy, but it includes important players like AWS, Google, Azure, HashiCorp, CyberArk, and 1Password. The goal of this operator is to synchronize secrets from these external sources into Kubernetes secrets, so they can be more easily accessed and used throughout the cluster.