cross-posted from: https://discuss.online/post/39090314

Full layout on my forum here. Basic idea is 3 thin clients that fit within:

  • 1u space requirement
  • Possibly adding 2.5gb nic to the pci-e low profile slots of each hp t740 for Ceph shared storage.
  • Kubernetes and/or Proxmox experiments
    • Adding some sort of remote management via jetkvm or pikvm or similar.
    • Additional expansion no doubt needed for connecting to the three nodes.
+----------------------------------------------------------+
|                    CLUSTER TOTALS                        |
+----------------------+-----------------------------------+
| TOTAL CORES          | 12 physical / 24 threads          |
| TOTAL RAM            | 192 GB DDR4                       |
| TOTAL STORAGE        | 1.5 TB NVMe (local only)          |
| TOTAL NODES          | 3                                 |
| TOTAL NICs           | 37 (depending on 4-port NIC)     |
| GPU                  | None                              |
| KVM/IPMI             | None                              |
+----------------------------------------------------------+

All suggestions greatly appreciated! Never done this before, but hoping to save myself from the future hassle of having to drive to the data center… would much rather log in on Wireguard and address things by remotely rebooting, etc.

  • fruitycoder@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    3
    ·
    14 hours ago

    I would recommend 4-5 nodes. 5 if you want true high availability. 4 still requires some intervention in case of failure.

    Just because it’s bare metal. Got to think of your Mean Time to Repair (MTTR) which is to say if a whole node goes bust how long will it take to potentially order and install a new one.

    If you go kubernetes (k8s) I would recommend rke2 or k3s. They are really straightforward setups and pretty enterprise ready out of the box.

    If you have a hard requirement for Ceph I would recommend doing Rook-Ceph which makes deploying and management a lot easier by letting k8s handle it. For simpler but less performant (in my testing) persistent volumes (PVs) like ceph Longhorn is really easy to deploy and manage.

    For backups Velero is really nice for apps in your cluster, since it can be done per namespace and include PV data too. Rke2/k3s both have nice etcd (the backend data base for k8s) snapshoting and backup tools too for full disaster recovery.

    Rke2/k3s both have ways to auto deploying charts from the filesystem too https://docs.rke2.io/add-ons/helm

    This is a good stepping stone for GitOps imho. If that matters to you at all. Starting with just having a git dir for these files, then later doing some like ArgoCD

    I would also recommend, since you are looking at hyper converged storage have dedicated network lines for it is generally recommended. So create a bond of two ports per node just for storage, tag them with their own vlan, and in your setup of rook or longhorn specific that vlan interface as the device for data to flow.

    Pxe boot is also nice at this scale, either setup on your router (OpenWrt has decent support), you maintance laptop/machine, and/or do something like Tinkerbell (cloud native pxe from your k8s cluster!). It’s just nice to be able to blow away a node and rebuild if you are tinkering a lot.

    Remember cattle not pets, and welcome to the range cowpoke!

    • kiol@discuss.onlineOP
      link
      fedilink
      English
      arrow-up
      2
      ·
      13 hours ago

      Seems Mellanox ConnectX-3 Pro Dual Port 10G SFP+ Low Profile MCX312C-XCCT is decent choice, which I can use for 10gb triangle between the current 3 nodes. I was thinking of using jetkvm + rs-232 expansion serial cable with 4-port hdmi/usb switcher for controlling the nodes. Financially, a future expansion would be moving from the triangle to a 10gb switch, allowing for NAS or other node additions. Also, each node has an empty Sata SSD port currently. Updated the forum thread.

      • fruitycoder@sh.itjust.works
        link
        fedilink
        English
        arrow-up
        2
        ·
        10 hours ago

        I really enjoy the pikvm and the switcher for my home lab. Redfish support gets fishy with a switcher if that is a concern though.

        I do love a good mesh for a cluster block though. My next next next project is using KubeOVN to turn my cluster block into a switch with “out” connections to connection other devices (wifi, laptop, cameras, etc) to it as my network router and of course upstream from the modem and hotspot for Internet connection.

  • Decronym@lemmy.decronym.xyzB
    link
    fedilink
    English
    arrow-up
    1
    ·
    edit-2
    10 hours ago

    Acronyms, initialisms, abbreviations, contractions, and other phrases which expand to something larger, that I’ve seen in this thread:

    Fewer Letters More Letters
    Git Popular version control system, primarily for code
    IP Internet Protocol
    NAS Network-Attached Storage
    SSD Solid State Drive mass storage
    k8s Kubernetes container management package

    5 acronyms in this thread; the most compressed thread commented on today has 8 acronyms.

    [Thread #268 for this comm, first seen 1st May 2026, 16:40] [FAQ] [Full list] [Contact] [Source code]

  • non_burglar@lemmy.world
    link
    fedilink
    English
    arrow-up
    1
    ·
    17 hours ago

    I’m not sure what data center will allow you to hodgepodge a 1u cluster of consumer-grade hardware, but heat and power management alone will be a problem.

  • frongt@lemmy.zip
    link
    fedilink
    English
    arrow-up
    6
    arrow-down
    1
    ·
    1 day ago

    What’s your plan for when you reboot and it doesn’t come back up? I strongly recommend having some kind of IPMI and virtual console, whether that’s pikvm or whatever. Far too many times I’ve had a server go down and it’s saved me from having to drive down to the datacenter and stand in a loud, frigid room and troubleshoot the issue.