Linux From Scratch and Linux Internals

Devashish Sood
9 min readMay 2, 2019
Tux — Mascot of Linux

History and Overview

The story of Linux is fascinating as it is a success story of how something became so ubiquitous that it is used in every Android device, iOS device, and Cloud server running Ubuntu, Windows, MacOS or any other operating system.

Android Tux

Before Linux was created, the most common operating systems were UNICS(UNIX) and DOS(Command line predecessor to Windows). Linus Torvalds was a Ph.D. student at the University of Helsinki in the 90's, who created and open sourced Linux for his dissertation but quit the program halfway to finish an MBA to manage Linux. Here is a timeline of the order in which Operating Systems were created, although some OS like UNIX were rewritten using Linux later.

This article should help you understand how to use Linux in a professional setting and maximize the key advantages it provides. It should also help you build your own custom Operating System if you are interested in that.

Linux From Scratch(LFS)

The Linux From Scratch project helps you to create a Linux Operating System from scratch by following certain commands that you can copy paste into the terminal. The general pattern of these commands involves the following steps:

  1. Create a partition in your current operating system/VirtualBox VM and set it up with the correct file format and Linux kernel. Create the basic directories like the bin folder where all the binaries will be installed, usr folder where all the user related files will go, etc directory, proc directory for running process information that you can read(kills a process if you delete its folder), mnt directory for Hard Drives and Mounts/Volumes and so on.
  2. Download Compressed Source code available on Github or Private repositories in the form of .tar.gz or .tar files.
  3. Verify the integrity of the Compressed tarball with a hash to check for tampering while downloading.
  4. Extract the compressed file to get the source code for a particular process or command.
  5. Use the ‘make’ command to run tests ‘make check’ on the source code, and then run ‘make install’ to build and install the process in the bin folder. The command builds the sources using clang into LLVM bytecode.
  6. Repeat Steps 2–5 for another process or command.

Benefits of LFS

The benefits of installing your own Operating System in this way is that you can read the source code for each of the 60 or so components that make up the OS, and you can visit their Github page/Website for more information on their purpose. By doing this you will be able to spot processes that are acting weirdly in Linux, by checking the active ports and processes that are running, and check their source code to verify or block the process from resources it does not need like the Internet. Also, you will be able to detect and kill processes that are not a part of the vanilla OS and could potentially be spyware on your system installed by some 3rd party service. You could also choose not to install a particular process if you consider it to be irrelevant for your needs or tweak it for your use-case. This way the final Operating System you build can range from 5MB to 1GB in size.

CPU Usage of Processes

Here are some examples of critical processes that are recommended by LFS authors.

make, mount(mount volumes), autoconf, automake, sed(stream editor — most commonly used to replace all occurrences of a word in a file without opening it), awk(text editing tool), gawk, iana-etc, grep(get lines with word), Intltool, xmlparser, kmod(kernel module), e2fsprogs, floating point operations processes,fstab(file system tables), du(size of folder), df(free space left on hard disk), lsof(check active ports with lsof -tunlp), top(current ram usage), htop, md5sum, tar(compress/extract), wget(install software from repository using command line), openssl(ssh and crypto functions), sudo, man(manual pages), gcc, perl etc

Video Playlist of LFS in Youtube

If you want a fast-forwarded video showing the installation of Linux OS with instructions from LFS you can watch this playlist on Youtube.

The Beyond LFS series helps you set up things like a repository management tool like apt-get, yum etc or DHCP for the internet protocols to connect via TCP or UDP and the various drivers needed if you’re building a PC, or OpenSSL or the X Window System, which is a GUI/menuconfig.

The Automated LFS series gives you an iso file of an installed Operating System and the scripts to generate it, where you can toggle things on/off if you are now familiar enough with the processes. The Cross LFS also helps you to set up custom Linux Operating Systems.

FileSystem and RAM

The Linux kernel is the boilerplate that is essential to all processes and functioning of the OS, and most of the Linux utilities are part of BusyBox or Util-Linux.

The filesystems available for Linux are EXT4, XFS, EXT3 etc and you can mount them using the ‘mount’ command. In fact, you could probably install the entire Operating System in a flash USB drive and mount it as RAM to run it. There was a period where the Linux router project built Operating Systems for routers which had only a few MB of storage on them so a lot of lightweight skeletons of Linux exist.

If you had 4 flash USB drives of 64GB each, you could mount them all to get 256 GB of RAM, and it would be faster than normal memory but slower than internal RAM by a small amount. So Linux is quite versatile and can be used in small devices to supercomputers.

The default partitions show up like /dev/sda or /dev/sda1 or /dev/sdb but if they are being routed through a hypervisor like those in AWS EC2, you will see them as /dev/xvda (where the ‘s’ changed to ‘xv’).

Distros of Linux

There are various distributions of Linux under various Licenses which were developed for different purposes. Despite their differences, every software written follows the POSIX standards which allows them to run on any variant of Linux.

Licenses

The major licenses which govern the popular Distros of Linux are:

  1. MIT License — MIT was involved in the development of UNIX with Bell Labs, AT&T and thus they have their own version of Open Source Linux.
  2. GNU GPL — GNU used to stand for GNU’s Not UNIX, which was developed as an alternative to UNIX and was rewritten from scratch along the same ideology and made Open Source.
  3. BSD Licence — Berkeley Software Distribution, University of California, Berkeley rewrote and Open Sourced Linux. UNIX is licensed under BSD.

Distros

  • Red Hat Linux — Red Hat Linux was based on GNU GPL Linux, and they commercialized their OS for security applications. Red Hat Linux finds itself in airplanes, banks and other high security applications. The free version of Red Hat Linux is called CentOS. It runs on the XFS filesystem, which is a hierarchical filesystem that keeps a lot of metadata and has slower delete/update operations. Red Hat also runs the Fedora project and developed GNOME which was one of the first GUI’s for Linux.
  • Ubuntu — Ubuntu is built on Debian, which is based on GNU Linux and Debian is supported by the Free Software Foundation. Ubuntu is supported by Canonical, which releases LTS(Long Term Support) stable releases of Ubuntu every few months. It is quite popular as an OS for entertainment purposes for non-tech savvy users and as an alternative to Windows.
  • MacOS — Apple created Darwin, a variant of UNIX utilizing BSD Linux and other open source code. It is quite popular among developers and certain populations who like a commercial version of Ubuntu(in the ideological sense since Ubuntu is Debian and MacOS is Darwin) for the extra benefits, stability, customer support and a commercial alternative to Windows.
  • OpenSUSE-OpenSUSE is a company that helped create a lot of Distros of Linux, it was the primary contributor to GUI software like GNOME and KDE. One of their popular distros is called Mandriva.
  • Arch Linux- It was built to be a Linux OS which does not come with any preinstalled packages and a rolling release(updates where the entire OS is updated). It is used to run hardware devices remotely, with less memory requirements and by advanced Linux users because they like lean operating systems.
  • Android — Android uses the GNU GPL Linux kernel and builds its own OS on top of it.

So why did so many Distros of Linux come from it being Open Sourced?

Business Model for Open Source Software like Linux

  • Software can be too big for a single person to read, a team of developers can read it in a few weeks.
  • Good software developers are always in demand/limited supply
  • Allow good software developers to contribute to code for academic, personal distinction reasons.
  • Get more eyes on software so that people can contribute, someone independent can verify/approve changes
  • Follow POSIX design to create sources
  • Allow businesses to compile sources themselves, tweak sources to match their business needs.
  • Ask for donations from a large community, donations from advertisers to the large community to earn a sizeable revenue. Popularity => Extract Revenue for upkeep operations and salary.
  • Make software customizable for different hardware and allow writing your own drivers increasing popularity. Smaller devices like routers need trimming on Operating System to fit in memory, so transparency in OS allows smaller OS to be created.
  • Faster updates to software than closed source for active community, easy replaceability of developers, viable if sufficient developers contribute to open source.
  • Security concerns on the software end are addressed by looking at sources.
  • Have stable releases regularly every few months which is checked for stability, vulnerabilities.
  • Allow creation of 3rd party libraries, Concept of App Store.
  • Keep Kernel and certain code Free and Open Source, but allow commercial usage and support.

Additional Context on reasons for growth and Popularity

Windows OS is closed source — the .exe files are encrypted and certain vendors will be concerned about how their information is being handled by the OS.

Windows OS was the first OS, so it gained a lot of popularity. But if its code remains closed source, it is hard to train developers to work with it. Which is why Windows is collaborating with Linux and has bash commands on its prompt now.

Windows may be immune to external attacks, but its code may be maliciously used by internal developers as it may have vulnerabilities. Linux vulnerabilities are detected and reported by the community when they are found and this helps the code be more secure. Software like Android uses the Linux kernel but uses obfuscation to prevent people from understanding it easily at runtime or as a .pkg file.

No lightweight version of Windows for small hardware devices like Arduino, Raspberry Pi, Routers, Embedded systems like custom microcontrollers etc.

Linux Live USB — Operating System in a few MB

Security concerns extend beyond software, for example the recent Hardware issues involving Intel like Meltdown, Spectre, Jackhammer, Intel Management Engine etc happen because their IP is not open-source so their vulnerabilities go unnoticed as fewer people look at it because people have to be trained specially to learn their IP. The worst part is these cannot be fixed remotely unless the computer is replaced.

Even AMD suffers from these problems although it strives for transparency and open-source which is why it is catching up to Nvidia with Ryzen.

Open hardware architectures built on RISC V along with open-source linux could help us to finally be free from vendor lock-in.

Network Architecture and the Internet

The DHCP protocol for TCP/IP specifies how the network requests work on Linux. The Network Interface Card drivers need to be installed along with the Network File System and Network Information System. You can check here for a detailed overview of Networking in Linux, Internet internals and the OSI model, and Denial Of Service attacks in the next article by me.

Thanks!

--

--