Merikanto

一簫一劍平生意,負盡狂名十五年

LPIC - 103 Configure Hardware


# Configure Firmware


Provides config tools & initiates OS booting process

Firmware settings can control onboard devices (hard disk controllers, USB ports, etc.)

The most important firmware is installed on the motherboard

Motherboard’s firmware resides in flash memory
i.e. EEP-ROM (Electronically Erasable Programmable Read-Only Memory)

  • Initialize motherboard’s hardware & control boot process

  • Provide fundamental I/O services (at boot time)

  • Types:

    • BIOS (Basic Input / Output System)

    • EFI (Extensible Firmware Interface)

    • UEFI (Unified EFI, EFI 2.0)

  • Enable / disable on-board hardware


1 - Virtual Filesystem

/dev, /sys, /proc are all virtual filesystems

  • /dev : virtual fs to represent hotplug devices
  • /proc: virtual fs to represent kernel & hardware data (access hardware info that aren’t accessible via /dev)
  • /sys : virtual fs to represent device info

/dev

Device file:

After Linux kernel communicates with device on a interface,
it must be able to transfer data to & from the device

Use /dev to interface with hardware devices


When add hardware device (USB, NIC, hard drives)

  • Linux creates file in /dev representing the device
  • Application then interact directly with the file to receive & send data

Design:
Much easier than requiring each application to know how to directly interact with a device.


Device data transfer

  • Receive data from device: read Linux device file associated with the device
  • Send data to device: write to Linux device file

Device file types

  • Character device (c) : Transfer data one char per time

    For serial device, e.g. termianls, USB

  • Block device (b) : Transfer large blocks of data

    For high speed data transfer, e.g. hard drives, network cards


Device mapper

  • Create files in /dev/mapper, which links to physical block device files in /dev
  • Maps physical block device to virtual block device
  • Virtual block device : allows system to intercept device IO, and perform certain operations
  • Mapped device are used by:
    • LVM (to create logical volumes)
    • LUKS (encrypt data on hard drives)

/sys

Notes

  • Created by kernel in the sysfs filesystem format
  • Obtain information about
    • system bus
    • devices
    • kernel
    • installed kernel modules

/proc

Use it to troubleshoot hardware issues

Linux kernel changes files & data in /proc, as it monitors system hardware status


1
2
3
4
5
# install
sudo apt install -y procinfo

# show info about all connected hardware devices
lsdev

Retrieve info from /proc/interrupts, /proc/ioports, /proc/dma
And combine them in one output


More details of /proc is in the below section.


2 - proc

  • IRQ - /proc/interrupts
  • I/O - /proc/ioports
  • DMA - /proc/dma

IRQ (Interrupt Request)

  • Signal sent to CPU, instructing it to suspend current activity, and handle external event (e.g. keyboard input)
  • Allow hardware devices to indicate when they have data to send to CPU

On the x86 platform, IRQs are numbered from 0 to 15
On newer platforms (x86-64), IRQs are more than 16

  • IRQ 1 - reserved for keyboard use only
  • IRQ 8 - real-time clock (reserved for system clock)

IRQ conflicts: reconfigure devices to use different IRQs


Buses:

  • ISA bus (Industry Standard Architecture)

    Sharing IRQ between 2 devices is tricky, become rare since 2001

  • PCI bus (Peripheral Component Interconnect)

    PCI devices can share IRQs more easily


Explore IRQs

1
sudo cat /proc/interrupts

Linux doesn’t begin to use an IRQ, until relevant driver is loaded

The /proc filesystem is a virtual filesystem

  • Refer to kernel data that’s convenient to represent using a filesystem
  • Files in /proc provides info about hardware, running processes, etc.
  • Many Linux utilities use /proc behind the scenes

I/O Addresses (I/O Ports)

Unique locations in memory, reserved for communications between CPU & specific physical hardware devices

I/O addresses are associated with specific devices, and normally should not be shared

1
sudo cat /proc/ioports

With PnP, IO ports conflicts aren’t very common. When in conflict, use setpci


DMA Addresses

DMA: Direct Memory Addressing

Alternative method to communication to I/O ports

  • Rather than have CPU mediate data transfer between device & memory,
    DMA permits device to transfer data directly
  • Send data from hardware device directly to memory, without waiting for CPU.
    Then CPU can read those memory locations to access data
  • Lower CPU requirements for I/O activity, improve overall system performance
1
2
# show dma channels
sudo cat /proc/dma


# Configure Hardware


1 - Geometry Settings

Traditional hard dark layout

Hard disk’s Cylinder / Head / Sector (CHS) geometry

  • Fixed number of read / write heads
  • Any sector on a hard disk can be un iquely identified by 3 numbers :
    Cylinder number + Head number + Sector number
  • Hard disks are built from platters, each of which is broken into tracks, which are broken into sectors

Problems with CHS

  • Earliest hard disks use variable numbers of sectors per cylinder

  • CHS translation: Moving disks between computers can result in problems

    Mismatched CHS geometries claimed in disk structures & by BIOS


Solution:

Logical Block Addressing (LBA) mode (or Linear Block Addressing)

  • Single unique number assigned to each sector on the disk
    Given sector number, disk firmware can read from correct head & cylinder
  • Modern BIOS provides option to use LBA / CHS translation mode
    EFI use LBA mode exclusively (no CHS translation)

2 - Coldplug & Hotplug Devices

Difference

  • Coldplug: attach / detach when power off only

  • Hotplug: attach / detach, even when power on

Coldplug devices are designed to be physically connected,
only disconnected when power off

  • Old external devices (parallel& RS-232 ports) are coldplug devices

Kernel & user space

  • User space program: run as ordinary program, communicate with external devices

  • Only kernel can communicate directly with hardware

  • /dev : interface between user-space programs & hardware

    e.g. Link to optical drive: /dev/cdrom


Utilities to manage hotplug devices

  • Sysfs : virtual filesystem, mounted as /sys

    Export device info, for user space program to access

  • HAL Daemon (hald) : Hardware Abstraction Layer (HAL)

    User space daemon. Provide other user space programs with available hardware info

  • D-Bus : Desktop Bus

    Daemon. Enables processes to communicate with each other & registers, to be notified for process / hardware event

    e.g. New USB device is available

  • udev: virtual filesystem mounted at /dev (for hardware devices)

    • Runs in the background, auto-detect new hardware connected to Linux (auto install required kernel modules)

    • Create dynamic device files & assign each a unique device filename in /dev,
      as drivers are loaded & unloaded

      1
      2
      3
      4
      5
      # configure udev
      sudo cat /etc/udev/udev.conf

      # debug
      /lib/systemd/systemd-udevd --debug &
    • Also creates persistent device files for storage device

      Use /dev/disk to create links to /dev device files

      e.g. /dev/disk/by-path : Link storage device by physical hardware port they’re connected to

With udev device links, can specify reference to storage device by permanent identifier



# Interface, Kernel, Hard Disks

Device Interface (3 popular standards)

  • PCI
  • USB Interface
  • GPIO Interface

1 - Device Interface

PCI lspci, setpci, lsdev

Module lsmod, insmod, modprobe, rmmod

USB lsusb, usbmgr, hotplug

Disks lsblk, blkid


PCI

PCI Standard (1993) : connecting hardware boards to PC motherboards


Tweak how PCI devices are detected

  • Kernel configuration screens under Bus Options

  • Most firmware implementations have PCI options

  • Linux drivers support options


Commands:

  • setpci : directly query & adjust low-level PCI device configuration

  • lspci : show current PCI configuration

    1
    2
    3
    4
    5
    6
    7
    8
    # lspci
    -vv more verbose
    -t tree view
    -M perform scan in bus-mapping mode
    reveal device hidden behind misconfigured PCI bridge

    -x in hex dump
    -k show kernel driver module for each installed PCI card

In case of conflicts on the PCI board, use setpci


PCI Boards

PCI bus: Plug-and-Play (PnP style configuration — auto-config)


PCIe

  • PCI Express
  • common interface for external hardware device, PCI 2.0 (much faster)

Client devices use PCI boards:

  • Internal hard drives - SATA / SCSI

    Linux auto recognize SATA & SCSI hard drives connected to PCI boards

  • External hard drives - Network hard drive

    Communicate on a fiber channel network: HBA (Host Bus Adapter)

  • Network Interface Controllers (NIC)

  • Wireless cards ( IEEE 802.11 )

  • Bluetooth

  • Audio cards

  • Video accelerators

    Advanced graphics often use video accelerator cards


USB

  • Linux uses drivers for USB controllers (/proc/bus/usb)

  • USB interface use serial communication, hence fewer connectors with motherboard


USB Basics

Protocol & hardware port for transferring data

  • USB 1.0 : up to 127 devices, 12 Mbps data transfer
  • USB 2.0 : 480 Mbps data transfer
  • USB 3.0 : 4.8 Gbps data transfer
1
2
3
4
5
# lsusb
-v verbose
-t tree view
-d vendor:product
# restrict vendor & product (codes afrer ID)

Most system includes standard USB hub to connect multiple USB devices to USB controller


Early Linux USB implementation:

USB disk storage device use USB storage drivers that interface with Linux’s SCSI support

Make USB look like SCSI devices


Linux provides USB filesystem, that in turn provides access to USB devices

  • Filesystem appears as part of /proc virtual fs
  • USB device info: /proc/bus/usb

Two steps to get Linux interact with USB

  • Linux kernel have the proper module installed to recognize USB controller

    Controller provides communication between Linux kernel & USB bus on the system

  • Linux has kernel module installed for the individual device type plugged into the USB bus

    This is for Linux to recognize the specific device,
    after communication established via installing modules to recognize controller

Software can access files in /proc to control USB devices,
rather than using device files in /dev


USB Manager Applications

USB is designed as hot-pluggable


usbmgr :

  • Runs in the background, detect changes on the USB bus
  • When it detects changes, load / unload the kernel modules that are required to handle the devices
  • Global configuration: /etc/usbmgr/usbgr.conf

hotplug :

  • Config of specific USB device: /etc/hotplug

  • /etc/hotplug/usb.usermap contain database of USB device IDs & pointers to scripts in /etc/hotplug/usb

    Scripts run, when devices are plugged / unplugged


GPIO

GPIO: General Purpose IO


Notes

  • Example : Raspberry Pi
  • Purpose : control external devices for automation
  • Provides multiple digital IO lines to control individually (Down to single-bit level)
  • Handled by special IC chip (Integrated Circuit), mapped into memory

Feature

  • Ideal for supporting communications to external devices

    e.g. lights, sensors, motors, robot operations

  • Possibility to use Linux to control objects & environments


2 - Kernel


Overview

Linux kernel needs device drivers to communicate with installed hardware devices.

  • Compile device drivers for all known devices into kernel, makes large kernel binary file

  • Use kernel modules to avoid above situation

    System only links modules need for the hardware

  • When compiling new Linux kernel:
    Also compile any hardware modules along with the new kernel


Module file type

  • As source code (need to compile)
  • As binary object files ( .ko )

Create separate modules for each kernel version: /lib/modules/5.16-xx


Notes

  • Modules to load at boot time : /etc/modules

  • kernel module config file

    1
    cat /etc/modules-load.d/modules.conf
  • modules dependencies

    1
    cat /lib/modules/[version]/modules.dep

When use moduel_install to install modules, it calls depmod utility

If modify / add modules, must manual run depmod to update modules.dep


Kernel Modules

Hardware in Linux is handled by kernel drivers,
many in the form of kernel modules - /lib/modules

Stand-alone driver files can be loaded to provide access to hardware
i.e. can be linked into kernel at runtime


lsmod: currently loaded drivers

  • Info only about kernel modules, NOT about drivers that compiled directly to Linux kernel
  • Used by column: number of other modules / processes that are using the module
1
2
# more info about module
modinfo nouveau

Load Kernel Modules

Load kernel modules with 2 programs: insmod & modprobe

  • insmod: inserts single module into the kernel
  • modprobe: auto loads any depended-on modules

When problems with insmod: manually load the depended-on modules, or use modeprobe

View config files:

1
sudo cat /etc/modprobe.d/xx

Linux kernel module has auto-loader feature (load modules automatically),
which must be compiled into kernel, and on various config files.


Commands

modprobe handle modules based on module name, no need to list full filename

1
2
3
4
5
6
7
8
9
10
11
12
13
14
# load using insmod
insmod /lib/modules/5.16/kernel/drivers/pci/pci-stub.ko

# load modules
modprobe -vv xx

# change config file by creating a new file
modprobe -C /etc/modprobe.d/xx.conf xx

# dry run
modprobe -n / --dry-run

# show deps
modprobe --show-depends

Remove Kernel Modules

rmmod : unload single kernel module

1
2
3
4
5
6
7
8
9
10
11
12
13
14
# unload
rmmod -v xx

# wait until unused
rmmod -w xx

# force
rmmod -f xx

# unload entire module stack (with deps)
modprobe -r xx

# dry run remove
modprobe -nvr btusb

3 - Hard Disks & Storage

  • PATA / ATA (Parallel Advanced Technology Attachment)
  • SATA (Serial ATA)
  • SCSI (Small Computer System Interface)
  • External Disks (USB, SCSI, IEEE-1394)

Storage type

  • HDD (hard disk drive)

    Store data magnetically on disk platters,
    with movable read / write head to write / retrieve magnetic images on the platters

  • SSD (solid state drive)

    Store data electronically using integrated circuits

    No moving parts in SSD : Faster & more resilient


Summary of 3 Standard Interface

  • PATA
    • Parallel interface
    • Wide cable
    • 2 devices per adapter
  • SATA
    • Serial interface
    • Thin cable
    • 4 devices per adapter
  • SCSI
    • Parallel interface
    • Faster than SATA
    • 8 devices per adapter (More disks together in a single interface)

PATA

Configure PATA Disks

  • PATA disks use parallel interface (several bits of data are transferred over cable at once)
  • Wide cable: support 40 / 80 lines
  • Connect up to 2 devices to each PATA connector on a motherboard
  • PATA cables have 3 connectors:
    • 1 for motherboard
    • 2 for disks

PATA disks must be configured as masters / slaves. Can be done via jumpers on the disks.

  • Master device at the end of the cable
  • Slave device on the middle connector

All modern PATA disks support cable select

Driver attempt to configure itself automatically (based on position on the PATA cable)

Easiest way to configure:

  • Set all PATA devices to use cable select option

For best performance: Disks should be placed on separate controllers


PATA disks & partitions naming scheme:

  • /dev/hda - Master drive on controller 1
    • /dev/hda1 - Partition 1 on disk 1
    • /dev/hda2 - Partition 2 on disk 1
  • /dev/hdb - Slave drive on controller 1
  • ......

For instance, if there’s master disk on controller 1 & 2, but no slave disk on controller 1:

  • /dev/hda
  • /dev/hdc

SATA

SATA is a replacement for PATA:

Newer motherboards often has 4+ SATA interfaces, and no PATA interface


Connect SATA disks

  • Connect to motherboard / controllers on a 1-to-1 basis

    Cannot connect more than one disk to a single cable (Simplify config)

  • SATA is a serial bus

    • Only 1 bit of data can be transferred at a time
    • SATA transfers more bits per unit of time on the data line
    • SATA is faster than PATA, cable is thinner (serial)

Modern PATA drivers treat PATA disks as SCSI disks

Most SATA drivers treat SATA disks as SCSI disks

Some old drivers treat SATA disks as PATA disks


SCSI

History

  • Traditionally a parallel bus (like PATA)

  • Newer variants is a serial bus (like SATA)

    SAS (Serial Attached SCSI)

  • Cost is very high (used on high end systems)


Configuration

  • Supports up to 8 / 16 devices per bus

    • One of these is the SCSI host adapter, either built into motherboard, or come as plug-in card
    • In reality, number of attached devices is limited, due to cable length limit
  • Each device has unique ID, assigned from a jumper on the device

  • If motherboard lacks built-in SCSI ports, possibly it won’t detect SCSI devices
    Can still boot from SCSI hard disk, if SCSI host adapter has it own firmware to support booting


Naming

  • /dev/sda
  • /dev/sdb
  • ......

Best practice:

Give hard disks lowest possible SCSI ID

Avoid future disks use higher ID and potential Linux device identifier collision


Problems

  • Multiple SCSI host adapters

    Linux assign device filenames to all disks on the first adapter

  • Some non-SCSI devices (e.g. USB, SATA) are mapped to Linux SCSI subsystem

    Cause a true SCSI hard disk assigned a higher device ID

  • SCSI bus is logically one-dimensional (all device on a single line)

    • Special resistor pack: prevent signal from bouncing around along SCSI chain
    • Each end of SCSI bus must be terminated, but device mid-chain must NOT be terminated
    • Incorrect termination will result in bizarre SCSI problems

List only SCSI block devices

1
lsblk -S

External Disks

Common types: USB, SCSI, IEEE-1394

SCSI

  • Direct support for external disks

  • Many SCSI host adapters have both internal & external connectors

    Can configure external SCSI disks just like internal disks

  • Linux treats USB & IEEE-1394 as SCSI devices

Remove external drives:

  • umount
  • unplug device

Direct unplug may result in damage to filesystem

Most SCSI buses are NOT hot-pluggable



# Partition

Advantages for Disk Partition

  • Multi-OS support, use different filesystems

  • Disk error protection

    Errors only affect the files on that partition


Partition System

Partitions are defined by data structures that are written to specified parts of the hard disk.

  • MBR (Master Boot Record, old)

  • Stores data in the first sector of the disk

    • Limited to partitions of 2TB max
  • GPT (GUID Partition Table, new)

    • Higher limits

1 - Partition Schema

MBR & GPT: a way of indexing partitions

Linux creates /dev files for each separate disk partition

MBR

First sector on the first hard drive partition on the system


Original x86 partitioning scheme allows only 4 partitions.

Each primary partition can split into multiple extended partitions


New scheme is extended: (with backward compatibility)

  • Primary partition : same as original partition type

  • Extended partition : special type of primary partition. Placeholder for logical partition

  • Logical partition : resides in a single extended partition

    All logical partitions must be contiguous


Numbering

  • Many OS must boot from primary partition

  • 4 primary partitions / 3 primary + 1 extended

  • Primary : number 1 - 4

    • Logical : number 5 +
  • Gaps can appear in MBR numbering (primary only, no logical partitions)

    • Example: 1, 3, 5, 6, 7

MBR & Boot

  • MBR data structures hold both partition table & primary BIOS boot loader

  • MBR exists only in the first sector of the disk:

    • easy to damage
    • erasure of MBR will make entire disk unusable

MBR Backup

1
2
3
4
5
# backup MBR partition
sfdisk -d /dev/nvme0n1 > backup.txt

# restore
sfdisk -f /dev/nvme0n1 < backup.txt

MBR Type codes

Type code: 1 byte number (2-digit hex)

1
2
3
4
0x0c	FAT
0x07 NTFS
0x82 SWAP
0x83 Linux filesystem

GPT

Overview

  • Part of Intel’s EFI specification

  • GPT uses Protective MBR, additional data structures defines true GPT partitions

    Legal MBR definition makes utilities think the disk holds a single MBR partition across entire disk

    (Just like protected mode flat memory model)

  • Define 128 partitions max (by default)

    Gaps can occur in partition numbering (e.g. 3, 5, 104)

  • Type codes : 16-byte GUID values


2 - Partition Alternatives

More dynamic & fault-tolerant

  • Multipath
  • LVM (Logical Volume Manager)
  • RAID (Redundant Array of Inexpensive Disks)

Multipath

DM Multipathing (Device Mapper)

  • Utilize dynamic /dev/mapper device file directory

  • For each new multipath device: /dev/mapper/mpathN

    N is the number of multipath drive

  • This device file is a normal device file, allow to create partitions & filesystems


Configure multiple paths between Linux & network storage devices

  • All path active: increased throughput
  • One path inactive: fault tolerance

Commands

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# install
sudo apt install multipath-tools -y

# kernel module
dm-multipath

# Show multipath devices
multipath

# background process to monitor path
# and activate / deactivate paths
multipathd

# create device entries for multipath device
kpartx

LVM

Usage

  • Set aside one or more partitions

  • Assign them MBR partition type code 0x8e (or GPT equivalent)

  • Access logical volumes: /dev/mapper/

    Logical volumes create entries in /dev/mapper, which represents LVM device

For each physical partition, need to mark partition type in fdisk / gdisk


Commands

1
2
3
4
5
6
7
8
9
10
11
12
13
14
# install
sudo apt install lvm2 -y

# create physical volume
pvcreate

# groups physical volume to volume group
vgcreate

# create logical volume from partitions in each physical volume
lvcreate

# list all logical volumes in all volume group
lvscan

Advantage

  • Easily resize logical volume

    Logical volumes in volume groups, are like files in filesystem

    Volume groups manages allocation of space, when resize logical volumes

  • Easily add disk space (add new physical disk, then expand existing volume group)

    Aggregate multiple physical drive partitions into virtual volumes
    Then treated as single partition in the system

  • Create installation with many specialized filesystem, retain option to resize in the future


Disadvantage

  • Complicates disaster recovery
  • If LVM config spans multiple disks, failure in one disk puts all files in volume group at risk

Solution:

Configure at least 1 filesystem in conventional partition (dedicate to /boot)

Reserve LVM for /home , /usr , etc.


RAID

Advantage

  • Striping: Improve data access performance & reliability
  • Mirroring: Fault tolerance, combine multiple drivers into one virtual drive

Disadvantage

  • Can be expensive

RAID versions

  • RAID 0 : Disk striping

    Stripe: spread data across multiple disks for faster access

  • RAID 1 : Disk mirroring

    Mirror: duplicate data across 2 drives

  • RAID 10 : Disk mirroring + striping

    Stripe for performance, mirror for fault tokerance

  • RAID 4 : Disk striping with parity

    Add a parity bit stored on a separate disk, so data on a failed disk can be recovered

  • RAID 5 : Disk striping with distributed parity

    Add parity bit to data stripe, so appears on all disks that any failed disk can be recovered

  • RAID 6 : Disk striping with double parity

    Stripes both data & parity bit, so two failed disks can be recovered


Linux’s software implementation

Software RAID system that can implement RAID features on any disk system

1
2
# install
sudo apt install mdadm

mdadm allows to specify multiple partitions to be used in any type of RAID environment.

RAID device appears as a single device in /dev/mapper


3 - Hard Disk Layout

Mount Point

Provide OS access to data on partitioned disks

Linux OS uses a unified directory tree

  • Each partition is mounted at a mount point in the tree

    Mount point: A directory (a way to access filesystem on the partition)

  • Mount the filesystem: Link filesystem to the mount point (create empty dir)

    e.g. root partition / , and /home, /usr

    If /home is unmounted & remounted at /hello, then all subdirectories under /hello will have new parent path as hello


Partition & Filesystem Layout

Steps to add new filesystem

  • Create Partition (fdisk, gdisk, parted)
  • Format partition with a Linux fs (mkfs)
  • Mount (mount xx /mnt)

Linux FHS (Filesystem Hierarchy Standard)


Mount point Typical size Comments
Swap 1x ~ 2x RAM size Memory extension. Slower than RAM
/home 200 M ~ 3 T Isolate on a separate partition, to preserve user data during system upgrade
/boot 100 M ~ 500 M Critical boot files. Can be on a separate partition
/usr 0.5 G ~ 25 G Linux standard program & data files
/usr/local 0.1 G ~ 3 G Data files that are unique to this installation (installed locally, safe from OS upgrade) 📌
/opt 0.1 G ~ 5 G Data files associated with 3rd party packages
/var 0.1 G ~ 3 T System & app logs, transient. Can be on a separate partition
/tmp 0.1 G ~ 20 G User-created temp files
/mnt /media / Mount points for removable media

/etc, /bin, /sbin, /lib, /dev should never be placed on separate partitions (Must reside on root partition)


Second Table (for other FHS directories)

Directory Description
/etc System & app config files. Executable files shouldn’t reside in /etc
/bin Critical executable files. e.g. ls, cp, mount
/sbin Run only by system admin. e.g. fdisk
/opt Optional 3rd party programs. (ready-made pkgs that don’t ship with the OS)
/usr/bin Local user programs & data
/usr/sbin System programs & data
/usr/lib Libraries for software packages

Distinctions made by FHS:

  • Sharable & unshrable files: Files can be shared via NFS server
  • Static (executables) & variable (logs) files
Sharable Unsharable
Static /usr
/opt
/etc
/boot
Variable /home
/var/logs
/var/run
/var/lock

4 - Create Partition

Disk Partition

Partition Tools

  • fdisk / cfdisk : fixed disk, handles MBR only
  • gdisk / cgdisk : handles GPT
  • parted : GNU parted command line tool
  • gparted : MBR, GPT, etc ( Gnome Partition Editor )

sfdisk / sgdisk : Useful for writing scripts to handle disk partitioning

Both fdisk & gdisk don’t allow altering existing partition size
If wants to modify, needs to delete existing partition & rebuild from scratch.


fdisk

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
# show partition scheme on disk
fdisk -l /dev/nvme0n1

# show partition detail
fdisk -l /dev/nvme0n1p1

# partition
fdisk /dev/nvme0n1

# inside fdisk prompt
p show current partition
n create
d delete
a mark as bootable

t change partition type code
l list type code
v verify partition table

q quit
w write & quit

In the past, partitions were aligned on CHS cylinders.

Modern disks require partition alignment on 8+ sector boundaries for optimal performance.
e.g. Boundary of 1 M (2048-sector)

Failure to align partitions properly, will result in severe performance degradation


gdisk

If hard drive currently isn’t using GPT, then offers options to convert to GPT

1
2
3
# inside gdisk prompt
i show detailed info on a partition
v verify disk

parted

Allow modify existing partition size

1
2
3
4
5
# enter
parted

# inside parted prompt
p print

gparted

Resizing / moving filesystem can be dangerous.

  • If resizing code has bug, or power failure during operation, there will be data lost.

  • Resize / move boot partition on BIOS-based machine can make system unbootable, until boot loader is reinstalled.


Disk Formatting

  • Low-level formatting: create a structure of sectors & tracks on the disk
  • High-level formatting: create filesystem

Prepare partition for use:

Format partition / Make filesystem (write low-level data structures to disk)

Hard disks are low-level formatted at the factory, should never need to be low-level formatted again.

1
2
# low level format hard disk
fdformat /dev/xx


# Filesystem

Handles: read & write data to raw device

Filesystem in essence: big data structures (Store data on disk in an indexed method)

Link: ext4 black magic

Filesystem: Win & Linux

  • Windows assign drive letters, file path tells exactly what physical device the file is stored on

  • Linux use virtual directory, contains file path from all storage devices installed on system

    • Single root / directory
    • Doesn’t show physical device that contains the file
    • Place physical device in virtual filesystem with mount points :
      Empty directory points to a specific physical device

1 - Make Filesystem

Filesystem Type

Linux Filesystem

Filesystem Type Code Comments
Ext2fs ext2 traditional Linux native fs (2 TB max)
Ext3fs ext3 ext2fs with a journal
Ext4fs ext4 Default. work on large disks (16+ TB)
ReiserFS reiserfs best at handling large number of small files. This feature is also found in ext4
JFS jfs Journaled. created by IBM (for AIX OS)
Btrfs btrfs advanced filesystem inspired by ZFS
eCryptfs ecryptfs POSIX-compliant encryption protocol applies to data before storing on device. Only OS that creates the FS can read data from it.
Swap swap Create Virtual memory using physical drive space

eCryptfs on a partition: Appear in /etc/crypttab

Btrfs

  • Improved fault tolerance & High performance (up to 16 EB)
  • Perform RAID & LVM subvolumes
  • Built-in snapshots for backups
  • Automatic data compression
  • Create filesystem across multiple devices

Non-Linux Filesystem

Filesystem Type Code Comments
CIFS cifs Common Internet Filesystem (Microsoft). Read / Write data across network using network storage device.
NFS nfs Network FS. Open source standard for read / write data across network
SMB / Server Message Block (Microsoft). For network storage & devices (e.g. printers). SMB support allows Linux clients & servers interact with Microsoft clients & servers.
FAT msdos / vfat File Allocation Table. DOS & Win only support FAT
VFAT vfat Virtual FAT. Format USB & SD cards
exFAT exfat Format USB & SD cards
NTFS / New Technology FS. preferred for win7
HFS / HFS+ / Hierarchical FS (by Apple)
XFS xfs created by SGI (Silicon Graphics, for IRIX OS)
ZFS zfs Zettabyte FS (Sun, now Oracle). For Unix servers, inspires Btrfs.
ISO-9660 iso9660 standard for CD-ROM. also works with Rock Ridge extensions
UDF / Universal Disc Format. common for DVD-ROM

FAT

  • Every major OS understands FAT, making FAT excellent for exchanging data on removable media
  • Also for cross-platform disk (e.g. Linux & Windows)

If using non x86 platform, make sure to check filesystem development on that platform.

A fast & reliable filesystem on one CPU might be slow & unreliable on another.


Create Filesystem

  • mkfs
  • mkdosfs (for FAT)

mkfs creates all index files & tables necessary for the specific filesystem

1
2
3
4
5
6
7
8
9
10
11
12
# make fs on a PARTITION, pass typecode 
mkfs -t ext4 /dev/sda2

# set reserved block percentage
mkfs -t ext4 -m 1 /dev/sda2

# help
man mkfs.ext4

# bad block check
# every sector in the partition will be checked
mkfs -c

Reserved block percentage

If disk is getting close to full, Linux will report disk is full before it actually gets full.

If bad block check returns result that several sectors are bad, chances are the entire hard disk doesn’t have long to live.


Create Swap Space

Swap Space

  • Linux can use swap partition or swap file for memory extension
  • Identify: MBR partition type code 0x82
  • Linux use /etc/fstab to hold swap space definition

Commands

1
2
3
4
5
# make swap space
mkswap /dev/sda6

# activate swap
swapon /dev/sda6

Activate swap space permanently: Create entry in /etc/fstab


2 - Maintain Filesystem Health

  • Tuning : dumpe2fs, tune2fs, debugfs
  • Monitor : df, du, iostat

Additionally, /proc & /sys are used for recording system stats

1
2
3
4
5
6
7
8
# partitions
cat /proc/partitions

# mount points
cat /proc/mounts

# partitions & kernel-level stats
ls /sys/block

Many Linux filesystem maintenance tools should run, when filesystem is unmounted.

Changes made by maintenance tools while filesystem is mounted, may confuse kernel drivers.


Filesystem Tools

Linux

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# install
sudo apt install -y e2fsprogs

# change file attributes
chattr

# change label
e2label

# resize fs
resize2fs

# show block & superblock info
dumpe2fs

# tuning
tune22fs, debugfs

XFS

1
2
3
4
5
6
7
8
9
10
11
12
13
14
# show / edit fs params (e.g. UUID)
xfs_admin

# info
xfs_info

# debug
xfs_db

# repair
xfs_repair

# improve organization of mounted fs
xfs_fsr

Filesystem Check: fsck

  • A frontend to filesystem-specific tools (e2fsck , fsck.jfs , etc.)
  • Examine filesystem’s major data structures for internal consistency (filesystem match the index against actual files)
1
2
3
4
5
6
7
8
# check all files in /etc/fstab
fsck -A

# check only specified filesystem
fsck -t ext4

# indicate progress, verbose
fsck -CV

Linux runs fsck automatically at startup on partitions marked in /etc/fstab :

  • Perform quick cursory examination of a partition, to confirm it’s unmounted cleanly
  • Linux boot process is not delayed due to filesystem check, unless system isn’t shutdown properly

If problem with filesystem:

  • Run fsck in recovery mode, on a specific partition (e.g. fsck /dev/part1)
  • If fail on first run, try running again for a few times

Filesystem Tuning

  • Provide info (mounted) : dumpe2fs
  • Change tuning options (un-mounted) : tune2fs, debugfs

dumpe2fs : Obtain FS Info

ext2 / ext3 ONLY

1
2
3
4
5
6
# -h: omit info about group descriptors
dumpe2fs -h /dev/nvme0n1p1

# XFS equivalent
xfs_info [mount point]
xfs_metadump

An inode is an entry in the index table that tracks files stored on the filesystem.

Each inode contains info for one file.
Number of inodes limit number of files


tune2fs : Adjust Tunable FS Parameters

Change filesystem parameters reported by dumpe2fs

ext2 / ext3 ONLY

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
# unmount first
umount /dev/sda6

# adjust max mount count
# trick system to think filesystem has mouted x number of times
-c [num x]

# adjust periodic disk checks interval
-i [1d / 1w / 1m]

# add journal (verison 2 log format, convert ext2 to ext3)
# will create a file called .journal
-j

# set journal parameters
-J size=xx device=xx

# set reserved blocks
-m [percent]

# change filesystem UUID
-U xx

Note:

  • ext2, ext3, ext4 require periodic disk check with fsck

debugfs : Debug FS Interactively

Reference Link

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
# unmount first
umount /dev/sda6

# enter interactive mode
debugfs /dev/sda6

# superblock info
stats -h

# inode info
stat [filename]

# undelete file
undelete inode [num]

# get inode number
lsdel

# extract file
write [source] [dest]

Use debugfs to undelete a file: (file deleted using rm)

inode is the inode number of the deleted file. (e.g. Use ls -li to see file inode number)


Maintain a Journal

Journaling

Problem: After power failure / system crash, ext2 filesystem could be in an inconsistent state. Only way to safely mount is to do a full disk check before mount.

Solution: Convert to a journaling filesystem (ext3), track data not yet written to the drive in a log file (journal)

  • Journal : data structure that describes pending operations

  • Prior to writing data to disk’s main data structures, Linux describes what’s going to do in the journal

    • When operation complete, entries are removed from journal
  • In case system crashes, only need to examine the journal, and only check data structures mentioned in the journal

    • Inconsistency: roll back or complete changes

    • Return disk to consistent state, without checking every data structure in the filesystem

  • Greatly speeds up disk check process


Linux filesystem with a journal

  • ext3
  • ext4
  • reiserfs
  • xfs
  • jfs

To use a journal, must mount filesystem with correct type code


Monitor Disk Usage

  • df (by partition / mounted filesystem)
  • du (by directory)

df

Helpful to find out which partition are in danger of being overloaded

1
2
3
4
5
6
7
8
9
10
# e.g  /proc, /sys, /proc/bus/usb
-a include all fs (virtual fs size = 0)

# partition with small files can deplete inodes soon
-i available & used inodes

-h human readable
-l omit network fs (only show local fs)
-T show fs type
-t type limit by fs type

df -i works well for filesystems that creates fixed number of inodes.

Other filesystems (e.g. reiserfs, btrfs) create inodes dynamically.


du

Adds up disk space used by all files in a specified directory

1
2
3
4
5
6
7
8
9
du -shc ~/Downloads/*

-h human readable
-a report of individual files
-c grand total
-s summary of subdirectory

-l count hard links
-x limit report to one fs

Normally du counts files that appear multiple times as hard links only once.


3 - Mount & Unmount Filesystem

Filesystems are most often used by being mounted - associated with a directory

  • Temporary : mount / umount
  • Permanent : edit /etc/fstab

Temp - Mount

Ties a filesystem to a Linux directory


Commands

1
2
3
4
5
6
7
8
9
10
11
12
13
-a	mount all fs in /etc/fstab
-r mount as read only
-w mount as read / write

-v verbose output
-L label
-U UUID

# if no -t, Linux will auto detect fs type
-t type specify fs type

# Example
sudo mount -t ext4 /dev/sdb1 /mnt

Options

1
2
3
4
5
6
7
# use loopback device
# mount a file as if it were a partition
mount -t vfat -o loop 1.img /mnt/img

# enable / disable normal user to mount fs
# only user who mount the fs may unmount it
-o user / nouser

Notes

  • If /etc/fstab specify user, users, owner ,
    ordinary user may mount fs that specifiy either device or mount point, but not both.

  • Most Linux distros ship with auto-mounter support,
    which let OS auto mount removable media when inserted

  • Using mount : record in /etc/mtab. Not a config file to edit


Temp - Umount

Commands

1
2
-a	unmount all in /etc/mtab
-r fallback to read-only

Notes

  • umount -r
    If Linux can’t unmount a filesystem, it should attempt to remount as read-only mode
  • Specify only device or mount point, no need to specify both

Linux caches access to most fs, hence data may not be written to disk until some time after write command.
Possible to corrupt disk by unplugging, even when disk is inactive.

1
2
# write cache to disk manually
sync

Permanent

/etc/fstab

  • fstab : filesystem table, mount at boot time
  • Describes permanent mappings of filesystem to mount points, controls how Linux provides access to disk partitions & removable media drivers
  • Can manually add device to /etc/fstab

Content of /etc/fstab

Most distros now specify partitions by labels / UUID
Ensuring correct drive partition is accessed, despite the order it appears in the raw device table

  • dump : equals 1, if dump utility should back up a partition
  • fsck : filesystem check order
    • Higher number represents check order
    • 0 : fsck should not check fs
    • 1 : root partition
    • 2 : other partition
1
2
3
4
5
6
# Example
UUID=xx /mnt/data ext4 users,credentials=/etc/creds 0 0

# /etc/creds
username=kk
password=hello
  • If add new hard disk, or repartition, then need to modify /etc/fstab
  • If devices in /etc/fstab don’t exist at boot time, then will generate boot error