Embedded Linux - Managing Flash Memory

Most embedded Linux systems lack a traditional PC hard drive. Instead, the Linux kernel, associated programs and user data reside on flash devices. The Memory Technology Device (MTD) project provides flash support for Linux. This document answers some common questions about the device driver, application and file system aspects of flash devices in embedded Linux.

See http://www.linux-mtd.infradead.org/faq/general.html for the official MTD Frequently Asked Questions.

Common Questions about Embedded Linux flash drivers (MTD):
How is the flash partition layout specified?
What filesystem types are suitable for flash?
What is the purpose of the MTD character device?
How do I access the flash from user-space applications?
How do I upgrade the boot loader or kernel or root filesystem?
How much redundancy or error recovery do I need for my upgrade procedure?
How should I create an empty JFFS2 partition?
How much overhead space does JFFS2 use for its own book-keeping?
Should I provide any options when I mount a JFFS2 filesystem?
Is flash chip X supported?
How do I add support for my platform?

How is the flash partition layout specified?

Most programmers are used to dealing with hard disks. The hard disk partition table resides in a reserved sector on the hard disk. In an IBM-PC, all system software reads the partition layout from the partition table on the disk.

If a flash device emulates a hard disk, the partition table method may still be used. For example, CompactFlash flash modules look like an IDE hard drive to the Linux kernel.

In most embedded system, the flash device is used directly with no hard disk emulation. The MTD mapping driver provides accessor functions to read and write flash memory. The mapping driver can either specify a hard-coded partition layout, read the partition layout from the kernel command line passed in from the boot loader (i.e. U-Boot), or read the partition layout from flash storage (i.e. Redboot boot loader).

Even without partition support, the MTD layer provides access to the entire flash chip as an MTD device. With partition support, each MTD partition will be exported as a separate MTD device. Each device has a descriptive name which can viewed using the following command:

cat /proc/mtd

What filesystem types are suitable for flash?

For a conventional NOR flash, the MTD block device provide a crude block device similar to a hard disk. Traditional hard drives use a 512-byte sector. The MTD block device emulates this sector layout, but there is a severe performance penalty for writes. Since most flash device sectors are >= 64KiBytes in size, updating a 512-byte sector requires a read-modify-write sequence for the entire flash sector! This kind of write is slow and causes many extra erase cycles on the flash - typically a flash sector is rated for 100K – 1 million erase cycles over the device lifetime, so it is wise to limit erase cycles.

Because of this write performance issue, the MTD block device is suitable for read-only filesystems. Some typical read-only filesystems for embedded use are CRAMFS and ROMFS.

CRAMFS has the advantage of compressing each 4Kbyte cluster, providing 2:1 compression. Read-write capability is possible using flash-oriented filesystems such as JFFS2.

JFFS2 is a journaling flash filesystem (hence the name) – the ‘2’ distinguishes JFFS2 from the JFFS filesystem, a largely defunct predecessor. JFFS2 bypasses the block device layer (with its associated buffer cache) and writes directly to the underlying flash device. Naturally, JFFS2 supports both read-write and read-only modes of operation.

For a NAND flash, the filesystem MUST be NAND-aware because both reads and writes must implement ECC error detection/correction. The JFFS2 and YAFFS filesystems are NAND-compatible.

What is the purpose of the MTD character device?

The MTD devices come in two flavors: MTD block device drivers, and MTD character device drivers. The block devices provide a 512 bytes-per-sector layout, for use by the filesystems (HYPERLINK TO What filesystems are suitable for flash?). The character device provides a linear view of a MTD device or an MTD partition. You can read this device as you would any file. Standard UNIX utilities may be used to read the flash. Assuming MTD device 0 is the entire flash, the following command will dump the entire flash image to a file:

cat /dev/mtdchar0 > /tmp/flash.bin

Writing the flash is different. What happens if you run the following commands on a flash partition that already contains valid data?

cat /dev/mtdchar0 < new.bin
cmp /dev/mtdchar0 new.bin
/dev/mtdchar0 new.bin differ: char n, line x

The MTD character device will write the data to the flash, but it will not perform a flash erase command. On a NOR flash device, the write command can only change 1 bits into 0 bits. To change a bit from 0 to 1 requires an erase command. The MTD character device provides IOCTL’s to facilitate erasing. The flash sector geometry may be determined approximately using the MEMGETINFO command: it returns the ‘least common denominator’ erase size (usually 64KiB or 128KiB), ignoring the smaller boot blocks if present. The exact flash layout may be determined using the MEMGETREGIONCOUNT and MEMGETREGIONINFO commands. Once the flash sector geometry is determined, the MEMERASE command may be issued to erase the desired blocks.

MTD provides user-space applications to automate the erasing process. The following commands will correctly write the new image to flash. We assume that the flash does not support locking, or the sectors are already unlocked; otherwise the flash_unlock could be used to unlock the appropriate sectors.

flash_eraseall /dev/mtdchar0
cat /dev/mtdchar0 < new.bin

Table 1: MTD IOCTL commands
Name Description Argument
MEMGETINFO Get layout and capabilities struct mtd_info_user *
MEMERASE Erase flash blocks struct erase_info_user *
MEMLOCK Lock flash blocks to disallow changes struct erase_info_user *
MEMUNLOCK Unlock flash to allow changes struct erase_info_user *
MEMGETREGIONCOUNT Return number of erase block regions int *
MEMGETREGIONINFO struct region_info_user *

MEMWRITEOOB NAND only: write out-of-band info (ECC) struct mtd_oob_buf *
MEMREADOOB NAND only: read out-of-band info (ECC) struct mtd_oob_buf *
MEMSETOOBSEL NAND only: set default OOB info struct nand_oobinfo*

How do I access the flash from user-space applications?

If the flash is mounted as a filesystem, the normal open/close/read/write system calls will work (obviously write() will not function on a read-only filesystem).

Otherwise, the flash may be accessed using the MTD character device (HYPERLINK TO What is the purpose of the MTD character device?)

How do I upgrade the boot loader or kernel or root filesystem?

Each component (boot loader, kernel, root filesystem) usually has its own MTD device partition, which can be accessed by the MTD character device. Usually the kernel is executing instructions from RAM – although some handheld computers do execute in flash, a.k.a. XIP (execute-in-place). When the kernel is executing from RAM, the kernel flash partition may be updated freely. The root filesystem is a special case – if files are open in the root filesystem (i.e. executables) during the update, confusion will result. Even without open files, a root JFFS2 filesystem would get its internal data structures out-of-sync with the flash contents.

Upgrading the root filesystem usually is done on a file-by-file basis. Sometimes it is convenient to package the upgrade as a .tar or .tar.gz archive.

How much redundancy or error recovery do I need for my upgrade procedure?

Most redundancy schemes require some support from your boot-loader. At a minimum, you should store the image in RAM or a ramdisk and verify the image before writing it into flash. The amount of redundancy needed depends on your application reliability and cost requirements. Many inexpensive Linux devices, such as the Linksys WRT54G, do not have redundant images due to cost concerns.

How should I create an empty JFFS2 partition?

As noted in the JFFS2 FAQ, the JFFS2 filesystem uses marks erased blocks with ‘cleanmarkers’. The cleanmarker was introduced to address the scenario where the device powers down during a flash block erase. If the cleanmarker or another node type is not present in the block, JFFS2 will redo the erase operation and write the cleanmarker at the beginning of the block.

The ‘-j’ option to the flash_eraseall command inserts the cleanmarker at the beginning of each block, so that the JFFS2 won’t redo the erase operation.

How much overhead space does JFFS2 use for its own book-keeping?

JFFS2 requires five spare erase blocks to implement garbage collection. On a two bit-per-cell device such as Intel StrataFlash or Spansion MirrorBit, the erase block size is 128KiB, so the wasted space is more than half a megabyte.

The spare erase blocks requirements are defined in fs/jffs2/nodelist.h. The JFFS2_RESERVED_BLOCK_BASE parameter is 3 by default. If you change this value to 1, you’ll save two erase blocks. If you change this value, you should do some stress testing to verify nothing was broken – the default has been left at 3 to maximize reliability.

Should I provide any options when I mount a JFFS2 filesystem?

A useful mount option for a read-write JFFS2 filesystem is ‘noatime’. The ‘noatime’ option turns off the updating of file access times, which would cause a flash write every time a file is read. If the filesystem is the root filesystem, the option can be supplied one of two ways:

1) Pass the following parameter to the kernel command line:
rootflags=noatime

2) Remount the root filesystem with the noatime option: mount –t jffs2 –o remount,noatime /dev/mtdblock3 /

Is flash chip X supported?

Most NOR flash chips are supported. In the old JEDEC drivers, you had to add an entry for each new flash to specify the sector layout and programming algorithms. The entry was indexed by the Manufacturer and Device ID numbers.

The MTD CFI driver uses the Common Flash Interface (CFI). The following description of CFI is excerpted from the latest CFI 2.0 standard []:

The Common Flash Interface (CFI) specification outlines device and host system software interrogation handshake that allows specific vendor-specified software algorithms to be used for entire families of devices. This allows device-independent, JEDEC ID-independent and forward- and backward-compatible software support for the specified flash device families. It allows flash vendors to standardize their existing interfaces for long-term compatibility.

The MTD CFI support probes the hardware for the CFI data. The CFI data includes the chip ID, command set ID, flash geometry and supported command types. The MTD CFI code supports three command sets: Intel (0001), AMD (0002) and ST Advanced Architecture (0020). You can even compile in support for all of these command sets in the same kernel. Once the command set support is present, you can use any CFI-compliant chip, assuming your low-level chip select timings and address range are compatible with the new flash device.

How do I add support for my platform?

In the kernel source, the drivers/mtd/maps directory contains the mapping drivers. You may be able to use the generic physmap.c driver. Specify the base address, chip size and bus width in the kernel configuration, and the physmap.c driver will probe the flash type. Generic memory accesses are used to read the flash. The physmap.c driver can even handle several flash chips in the contiguous memory area.

Some flashes do not use straightforward memory mappings, due to external bus addressing limitations. Or you may have more than one flash with non-adjacent memory mappings. In this case, you should write your own mapping driver. You can use physmap.c as a reference.

See http://www.linux-mtd.infradead.org/faq/general.html for the official MTD Frequently Asked Questions.