Carving ELF Files
Recently, I created a simple tool, Carve Exe, to carve executables from other files (e.g. memory dumps or network traffic). Carving executables from binary blobs is a common task in digital forensics and reverse engineering. For example, when analyzing how a malware sample unpacks and deobfuscates itself.
The problem of carving executables from binary blobs boils down to finding the beginning and the end of an executable file. Finding the beginning of an executable file is easy, as most file types have well-defined magic bytes1. Determining the end of an executable file, given its beginning, is a bit harder, though. This blog discusses how to determine the (beginning and) end of an ELF executable, by computing its size by only looking at the headers in the file.
The Layout of an ELF File
ELF executables consist of four parts:
- The ELF file header,
- The program headers table,
- The section headers table,
- The data sections.
The ELF File Header
Each ELF file starts with an ELF file header (52 bytes for 32-bit files and 64 bytes for 64-bit files). This header contains general information about the executable, such as the architecture and the entry point of the code.
The ELF file header structure is defined in the elf.h file from the Linux kernel:
32-bit | 64-bit |
---|---|
typedef struct elf32_hdr { unsigned char e_ident[EI_NIDENT]; Elf32_Half e_type; Elf32_Half e_machine; Elf32_Word e_version; Elf32_Addr e_entry; Elf32_Off e_phoff; Elf32_Off e_shoff; Elf32_Word e_flags; Elf32_Half e_ehsize; Elf32_Half e_phentsize; Elf32_Half e_phnum; Elf32_Half e_shentsize; Elf32_Half e_shnum; Elf32_Half e_shstrndx; } Elf32_Ehdr; |
typedef struct elf64_hdr { unsigned char e_ident[EI_NIDENT]; Elf64_Half e_type; Elf64_Half e_machine; Elf64_Word e_version; Elf64_Addr e_entry; Elf64_Off e_phoff; Elf64_Off e_shoff; Elf64_Word e_flags; Elf64_Half e_ehsize; Elf64_Half e_phentsize; Elf64_Half e_phnum; Elf64_Half e_shentsize; Elf64_Half e_shnum; Elf64_Half e_shstrndx; } Elf64_Ehdr; |
An ELF file always starts with \x7F\x45\x4C\x46
(the first four bytes of e_ident
), making it easy to identify.
To parse the other parts in the ELF file, we are interested in the following fields:
- Program headers table: Located at offset
e_phoff
withe_phnum
entries, each of sizee_phentsize
. - Section headers table: Located at offset
e_shoff
withe_shnum
entries, each of sizee_shentsize
.
The Program Headers Table
The program headers table tells the operating system how to load the executable into memory. Each entry describes a segment. The header of each segment defines where the segment starts in the file, how big it is and how it should be loaded into memory.
When the operating system loads an ELF file into memory for execution, it only looks at the program headers table and ignores the section header tables.
32-bit | 64-bit |
---|---|
typedef struct elf32_phdr { Elf32_Word p_type; Elf32_Off p_offset; Elf32_Addr p_vaddr; Elf32_Addr p_paddr; Elf32_Word p_filesz; Elf32_Word p_memsz; Elf32_Word p_flags; Elf32_Word p_align; } Elf32_Phdr; |
typedef struct elf64_phdr { Elf64_Word p_type; Elf64_Word p_flags; Elf64_Off p_offset; Elf64_Addr p_vaddr; Elf64_Addr p_paddr; Elf64_Xword p_filesz; Elf64_Xword p_memsz; Elf64_Xword p_align; } Elf64_Phdr; |
The Section Headers Table
The section headers table describes how the executable file’s data is stored. Each section contains specific data necessary for the executable, like code, constants, or debug information. The section header table defines their properties.
32-bit | 64-bit |
---|---|
typedef struct elf32_shdr { Elf32_Word sh_name; Elf32_Word sh_type; Elf32_Word sh_flags; Elf32_Addr sh_addr; Elf32_Off sh_offset; Elf32_Word sh_size; Elf32_Word sh_link; Elf32_Word sh_info; Elf32_Word sh_addralign; Elf32_Word sh_entsize; } Elf32_Shdr; |
typedef struct elf64_shdr { Elf64_Word sh_name; Elf64_Word sh_type; Elf64_Xword sh_flags; Elf64_Addr sh_addr; Elf64_Off sh_offset; Elf64_Xword sh_size; Elf64_Word sh_link; Elf64_Word sh_info; Elf64_Xword sh_addralign; Elf64_Xword sh_entsize; } Elf64_Shdr; |
The Sections in an ELF File
The sections contain the actual data of the executable (e.g. the code and any static values). Their relevant properties are defined in the section headers table.
Computing the Size of an ELF File
Usually, the layout of an ELF file follows this order:
- The ELF file header,
- The program header table,
- The sections,
- The section header table.
For this layout, we can compute the file size by finding the end of the section headers table using the formula: e_shoff
+ e_shentsize
* e_shnum
. This formula gives us the size of the section header table from its offset plus its total size (the number of entries times the size of each entry).
However, the ELF format does not enforce a strict order for its parts (except that the ELF file header must be at the start). The program header table, section header table, and sections can be in any order, as long as their pointers are correct. Thus, we need to find the end of the last part in the file to determine the file size.
The ELF format also does not enforce the need for a section headers table (and splitting the file up into multiple sections). Only a program header table is required. Therefore, we must also consider the start and end offsets of each segment in the file.
We compute the end of each part using the following formulas:
- The ELF file header end:
0x0 + e_ehsize
, - Program headers table end:
e_phoff + e_phnum * e_phentsize
- Each section’s end:
sh_offset + sh_size
- Section headers table end:
e_shoff + e_shnum * e_shentsize
- Each segment’s end:
p_offset + p_filesz
The largest result among these will be the file size.
Example Computation
Let’s test this methodology on a real example: /bin/ls
.
Using the readelf
command, we can parse ELF files.
Headers
First, let’s look at the ELF file header:
$ readelf --file-header /bin/ls
ELF Header:
Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
Class: ELF64
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: DYN (Position-Independent Executable file)
Machine: Advanced Micro Devices X86-64
Version: 0x1
Entry point address: 0x4f80
Start of program headers: 64 (bytes into file)
Start of section headers: 127936 (bytes into file)
Flags: 0x0
Size of this header: 64 (bytes)
Size of program headers: 56 (bytes)
Number of program headers: 13
Size of section headers: 64 (bytes)
Number of section headers: 28
Section header string table index: 27
Using the values in the ELF file header, we compute the start, size and end of each header:
Header | Offset start | # of entries | Entry size | Size | Offset end |
---|---|---|---|---|---|
ELF File Header | 0 | 1 | 64 | 1 * 64 = 64 | 0 + 64 = 64 |
Program Headers Table | 64 | 13 | 56 | 13 * 56 = 728 | 64 + 728 = 792 |
Section Headers Table | 127936 | 28 | 64 | 28 * 64 = 1792 | 127936 + 1792 = 129728 |
Sections
We do the same for the sections, by parsing the section header table:
$ readelf --section-headers /bin/ls
There are 28 section headers, starting at offset 0x1f3c0:
Section Headers:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
[ 0] NULL 0000000000000000 00000000
0000000000000000 0000000000000000 0 0 0
[ 1] .interp PROGBITS 0000000000000318 00000318
000000000000001c 0000000000000000 A 0 0 1
[ 2] .note.gnu.pr[...] NOTE 0000000000000338 00000338
0000000000000050 0000000000000000 A 0 0 8
[ 3] .note.gnu.bu[...] NOTE 0000000000000388 00000388
0000000000000024 0000000000000000 A 0 0 4
[ 4] .note.ABI-tag NOTE 00000000000003ac 000003ac
0000000000000020 0000000000000000 A 0 0 4
[ 5] .gnu.hash GNU_HASH 00000000000003d0 000003d0
0000000000000024 0000000000000000 A 6 0 8
[ 6] .dynsym DYNSYM 00000000000003f8 000003f8
0000000000000ac8 0000000000000018 A 7 1 8
[ 7] .dynstr STRTAB 0000000000000ec0 00000ec0
0000000000000561 0000000000000000 A 0 0 1
[ 8] .gnu.version VERSYM 0000000000001422 00001422
00000000000000e6 0000000000000002 A 6 0 2
[ 9] .gnu.version_r VERNEED 0000000000001508 00001508
00000000000000e0 0000000000000000 A 7 1 8
[10] .rela.dyn RELA 00000000000015e8 000015e8
0000000000000a68 0000000000000018 A 6 0 8
[11] .relr.dyn RELR 0000000000002050 00002050
0000000000000050 0000000000000008 A 0 0 8
[12] .init PROGBITS 0000000000003000 00003000
000000000000001b 0000000000000000 AX 0 0 4
[13] .text PROGBITS 0000000000003020 00003020
0000000000012db3 0000000000000000 AX 0 0 16
[14] .fini PROGBITS 0000000000015dd4 00015dd4
000000000000000d 0000000000000000 AX 0 0 4
[15] .rodata PROGBITS 0000000000016000 00016000
00000000000051a0 0000000000000000 A 0 0 32
[16] .eh_frame_hdr PROGBITS 000000000001b1a0 0001b1a0
0000000000000594 0000000000000000 A 0 0 4
[17] .eh_frame PROGBITS 000000000001b738 0001b738
00000000000020f8 0000000000000000 A 0 0 8
[18] .init_array INIT_ARRAY 000000000001ef70 0001df70
0000000000000008 0000000000000008 WA 0 0 8
[19] .fini_array FINI_ARRAY 000000000001ef78 0001df78
0000000000000008 0000000000000008 WA 0 0 8
[20] .data.rel.ro PROGBITS 000000000001ef80 0001df80
0000000000000af8 0000000000000000 WA 0 0 32
[21] .dynamic DYNAMIC 000000000001fa78 0001ea78
00000000000001f0 0000000000000010 WA 7 0 8
[22] .got PROGBITS 000000000001fc68 0001ec68
0000000000000390 0000000000000008 WA 0 0 8
[23] .data PROGBITS 0000000000020000 0001f000
0000000000000278 0000000000000000 WA 0 0 32
[24] .bss NOBITS 0000000000020280 0001f278
00000000000012c0 0000000000000000 WA 0 0 32
[25] .comment PROGBITS 0000000000000000 0001f278
000000000000001b 0000000000000001 MS 0 0 1
[26] .gnu_debuglink PROGBITS 0000000000000000 0001f294
0000000000000010 0000000000000000 0 0 4
[27] .shstrtab STRTAB 0000000000000000 0001f2a4
0000000000000119 0000000000000000 0 0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
L (link order), O (extra OS processing required), G (group), T (TLS),
C (compressed), x (unknown), o (OS specific), E (exclude),
D (mbind), l (large), p (processor specific)
Using the values in the section header table, we compute the start, size and end of each section:
Section | Start | Size | End |
0x0 | 0x0 | 0x0 + 0x0 = 0 | |
.interp | 0x318 | 0x1c | 0x318 + 0x1c = 820 |
.note.gnu.property | 0x338 | 0x50 | 0x338 + 0x50 = 904 |
.note.gnu.build-id | 0x388 | 0x24 | 0x388 + 0x24 = 940 |
.note.ABI-tag | 0x3ac | 0x20 | 0x3ac + 0x20 = 972 |
.gnu.hash | 0x3d0 | 0x24 | 0x3d0 + 0x24 = 1012 |
.dynsym | 0x3f8 | 0xac8 | 0x3f8 + 0xac8 = 3776 |
.dynstr | 0xec0 | 0x561 | 0xec0 + 0x561 = 5153 |
.gnu.version | 0x1422 | 0xe6 | 0x1422 + 0xe6 = 5384 |
.gnu.version_r | 0x1508 | 0xe0 | 0x1508 + 0xe0 = 5608 |
.rela.dyn | 0x15e8 | 0xa68 | 0x15e8 + 0xa68 = 8272 |
.relr.dyn | 0x2050 | 0x50 | 0x2050 + 0x50 = 8352 |
.init | 0x3000 | 0x1b | 0x3000 + 0x1b = 12315 |
.text | 0x3020 | 0x12db3 | 0x3020 + 0x12db3 = 89555 |
.fini | 0x15dd4 | 0xd | 0x15dd4 + 0xd = 89569 |
.rodata | 0x16000 | 0x51a0 | 0x16000 + 0x51a0 = 111008 |
.eh_frame_hdr | 0x1b1a0 | 0x594 | 0x1b1a0 + 0x594 = 112436 |
.eh_frame | 0x1b738 | 0x20f8 | 0x1b738 + 0x20f8 = 120880 |
.init_array | 0x1df70 | 0x8 | 0x1df70 + 0x8 = 122744 |
.fini_array | 0x1df78 | 0x8 | 0x1df78 + 0x8 = 122752 |
.data.rel.ro | 0x1df80 | 0xaf8 | 0x1df80 + 0xaf8 = 125560 |
.dynamic | 0x1ea78 | 0x1f0 | 0x1ea78 + 0x1f0 = 126056 |
.got | 0x1ec68 | 0x390 | 0x1ec68 + 0x390 = 126968 |
.data | 0x1f000 | 0x278 | 0x1f000 + 0x278 = 127608 |
.bss (NOBITS ) |
0 | 0 | 0 + 0 = 0 |
.comment | 0x1f278 | 0x1b | 0x1f278 + 0x1b = 127635 |
.gnu_debuglink | 0x1f294 | 0x10 | 0x1f294 + 0x10 = 127652 |
.shstrtab | 0x1f2a4 | 0x119 | 0x1f2a4 + 0x119 = 127933 |
Segments
For the segments, we parse the program header table and look at the offset and size of each segment to compute its end.
$ readelf --program-header /bin/ls
Elf file type is DYN (Position-Independent Executable file)
Entry point 0x4f80
There are 13 program headers, starting at offset 64
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
PHDR 0x0000000000000040 0x0000000000000040 0x0000000000000040
0x00000000000002d8 0x00000000000002d8 R 0x8
INTERP 0x0000000000000318 0x0000000000000318 0x0000000000000318
0x000000000000001c 0x000000000000001c R 0x1
[Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
LOAD 0x0000000000000000 0x0000000000000000 0x0000000000000000
0x00000000000020a0 0x00000000000020a0 R 0x1000
LOAD 0x0000000000003000 0x0000000000003000 0x0000000000003000
0x0000000000012de1 0x0000000000012de1 R E 0x1000
LOAD 0x0000000000016000 0x0000000000016000 0x0000000000016000
0x0000000000007830 0x0000000000007830 R 0x1000
LOAD 0x000000000001df70 0x000000000001ef70 0x000000000001ef70
0x0000000000001308 0x00000000000025d0 RW 0x1000
DYNAMIC 0x000000000001ea78 0x000000000001fa78 0x000000000001fa78
0x00000000000001f0 0x00000000000001f0 RW 0x8
NOTE 0x0000000000000338 0x0000000000000338 0x0000000000000338
0x0000000000000050 0x0000000000000050 R 0x8
NOTE 0x0000000000000388 0x0000000000000388 0x0000000000000388
0x0000000000000044 0x0000000000000044 R 0x4
GNU_PROPERTY 0x0000000000000338 0x0000000000000338 0x0000000000000338
0x0000000000000050 0x0000000000000050 R 0x8
GNU_EH_FRAME 0x000000000001b1a0 0x000000000001b1a0 0x000000000001b1a0
0x0000000000000594 0x0000000000000594 R 0x4
GNU_STACK 0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000 RW 0x10
GNU_RELRO 0x000000000001df70 0x000000000001ef70 0x000000000001ef70
0x0000000000001090 0x0000000000001090 R 0x1
Using the values in the program header table, we compute the start, size and end of each segment:
Segment # | Start | Size | End |
0 | 0x40 | 0x2d8 | 0x40 + 0x2d8 = 792 |
1 | 0x318 | 0x1c | 0x318 + 0x1c = 820 |
2 | 0x0 | 0x20a0 | 0x0 + 0x20a0 = 8352 |
3 | 0x3000 | 0x12de1 | 0x3000 + 0x12de1 = 89569 |
4 | 0x16000 | 0x7830 | 0x16000 + 0x7830 = 120880 |
5 | 0x1df70 | 0x1308 | 0x1df70 + 0x1308 = 127608 |
6 | 0x1ea78 | 0x1f0 | 0x1ea78 + 0x1f0 = 126056 |
7 | 0x338 | 0x50 | 0x338 + 0x50 = 904 |
8 | 0x388 | 0x44 | 0x388 + 0x44 = 972 |
9 | 0x338 | 0x50 | 0x338 + 0x50 = 904 |
10 | 0x1b1a0 | 0x594 | 0x1b1a0 + 0x594 = 112436 |
11 | 0x0 | 0x0 | 0x0 + 0x0 = 0 |
12 | 0x1df70 | 0x1090 | 0x1df70 + 0x1090 = 126976 |
Putting It All Together
Now we have the start, size and end of all headers and all sections, we can simply compute the file size by getting the largest offset:
$ python
>>> max([64, 792, 129728, 0, 820, 904, 940, 972, 1012, 3776, 5153, 5384, 5608, 8272, 8352, 12315, 89555, 89569, 111008, 112436, 120880, 122744, 122752, 125560, 126056, 126968, 127608, 0, 127635, 127652, 127933, 792, 820, 8352, 89569, 120880, 127608, 126056, 904, 972, 904, 112436, 0, 126976])
129728
The section header table is apparently the last part of /bin/ls
, and as such, its end (at offset 129728) should be equal to the file size.
Let’s check our answer:
$ ls -l /bin/ls
-rwxr-xr-x 1 root root 129728 28 mrt 20:09 /bin/ls
It works!
Discussion
In this post, we discussed carving ELF files from other files/data streams (e.g. memory dumps and network traffic). Finding the beginning of an ELF file is simple, just look for the magic bytes \x7F\x45\x4C\x46
, but the end of the file does not have such a marker. In this post, we saw that it is possible to determine the end of an ELF file (giving its beginning), by parsing the file and computing which part of the file is the last.
It should be noted that the headers in an ELF file do not have to contain the correct values for the ELF file to work. For example, the section header table is completely ignored during loading and execution of an ELF file. If the section header table were to contain bad values (e.g. wrong offsets or wrong section sizes), the ELF file would execute just fine, but linking against the file would not work. Such bad values would also make the analysis in this post impossible. Similarly, if the ELF file has some sort of appended overlay that is not properly part of a section, these types of analyzes will not work. Garbage in, garbage out.
-
\x4D\x5A
(the MZ header) for Windows DOS executables files and\x7F\x45\x4C\x46
(the ELF header) for Linux ELF files. ↩