MangoPI MQ Dual boot speed
Here we discuss getting the MangoPI MQ Dual to boot to a prompt in 1.5 seconds
The MangoPI MQ Dual is a tiny board based
around the AllWinner T113-S3
This is an interesting processor because of its cost
(less than $5 USD in medium quantities at time of writing), integrated memory (128MB), and relatively full Linux support.
Because of the 'small' nature of this CPU, it lends itself to more deeply embedded projects.
However, when used in this manner the relatively 'heavy' nature of Linux can become an
issue, as it has impacts on build sizes, boot time etc...
In the instructions below we're building a completely open source system from a combination
of U-Boot, Linux kernel & Buildroot. This is all based on mainline upstream sources, which
gives a huge amount of assurance about on-going support and the quality of development. The
configuration of these pieces has been adjusted to minimise size (and functionality to some
degree as a consequence) to demonstrate a plausible minimal Linux system as a starting point
for building something. The focus is on boot speed.
Timing breakdown
The unit boots in the following rough phases/durations:
- 250ms: Loading U-Boot from the SD card, initialising DDR, running U-Boot
-
600ms: Loading Linux and root filesystem from the SD card and starting the Linux kernel
-
400ms: Linux kernel initialising all drivers and mounting the initramfs root filesystem
- 200ms: Userspace execution
The biggest aspects of this are really reading content from the SD card. Unfortunately the
U-Boot driver for the SunXI MMC device inside the AllWinner T113-S3 does not give us great
performance. We are currently seeing ~6MiB per second, on a card that should be capable of
many times that. If further improvements were needed, this would be the area to optimise.
Boot speed tips & tricks
There isn't anything particularly special about the image that is constructed here - the
fundamental C source code has not been change in any way. What we have done is tweak the
configurations in such a way to remove features we don't want, and adjust some default
values. Here are the main areas that have been adjusted:
- Baudrates
-
Embedded Linux systems typically use a serial port as their primary console for
development. This is normally not exposed on the final product. For various historic
reasons, the default for this is set to 115200 baud. This is horribly slow. Modern systems
have no problems with much higher baudrates on UARTs, some up to 12 Megabaud. For our
purposes, 1.5 Megabaud is a good balance between speed, compatibility and standardisation.
We also add the quiet
parameter to the Linux kernel
bootargs
to reduce its console output during boot. See
ArchLinux wiki for more
details. Full kernel logs are still available via dmesg
if required
- Configuration
-
U-Boot by default uses a 3 second boot delay to give you the opportunity to stop it at the
prompt to run adhoc boot commands. While retaining the ability to do this is helpful,
there is no need to have a delay at all. U-Boot will look for characters sent in the time
between power on and the completion of the board initialisation. Any character sent then
will interrupt the default boot if the boot delay value is set to 0. On the Mango Pi MQ
this still gives us a few hundred milliseconds, which is sufficient.
This has been adjusted by setting CONFIG_BOOTDELAY=0
in the U-Boot
configuration.
- Storage access
-
This covers two things - don't hit the storage more frequently than you need, and don't
store more than you need. To avoid hitting the storage too frequently we have dropped the
U-Boot programmable environment. So you cannot make non-volatile changes to U-Boot without
rebuilding U-Boot itself. This can actually be a benefit to reliability on embedded
systems as any issues in the U-Boot environment can render the unit unbootable. So
removing the ability to change it reduces some risk. It also means we don't have to look
to storage to find a U-Boot environment file.
We have combined the kernel, device tree and filesystem into a single image which is
loaded into memory in a continuous read.
We have reduced the functionality in the filesystem & kernel to the bare minimum, thus
reducing image size (3.5MB at this stage).
- Reduced functionality
-
We have removed as much as possible from the Linux kernel & Buildroot filesystem to bring
the image size down.
In Buildroot we are using a statically built
musl based busybox executable. This removes the
need for libc.so
, by integrating it with busybox. However if more binaries
are added, this would get duplicated, eventually no longer being worth while.
In Linux we have turned off module support, and removed various subsystems that we're
not using (wifi, sound etc...)
Other options
If the non-volatile root filesystem makes development awkward, moving over the using the
EXT4 partition directly would be an option. This could also have some benefits to boot speed
as only the needed files would be accessed during boot.
Hardware Details
Build Instructions
Pre-built binaries can be obtained from
AndreRenaud/buildroot-mangopi-mini/releases.
-
Clone the
Buildroot repository
$ git clone https://github.com/AndreRenaud/buildroot-mangopi-mini.git
-
Build the image
$ cd buildroot-mangopi-mini
buildroot-mangopi-mini$ make mangopi_mini_defconfig
buildroot-mangopi-mini$ make
... [verbose output removed] ...
buildroot-mangopi-mini$ $ ls -l output/images/
total 18308
-rwxr-xr-x 1 andre andre 17823 Sep 17 09:46 mango-mini.dtb
-rw-r--r-- 1 andre andre 3482347 Sep 17 09:47 mango-mini.ub
-rw-r--r-- 1 andre andre 1039872 Sep 17 09:47 rootfs.cpio
-rw-r--r-- 1 andre andre 493039 Sep 17 09:47 rootfs.cpio.zst
-rw-r--r-- 1 andre andre 268435456 Sep 17 09:47 rootfs.ext4
-rw-r--r-- 1 andre andre 1556480 Sep 17 09:47 rootfs.tar
-rw-r--r-- 1 andre andre 269484032 Sep 17 09:47 sdcard.img
-rw-r--r-- 1 andre andre 521812 Sep 17 09:46 u-boot.bin
-rw-r--r-- 1 andre andre 554644 Sep 17 09:46 u-boot-sunxi-with-spl.bin
-rw-r--r-- 1 andre andre 2969648 Sep 17 09:46 zImage
-
Flash a blank microSD card. Note: be careful that you have the correct device. Use
Balena Etcher if you're not sure
$ sudo dd if=output/images/sdcard.img of=/dev/sda bs=1M
-
Insert the uSD card into the slot on the MangoPI MQ, and power it on via the USB OTG
header
-
A new USB serial device should show up on your computer now. Connect to this using a
serial program (picocom, minicom putty etc...), at 115200 baud and you should see a
login prompt (username: "root", password: "" - blank)
$ picocom -b 115200 /dev/ttyACM0
picocom v3.1
... [verbose output removed] ...
Type [C-a] [C-h] to see available commands
Terminal ready
Welcome to Buildroot
buildroot login:
Timed Boot Log
This boot log is obtained by monitoring the hardware serial port of the Mango Pi on P8. As
we need the Linux kernel to initialise the USB peripheral, we cannot get the full boot
console over the USB port. This bootlog is generated using
GrabSerial. The first two column shows
the absolute time from boot at which the line of text started. The second column shows the
delta from the previous line. All values are in floating point seconds.
[0.000002 0.000002]
[0.000218 0.000216] U-Boot SPL 2024.10-rc1 (Sep 17 2024 - 11:21:52 +1200)
[0.000802 0.000584] DRAM: 128 MiB
[0.003019 0.002217] Trying to boot from MMC1
[0.199349 0.196330]
[0.199475 0.000126]
[0.199490 0.000015] U-Boot 2024.10-rc1 (Sep 17 2024 - 11:21:52 +1200) Allwinner Technology
[0.200530 0.001040]
[0.200546 0.000016] CPU: Allwinner RModel: MangoPi MQ-R-T113
[0.201054 0.000508] DRAM:
[0.224077 0.023023] Core: 39 devices, 17 uclasses, devicetree: separate
[0.224673 0.000596] WDT: Not starting watchdog@20500a0
[0.225035 0.000362] MMC: mmc@4020000: 0, mmc@4021000: 1
[0.231221 0.006186] Loading Environment from nowhere... OK
[0.231650 0.000429] In: serial@2500c00
[0.231819 0.000169] Out: serial@2500c00
[0.231979 0.000160] Err: serial@2500c00
[0.233635 0.001656] Net: No ethernet found.
[0.235225 0.001590] Hit any key to stop autoboot: 0
[0.837554 0.602329] 3482343 bytes read in 578 ms (5.7 MiB/s)
[0.838717 0.001163] ## Loading kernel from FIT Image at 45000000 ...
[0.839962 0.001245] Using 'config' configuration
[0.840907 0.000945] Trying 'kernel' kernel subimage
[0.841317 0.000410] Description: Linux kernel
[0.841675 0.000358] Type: Kernel Image
[0.842026 0.000351] Compression: uncompressed
[0.842383 0.000357] Data Start: 0x450000c8
[0.842728 0.000345] Data Size: 2969640 Bytes = 2.8 MiB
[0.843210 0.000482] Architecture: ARM
[0.843477 0.000267] OS: Linux
[0.843765 0.000288] Load Address: 0x42000000
[0.844101 0.000336] Entry Point: 0x42000000
[0.844451 0.000350] Verifying Hash Integrity ... OK
[0.844864 0.000413] ## Loading ramdisk from FIT Image at 45000000 ...
[0.845434 0.000570] Using 'config' configuration
[0.845811 0.000377] Trying 'ramdisk' ramdisk subimage
[0.846217 0.000406] Description: buildroot
[0.846515 0.000298] Type: RAMDisk Image
[0.846792 0.000277] Compression: uncompressed
[0.847060 0.000268] Data Start: 0x452d97a8
[0.847311 0.000251] Data Size: 493041 Bytes = 481.5 KiB
[0.847678 0.000367] Architecture: ARM
[0.847873 0.000195] OS: Linux
[0.848084 0.000211] Load Address: unavailable
[0.848452 0.000368] Entry Point: unavailable
[0.848744 0.000292] Verifying Hash Integrity ... OK
[0.849067 0.000323] ## Loading fdt from FIT Image at 45000000 ...
[0.849483 0.000416] Using 'config' configuration
[0.849766 0.000283] Trying 'fdt' fdt subimage
[0.850030 0.000264] Description: Flattened Device Tree blob
[0.850456 0.000426] Type: Flat Device Tree
[0.850762 0.000306] Compression: uncompressed
[0.851083 0.000321] Data Start: 0x452d519c
[0.851346 0.000263] Data Size: 17823 Bytes = 17.4 KiB
[0.851700 0.000354] Architecture: ARM
[0.851899 0.000199] Verifying Hash Integrity ... OK
[0.852197 0.000298] Booting using the fdt blob at 0x452d519c
[0.852549 0.000352] Loading Kernel Image to 42000000
[0.852805 0.000256] Loading Ramdisk to 47cf2000, end 47d6a5f1 ... OK
[0.853170 0.000365] Loading Device Tree to 47cea000, end 47cf159e ... OK
[0.860799 0.007629]
[0.860817 0.000018] Starting kernel ...
[0.860983 0.000166]
[1.188615 0.327632] [ 0.001454] /cpus/cpu@0 missing clock-frequency property
[1.209622 0.021007] [ 0.001513] /cpus/cpu@1 missing lock-frequency property
[1.364434 0.154812] Saving 256 bits of non-creditable seed for next boot
[1.369073 0.004639] Starting syslogd: OK
[1.379926 0.010853] Starting klogd: OK
[1.383684 0.003758] Running sysctl: OK
[1.398598 0.014914] Starting network: OK
[1.416731 0.018133] Starting crond: OK
[1.521718 0.104987]
[1.521754 0.000036] Welcome to Buildroot
[1.521923 0.000169] buildroot login:
Benchmarks & Hardware Testing
Mhz
# mhz
count=394821 us50=19992 us250=98404 diff=78412 cpu_MHz=1007.042
Dhrystone
# dhrystone 50000000
Dhrystone Benchmark, Version 2.1 (Language: C)
Program compiled without 'register' attribute
Execution starts, 50000000 runs through Dhrystone
Execution ends
Final values of the variables used in the benchmark:
Int_Glob: 5
should be: 5
Bool_Glob: 1
should be: 1
Ch_1_Glob: A
should be: A
Ch_2_Glob: B
should be: B
Arr_1_Glob[8]: 7
should be: 7
Arr_2_Glob[8][7]: 50000010
should be: Number_Of_Runs + 10
Ptr_Glob->
Ptr_Comp: -1224957888
should be: (implementation-dependent)
Discr: 0
should be: 0
Enum_Comp: 2
should be: 2
Int_Comp: 17
should be: 17
Str_Comp: DHRYSTONE PROGRAM, SOME STRING
should be: DHRYSTONE PROGRAM, SOME STRING
Next_Ptr_Glob->
Ptr_Comp: -1224957888
should be: (implementation-dependent), same as above
Discr: 0
should be: 0
Enum_Comp: 1
should be: 1
Int_Comp: 18
should be: 18
Str_Comp: DHRYSTONE PROGRAM, SOME STRING
should be: DHRYSTONE PROGRAM, SOME STRING
Int_1_Loc: 5
should be: 5
Int_2_Loc: 13
should be: 13
Int_3_Loc: 7
should be: 7
Enum_Loc: 1
should be: 1
Str_1_Loc: DHRYSTONE PROGRAM, 1'ST STRING
should be: DHRYSTONE PROGRAM, 1'ST STRING
Str_2_Loc: DHRYSTONE PROGRAM, 2'ND STRING
should be: DHRYSTONE PROGRAM, 2'ND STRING
Microseconds for one run through Dhrystone: 0.3
Dhrystones per Second: 3302510.0
Whetstone
# whetstone 500000
Loops: 500000, Iterations: 1, Duration: 24 sec.
C Converted Double Precision Whetstones: 2083.3 MIPS
RamSMP
Note: These tests take quite a while to run. Also, due to the limited RAM available on this
cpu, the kernel out-of-memory (OOM) killer will kill off these tests if run at full memory
usage. So we have shrunk the usage a bit in some tests.
# ramsmp -b 1
RAMspeed/SMP (GENERIC) v3.5.0 by Rhett M. Hollander and Paul V. Bolotoff, 2002-09
8Gb per pass mode, 2 processes
INTEGER & WRITING 1 Kb block: 10015.62 MB/s
INTEGER & WRITING 2 Kb block: 9667.79 MB/s
INTEGER & WRITING 4 Kb block: 6742.15 MB/s
INTEGER & WRITING 8 Kb block: 7311.32 MB/s
INTEGER & WRITING 16 Kb block: 7025.77 MB/s
INTEGER & WRITING 32 Kb block: 7796.73 MB/s
INTEGER & WRITING 64 Kb block: 3464.04 MB/s
INTEGER & WRITING 128 Kb block: 3688.21 MB/s
INTEGER & WRITING 256 Kb block: 3245.94 MB/s
INTEGER & WRITING 512 Kb block: 3031.65 MB/s
INTEGER & WRITING 1024 Kb block: 2927.86 MB/s
INTEGER & WRITING 2048 Kb block: 2905.54 MB/s
INTEGER & WRITING 4096 Kb block: 2892.17 MB/s
INTEGER & WRITING 8192 Kb block: 2885.02 MB/s
INTEGER & WRITING 16384 Kb block: 2874.86 MB/s
INTEGER & WRITING 32768 Kb block: 2858.14 MB/s
# ramsmp -b 2
RAMspeed/SMP (GENERIC) v3.5.0 by Rhett M. Hollander and Paul V. Bolotoff, 2002-09
8Gb per pass mode, 2 processes
INTEGER & READING 1 Kb block: 7583.74 MB/s
INTEGER & READING 2 Kb block: 7639.73 MB/s
INTEGER & READING 4 Kb block: 7668.49 MB/s
INTEGER & READING 8 Kb block: 7681.87 MB/s
INTEGER & READING 16 Kb block: 7665.73 MB/s
INTEGER & READING 32 Kb block: 6562.60 MB/s
INTEGER & READING 64 Kb block: 5887.65 MB/s
INTEGER & READING 128 Kb block: 4137.91 MB/s
INTEGER & READING 256 Kb block: 2374.20 MB/s
INTEGER & READING 512 Kb block: 2507.72 MB/s
INTEGER & READING 1024 Kb block: 2589.00 MB/s
INTEGER & READING 2048 Kb block: 2544.78 MB/s
INTEGER & READING 4096 Kb block: 2529.61 MB/s
INTEGER & READING 8192 Kb block: 2528.81 MB/s
INTEGER & READING 16384 Kb block: 2525.78 MB/s
INTEGER & READING 32768 Kb block: 2523.67 MB/s
# ramsmp -b 3 -m 16 -g 1
RAMspeed/SMP (GENERIC) v3.5.0 by Rhett M. Hollander and Paul V. Bolotoff, 2002-09
1Gb per pass mode, 2 processes
INTEGER Copy: 1308.69 MB/s
INTEGER Scale: 1632.49 MB/s
INTEGER Add: 956.67 MB/s
INTEGER Triad: 914.97 MB/s
---
INTEGER AVERAGE: 1203.21 MB/s
# ramsmp -b 4 -m 16 -g 1
RAMspeed/SMP (GENERIC) v3.5.0 by Rhett M. Hollander and Paul V. Bolotoff, 2002-09
1Gb per pass mode, 2 processes
FL-POINT & WRITING 1 Kb block: 10963.46 MB/s
FL-POINT & WRITING 2 Kb block: 10784.90 MB/s
FL-POINT & WRITING 4 Kb block: 7000.56 MB/s
FL-POINT & WRITING 8 Kb block: 8262.41 MB/s
FL-POINT & WRITING 16 Kb block: 7400.96 MB/s
FL-POINT & WRITING 32 Kb block: 7745.64 MB/s
FL-POINT & WRITING 64 Kb block: 3309.73 MB/s
FL-POINT & WRITING 128 Kb block: 3616.63 MB/s
FL-POINT & WRITING 256 Kb block: 3220.54 MB/s
FL-POINT & WRITING 512 Kb block: 3073.26 MB/s
FL-POINT & WRITING 1024 Kb block: 2971.77 MB/s
FL-POINT & WRITING 2048 Kb block: 2961.22 MB/s
FL-POINT & WRITING 4096 Kb block: 2934.71 MB/s
FL-POINT & WRITING 8192 Kb block: 2914.26 MB/s
FL-POINT & WRITING 16384 Kb block: 2868.20 MB/s
# ramsmp -b 5 -m 16 -g 1
RAMspeed/SMP (GENERIC) v3.5.0 by Rhett M. Hollander and Paul V. Bolotoff, 2002-09
1Gb per pass mode, 2 processes
FL-POINT & READING 1 Kb block: 6505.88 MB/s
FL-POINT & READING 2 Kb block: 6536.87 MB/s
FL-POINT & READING 4 Kb block: 6552.30 MB/s
FL-POINT & READING 8 Kb block: 6552.54 MB/s
FL-POINT & READING 16 Kb block: 6537.00 MB/s
FL-POINT & READING 32 Kb block: 5605.81 MB/s
FL-POINT & READING 64 Kb block: 3599.00 MB/s
FL-POINT & READING 128 Kb block: 2706.26 MB/s
FL-POINT & READING 256 Kb block: 2836.77 MB/s
FL-POINT & READING 512 Kb block: 2459.71 MB/s
FL-POINT & READING 1024 Kb block: 2506.30 MB/s
FL-POINT & READING 2048 Kb block: 2490.05 MB/s
FL-POINT & READING 4096 Kb block: 2498.23 MB/s
FL-POINT & READING 8192 Kb block: 2514.75 MB/s
FL-POINT & READING 16384 Kb block: 2535.22 MB/s
# ramsmp -b 6 -m 16 -g 1
RAMspeed/SMP (GENERIC) v3.5.0 by Rhett M. Hollander and Paul V. Bolotoff, 2002-09
1Gb per pass mode, 2 processes
FL-POINT Copy: 1022.98 MB/s
FL-POINT Scale: 858.64 MB/s
FL-POINT Add: 1579.09 MB/s
FL-POINT Triad: 929.05 MB/s
---
FL-POINT AVERAGE: 1097.44 MB/s
Wifi
This assumes that the iperf3
server is running elsewhere on the network, and
that the MangoPi wifi has been configured appropriately.
# iperf3 -c 192.168.2.140
Connecting to host 192.168.2.140, port 5201
[ 5] local 192.168.2.38 port 43774 connected to 192.168.2.140 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 2.12 MBytes 17.8 Mbits/sec 0 137 KBytes
[ 5] 1.00-2.00 sec 4.12 MBytes 34.6 Mbits/sec 0 303 KBytes
[ 5] 2.00-3.00 sec 4.38 MBytes 36.7 Mbits/sec 0 342 KBytes
[ 5] 3.00-4.00 sec 4.50 MBytes 37.7 Mbits/sec 0 342 KBytes
[ 5] 4.00-5.00 sec 4.00 MBytes 33.6 Mbits/sec 0 342 KBytes
[ 5] 5.00-6.00 sec 4.38 MBytes 36.7 Mbits/sec 0 361 KBytes
[ 5] 6.00-7.00 sec 4.50 MBytes 37.7 Mbits/sec 0 361 KBytes
[ 5] 7.00-8.00 sec 4.38 MBytes 36.7 Mbits/sec 0 380 KBytes
[ 5] 8.00-9.00 sec 4.00 MBytes 33.6 Mbits/sec 0 380 KBytes
[ 5] 9.00-10.00 sec 4.12 MBytes 34.6 Mbits/sec 0 380 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 40.5 MBytes 34.0 Mbits/sec 0 sender
[ 5] 0.00-10.09 sec 40.0 MBytes 33.2 Mbits/sec receiver
iperf Done.