Issue
I was testing a .NET application on a RaspberryPi and whereas each iteration of that program took 500 milliseconds on a Windows laptop, the same took 5 seconds on a RaspberryPi. After some debugging, I found that majority of that time was being spent on a foreach
loop concatenating strings.
Edit 1: To clarify, that 500 ms and 5 s time I mentioned was the time of the entire loop. I placed a timer before the loop, and stopped the timer after the loop had finished. And, the number of iterations are the same in both, 1000.
Edit 2: To time the loop, I used the answer mentioned here.
private static string ComposeRegs(List<list_of_bytes> registers)
{
string ret = string.Empty;
foreach (list_of_bytes register in registers)
{
ret += Convert.ToString(register.RegisterValue) + ",";
}
return ret;
}
Out of the blue I replaced the foreach
with a for
loop, and suddenly it starts taking almost the same time as it did on that laptop. 500 to 600 milliseconds.
private static string ComposeRegs(List<list_of_bytes> registers)
{
string ret = string.Empty;
for (UInt16 i = 0; i < 1000; i++)
{
ret += Convert.ToString(registers[i].RegisterValue) + ",";
}
return ret;
}
Should I always use for
loops instead of foreach
? Or was this just a scenario in which a for
loop is way faster than a foreach
loop?
Solution
The actual problem is concatenating strings not a difference between for
vs foreach
. The reported timings are excruciatingly slow even on a Raspberry Pi. 1000 items is so little data it can fit in either machine's CPU cache. An RPi has a 1+ GHZ CPU which means each concatenation takes at leas 1000 cycles.
The problem is the concatenation. Strings are immutable. Modifying or concatenating strings creates a new string. Your loops created 2000 temporary objects that need to be garbage collected. That process is expensive. Use a StringBuilder instead, preferably with a capacity
roughly equal to the size of the expected string.
[Benchmark]
public string StringBuilder()
{
var sb = new StringBuilder(registers.Count * 3);
foreach (list_of_bytes register in registers)
{
sb.AppendFormat("{0}",register.RegisterValue);
}
return sb.ToString();
}
Simply measuring a single execution, or even averaging 10 executions, won't produce valid numbers. It's quite possible the GC run to collect those 2000 objects during one of the tests. It's also quite possible that one of the tests was delayed by JIT compilation or any other number of reasons. A test should run long enough to produce stable numbers.
The defacto standard for .NET benchmarking is BenchmarkDotNet. That library will run each benchmark long enough to eliminate startup and cooldown effect and account for memory allocations and GC collections. You'll see not only how much each test takes but how much RAM is used and how many GCs are caused
To actually measure your code try using this benchmark using BenchmarkDotNet :
[MemoryDiagnoser]
[MarkdownExporterAttribute.StackOverflow]
public class ConcatTest
{
private readonly List<list_of_bytes> registers;
public ConcatTest()
{
registers = Enumerable.Range(0,1000).Select(i=>new list_of_bytes(i)).ToList();
}
[Benchmark]
public string StringBuilder()
{
var sb = new StringBuilder(registers.Count*3);
foreach (var register in registers)
{
sb.AppendFormat("{0}",register.RegisterValue);
}
return sb.ToString();
}
[Benchmark]
public string ForEach()
{
string ret = string.Empty;
foreach (list_of_bytes register in registers)
{
ret += Convert.ToString(register.RegisterValue) + ",";
}
return ret;
}
[Benchmark]
public string For()
{
string ret = string.Empty;
for (UInt16 i = 0; i < registers.Count; i++)
{
ret += Convert.ToString(registers[i].RegisterValue) + ",";
}
return ret;
}
}
The tests are run by calling BenchmarkRunner.Run<ConcatTest>()
using System.Text;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System.Linq;
public class Program
{
public static void Main(string[] args)
{
var summary = BenchmarkRunner.Run<ConcatTest>();
Console.WriteLine(summary);
}
}
Results
Running this on a Macbook produced the following results. Note that BenchmarkDotNet produced results ready to use in StackOverflow, and the runtime information is included in the results :
BenchmarkDotNet=v0.13.1, OS=macOS Big Sur 11.5.2 (20G95) [Darwin 20.6.0]
Intel Core i7-8750H CPU 2.20GHz (Coffee Lake), 1 CPU, 12 logical and 6 physical cores
.NET SDK=6.0.100
[Host] : .NET 6.0.0 (6.0.21.52210), X64 RyuJIT
DefaultJob : .NET 6.0.0 (6.0.21.52210), X64 RyuJIT
Method | Mean | Error | StdDev | Gen 0 | Gen 1 | Allocated |
-------------- |----------:|---------:|---------:|---------:|--------:|----------:|
StringBuilder | 34.56 μs | 0.682 μs | 0.729 μs | 7.5684 | 0.3052 | 35 KB |
ForEach | 278.36 μs | 5.509 μs | 5.894 μs | 818.8477 | 24.4141 | 3,763 KB |
For | 268.72 μs | 3.611 μs | 3.015 μs | 818.8477 | 24.4141 | 3,763 KB |
Both For
and ForEach
took almost 10 times more than StringBuilder
and used 100 times as much RAM
Answered By - Panagiotis Kanavos