Monday, November 29, 2021

[SOLVED] C# foreach loop comically slower than for loop on a RaspberryPi

November 29, 2021 .net, c#, raspberry-pi, raspberry-pi4

Issue

I was testing a .NET application on a RaspberryPi and whereas each iteration of that program took 500 milliseconds on a Windows laptop, the same took 5 seconds on a RaspberryPi. After some debugging, I found that majority of that time was being spent on a foreach loop concatenating strings.

Edit 1: To clarify, that 500 ms and 5 s time I mentioned was the time of the entire loop. I placed a timer before the loop, and stopped the timer after the loop had finished. And, the number of iterations are the same in both, 1000.

Edit 2: To time the loop, I used the answer mentioned here.

private static string ComposeRegs(List<list_of_bytes> registers)
{
    string ret = string.Empty;
    foreach (list_of_bytes register in registers)
    {
        ret += Convert.ToString(register.RegisterValue) + ",";
    }
    return ret;
}

Out of the blue I replaced the foreach with a for loop, and suddenly it starts taking almost the same time as it did on that laptop. 500 to 600 milliseconds.

private static string ComposeRegs(List<list_of_bytes> registers)
{
    string ret = string.Empty;
    for (UInt16 i = 0; i < 1000; i++)
    {
        ret += Convert.ToString(registers[i].RegisterValue) + ",";
    }
    return ret;
}

Should I always use for loops instead of foreach? Or was this just a scenario in which a for loop is way faster than a foreach loop?

Solution

The actual problem is concatenating strings not a difference between for vs foreach. The reported timings are excruciatingly slow even on a Raspberry Pi. 1000 items is so little data it can fit in either machine's CPU cache. An RPi has a 1+ GHZ CPU which means each concatenation takes at leas 1000 cycles.

The problem is the concatenation. Strings are immutable. Modifying or concatenating strings creates a new string. Your loops created 2000 temporary objects that need to be garbage collected. That process is expensive. Use a StringBuilder instead, preferably with a capacity roughly equal to the size of the expected string.

    [Benchmark]
    public string StringBuilder()
    {
        var sb = new StringBuilder(registers.Count * 3);
        foreach (list_of_bytes register in registers)
        {
            sb.AppendFormat("{0}",register.RegisterValue);
        }
        return sb.ToString();
    }

Simply measuring a single execution, or even averaging 10 executions, won't produce valid numbers. It's quite possible the GC run to collect those 2000 objects during one of the tests. It's also quite possible that one of the tests was delayed by JIT compilation or any other number of reasons. A test should run long enough to produce stable numbers.

The defacto standard for .NET benchmarking is BenchmarkDotNet. That library will run each benchmark long enough to eliminate startup and cooldown effect and account for memory allocations and GC collections. You'll see not only how much each test takes but how much RAM is used and how many GCs are caused

To actually measure your code try using this benchmark using BenchmarkDotNet :

[MemoryDiagnoser]
[MarkdownExporterAttribute.StackOverflow]
public class ConcatTest
{

    private readonly List<list_of_bytes> registers;


    public ConcatTest()
    {
        registers = Enumerable.Range(0,1000).Select(i=>new list_of_bytes(i)).ToList();
    }

    [Benchmark]
    public string StringBuilder()
    {
        var sb = new StringBuilder(registers.Count*3);
        foreach (var register in registers)
        {
            sb.AppendFormat("{0}",register.RegisterValue);
        }
        return sb.ToString();
    }

    [Benchmark]
    public string ForEach()
    {
        string ret = string.Empty;
        foreach (list_of_bytes register in registers)
        {
            ret += Convert.ToString(register.RegisterValue) + ",";
        }
        return ret;
    }

    [Benchmark]
    public string For()
    {
        string ret = string.Empty;
        for (UInt16 i = 0; i < registers.Count; i++)
        {
            ret += Convert.ToString(registers[i].RegisterValue) + ",";
        }
        return ret;
    }

}

The tests are run by calling BenchmarkRunner.Run<ConcatTest>()

using System.Text;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System.Linq;

public class Program
{
    public static void Main(string[] args)
    {
        var summary = BenchmarkRunner.Run<ConcatTest>();
        Console.WriteLine(summary);
    }
}

Results

Running this on a Macbook produced the following results. Note that BenchmarkDotNet produced results ready to use in StackOverflow, and the runtime information is included in the results :

BenchmarkDotNet=v0.13.1, OS=macOS Big Sur 11.5.2 (20G95) [Darwin 20.6.0]
Intel Core i7-8750H CPU 2.20GHz (Coffee Lake), 1 CPU, 12 logical and 6 physical cores
.NET SDK=6.0.100
  [Host]     : .NET 6.0.0 (6.0.21.52210), X64 RyuJIT
  DefaultJob : .NET 6.0.0 (6.0.21.52210), X64 RyuJIT


        Method |      Mean |    Error |   StdDev |    Gen 0 |   Gen 1 | Allocated |
-------------- |----------:|---------:|---------:|---------:|--------:|----------:|
 StringBuilder |  34.56 μs | 0.682 μs | 0.729 μs |   7.5684 |  0.3052 |     35 KB |
       ForEach | 278.36 μs | 5.509 μs | 5.894 μs | 818.8477 | 24.4141 |  3,763 KB |
           For | 268.72 μs | 3.611 μs | 3.015 μs | 818.8477 | 24.4141 |  3,763 KB |

Both For and ForEach took almost 10 times more than StringBuilder and used 100 times as much RAM

Answered By - Panagiotis Kanavos

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Monday, November 29, 2021

[SOLVED] C# foreach loop comically slower than for loop on a RaspberryPi

Issue

Solution

Popular Posts

Labels