CopyToAsync Throughput Benchmark
In-memory-based streams have no asynchronous needs when used in most scenarios because all operations are synchronous and in-memory. However, there are times where the data in the stream needs to be copied to another stream instance with actual asynchronous behavior, such as a FileStream. While this may not be an everyday use case, it is a real-world situation. This is where the CopyToAsync
method comes into play.
This benchmark scenario uses stream instances that are instantiated as expandable (dynamic growth) streams. The stream is filled with random data, similar to the Bulk Fill and Read
scenario.
In each operation, the CopyToAsync
method copies data from the stream to another stream designed to mock an actual file I/O-based stream with asynchronous behavior.
Summary
The benchmark results show that for smaller streams, all the stream classes perform similarly in terms of throughput performance. MemoryStreamSlim
and RecyclableMemoryStream perform better than MemoryStream in terms of memory allocation performance. However, as the stream size increases, the MemoryStreamSlim
class performance stays consistent and deterministic, even as the internal chained memory segments slow things down slightly compared to MemoryStream
to copy the entire stream contents to the destination stream.
By far, the MemoryStream
class performs the worst in terms of memory allocation performance in this scenario under all conditions, but does perform better in terms of throughput performance compared to RecyclableMemoryStream
and MemoryStreamSlim
for larger stream sizes.
Once the stream size approaches 1MB, the RecyclableMemoryStream
class starts to perform very poorly in terms of throughput performance, and the performance rapidly deteriorates as the stream size increases further. Using the UseExponentialLargeBuffer
option did not affect the throughput performance in this scenario.
Given that file systems managed by the OS and related drivers employ a series of buffer and caching mechanisms, the emulation approach used in this benchmark is not a perfect representation of the actual performance of the CopyToAsync
method in a real-world local file-based scenario. However, it does provide a means to compare the performance of the different stream classes in a consistent and deterministic way for asynchronous I/O operations that do incur regular asynchronous latencies such as for network based files on file servers, etc.
Benchmark Operation
A single benchmark operation consists of performing a loop of steps that does the following:
- Create a new stream instance with a capacity set to the operation data size.
- Write the test data synchronously to the stream (either in a single write or segmented based on the BulkInitialFill parameter).
- Call CopyToAsync() on the stream passing a mock asynchronous File I/O stream destination.
- Dispose of the stream instance.
The number of loops in each operation is determined by the DataSize
parameter to keep each benchmark reasonably consistent in duration, but the loop count is always the same for all classes being compared for any given DataSize parameter value.
MemoryStreamSlim
and RecyclableMemoryStream
classes are created with the option to zero out memory buffers when they are no longer used disabled to keep the benchmark performance focused on the CopyToAsync()
call. The MemoryStream
class has no option to zero out memory buffers (used memory is always cleared - i.e. internal buffers are allocated with new byte[]
), so this parameter does not apply to that class.
Asynchronous Stream Emulation
A note on how the destination stream is used in the CopyToAsync
call.
The destination stream used in the CopyToAsync
call is a simple mock stream that emulates the behavior of an asynchronous I/O based stream. This is accomplished by using a MemoryStream
instance internally to manage the stream contents. Then each asynchronous operation on the mock stream class (ReadAsync, WriteAsync, CopyToAsync) is counted and on every 8th operation (or at least 10MB have been processed), an asynchronous delay is introduced to simulate the latency of an actual I/O operation. The delay time is calculated to emulate a data thoughput of 2GB/sec with a 0.5 ms latency for every separate asynchronous operation.
This is all done to provide a consistent and deterministic performance comparison between the different stream classes specifically for benchmarking and accentuating the impact of the number of internal asynchronous operations performed. Results using different I/O based streams will vary based on the actual I/O performance characteristics of the underlying system.
Benchmark Parameters
The following parameters were used in the benchmarks. These will appear as columns in the benchmark results along with the standard BenchmarkDotNet columns.
DataSize
The amount of data to write to the stream in each operation loop. The data is a byte array of the specified size.
CapacityOnCreate
When true
, the stream is instantiated with the current loop iteration data size as the initial capacity. When false
, the stream is created with the default capacity (no initial capacity specified). The results show no notable difference in performance between the two options, but is included in this benchmark to clarify that fact.
BulkInitialFill
When true
, the stream is initially filled with random data in a single bulk write operation. When false
, the stream is filled with random data in a loop of write operations. The initial stream data fill operation is similar to the operations used in the Bulk Fill and Read (BulkInitialFill is true) and Segmented Fill and Read (BulkInitialFill is false) benchmarks. The results show no notable difference in performance between the two options, but is included in this benchmark to clarify that fact.
Benchmark Results
The results of the benchmarks are found in the CopyToAsync()
benchmark output.
HTML Report
Since the benchmark results can create rather large tables, and the Markdown tables can be hard to absorb with the horizontal and vertical table scrolling, the results are also provided in a separate HTML file.
This can be found here.