Skip to content

perf(storage): use google_crc32c.value for checksums#16719

Draft
zhixiangli wants to merge 2 commits intogoogleapis:mainfrom
zhixiangli:zhixiangli/refactor/asyncio-crc32c-value
Draft

perf(storage): use google_crc32c.value for checksums#16719
zhixiangli wants to merge 2 commits intogoogleapis:mainfrom
zhixiangli:zhixiangli/refactor/asyncio-crc32c-value

Conversation

@zhixiangli
Copy link
Copy Markdown
Contributor

@zhixiangli zhixiangli commented Apr 20, 2026

Updates _ReadResumptionStrategy to use google_crc32c.value(data) instead of manually converting Checksum(data).digest() to an integer.

Updated unit tests to mock google_crc32c.value accordingly.

Test code

import timeit
import os
import google_crc32c
from google_crc32c import Checksum

def method1(data):
    return int.from_bytes(Checksum(data).digest(), "big")

def method2(data):
    return google_crc32c.value(data)

def benchmark():
    # Testing larger sizes and more iterations
    # 1KB, 1MB, 10MB, 100MB
    data_sizes = [1024, 1024 * 1024, 10 * 1024 * 1024, 100 * 1024 * 1024] 
    
    for size in data_sizes:
        data = os.urandom(size)
        print(f"\nData size: {size / (1024*1024):.2f} MB" if size >= 1024*1024 else f"\nData size: {size / 1024:.2f} KB")
        
        # Correctness check
        res1 = method1(data)
        res2 = method2(data)
        assert res1 == res2, f"Failed for size {size}: {res1} != {res2}"
        print("Assertion passed: results are equal.")
        
        # Increase iterations for more stable results, with a minimum of 1000
        if size <= 1024:
            number = 100000
        elif size <= 1024 * 1024:
            number = 10000
        else:
            number = 1000
        
        t1 = timeit.timeit(lambda: method1(data), number=number)
        t2 = timeit.timeit(lambda: method2(data), number=number)
        
        avg1 = t1 / number
        avg2 = t2 / number
        
        print(f"Method 1 (Checksum(data).digest()): {t1:.6f} s total ({avg1:.8f} s/call)")
        print(f"Method 2 (google_crc32c.value(data)): {t2:.6f} s total ({avg2:.8f} s/call)")
        print(f"Improvement: {(t1 - t2) / t1 * 100:.2f}%")

if __name__ == "__main__":
    print(f"google_crc32c implementation: {google_crc32c.implementation}")
    benchmark()

Output

google_crc32c implementation: c

Data size: 1.00 KB
Assertion passed: results are equal.
Method 1 (Checksum(data).digest()): 0.046088 s total (0.00000046 s/call)
Method 2 (google_crc32c.value(data)): 0.016062 s total (0.00000016 s/call)
Improvement: 65.15%

Data size: 1.00 MB
Assertion passed: results are equal.
Method 1 (Checksum(data).digest()): 0.464044 s total (0.00004640 s/call)
Method 2 (google_crc32c.value(data)): 0.439121 s total (0.00004391 s/call)
Improvement: 5.37%

Data size: 10.00 MB
Assertion passed: results are equal.
Method 1 (Checksum(data).digest()): 0.450953 s total (0.00045095 s/call)
Method 2 (google_crc32c.value(data)): 0.445793 s total (0.00044579 s/call)
Improvement: 1.14%

Data size: 100.00 MB
Assertion passed: results are equal.
Method 1 (Checksum(data).digest()): 6.287893 s total (0.00628789 s/call)
Method 2 (google_crc32c.value(data)): 6.095833 s total (0.00609583 s/call)
Improvement: 3.05%

Updates _ReadResumptionStrategy to use google_crc32c.value(data) instead
of manually converting Checksum(data).digest() to an integer.

Updated unit tests to mock google_crc32c.value accordingly.
@zhixiangli zhixiangli requested a review from a team as a code owner April 20, 2026 11:41
@zhixiangli zhixiangli marked this pull request as draft April 20, 2026 11:41
@zhixiangli zhixiangli changed the title refactor(asyncio): use google_crc32c.value for checksums refactor(storage): use google_crc32c.value for checksums Apr 20, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the CRC32C checksum calculation in the asyncio reads resumption strategy by replacing the Checksum class usage with the more direct google_crc32c.value function. The associated unit tests have been updated to mock this new function and validate integer results instead of byte digests. I have no feedback to provide.

@zhixiangli zhixiangli changed the title refactor(storage): use google_crc32c.value for checksums perf(storage): use google_crc32c.value for checksums Apr 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant