Description of the issue
Running the default Mastr download() currently results in an error processing certain files - as of March 8th
- Error processing file 'Netze.xml': 'could not convert string to float: '334, 335''
- Error processing file 'EinheitenVerbrennung.xml': 'could not convert string to float: '2442, 2442''
As a result, the corresponding sqlite tables are empty.
The issue seems to be caused by the dtype check for "O" in replace_mastr_katalogeintraege. This check does not match the pandas string dtype introduced in pandas>=3.0
If the suggested solution outlined below aligns with your expectations, I would be happy to prepare a pull request.
Please let me know if there are any additional implications or edge cases that I may have overlooked.
Steps to Reproduce
- run default db.download()
- check if grids or combustion_extended tables in sqlite are empty
Ideas of solution
- adjust the if statement in
replace_mastr_katalogeintraege (utils_cleansing_bulk.py) to include pandas string_dtype, i.e. instead of if df[column_name].dtype == "O":
- use
if (pd.api.types.is_string_dtype(df[column_name]) or pd.api.types.is_object_dtype(df[column_name])):
- as the object dtype check is still included it should work the same as before with older pandas versions
Context and Environment
- Version used: v0.16.1 and latest commit on develop 297cd59 - specifically with pandas 3.0.1
- Operating system: unix/mac os
- Environment setup and (python) version: python 3.12, pandas 3.0.1
Workflow checklist
Description of the issue
Running the default Mastr download() currently results in an error processing certain files - as of March 8th
As a result, the corresponding sqlite tables are empty.
The issue seems to be caused by the dtype check for "O" in
replace_mastr_katalogeintraege. This check does not match the pandas string dtype introduced in pandas>=3.0If the suggested solution outlined below aligns with your expectations, I would be happy to prepare a pull request.
Please let me know if there are any additional implications or edge cases that I may have overlooked.
Steps to Reproduce
Ideas of solution
replace_mastr_katalogeintraege(utils_cleansing_bulk.py) to include pandas string_dtype, i.e. instead ofif df[column_name].dtype == "O":if (pd.api.types.is_string_dtype(df[column_name]) or pd.api.types.is_object_dtype(df[column_name])):Context and Environment
Workflow checklist