Ministry of Environment of Québec (2011-2014) web archive collection derivatives

Nick Ruest
Web archive derivatives of the Ministry of Environment of Québec (2011-2014) collection from the Bibliothèque et Archives nationales du Québec. The derivatives were created with the Archives Unleashed Toolkit. Merci beaucoup banq! These derivatives are in the Apache Parquet format, which is a columnar storage format. These derivatives are generally small enough to work with on your local machine, and can be easily converted to Pandas DataFrames. See this notebook for examples. Domains .webpages().groupBy(ExtractDomainDF($"url").alias("url")).count().sort($"count".desc) ...
This data repository is not currently reporting usage information. For information on how your repository can submit usage information, please see our documentation.