There are four main fields in the food.gov.uk search results :
Each entry could be a single product or multiple products, it is not always obvious from the description which is which.
Clicking into a result presents an opportunity to scrape more product information, however there is no consistent format in how
products are presented, some had product content in paragraphs with
breaks, some has product content in html tables.
This means alerts with multiple products will have to edited by hand.
| Column | Non-Null | Count | Dtype | Notes |
|---|---|---|---|---|
| Source | 748 | non-null | object | Where data was sourced, food.gov.uk or nationalarchives.gov.uk |
| Date | 748 | non-null | object | Date the notice was issued |
| Alert_Type | 748 | non-null | object | Allergen or Safety Alert |
| Title | 748 | non-null | object | Alert Title Text |
| BodyText | 748 | non-null | object | Alert Body text (usually has more information) |
| usage: | 44.0+ | KB | None |
Following on from Exploratory Data Analysis, Data Cleansing and Feature Engineering the data structure has been supplemented with additional fields for analysis.
| Column | Non-Null | Count | Dtype | Engineered Feature Notes |
|---|---|---|---|---|
| Date | 751 | non-null | datetime64[ns] | Datetime conversion |
| datetime | 751 | non-null | datetime64[ns] | None |
| year | 751 | non-null | int64 | Datetime conversion |
| month | 751 | non-null | object | Datetime conversion |
| Alert_Type | 751 | non-null | object | None |
| Product_category | 751 | non-null | object | Edited by hand |
| Product_Type | 751 | non-null | object | Edited by hand |
| Title | 751 | non-null | object | None |
| BodyText | 751 | non-null | object | None |
| Supplier | 751 | non-null | object | Extract from Title or Body text |
| Product | 751 | non-null | object | Extract from Title or Body text |
| Risk | 751 | non-null | object | Extract from Title or Body text |
| Pathogen | 191 | non-null | object | Extract from Title or Body text |
| Allergen | 269 | non-null | object | Extract from Title or Body text |
| Foreign_Material | 159 | non-null | object | Extract from Title or Body text |
| Other | 131 | non-null | object | Extract from Title or Body text |
| month_num | 751 | non-null | int32 | Datetime conversion |
| month_name | 751 | non-null | object | Datetime conversion |
| year_month | 751 | non-null | object | Datetime conversion |
| datetime64[ns](2), | int32(1), | int64(1), | object(15) | |
| usage: | 108.7+ | KB | None |