Maintenance of the data is carried out through the web scrape front end.
The Web Scrape front end is an app developed in Python that handles the scraping, cleansing and feature engineering of the incoming data from the fsa.gov.uk web site.
While the identification of new alerts and push button scraping is fully automated, a human needs to review and edit the incoming data. The manual intervention
improves the quality of the dataset considerably, adding additional features to the dataset not currently maintained by the FSA.
For example :
Tesco are recalling Tesco Finest 6 All Butter Pastry Mince Pies because
they may contain pieces of dried glue from packaging which makes them unsafe to eat.
Additional features that can be deduced or inferred from this information:
Brand : in this example would be Tesco
Supplier : is unknown
Supplier Type : is unknown. Obviously someone makes these for Tesco, but who exactly is not noted in this instance.
Outlet : Tesco - The outlet the consumer would buy from.
Outlet Type : Grocer
Product Category : Bread or Baked Goods
Product Type : Bread or Baked Goods
Other Contaminant : Bits of Glue
Another example from a recent alert:
FGS Ingredients Ltd is recalling a number of products containing mustard powder because they may contain peanuts. This means these products
are a possible health risk for anyone with an allergy to peanuts. These products are sold under several different brand names at several
different retail stores.
As part of the scrape process, additional product details are included :
** Frozen Iceland Takeaway Chinese Style Chicken Curry - 375g
Brand : Iceland
Supplier : FGS Ingredients
Supplier Type : Manufacturer.
Outlet : Iceland
Outlet Type : Grocer
Product Category : Ready Meal / Ready to Eat
Product Type : Ready Meal / Ready to Eat
Allergen Contaminant: nuts
This is simplified through the Web Scrape front end.
Once the data has been collated and added to the FSA Safety Alert data set, analysis and the subsequent production of the
visualisations and infographic along with publishing to the dev and public server are automatic.
The master list
The master list contains the data we have, front page search results + edited product information
Web Scrape new Data
The master list is updated by initiating a fresh scrape of the search results from the food.gov.uk web site. Any new alerts not already in the master
list are added to the master list and identifed:
Scrape Alert Notice Detail
The user initiates a web scrape of the alert notice which populates as much of the data set as possible, leaving the remaining fields to be hand edited.
Search FSA Data
Searching the FSA data also produces a time series plot showing each month and year a notice was issued for the search term.
Visualisations
A suite of visualisations have been developed that automatically update whenever new data is added to the FSA Safety Alert dataset.
Infographic
The infographic is automatically maintained and published whenever the dataset is updated. The main SVG file is automatically amended with new values
using the lxml python library, converted to PNG format using the cairosvg library then published to the local dev and public server. This does require
a human to review changes whenever there are significant new alerts.