![]()
Duplicate Rows - There are no duplicated Rows
Strip - Strip leading / trailing spaces from df
Clean up mistyped accent characters, inconsistent punctuation, rename single use of country names, rename long country names to shorter versions to improve visualisation readability.
| The Netherlands | Netherlands | |
| Republic of Korea | South Korea | |
| United Arab Emirates | UAE | |
| Russian Federation | Russia | |
| United Kingdom in respect of Northern Ireland | UK | |
| United Kingdom | UK | |
| United States | USA | Türkiye | Turkey |
| Democratic People's Republic of Korea | North Korea | |
| People's Republic of China | China |
Economic Area| Bulgaria | EU Countries | |
| Romania | EU Countries | |
| Czechia | EU Countries | |
| Denmark | EU Countries | |
| Hungary | EU Countries | |
| Poland | EU Countries | |
| Sweden | EU Countries | |
| Croatia | EU Countries | |
| Cyprus | EU Countries | |
| Ireland | EU Countries | |
| Austria | EU Countries | |
| Belgium | EU Countries | |
| Estonia | EU Countries | |
| Finland | EU Countries | |
| France | EU Countries | |
| Germany | EU Countries | |
| Greece | EU Countries | |
| Italy | EU Countries | |
| Latvia | EU Countries | |
| Lithuania | EU Countries | |
| Luxembourg | EU Countries | |
| Malta | EU Countries | |
| The Netherlands | EU Countries | |
| Portugal | EU Countries | |
| Slovakia | EU Countries | |
| Slovenia | EU Countries | |
| Spain | EU Countries | |
| China | China |
It's worthwhile to improve the quality of relevant data where possible for larger product categories.
risk_category feature from Risk type columnThere are 218 variations in the `Risk type` column
Each entry is a comma separated list of risks.
The first entry in a list is chosen as the main category, i.e `Burns, Chemical, Damage to hearing` is mapped to `Burns`
A Future task is to split the comma separated variables into binary columns
Risk type field is feature engineered to risk_category, the result is 14 risk categories
| risk_category | perc_of_total | |
|---|---|---|
| Injuries | 27.28 | |
| Chemical | 26.93 | |
| Breathing Obstruction | 19.08 | |
| Electric Shock | 11.74 | |
| Burns | 5.02 | |
| Fire | 4.37 | |
| Environment | 1.81 | |
| Eyesight Damage | 1.23 | |
| Microbiological | 0.99 | |
| Hearing Damage | 0.88 | |
| Entrapment | 0.32 | |
| Drowning | 0.31 | |
| Health Other | 0.03 |
Category to product_category OECD portal category : there are 33 categories, however there are a lot of missing values
Category is complete with no missing values and has 80 categories.
The top 12 entries in category represent > 90% of the total number of alerts issued.
product_category will detail the top12 + others listed as other.
product_category values (12 x named + other)| Toys | 8793 | |
| Clothing & Fashion | 5337 | |
| Automotive | 5317 | |
| Electrical Appliances | 3443 | |
| other | 3092 | |
| Cosmetics | 2525 | |
| Lighting | 1831 | |
| Childcare Equipment | 1295 | |
| Chemicals | 979 | |
| Jewellery | 907 | |
| Protective equipment | 826 | |
| Hobby \ Sports | 628 | |
| Machinery | 535 |
country_top15| Idx | country_top15 | count | perc_of_total | |
|---|---|---|---|---|
| 0 | China | 13866 | 39.05 | |
| 1 | Unknown | 9173 | 25.83 | |
| 2 | other | 2994 | 8.43 | |
| 3 | Germany | 2333 | 6.57 | |
| 4 | Italy | 1263 | 3.56 | |
| 5 | France | 1014 | 2.86 | |
| 6 | Turkey | 937 | 2.64 | |
| 7 | United States of America | 859 | 2.42 | |
| 8 | Japan | 851 | 2.40 | |
| 9 | United Kingdom | 648 | 1.82 | |
| 10 | Poland | 492 | 1.39 | |
| 11 | India | 329 | 0.93 | |
| 12 | Spain | 315 | 0.89 | |
| 13 | Taiwan | 222 | 0.63 | |
| 14 | Sweden | 212 | 0.60 |
Compulsory measures / Voluntary measures columnsThe Compulsory measures / Voluntary measures columns have detailed not delimited data,
sometimes with multiple entries, indicating an escalation in measures.
To simplify the analysis and encoding, the encoding was matched in order of severity.
(I'd make a recommendation to the EU team to re-analyse the requirements of the database and improve this area of their record keeping.)
e.g.
Retailer: Withdrawal of the product from the market:27/05/2020 Retailer:Recall of the product from end users:27/05/2020 Distributor:Ban on the marketing of the product and any accompanying measures:27/05/2020
This example would be split into {compulsory/voluntary}{responsible_party} {action required}
e.g.
{compulsory/voluntary responsible party} options : unknown, retailer, distributor, importer, manufacturer, other
{compulsory/voluntary action required} options : Ban, recall, withdrawal, destruction, removal, criminal, borderstop, warnings
compulsory_withdrawal |
||
compulsory_recall |
||
compulsory_ban |
||
compulsory_destruction |
||
compulsory_removal |
||
compulsory_criminal |
||
compulsory_borderstop |
||
compulsory_warnings |
||
compulsory_responsible_unknown |
||
compulsory_responsible_retail |
||
compulsory_responsible_disti |
||
compulsory_responsible_importer |
||
compulsory_responsible_manufacturer |
||
compulsory_responsible_other |
Compulsory measures -> Binary columns encodingSet binary values in column if true else false
Since multiple values can exist in each field, the order is relevant, we want to indicate severity and the simple contains search will match any instance, therefore, most import is the last search since it will overwrite previous matches
| Severity (1 low) |
Action |
% of Alerts |
Qty |
|
|---|---|---|---|---|
| 8 | Criminal | 0.01% | 2 | |
| 7 | Removal from sale | 0.31% | 111 | |
| 6 | destruction | 1.38% | 490 | |
| 5 | ban | 11.71% | 4,157 | |
| 4 | recall | 11.17% | 3966 | |
| 3 | withdrawal | 29.08% | 10,324 | |
| 2 | borderstop | 4.32% | 1,533 | |
| 1 | warnings | 1.41% | 502 |
| idx | Column | Total | Percentage of Total | |
|---|---|---|---|---|
| 1 | voluntary_recall | 8475 | 51.56 | |
| 0 | voluntary_withdrawal | 5860 | 35.65 | |
| 4 | voluntary_removal | 791 | 4.81 | |
| 3 | voluntary_destruction | 656 | 3.99 | |
| 7 | voluntary_warnings | 367 | 2.23 | |
| 2 | voluntary_ban | 259 | 1.58 | |
| 6 | voluntary_borderstop | 29 | 0.18 | |
| 5 | voluntary_criminal | 0 | 0.00 |
| idx | Column | Total | Percentage of Total | |
|---|---|---|---|---|
| 0 | voluntary_responsible_unknown | 20178 | 52.83 | |
| 5 | voluntary_responsible_other | 6825 | 17.87 | |
| 4 | voluntary_responsible_manufacturer | 4317 | 11.30 | |
| 3 | voluntary_responsible_importer | 3448 | 9.03 | |
| 2 | voluntary_responsible_disti | 2469 | 6.46 | |
| 1 | voluntary_responsible_retail | 954 | 2.50 |
The reuse policy of European Commission documents is implemented by the Commission Decision 2011/833/EU of 12 December 2011 on the reuse of Commission documents (OJ L 330, 14.12.2011, p. 39). Except otherwise noted, the reuse of this document is authorised under a Creative Commons Attribution 4.0 International (CC-BY 4.0) licence.
This means that reuse is allowed provided appropriate credit is given and any changes are indicated.
Since this entire analysis is based on data from the EU, and I have used visualisations from the official annual summary as inspiration for the programmatically created version of these visualisations, the creative commons license applies this work hosted on rodders.me, this means that reuse of this analysis and its visualisations is allowed provided appropriate credit is given and any changes are indicated.