diff --git a/README.md b/README.md index a2fad11..748a578 100644 --- a/README.md +++ b/README.md @@ -12,7 +12,7 @@ - Hiigh Precision / Recall - Worldwide Names - +``` ``` echo -e "$(python main.py 'Brian is in the kitchen while Amanda is watching the TV on the sofa.\nThey are both waiting for Alfred to come.')" @@ -37,82 +37,44 @@ Here is an example on a (old) text: [ALI BABA AND THE FORTY THIEVES](http://text -## License +## Dataset Generation -I don't own the data obviously. It's fetched from the websites listed in: +[generate.sh](name-dataset/blob/master/generation/generate.sh) -https://github.com/philipperemy/name-dataset/blob/master/generation/generate.sh - -So I guess the most strict software license should apply here. - -## Sources and References - -Exhaustive list of all the possible websites. Not all are used since there is a lot of garbage in the lists. - -- Generator: http://listofrandomnames.com/index.cfm?generated -- https://www.sajari.com/public-data: 5000 names (First Names CSV) -- http://www.20000-names.com/ names around the world -- https://catalogue.data.gov.bc.ca/dataset/most-popular-boys-names-for-the-past-100-years UK -- https://catalogue.data.gov.bc.ca/dataset/most-popular-girl-names-for-the-past-100-years UK -- https://www.nrscotland.gov.uk/statistics-and-data/statistics/statistics-by-theme/vital-events/names/babies-first-names/full-lists-of-babies-first-names-2010-to-2014 Scotland - -- https://gender-api.com/en/pricing - -- https://github.com/OpenGenderTracking/globalnamedata/tree/master/assets -- From https://bocoup.com/blog/global-name-data - -- https://github.com/MatthiasWinkelmann/firstname-database - -- http://www.namepedia.org/en/firstname/Nabil/ - -- https://datasets.imdbws.com/ -- https://www.imdb.com/interfaces/ - -- https://opendata.stackexchange.com/questions/46/multinational-list-of-popular-first-names-and-surnames -- ftp://ftp.heise.de/pub/ct/listings/0717-182.zip - -- https://data.world/howarder/gender-by-name - -- https://statbel.fgov.be/en/open-data/first-names-total-population-municipality - -- https://www.ons.gov.uk/peoplepopulationandcommunity/birthsdeathsandmarriages/livebirths/bulletins/babynamesenglandandwales/previousReleases - -- http://www.cs.cmu.edu/afs/cs/project/ai-repository/ai/areas/nlp/corpora/names/ - -- https://www.ssa.gov/oact/babynames/limits.html - -- https://www.ssa.gov/OACT/babynames/ - -- https://www.ssa.gov/cgi-bin/popularnames.cgi - -- https://github.com/hadley/data-baby-names/blob/master/baby-names.csv - -- http://www.quietaffiliate.com/free-first-name-and-last-name-databases-csv-and-sql/ - -- https://stackoverflow.com/questions/1452003/plain-computer-parseable-lists-of-common-first-names - -- http://mbejda.github.io/ - -- https://www2.census.gov/topics/genealogy/1990surnames/dist.all.last - -- https://opendata.stackexchange.com/questions/1108/database-of-names-of-japanese-and-non-japanese-people - -- https://opendata.stackexchange.com/questions/12234/name-and-gender-dataset - -- https://opendata.stackexchange.com/questions/7071/people-names-by-country - -- http://www.randomnames.com/all-boys-names.asp - -- https://en.wikipedia.org/wiki/List_of_most_popular_given_names#cite_note-ahram2004-2 - -- http://www.avss.ucsb.edu/NameFema.HTM - -- http://www.oxfordreference.com/view/10.1093/acref/9780198610601.001.0001/acref-9780198610601?btog=chap&hide=true&page=248&pageSize=10&skipEditions=true&sort=titlesort&source=%2F10.1093%2Facref%2F9780198610601.001.0001%2Facref-9780198610601 - -- https://github.com/dominictarr/random-name/blob/master/first-names.txt - -- https://github.com/smashew/NameDatabases/tree/master/NamesDatabases/first%20names - -- https://www.behindthename.com/names - -- https://incompetech.com/named/multi.pl +- [listofrandomnames.com](http://listofrandomnames.com/index.cfm?generated) +- [sajari.com 5000 Names around the Globe](https://www.sajari.com/public-data) +- [20000-names.com](http://www.20000-names.com) +- [UK Gov Boys Names 100 Years](https://catalogue.data.gov.bc.ca/dataset/most-popular-boys-names-for-the-past-100-years) +- [UK Gov Girls Names Years](https://catalogue.data.gov.bc.ca/dataset/most-popular-girl-names-for-the-past-100-years) +- [Scotland Baby Names](https://www.nrscotland.gov.uk/statistics-and-data/statistics/statistics-by-theme/vital-events/names/babies-first-names/full-lists-of-babies-first-names-2010-to-2014) +- [Open Gender Tracking](https://github.com/OpenGenderTracking/globalnamedata/tree/master/assets) +- [bocoup.com Global Names](https://bocoup.com/blog/global-name-data) +- [MatthiasWinkelmann's Repo](https://github.com/MatthiasWinkelmann/firstname-database) +- [Namepedia](http://www.namepedia.org/en/firstname/Nabil) +- [Imdb Datasets](https://datasets.imdbws.com) +- [Imdb Interfaces](https://www.imdb.com/interfaces) +- [Stackenchange OpenData](https://opendata.stackexchange.com/questions/46/multinational-list-of-popular-first-names-and-surnames) +- [hiese.de Listings](ftp://ftp.heise.de/pub/ct/listings/0717-182.zip) +- [Data World](https://data.world/howarder/gender-by-name) +- [Belgium Gov](https://statbel.fgov.be/en/open-data/first-names-total-population-municipality) +- [UK Gov Birth](https://www.ons.gov.uk/peoplepopulationandcommunity/birthsdeathsandmarriages/livebirths/bulletins/babynamesenglandandwales/previousReleases) +- [CMU AI Repo Corpora](http://www.cs.cmu.edu/afs/cs/project/ai-repository/ai/areas/nlp/corpora/names) +- [US Social Security Data Baby Names I](https://www.ssa.gov/oact/babynames/limits.html) +- [US Social Security Data Baby Names II](https://www.ssa.gov/OACT/babynames/) +- [US Social Security Data Popular Names](https://www.ssa.gov/cgi-bin/popularnames.cgi) +- [Hadley Repo Baby Names](https://github.com/hadley/data-baby-names/blob/master/baby-names.csv) +- [QuietAffiliate.com](http://www.quietaffiliate.com/free-first-name-and-last-name-databases-csv-and-sql) +- [Stackoverflow](https://stackoverflow.com/questions/1452003/plain-computer-parseable-lists-of-common-first-names) +- [Mbejda Repo](http://mbejda.github.io) +- [US Gov Cencus](https://www2.census.gov/topics/genealogy/1990surnames/dist.all.last) +- [Stackexchange Opendata Japanese](https://opendata.stackexchange.com/questions/1108/database-of-names-of-japanese-and-non-japanese-people) +- [Stackexchange Opendata Gender](https://opendata.stackexchange.com/questions/12234/name-and-gender-dataset) +- [Stackexchange Opendata Country](https://opendata.stackexchange.com/questions/7071/people-names-by-country) +- [Randomnames.com Boys](http://www.randomnames.com/all-boys-names.asp) +- [Wikipedia Popular Names](https://en.wikipedia.org/wiki/List_of_most_popular_given_names#cite_note-ahram2004-2) +- [USCS Female Names](http://www.avss.ucsb.edu/NameFema.HTM) +- [Oxford Reference](http://www.oxfordreference.com/view/10.1093/acref/9780198610601.001.0001/acref-9780198610601?btog=chap&hide=true&page=248&pageSize=10&skipEditions=true&sort=titlesort&source=%2F10.1093%2Facref%2F9780198610601.001.0001%2Facref-9780198610601) +- [dominctarr Repo](https://github.com/dominictarr/random-name/blob/master/first-names.txt) +- [smashew Repo](https://github.com/smashew/NameDatabases/tree/master/NamesDatabases/first%20names) +- [Behind The Name](https://www.behindthename.com/names) +- [Incompetech](https://incompetech.com/named/multi.pl)