Update README.md

master
Debdut Karmakar 2020-08-19 12:21:50 +05:30 committed by GitHub
parent 6f361fa122
commit 26bf6d4bb5
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 41 additions and 79 deletions

120
README.md
View File

@ -12,7 +12,7 @@
- Hiigh Precision / Recall
- Worldwide Names
<!-- ## Installation
## Installation
```
npm install name-dataset
```
@ -21,7 +21,7 @@ npm install name-dataset
```js
const Names = require('name-dataset')
``` -->
```
```
echo -e "$(python main.py 'Brian is in the kitchen while Amanda is watching the TV on the sofa.\nThey are both waiting for Alfred to come.')"
@ -37,82 +37,44 @@ Here is an example on a (old) text: [ALI BABA AND THE FORTY THIEVES](http://text
<img src='assets/img_2.png'/>
## License
## Dataset Generation
I don't own the data obviously. It's fetched from the websites listed in:
[generate.sh](name-dataset/blob/master/generation/generate.sh)
https://github.com/philipperemy/name-dataset/blob/master/generation/generate.sh
So I guess the most strict software license should apply here.
## Sources and References
Exhaustive list of all the possible websites. Not all are used since there is a lot of garbage in the lists.
- Generator: http://listofrandomnames.com/index.cfm?generated
- https://www.sajari.com/public-data: 5000 names (First Names CSV)
- http://www.20000-names.com/ names around the world
- https://catalogue.data.gov.bc.ca/dataset/most-popular-boys-names-for-the-past-100-years UK
- https://catalogue.data.gov.bc.ca/dataset/most-popular-girl-names-for-the-past-100-years UK
- https://www.nrscotland.gov.uk/statistics-and-data/statistics/statistics-by-theme/vital-events/names/babies-first-names/full-lists-of-babies-first-names-2010-to-2014 Scotland
- https://gender-api.com/en/pricing
- https://github.com/OpenGenderTracking/globalnamedata/tree/master/assets
- From https://bocoup.com/blog/global-name-data
- https://github.com/MatthiasWinkelmann/firstname-database
- http://www.namepedia.org/en/firstname/Nabil/
- https://datasets.imdbws.com/
- https://www.imdb.com/interfaces/
- https://opendata.stackexchange.com/questions/46/multinational-list-of-popular-first-names-and-surnames
- ftp://ftp.heise.de/pub/ct/listings/0717-182.zip
- https://data.world/howarder/gender-by-name
- https://statbel.fgov.be/en/open-data/first-names-total-population-municipality
- https://www.ons.gov.uk/peoplepopulationandcommunity/birthsdeathsandmarriages/livebirths/bulletins/babynamesenglandandwales/previousReleases
- http://www.cs.cmu.edu/afs/cs/project/ai-repository/ai/areas/nlp/corpora/names/
- https://www.ssa.gov/oact/babynames/limits.html
- https://www.ssa.gov/OACT/babynames/
- https://www.ssa.gov/cgi-bin/popularnames.cgi
- https://github.com/hadley/data-baby-names/blob/master/baby-names.csv
- http://www.quietaffiliate.com/free-first-name-and-last-name-databases-csv-and-sql/
- https://stackoverflow.com/questions/1452003/plain-computer-parseable-lists-of-common-first-names
- http://mbejda.github.io/
- https://www2.census.gov/topics/genealogy/1990surnames/dist.all.last
- https://opendata.stackexchange.com/questions/1108/database-of-names-of-japanese-and-non-japanese-people
- https://opendata.stackexchange.com/questions/12234/name-and-gender-dataset
- https://opendata.stackexchange.com/questions/7071/people-names-by-country
- http://www.randomnames.com/all-boys-names.asp
- https://en.wikipedia.org/wiki/List_of_most_popular_given_names#cite_note-ahram2004-2
- http://www.avss.ucsb.edu/NameFema.HTM
- http://www.oxfordreference.com/view/10.1093/acref/9780198610601.001.0001/acref-9780198610601?btog=chap&hide=true&page=248&pageSize=10&skipEditions=true&sort=titlesort&source=%2F10.1093%2Facref%2F9780198610601.001.0001%2Facref-9780198610601
- https://github.com/dominictarr/random-name/blob/master/first-names.txt
- https://github.com/smashew/NameDatabases/tree/master/NamesDatabases/first%20names
- https://www.behindthename.com/names
- https://incompetech.com/named/multi.pl
- [listofrandomnames.com](http://listofrandomnames.com/index.cfm?generated)
- [sajari.com 5000 Names around the Globe](https://www.sajari.com/public-data)
- [20000-names.com](http://www.20000-names.com)
- [UK Gov Boys Names 100 Years](https://catalogue.data.gov.bc.ca/dataset/most-popular-boys-names-for-the-past-100-years)
- [UK Gov Girls Names Years](https://catalogue.data.gov.bc.ca/dataset/most-popular-girl-names-for-the-past-100-years)
- [Scotland Baby Names](https://www.nrscotland.gov.uk/statistics-and-data/statistics/statistics-by-theme/vital-events/names/babies-first-names/full-lists-of-babies-first-names-2010-to-2014)
- [Open Gender Tracking](https://github.com/OpenGenderTracking/globalnamedata/tree/master/assets)
- [bocoup.com Global Names](https://bocoup.com/blog/global-name-data)
- [MatthiasWinkelmann's Repo](https://github.com/MatthiasWinkelmann/firstname-database)
- [Namepedia](http://www.namepedia.org/en/firstname/Nabil)
- [Imdb Datasets](https://datasets.imdbws.com)
- [Imdb Interfaces](https://www.imdb.com/interfaces)
- [Stackenchange OpenData](https://opendata.stackexchange.com/questions/46/multinational-list-of-popular-first-names-and-surnames)
- [hiese.de Listings](ftp://ftp.heise.de/pub/ct/listings/0717-182.zip)
- [Data World](https://data.world/howarder/gender-by-name)
- [Belgium Gov](https://statbel.fgov.be/en/open-data/first-names-total-population-municipality)
- [UK Gov Birth](https://www.ons.gov.uk/peoplepopulationandcommunity/birthsdeathsandmarriages/livebirths/bulletins/babynamesenglandandwales/previousReleases)
- [CMU AI Repo Corpora](http://www.cs.cmu.edu/afs/cs/project/ai-repository/ai/areas/nlp/corpora/names)
- [US Social Security Data Baby Names I](https://www.ssa.gov/oact/babynames/limits.html)
- [US Social Security Data Baby Names II](https://www.ssa.gov/OACT/babynames/)
- [US Social Security Data Popular Names](https://www.ssa.gov/cgi-bin/popularnames.cgi)
- [Hadley Repo Baby Names](https://github.com/hadley/data-baby-names/blob/master/baby-names.csv)
- [QuietAffiliate.com](http://www.quietaffiliate.com/free-first-name-and-last-name-databases-csv-and-sql)
- [Stackoverflow](https://stackoverflow.com/questions/1452003/plain-computer-parseable-lists-of-common-first-names)
- [Mbejda Repo](http://mbejda.github.io)
- [US Gov Cencus](https://www2.census.gov/topics/genealogy/1990surnames/dist.all.last)
- [Stackexchange Opendata Japanese](https://opendata.stackexchange.com/questions/1108/database-of-names-of-japanese-and-non-japanese-people)
- [Stackexchange Opendata Gender](https://opendata.stackexchange.com/questions/12234/name-and-gender-dataset)
- [Stackexchange Opendata Country](https://opendata.stackexchange.com/questions/7071/people-names-by-country)
- [Randomnames.com Boys](http://www.randomnames.com/all-boys-names.asp)
- [Wikipedia Popular Names](https://en.wikipedia.org/wiki/List_of_most_popular_given_names#cite_note-ahram2004-2)
- [USCS Female Names](http://www.avss.ucsb.edu/NameFema.HTM)
- [Oxford Reference](http://www.oxfordreference.com/view/10.1093/acref/9780198610601.001.0001/acref-9780198610601?btog=chap&hide=true&page=248&pageSize=10&skipEditions=true&sort=titlesort&source=%2F10.1093%2Facref%2F9780198610601.001.0001%2Facref-9780198610601)
- [dominctarr Repo](https://github.com/dominictarr/random-name/blob/master/first-names.txt)
- [smashew Repo](https://github.com/smashew/NameDatabases/tree/master/NamesDatabases/first%20names)
- [Behind The Name](https://www.behindthename.com/names)
- [Incompetech](https://incompetech.com/named/multi.pl)