names.io/README.md

81 lines
4.7 KiB
Markdown
Raw Normal View History

2020-08-19 08:26:16 +02:00
# name-dataset
> _A First and Last Name Dataset_
<img src='assets/logo.png'/>
## Features
- ~160k first names
- ~100k last names
- Find Names in Texts
2020-10-08 01:56:08 +02:00
- High Precision / Recall
2020-08-19 08:26:16 +02:00
- Worldwide Names
2020-08-19 08:51:50 +02:00
## Installation
2020-08-19 08:26:16 +02:00
```
npm install name-dataset
```
## Usage
```js
const Names = require('name-dataset')
2020-08-19 08:51:50 +02:00
```
2020-08-19 08:26:16 +02:00
```
echo -e "$(python main.py 'Brian is in the kitchen while Amanda is watching the TV on the sofa.\nThey are both waiting for Alfred to come.')"
```
<img src='assets/img_1.png'/>
## How reliable is it?
Well, it depends if you are looking for a high recall or a high precision. For example, the word Rose can be either a name or a noun. If we include it in the list, then we increase the precision but we decrease the recall. And vice versa, if it's not in the list. The library checks that the word starts with a capital letter. In our case, we emphasize more on precision. So I would say the best use case here is to check whether it's a name or not based on a prior knowledge that the customer has submitted a name.
Here is an example on a (old) text: [ALI BABA AND THE FORTY THIEVES](http://textfiles.com/stories/ab40thv.txt).
2020-08-19 08:28:28 +02:00
<img src='assets/img_2.png'/>
2020-08-19 08:26:16 +02:00
2020-08-19 08:51:50 +02:00
## Dataset Generation
[generate.sh](name-dataset/blob/master/generation/generate.sh)
- [listofrandomnames.com](http://listofrandomnames.com/index.cfm?generated)
- [sajari.com 5000 Names around the Globe](https://www.sajari.com/public-data)
- [20000-names.com](http://www.20000-names.com)
- [UK Gov Boys Names 100 Years](https://catalogue.data.gov.bc.ca/dataset/most-popular-boys-names-for-the-past-100-years)
- [UK Gov Girls Names Years](https://catalogue.data.gov.bc.ca/dataset/most-popular-girl-names-for-the-past-100-years)
- [Scotland Baby Names](https://www.nrscotland.gov.uk/statistics-and-data/statistics/statistics-by-theme/vital-events/names/babies-first-names/full-lists-of-babies-first-names-2010-to-2014)
- [Open Gender Tracking](https://github.com/OpenGenderTracking/globalnamedata/tree/master/assets)
- [bocoup.com Global Names](https://bocoup.com/blog/global-name-data)
- [MatthiasWinkelmann's Repo](https://github.com/MatthiasWinkelmann/firstname-database)
- [Namepedia](http://www.namepedia.org/en/firstname/Nabil)
- [Imdb Datasets](https://datasets.imdbws.com)
- [Imdb Interfaces](https://www.imdb.com/interfaces)
- [Stackenchange OpenData](https://opendata.stackexchange.com/questions/46/multinational-list-of-popular-first-names-and-surnames)
- [hiese.de Listings](ftp://ftp.heise.de/pub/ct/listings/0717-182.zip)
- [Data World](https://data.world/howarder/gender-by-name)
- [Belgium Gov](https://statbel.fgov.be/en/open-data/first-names-total-population-municipality)
- [UK Gov Birth](https://www.ons.gov.uk/peoplepopulationandcommunity/birthsdeathsandmarriages/livebirths/bulletins/babynamesenglandandwales/previousReleases)
- [CMU AI Repo Corpora](http://www.cs.cmu.edu/afs/cs/project/ai-repository/ai/areas/nlp/corpora/names)
- [US Social Security Data Baby Names I](https://www.ssa.gov/oact/babynames/limits.html)
- [US Social Security Data Baby Names II](https://www.ssa.gov/OACT/babynames/)
- [US Social Security Data Popular Names](https://www.ssa.gov/cgi-bin/popularnames.cgi)
- [Hadley Repo Baby Names](https://github.com/hadley/data-baby-names/blob/master/baby-names.csv)
- [QuietAffiliate.com](http://www.quietaffiliate.com/free-first-name-and-last-name-databases-csv-and-sql)
- [Stackoverflow](https://stackoverflow.com/questions/1452003/plain-computer-parseable-lists-of-common-first-names)
- [Mbejda Repo](http://mbejda.github.io)
- [US Gov Cencus](https://www2.census.gov/topics/genealogy/1990surnames/dist.all.last)
- [Stackexchange Opendata Japanese](https://opendata.stackexchange.com/questions/1108/database-of-names-of-japanese-and-non-japanese-people)
- [Stackexchange Opendata Gender](https://opendata.stackexchange.com/questions/12234/name-and-gender-dataset)
- [Stackexchange Opendata Country](https://opendata.stackexchange.com/questions/7071/people-names-by-country)
- [Randomnames.com Boys](http://www.randomnames.com/all-boys-names.asp)
- [Wikipedia Popular Names](https://en.wikipedia.org/wiki/List_of_most_popular_given_names#cite_note-ahram2004-2)
- [USCS Female Names](http://www.avss.ucsb.edu/NameFema.HTM)
- [Oxford Reference](http://www.oxfordreference.com/view/10.1093/acref/9780198610601.001.0001/acref-9780198610601?btog=chap&hide=true&page=248&pageSize=10&skipEditions=true&sort=titlesort&source=%2F10.1093%2Facref%2F9780198610601.001.0001%2Facref-9780198610601)
- [dominctarr Repo](https://github.com/dominictarr/random-name/blob/master/first-names.txt)
- [smashew Repo](https://github.com/smashew/NameDatabases/tree/master/NamesDatabases/first%20names)
- [Behind The Name](https://www.behindthename.com/names)
- [Incompetech](https://incompetech.com/named/multi.pl)