Advertisement · 728 × 90

Posts by Ryan Dubnicek

HTRC BookNLP Dataset for English-Language Fiction - Documentation - HathiTrust Research Center

Guessing that running BookNLP is part of the fun, but if you want to start w/output files, your friends at HTRC have run ~200k English-lang fiction vols through the pipeline already, and released all non-expressive data: htrc.atlassian.net/wiki/spaces/.... Unsure if any BSC vols are included though!

1 year ago 3 1 0 0