Data Scientist - NLP - Software Cornwall

  • Full Time
  • Anywhere

Creating the next bestseller

Work Location: Cornwall/ Remote (UK Based) – We do not provide Visa sponsorships

Schedule: Full – Time

Type: 3 months Contract (possibility of turning full-time after)


Reread is looking to disrupt the publishing space. We use predictive technology to create the next bestseller. As Data Scientist & NLP expert, you will be joining as the first non-founder member, requiring someone who is passionate about building something innovative and loves a fast-paced start-up environment.


What you will do:

1.       Collecting and building a data corpus consisting of thousands of books

2.       Identify and test out thousands of potential data points/features from book data and present the final features of a bestseller

3.       Collect sales and metadata for every book in data corpus and cross-reference to identify which features in those data points contribute to bestselling prediction

4.       Identify validity and possible contribution of book reviews into detecting bestselling potential

5.       Create an algorithm engine that can assess books and predict probability of bestselling potential

6.       Define ranges of bestselling book features within which a bestseller lies

7.       Create a recommender system that can process books and provide insights and recommendations based on collected data to become bestsellers



•    3+ years of professional experience as a data scientist or related roles
•    Experience in setting up supervised & unsupervised learning Client/NLP models including data cleaning, data analytics, feature creation, model selection & ensemble methods, performance metrics & visualization
•    Strong experience in prediction using Machine Learning and Deep Learning
•    Hand on experience with machine learning techniques such as deep neural nets (DNN, CNN, LSTM-RNN)
•    Good understanding of the complexity of developing and productizing real-world AI/ML applications such as prediction, recommendation, computer vision, bots, NLP, sentiment, knowledge and content intelligence, etc.
•    Knowledge of Text Analytics with a strong understanding of Client & NLP algorithms and models (GLMs, SVM, PCA, NB, Clustering, DTs) and their underlying computational and probabilistic statistics
•    Designing and documenting data architecture at multiple levels (high-level to detailed) and across multiple views (conceptual, logical, physical, data flow and sequence diagrams)
•    At least 3 years’ experience building Machine Learning & NLP solutions over open source platforms such as SciKit-Learn,Tensorflow, SparkML, Torch, Caffe, H2O 
•    Excellent knowledge and demonstrable experience in using open source NLP packages such as NLTK, Word2Vec, SpaCy, Gensim, Standford CoreNLP.

To apply for this job email your details to