Senior Data Scientist - NLP Specialist
Overview
At Ascent we are building an intelligent compliance platform that enables compliance professionals to easily track and understand their compliance obligations and related regulation. To support that platform, we are also building a full data science platform and team to help improve our efficiency at processing regulations and unlock new features using fundamentally creative approaches.
We are looking for experienced, passionate data scientists to help us build and maintain models that help solve a wide variety of problems. As a Senior Data Scientist and NLP Specialist at Ascent, you will be focused on creatively applying truly cutting edge research, including a variety of deep learning approaches, to our hardest problems, as well as mentoring less experienced data scientists. You may be thinking about novel ways of applying different language models to the Regulatory domain, or better ways of representing additional context and knowledge alongside text in sequential models. You’ll be expected to understand NLP at a fundamental level and use that understanding to create original and effective solutions. You’ll also help us think about the ethics of the models we build and how we can lead by example in the way we do our work.
Your day-to-day work will include working closely with business users, our regulation content team, and the rest of our data scientists and engineers to: a) use state-of-the-art NLP to automate significant sections of the workflow involved in onboarding and managing regulations; b) uncover novel features and information from text to enable new product features (e.g. finding related groups of regulation across regulators); c) deploy, monitor, and maintain models to actually solve problems in production; and d) experiment with creative solutions to problems that use new research and tools and disseminate new knowledge to the team.
We primarily use the Python data ecosystem, including both scikit-learn and Keras+Tensorflow, but we are open to all tools. We use all kinds of models, including deep learning and non-deep learning; we prefer to use the simplest tool that accomplishes the goal. We have a strong bias towards containerization, internal transparency, and simplicity to facilitate maintainable systems. We also place a high premium on our culture and values, both within the tech team and the company as a whole. We believe a diversity of opinions and perspectives creates a stronger team and product, and we are committed to an equal opportunity hiring process.
Responsibilities
- Work with non-technical colleagues to design and build machine learning models that accomplish specific tasks, with a heavy focus on NLP - splitting, summarization, classification, entity recognition, similarity scoring, recommendation, etc
- Cross validate and test models prior to deployment, and monitor model accuracy in production
- Employ production-quality coding standards / best-practices even during model training and prototyping
- Help educate others in the company about machine learning and data science so that they can think productively about possible solutions to their business problems
- Stay current on machine learning research and tools
- Prioritize minimum viable models / solutions over complicated models / solutions
- Think about NLP fundamentals and use your intuition to apply cutting-edge NLP research to our most difficult problems
- Mentor less experienced data scientists
Minimum Skills and Experience
- 3+ years solving business problems using data science
- 2+ years building NLP models that were used to transform text in production pipelines
- 1+ year using sequential deep learning models or similar on text problems
- Experience with large data sets and modern tools for handling them such as Apache Spark
- Experience with both supervised and unsupervised learning approaches
- A thorough understanding of the mathematical underpinnings of common models
- Experience developing creative modeling solutions given a set of business requirements and delivering that value to the business
- Ability to work productively on small teams and manage workstreams independently if needed
- Proficient in SQL, *nix CLI tools (grep/sed/awk/BASH, etc), and Python
- Experience deploying and maintaining code using git-based tools and operating in a continuous deployment/integration environment
- Experience writing thorough tests and documentation for maintainable code-bases
Preferred Skills and Experience
- 3+ years building NLP models that were used to transform text in production pipelines
- 2+ years of experience using NLP to drive process automation
- Uncommonly strong statistical and mathematical background
- Educational background in Computational Linguistics or similar
- Experience with models in the context of “fast data” tools like DC/OS or other similar architectures and tools