Data Scientist
Overview
At Ascent we are building an intelligent compliance platform that enables compliance professionals to easily track and understand their compliance obligations and related regulation. To support that platform, we are also building a full data science platform and team to help improve our efficiency at processing regulations and unlock new features using creative, data-driven approaches.
We are looking for experienced, passionate data scientists to help us build and maintain models that help solve a wide variety of problems. As a Data Scientist at Ascent, you will be working closely with business users, our regulation content team, and data engineers to: a) use state-of-the-art NLP to automate significant sections of the workflow involved in onboarding and managing regulations; b) uncover novel features and information from text to enable new product features (e.g. finding related groups of regulation across regulators); c) deploy, monitor, and maintain models to actually solve problems in production; and d) experiment with creative solutions to problems that use new research and tools and disseminate new knowledge to the team.
We primarily use the Python data ecosystem, including both scikit-learn and Keras+Tensorflow. We use all kinds of models, including deep learning and non-deep learning; we prefer to use the simplest tool that accomplishes the goal. We have a strong bias towards containerization, internal transparency, and simplicity to facilitate maintainable systems. We also place a high premium on our culture and values, both within the tech team and the company as a whole. We believe a diversity of opinions and perspectives creates a stronger team and product, and we are committed to an equal opportunity hiring process.
Responsibilities
- Build machine learning models that accomplish specific tasks, with a heavy focus on NLP - splitting, summarization, classification, entity recognition, similarity scoring, recommendation, etc
- Cross validate and test models prior to deployment, and monitor model accuracy in production
- Work closely with data engineers to design and build infrastructure that facilitates more efficient model training and deployment
- Use creativity and independent thinking to come up with novel data science solutions to complicated problems
- Employ production-quality coding standards / best-practices even during model training and prototyping
- Help educate others in the company about machine learning and data science so that they can think productively about possible data-driven solutions to their business problems
- Quickly learn the latest machine learning research and tools
- Prioritize minimum viable models / solutions over complicated models / solutions
Minimum Skills and Experience
- 2+ years solving business problems using data science
- 1+ year working with text data
- 1+ year monitoring and maintaining models in production (possibly in conjunction with engineers)
- Experience with large data sets and modern tools for handling them such as Apache Spark
- Experience with models that handle unstructured data well (particularly deep learning)
- Experience with both supervised and unsupervised learning approaches
- A demonstrated understanding of the mathematical underpinnings of common models
- Proven ability to develop creative modeling solutions given a set of business requirements
- Demonstrated ability to work productively on small teams and maintain workstreams independently if needed
- Proficient in SQL, *nix CLI tools (grep/sed/awk/BASH, etc), and Python
- Some understanding of the JVM ecosystem
- Experience deploying and maintaining code using git-based tools and operating in a continuous deployment/integration environment
- Experience writing thorough tests and documentation for maintainable code-bases
Preferred Skills and Experience
- 3+ years building NLP models that were used to transform text in production pipelines
- 1+ years of experience using machine learning to drive process automation
- Especially strong statistical and mathematical background
- Experience with models in the context of “fast data” tools like DC/OS or other similar architectures and tools
- Experience working with and storing large amounts of text data and text transformations