Data Platform Software Lead
Cognichip Inc. — Toronto, CA
- Published on
About the Role
We are seeking a skilled and pragmatic Data Platform Engineer to architect and scale intelligent data systems that support our AI and ML pipelines—focused specifically on code-based text datasets. You will play a central role in building the infrastructure that powers data ingestion, transformation, and delivery for our models. This includes developing systems for web-scale data discovery and crawling, designing robust data pipelines, and enabling our scientists to experiment and iterate with confidence.
Core Responsibilities
- Design and implement scalable data infrastructure to ingest, transform, and manage large-scale code datasets, ensuring high reliability and modularity.
- Build systems and tools for automated web crawling, parsing, deduplication, and metadata extraction from open-source and public code repositories.
- Develop robust data pipelines for ingesting and processing structured text datasets using distributed compute frameworks.
- Monitor quality, throughput, and performance.
- Collaborate across research, infrastructure, and compliance teams to meet technical, operational, and regulatory requirements.
Required Skills
- 5 years of software engineering experience in data-intensive environments.
- Proven experience building and maintaining scalable data systems and infrastructure.
- Experience with web crawling, scraping frameworks, and large-scale data ingest.
- Comfortable with AWS or other cloud environments, including storage, containerized compute, and security.
- Working experience with a data-centric tech stack including Python, Go, or Scala; Spark or Ray; Airflow or Prefect; Kafka; Redis; PostgreSQL or ClickHouse; and GitHub APIs.
Preferred Qualifications
- Experience curating and preparing code-based datasets for language models or code intelligence applications.
- Familiarity with code parsing, tokenization, embedding, and static analysis.
- Prior experience in a startup or fast-paced, high-ownership engineering environment.
- Strong written and verbal communication skills.
What We Offer
- Opportunity to shape the technical direction of a disruptive AI startup.
- Work with cutting-edge technologies in AI/ML and cloud computing.
- Competitive compensation package including equity.
- High-caliber, talented collaborators from diverse disciplines.
- Collaborative and innovative startup culture.