A Bachelor’s or Master’s degree in Computer Science, Machine Learning, Data Science, Software Engineering, or a related quantitative field.
4+ years of experience in designing, building, and deploying end-to-end machine learning pipelines in production environments.
Proficiency in programming languages such as Python, PySpark, and/or Scala for scalable systems.
Strong expertise in machine learning frameworks such as scikit-learn, XGBoost, andPyTorch, with hands-on experience in training, tuning, and deploying machine learning models.
Practical knowledge of data preprocessing and feature engineering, with experience in tools like Pandas, NumPy, and Dask for handling large datasets.
Proven experience deploying models in production environments, using tools like Docker, Kubernetes, and cloud services (AWS, Azure).
Expertise in MLOps practices, including CI/CD pipelines, model versioning, and monitoring, using tools like MLFlow, Kubeflow, or TensorFlow Extended (TFX).
Familiarity with database technologies, including SQL, NoSQL (e.g. MongoDB, Cassandra), and time-series databases (e.g. InfluxDB).
Knowledge of APIs and integration, including building and consuming RESTful APIs for model serving.
Strong understanding of cloud platforms (AWS, GCP, Azure) and orchestration tools (e.g. Airflow) for workflow automation.
Solid foundation in data structures and software engineering best practices, including version control with Git.
Nice-to-Have:
Experience with feature stores (e.g., Feast, Hopsworks) to manage and reuse machine learning features.
Hands-on experience with LLMOps tools and deploying large-scale models like LLMs (e.g. GPT, LLaMA) in production.
Familiarity with graph databases (e.g., Neo4j) or vector databases (e.g., Pinecone, FAISS) for advanced search and retrieval tasks.