About Me

I'm Soumik (pronounced Shou-mik), a Senior Data Scientist currently working on Generative AI and MLOps. I possess ~5 years of experience crafting end-to-end machine learning and deep learning systems. Additionally, I have decent skills in software engineering (back-end/server-side) and cloud computing.

Expertise

  • Data Science, AI/ML
  • Systems Architecture
  • Software Engineering (Back-End)
  • Cloud Infrastructures (GCP, AWS)
  • Statistical Modeling
  • Natural Language Processing
  • Computer Vision
  • MLOps
  • ETL Pipelines
  • Data Scraping

Work Experience

Optimizely
HQ: New York, USA | Full-time | Remote

Senior Data Scientist

Jan 2024 - Present

  • Design and develop cutting-edge Generative AI models.
  • Fine-tune and adapt pre-trained LLMs for specific use cases.
  • Evaluate and measure the performance of Generative AI models.
  • Collaborate with data engineers and product teams to integrate Generative AI models into production.
  • Automate model training, testing, and deployment processes using MLOps pipelines.

Venturas Ltd.
HQ: Tokyo, Japan | Full-time | Remote

Sr. AI Engineer (Team Lead)

Aug 2023 - Dec 2023

  • Creating the overall structure and design of AI/ML systems, ensuring alignment with business goals, scalability, and performance requirements.
  • Evaluating and choosing appropriate technologies and frameworks that best fit the system's requirements.
  • Providing technical leadership and guidance to development teams, facilitating knowledge sharing and best practices in AI/ML development.
  • Ensuring seamless integration of AI/ML components with existing systems and infrastructure, including cloud platforms like GCP/AWS.
  • Conducting regular code reviews to ensure that the code meets the quality standards, follows the architectural guidelines, and adheres to best practices.

Sr. Artificial Intelligence Engineer

Jan 2023 - Jul 2023

  • Designing, developing, and implementing comprehensive machine learning systems, including data preprocessing, model training, evaluation, and deployment in a production environment.
  • Constructing MLOps pipelines to streamline the development, experimentation, continuous integration (CI), continuous delivery (CD), validation, and monitoring of AI/ML models.
  • Analyzing large datasets to identify trends, anomalies, and meaningful patterns that can inform decision-making and model development.
  • Assisting junior team members by providing guidance, conducting code reviews, and sharing expertise to foster a collaborative and continuous learning environment.

Artificial Intelligence Engineer

Jun 2021 - Dec 2022

  • Design, develop, validate, deploy and maintain various statistical, machine learning, and deep learning models for NLP and Computer Vision systems.
  • Build and deploy MLOps infrastructure on Google Cloud Platform for continuous data validation, model deployment, periodic model retraining, versioning, and performance monitoring.
  • Create ETL/data pipelines to collect, clean, transform, and aggregate data into data warehouses.
  • Optimize data pipelines to minimize data extraction, training time, and model prediction serving latency.
  • Design and implement cloud infrastructure on Google Cloud Platform to automate processes.

Chowa Giken Corporation
HQ: Hokkaido, Japan | Full-time | Remote

Machine Learning Engineer

Sep 2019 - Jun 2021

  • Preprocess data, including cleaning, normalization, and transformation for various machine learning tasks.
  • Design, develop, and train custom machine learning models for both NLP and computer vision applications.
  • Fine-tune and optimize state-of-the-art models for optimal results in specific applications and tasks.
  • Analyze and interpret ML model outcomes to derive insights and assess performance.
  • Deploy machine learning models into production using tools and platforms like Flask/Django and GCP.
  • Regularly experiment with new techniques, update models based on performance, and drive continuous improvement and innovation in machine learning systems.

Technical Skills

Programming Languages

  Python ++     JavaScript   SQL

Machine Learning & Deep Learning

TensorFlow Keras PyTorch Apache Spark (PySpark) Transformers Scikit-learn

MLOps & DevOps

Apache Airflow MLFlow TFX Docker GitHub Actions

GCP - Google Cloud Platform

Compute Engine Cloud Storage Vertex AI BigQuery Cloud Functions Cloud Run App Engine Pub/Sub VPC networks Cloud Composer

AWS - Amazon Web Services

SageMaker EC2 Storage Service - S3

Back-End Engineering

Django FastAPI Flask MySQL Firebase SQLAlchemy Elastic Stack Celery Redis

Miscellaneous

Git UNIX/Linux Shell Scripting LaTeX Selenium Scrapy BeautifulSoup

Education

North South University
Bachelor of Science in Computer Science & Engineering
Dhaka, Bangladesh
January 2015 - April 2019

My Blog

Contact