The DataVolt Project, Diffusion Models Course, Feature Selection in Machine Learning
This week's agenda:
VScode Extensions
During the past few weeks, I shared a weekend edition with VScode extension recommendations by theme every Saturday. From extensions for working with Git to extensions for code generation with AI. Here are the links for those editions:
Open Source of the Week
This week's focus is on a new open-source project, DataVolt by Allan Wandia . The DataVolt is a Python library that provides a set of tools for building and maintaining data engineering pipelines.
The library has the following core functionality:
Installation:
pip install datavolt
In addition, the library tracks the pipeline performance and provides a data visualization of key performance metrics. For example, the below screenshot provides a visual representation of the data load from a CSV file:
License: MIT
New Learning Resources
Here are some new learning resources that I came across this week.
Recommended by LinkedIn
Diffusion Models and their Applications
The Diffusion Models and their Applications course from Korea Advanced Institute of Science & Technology provides an introduction to diffusion models.
Forecasting with LSTM
The following tutorial by Code with Josh, provides an introduction to time series forecasting with LSTM models using Python and Tensorflow.
Fast API Course
This one-hour tutorial by NeuralNine provides an introduction to FastAPI, one of Python main framework to build an API system.
Book of the Week
This week focuses on a new ML book - Feature Selection in Machine Learning (2nd edition), by Soledad Galli . The book, as its name implies, focuses on feature selection methods, which, in my opinion, is one of the foundations of data science. The book provides an introduction to the topic and dives into different feature selection methods, such as:
The book currently has only a PDF version, and it is available for purchase on the following website:
Have any questions? Please comment below!
See you next Tuesday!
Thanks,
Rami
AI | Machine Learning | Computer Vision Research | NLP | Language Modelling | GANs @ AIO
2moRami Krispin 📌 The DataVolt project caught my attention - open source tools like this are reshaping how we build ML pipelines. Pro tip: For those diving into diffusion models, start with simple architectures and gradually scale up. I've found this approach helps build stronger intuition about the underlying mechanics.