How to run data science projects

In this article, I will outline my mental model for running a science project. Specifically, I’m referring to data or applied science projects, drawing from my experience of over 9 years at AWS and Amazon. You might argue that in agile environments like startups or smaller companies, the approach could differ, but aside from an additional layer of hierarchy, I don’t anticipate significant deviations.

Causal inference as a blind spot of data scientists

Throughout much of the 20th century, frequentist statistics dominated the field of statistics and scientific research. Frequentist statistics primarily focus on the analysis of data in terms of probabilities and observed frequencies. Causal inference, on the other hand, involves making inferences about cause-and-effect relationships, which often goes beyond the scope of traditional frequentist statistical methods.

Navigating the Future of AI: Strategies for Survival

Lately, reading the news and following updates about advancements in AI, specifically in Generative AI and chatGPT, gave me mixed feelings - on one hand, we are on something big and impactful, but at the same time it feels like a potential threat to the future. And I’m not alone - NLP students lost their field of research overnight, meanwhile some orgs at FAANG became obsolete. It is an old news that chatGPT can pass a software developer tests at FAANG, an exam to become a lawyer or generate inspirational phrases for your YouTube shorts. But I’m sceptical that we will experience a radical transformation in a short time of a few years, but rather, it will be an iterative change which can take a decade or more. But as a story goes, a slowly boiled frog was too comfortable to jump out of a pot, the fate we shall avoid.

Is serverless architecture cheap? Yes, but it depends on your use case

If you are interested in serverless architecture then you probably have read many contradictory articles and might wonder, whether serverless architectures are cost effective or expensive. I would like to clear the air around effectiveness of serverless architectures through an analysis of a web scraping solution. The use case is fairly simple – at certain times during the day, let’s say every hour from 6am to 11pm, I want to run a Python script and scrape a website. The execution of the script takes less than 15 minutes. This is an important consideration to which we will come back later. The project can be considered as an ETL process without a user interface and can be packed into a self-containing function or a library.

Applying Machine Learning to Peer to Peer lending

Peer to peer lending allows to lend money to unrelated individuals without going through traditional financial service such as bank, credit union, etc. Nevertheless, there is an intermediary - service and platform provider. The provider verifies the identity of the borrower and income status, processes the payments, promotes its platform, deals with bad loans or demands bankruptcy for the borrower.
The advantage of peer to peer lending for the borrowers is lower interest rate and higher rate for lenders. However, higher rate comes with higher risk - the return is more volatile than a bank deposit.