My Blog

Some texts about stuff that I like

May 8, 2025

DynamoDB

Hash Tables

NoSQL

AWS

Why is DynamoDB So Blazing Fast? A Deep Dive into Hash Tables and NoSQL Performance

DynamoDB

Hash Tables

NoSQL

AWS

Did already ask yourself why GET methods in DynamoDb are so fast?.

April 9, 2025

Rust

GenerativeAI

Concurrency

LLMs

Rust for Blazing-Fast LLM Applications

Rust

GenerativeAI

Concurrency

LLMs

Python's great for prototyping and has amazing libraries like Hugging Face's Transformers, but when performance matters, Rust shines.

April 3, 2025

Parallel Programming

I/O Bound

CPU Bound

Python ThreadPoolExecutor and Concurrency: The Complete Guide

Parallel Programming

I/O Bound

CPU Bound

ThreadPoolExecutor is part of Python's `concurrent.futures` module, which was introduced in Python 3.2. It provides a high-level interface for asynchronously executing callables.

March 10, 2025

AI Engineering

LLM Accuracy

Needle In a Haystack

Machine Learning

Generative AI

Benchmarks

Uncovering The Needle

AI Engineering

LLM Accuracy

Needle In a Haystack

Machine Learning

Generative AI

Benchmarks

In the LLM world, 'needle in a haystack' refers to the challenge of finding specific, accurate information within the vast amount of data these models have been trained on.

October 23, 2024

RAG Performance

Accuracy

Prompt Engineering

Benchmarks

How to improve your RAG performance?

RAG Performance

Accuracy

Prompt Engineering

Benchmarks

When building a RAG application, it's very common to face low accuracy at the beginning. To overcome this issue, techniques such as prompt engineering, filtered search, score extractor and more are mandatory to reach performances above 80%.

July 5, 2024

Context Caching

Gemini

Tokens

Prompt

Context Caching in LLMs

Context Caching

Gemini

Tokens

Prompt

Cache input tokens and consult them for subsequent requests.

June 17, 2024

Parallel Programming

Cuda

Multi-core

Computing Architecture

Processors

What is Parallel Programming

Parallel Programming

Cuda

Multi-core

Computing Architecture

Processors

Parallel programming is a type of computing architecture where multiple processes run simultaneously. This approach can significantly speed up computations by leveraging multi-core processors.