Today I Learned

The aggregation frameworks is a pipeline of steps that runs on your data to convert the output on the format you need it. It provides a set of stages that you can chain together to create pipelines that process your data in a series of steps.

Here are some of the common aggregation stages:

$match: Filters documents based on specified criteria.
$project: Selects or excludes fields from documents.
$group: Groups documents by a specified field and calculates aggregate values.
$sort: Sorts documents by a specified field.
$limit: Limits the number of documents returned.
$skip: Skips a specified number of documents.
$unwind: Unwinds an array field, creating a new document for each element in the array.
$lookup: Joins two collections based on a specified field.
$redact: Redacts fields in documents based on specified criteria.
$bucket: Buckets documents into groups based on a specified field.
$sample: Samples a random subset of documents.
$geoNear: Finds documents near a specified point.

Example: Calculating Average Order Value

Scenario: Let's assume we have a collection named orders with documents representing individual orders. Each document has fields like order_id, customer_id, product_name, and price. We want to calculate the average order value for each customer.

db.orders.aggregate([
  {
    $group: {
      _id: "$customer_id",
      total_spent: { $sum: "$price" },
      total_orders: { $count: {} }
    }
  },
  {
    $project: {
      average_order_value: { $divide: ["$total_spent", "$total_orders"] }
    }
  }
])

Explanation:

$group:
- Groups the documents by customer_id.
- Calculates the total spent for each customer using $sum.
- Counts the total number of orders for each customer using $count.
$project:
- Calculates the average order value by dividing total_spent by total_orders.

Result: The aggregation pipeline will return a list of documents, each containing the customer_id and the calculated average_order_value.

Getting starter commands for mongosh

Mongo Db

When you open a new connection for `mongosh` from your terminal you will find really useful the following commands:

First of all, connect to a server:

mongosh "mongodb+srv://{MONGO_COLLECTION_USER}:{MONGO_COLLECTION_PASSWORD}@{MONGO_APP_NAME}.yng1j.mongodb.net/?appName={MONGO_APP_NAME}"

Then the basic commands to start interacting (the most of them are little obvious):

// show databases
show dbs
// use database
use <db_name>
// show collections
show collections
// finally interact with them, for example
db.users.findOne()

Stored Procedures

Databases

Stored Procedures are precompiled code blocks that reside within a database. They provide a way to encapsulate and reuse frequently executed SQL statements, improving performance, maintainability, and security.

Benefits of Stored Procedures:

Performance: Precompiled code executes faster than executing statements directly.
Modularity: Encapsulate complex logic, making code more organized and reusable.
Security: Centralize security rules and permissions.
Data validation: Enforce data integrity and consistency.

Example:

CREATE PROCEDURE GetCustomers
AS
BEGIN
    SELECT CustomerID, CustomerName, City
    FROM Customers;
END;

// How to use?

EXEC GetCustomers;

Example using params:

CREATE PROCEDURE GetCustomersByCity
    @City nvarchar(50)
AS
BEGIN
    SELECT CustomerID, CustomerName
    FROM Customers
    WHERE City = @City;
END;

// How to use?

EXEC GetCustomersByCity @City = 'London';

Time to Live (TTL)

Mongo Db

Time to Live (TTL) is a feature in MongoDB that allows you to automatically expire documents after a specified amount of time. This is useful for scenarios where you want to keep data for a limited duration, such as temporary data, session data, or cache entries.

How TTL works:

Index creation: To enable TTL for a collection, you create an index on a field that represents the expiration time. This field must be of type Date.
Expiration time setting: When creating the index, you specify the TTL value in seconds.
Document expiration: MongoDB periodically scans the collection and deletes documents whose expiration time has passed.

Example:

db.sessions.createIndex({ expiresAt: 1 }, { expireAfterSeconds: 3600 });

This creates an index on the expiresAt field and sets the TTL to 1 hour (3600 seconds). Any documents in the sessions collection with an expiresAt value that is older than 1 hour will be automatically deleted.

Use cases for TTL:

Session management: Store session data with a TTL to automatically expire inactive sessions.
Temporary data: Keep temporary data for a limited time, such as cached results or temporary files.
Data retention policies: Implement data retention policies by setting appropriate TTL values for different types of data.

Explain query execution

Mongo Db

The explain("executionStats") command in MongoDB provides detailed information about the execution plan and performance metrics of a query.
When used with the find() method, it returns a document containing statistics about how MongoDB executed the query.

Example:

db.products.explain("executionStats").find({ price: { $gt: 10 } });

MongoDB Semantic Search

Unlabeled

MongoDB Semantic Search refers to the ability to search for documents in a MongoDB collection based on the meaning or context of the data, rather than just exact keyword matches. Traditional database searches are often keyword-based, which means they only return documents that contain an exact match to the search query. Semantic search, on the other hand, aims to understand the intent behind the query and return results that are contextually similar, even if the exact keywords aren't present.

Event Loop quick explanation

Javascript

Single-Threaded Execution: JavaScript operates on a single thread, meaning it executes one task at a time using the call stack, where functions are processed sequentially.

Call Stack: Picture the call stack as a stack of plates. Each time a function is invoked, a new plate (function) is added to the stack. Once a function completes, the plate is removed.

Web APIs: Asynchronous tasks like setTimeout, DOM events, and HTTP requests are managed by the browser’s Web APIs, operating outside the call stack.

Callback Queue: After an asynchronous task finishes, its callback is placed in the callback queue, which waits for the call stack to clear before moving forward.

Event Loop: The event loop constantly monitors the call stack. When it's empty, the loop pushes the next callback from the queue onto the stack.

Microtasks Queue: Tasks like promises are placed in a microtasks queue, which has higher priority than the callback queue. The event loop checks the microtasks queue first to ensure critical tasks are handled immediately.

Priority Handling: To sum up, the event loop prioritizes microtasks before handling other callbacks, ensuring efficient execution.

Compare Language Models

Llm

Page to compare different LLM:

https://arena.lmsys.org/

pop method

Python

We can delete a property from an object in python with `.pop('property_name', None)` to avoid errors if the object does not exist.

Source: https://www.javatpoint.com/difference-between-del-and-pop-in-python#:~:text=In%20Python%2C%20%22del%22%20can,removes%20an%20object%20from%20memory

Aggregation $lookup

Mongo Db

This aggregation stage performs a left outer join to a collection in the same database.

There are four required fields:

from: The collection to use for lookup in the same database
localField: The field in the primary collection that can be used as a unique identifier in the from collection.
foreignField: The field in the from collection that can be used as a unique identifier in the primary collection.
as: The name of the new field that will contain the matching documents from the from collection.

Example:

db.comments.aggregate([
  {
    $lookup: {
      from: "movies",
      localField: "movie_id",
      foreignField: "_id",
      as: "movie_details",
    },
  },
  {
    $limit: 1
  }
])

dir()

Python

Within the breakpoint() debugger, type dir(object) to list all available attributes and methods of the object.
This will give you an overview of what you can access.

Deep copy vs Shallow copy

Javascript

Shallow copy
A shallow copy duplicates only the top-level properties. If those properties are references (like objects or arrays), the copy will reference the same objects.

let original = { a: 1, b: { c: 2 } }
let copy = { ...original }
copy.b.c = 3 // Changes "original.b.c"

When to use:
- Small objects with primitive data types.
- Situations where performance is critical.
- Cases where changes to nested objects should reflect in all copies.

Deep copy
A deep copy creates a complete clone of the original object, duplicating all nested objects and arrays.

let original = { a: 1, b: { c: 2 } }
let copy = JSON.parse(JSON.stringify(original));
copy.b.c = 3 // The "original.b.c" remains 2

When to use:
- Complex objects with nested structures.
- Scenarios where complete independence from the original object is needed.
- Preventing unintended side-effects from shared references.

Debug built-in method

NodeJs

To debug a Node.js application, one can use the debugging built-in method:

(1) Insert debugger; statement where you want to insert a break point
(2) Run the file with command $ node inspect <file name>
(3) Use a key for example, c to continue to next break point

You can even debug values associated to variables at that break point by typing repl. For more information, Please check the official guide.

AWS Simple Email Service

Aws

Amazon Simple Email Service (SES) is a cost-effective email service built on the reliable and scalable infrastructure that Amazon.com developed to serve its own customer base. With Amazon SES, you can send transactional email, marketing messages, or any other type of high-quality content to your customers

AWS Textract

Aws

Service to extract text information with precision using an AWS service:

Amazon Textract is a machine learning (ML) service that automatically extracts text, handwriting, 
design elements and data from scanned documents.
It goes beyond simple optical character recognition (OCR) to identify,
understand and extract specific data from documents.

Source:https://aws.amazon.com/es/textract/