Data Engineering and Computer Science

https://en.wikipedia.org/wiki/NoSQL
Not only SQL: A NoSQL database provides a mechanism for storage and retrieval of data which is modeled in means other than the tabular relations used in relational databases. NoSQL databases are increasingly used in big data and real-time web applications. Many NoSQL stores compromise consistency (in the sense of theCAP theorem) in favor of availability, partition tolerance, and speed.
Column: Accumulo, Cassandra, Druid, HBase, Vertica, SAP HANA
#TALK GOTO 2012 - Introduction to NoSQL - Martin Fowler
Graph:
- A graph database is a database that uses graph structures for semantic queries with nodes, edges and properties to represent and store data. A key concept of the system is the graph (or edge or relationship), which directly relates data items in the store. The relationships allow data in the store to be linked together directly, and in many cases retrieved with a single operation.
- Graph databases employ nodes, edges and properties.
  - Nodes represent entities/items you might want to keep track of (people, businesses, accounts).
  - Edges, also known as graphs or relationships, are the lines that connect nodes to other nodes; they represent the relationship between them.
  - Properties are pertinent information that relate to nodes (sort of keywords).
  - AllegroGraph, ArangoDB, InfiniteGraph, Apache Giraph, MarkLogic, Neo4J, OrientDB, Virtuoso, Stardog
  - https://neo4j.com/developer/graph-database/
Key-value
- https://en.wikipedia.org/wiki/Key-value_database
- A key-value store, or key-value database, is a data storage paradigm designed for storing, retrieving, and managing associative arrays, a data structure more commonly known today as a dictionary or hash.
- Dictionaries contain a collection of objects, or records, which in turn have many different fields within them, each containing data. These records are stored and retrieved using a key that uniquely identifies the record, and is used to quickly find the data within the database.
Document-oriented database

# Data munging

https://www.coursera.org/learn/data-cleaning

# Data preparation

# Exploratory data analysis

# Big data

# MapReduce

https://en.wikipedia.org/wiki/MapReduce
MapReduce is a programming model and an associated implementation for processing and generating large data sets with a parallel, distributed algorithm on a cluster
A MapReduce program is composed of aMap() procedure (method) that performs filtering and sorting (such as sorting students by first name into queues, one queue for each name) and aReduce() method that performs a summary operation (such as counting the number of students in each queue, yielding name frequencies)

CarlosGG's Knowledge Garden 🪴

Data Engineering and Computer Science

# Resources

# Python

# Julia

# Javascript

# Bash

# CUDA

# Books

# R

# Courses

# Code

# Business Intelligence

# Big data, distributed computing

# Databases

# Subtopics

# Open datasets (for ML, DL and DS)

# MLOps

# Feature engineering

# Feature extraction

# Data mining

# Web scraping

# API

# Databases

# SQL

# NoSQL

# Data munging

# Data preparation

# Exploratory data analysis

# Big data

# MapReduce

Backlinks

Interactive Graph