What is NoSQL#
NoSQL is a non-relational database (Not Only SQL), in which data is stored in the form of key-value pairs, documents, graphs, or column families. Compared to relational databases, NoSQL databases have better scalability and higher performance.
The emergence of NoSQL databases is to solve the problem of large-scale data storage and access that traditional relational databases cannot solve. The design of traditional relational databases is based on the ACID model (Atomicity, Consistency, Isolation, Durability), and these characteristics make it difficult to deal with the storage and access of large-scale data. Unlike traditional relational databases, NoSQL databases usually have better scalability and higher performance, and can easily handle high concurrency and massive data scenarios.
Common types of NoSQL#
NoSQL databases are usually divided into four types: key-value databases, document databases, column family databases, and graph databases. Let's introduce these types one by one.
Key-value database#
A key-value database is a simple NoSQL database in which data is stored in the form of key-value pairs, similar to a hash table. Key-value databases usually have fast read and write speeds and can be easily horizontally scaled. Redis and Memcached are two well-known key-value databases.
Document database#
A document database is a NoSQL database in which data is stored in the form of JSON or XML documents. Unlike key-value databases, document databases allow nesting and hierarchical structure of documents, making it better for representing complex data structures. MongoDB and Couchbase are two well-known document databases. The advantages of document databases are that they can store and query semi-structured data, such as logs, blogs, comments, etc. Document databases also support dynamic queries, allowing the use of expressions, functions, etc. in queries.
Column family database#
A column family database is a NoSQL database in which data is stored in the form of column families, similar to tables. Each column family contains multiple columns, and each column can contain multiple versions. Column family databases are usually used to store large amounts of structured data and can be easily horizontally scaled. HBase and Cassandra are two well-known column family databases.
Graph database#
A graph database is a NoSQL database in which data is stored in the form of graphs, similar to a network topology. Graph databases are usually used to store complex relationship data and have fast query speeds. Neo4j and OrientDB are two well-known graph databases. The advantages of graph databases are that they can store and query complex relationship data, such as social networks, knowledge graphs, etc.
Redis#
Redis is an open-source, key-value-based in-memory database that uses a single-threaded architecture and supports operations on various data types such as strings, lists, hashes, sets, etc. Redis can store data in memory and write data to disk. The main features of Redis include:
- Fast: Redis stores data in memory, so it can quickly read and write data.
- High availability: Redis supports master-slave replication and Sentinel, which can automatically transfer client requests to standby nodes when the master node fails.
- Multiple data types: Redis supports operations on multiple data types such as strings, hashes, lists, sets, sorted sets, etc.
- Rich features: Redis supports transactions, Lua scripts, automatic expiration, and other rich features.
Installation and configuration of Redis#
On Ubuntu system, you can use the following commands to install Redis.
sudo apt-get update
sudo apt-get install redis-server
After installation, you can use the following command to start the Redis service.
redis-server
By default, Redis listens on the address 127.0.0.1 and port 6379. You can modify the listening address and port number of Redis by modifying the configuration file /etc/redis/redis.conf.
Basic operations of Redis#
You can use the Redis command-line client tool redis-cli to connect to the Redis server and perform various operations. Here are some common examples of Redis operations:
# Connect to the Redis server
redis-cli
# Set key-value pair
set name "John"
# Get value by key
get name
# Set expiration time
setex name 10 "John"
# Add elements to a list
lpush colors "red"
lpush colors "yellow"
lpush colors "blue"
# Get list
lrange colors 0 -1
# Add elements to a set
sadd fruits "apple"
sadd fruits "banana"
sadd fruits "orange"
# Get set
smembers fruits
# Add elements to a sorted set
zadd scores 60 "John"
zadd scores 70 "Mike"
zadd scores 80 "David"
# Get sorted set
zrange scores 0 -1 withscores
Operating Redis in Python programs#
You can use the third-party library redis in Python to conveniently operate Redis. Here are some examples of using the redis library:
import redis
# Connect to the Redis server
r = redis.Redis(host='localhost', port=6379, db=0)
# Set key-value pair
r.set('name', 'John')
# Get value by key
r.get('name')
# Set expiration time
r.setex('name', 10, 'John')
# Add elements to a list
r.lpush('colors', 'red')
r.lpush('colors', 'yellow')
r.lpush('colors', 'blue')
# Get list
r.lrange('colors', 0, -1)
# Add elements to a set
r.sadd('fruits', 'apple')
r.sadd('fruits', 'banana')
r.sadd('fruits', 'orange')
# Get set
r.smembers('fruits')
# Add elements to a sorted set
r.zadd('scores', {'John': 60, 'Mike': 70, 'David': 80})
# Get sorted set
r.zrange('scores', 0, -1, withscores=True)
MongoDB#
MongoDB is a popular open-source document database, known for its flexibility and scalability, making it a leader among NoSQL databases. MongoDB supports dynamic schemas, making it easy to store dynamic and semi-structured data. In addition, MongoDB supports horizontal scaling, making it easy to scale to handle large amounts of data.
Basic concepts of MongoDB#
In MongoDB, a database consists of multiple collections. Each collection consists of multiple documents, and each document is a JSON object. Unlike traditional relational databases, MongoDB does not have a fixed schema, so it can easily handle dynamic and semi-structured data. The flexibility of MongoDB makes it widely used in web and mobile applications.
Operating MongoDB in Python programs#
You can use the pymongo library to operate MongoDB in Python programs. First, you need to install the pymongo library:
pip3 install pymongo
Then, you can use the following code to connect to the MongoDB server and operate the database:
from pymongo import MongoClient
# Connect to the MongoDB server
client = MongoClient('mongodb://127.0.0.1:27017')
# Get the database object
db = client.school
# Get the collection object
coll = db.students
# Insert a document
coll.insert_one({'stuid': int(1001), 'name': '骆昊', 'gender': True})
# Query documents
for student in coll.find({'gender': True}):
print('学号:', student['stuid'])
print('姓名:', student['name'])
print('性别:', '男' if student['gender'] else '女')
Query syntax of MongoDB#
The query syntax of MongoDB is different from SQL. Here are some common MongoDB query operations:
SQL | MongoDB | Explanation (SQL/MongoDB) |
---|---|---|
SELECT * FROM table | db.collection.find() | Select all documents from a collection |
SELECT * FROM table WHERE condition | db.collection.find({field: value}) | Select documents that meet the condition from a collection |
SELECT * FROM table WHERE condition ORDER BY field | db.collection.find().sort({field: 1/-1}) | Sort by a specified field |
SELECT * FROM table WHERE condition LIMIT n | db.collection.find().limit(n) | Limit the number of returned documents |
SELECT DISTINCT field FROM table | db.collection.distinct(field) | Return a list of unique values for a specified field |
SELECT COUNT(*) FROM table WHERE condition | db.collection.find({field: value}).count() | Count the number of documents that meet the condition |
Aggregation operations of MongoDB#
MongoDB also supports aggregation operations for processing and analyzing data collections. Here are some common aggregation operations:
$match
operation: Used to filter documents in a data collection that meet certain conditions.$group
operation: Used to group and calculate data collections.$sort
operation: Used to sort by a specified field.$limit
operation: Used to limit the number of returned documents.$skip
operation: Used to skip a specified number of documents.
Performance optimization of MongoDB#
Performance optimization of MongoDB includes the following aspects:
- Determine appropriate indexes: Indexes can improve query performance, but too many indexes can reduce write performance.
- Distributed deployment: MongoDB supports horizontal scaling, and performance can be improved through distributed deployment.
- Avoid full table scans: Full table scans are inefficient query methods and should be avoided as much as possible.
- Choose appropriate data types: Choosing appropriate data types can reduce data size and improve performance.