Merikanto

一簫一劍平生意,負盡狂名十五年

Database Resources & Knowledge

Some scattered resources and knowledge regarding databases.

Reference:

Databse System


ORM

Python: the Django ORM or SQLAlchemy

ORM can prevent SQL injection.

ORM (Object-Relational-Mapping) is a technique that lets you query and manipulate data from a database using an object-oriented paradigm.

  • The Object part is the one you use with your programming language (e.g. Python)

  • The Relational part is a Relational Database Manager System. There are other types of databases but the most popular is relational (e.g. MySQL)

  • The Mapping part is where you do a bridge between your objects and your tables.

An ORM library is a completely ordinary library written in your language of choice that encapsulates the code needed to manipulate the data, so you don’t use SQL anymore; you interact directly with an object in the same language you’re using.

Using an ORM framework would allow you to map that object with a database record automatically. For instance, retrieve a list of books from author name “Linus”:

1
book_list = BookTable.query(author="Linus");

The mechanical part is taken care of automatically via the ORM library.


Pros and Cons:

  • ORM saves a lot of time:

    • DRY: You write your data model in only one place, and it’s easier to update, maintain, and reuse the code.
    • A lot of stuff is done automatically, from database handling to I18N.
    • It forces you to write MVC code, which, in the end, makes your code a little cleaner.
    • You don’t have to write poorly-formed SQL (most Web programmers really suck at it, because SQL is treated like a “sub” language, when in reality it’s a very powerful and complex one).
    • Sanitizing; using prepared statements or transactions are as easy as calling a method.

  • ORM library is more flexible:

    • It fits in your natural way of coding (it’s your language!).
    • It abstracts the DB system, so you can change it whenever you want.
    • The model is weakly bound to the rest of the application, so you can change it or use it anywhere else.
    • It lets you use OOP goodness like data inheritance without a headache.

  • Disadvantages of ORM:

    • You have to learn it, and ORM libraries are not lightweight tools;
    • You have to set it up. Same problem.
    • Performance is OK for usual queries, but a SQL master will always do better with his own SQL for big projects.
    • It abstracts the DB. While it’s OK if you know what’s happening behind the scene, it’s a trap for new programmers that can write very greedy statements, like a heavy hit in a for loop.

NoSQL & My SQL

General points about NoSQL:

  • NoSQL is typically good for unstructured/“schemaless” data - usually, you don’t need to explicitly define your schema up front and can just include new fields without any ceremony

  • NoSQL typically favours a denormalised schema due to no support for JOINs per the RDBMS world. So you would usually have a flattened, denormalized representation of your data.

  • Using NoSQL doesn’t mean you could lose data. Different DBs have different strategies. e.g. MongoDB - you can essentially choose what level to trade off performance vs potential for data loss - best performance = greater scope for data loss.

  • It’s often very easy to scale out NoSQL solutions. Adding more nodes to replicate data to is one way to a) offer more scalability and b) offer more protection against data loss if one node goes down. But again, depends on the NoSQL DB/configuration. NoSQL does not necessarily mean “data loss” like you infer.

  • Complex/dynamic queries/reporting are best served from an Relational Database. Often the query functionality for a NoSQL DB is limited (NoSQL DBs often lack the ability to perform atomic operations across multiple “tables”).

  • It doesn’t have to be a 1 or the other choice. My experience has been using RDBMS in conjunction with NoSQL for certain use cases.


NoSQL is more suitable for:

  • Easy to scale by just adding more nodes.

  • Query on large data set

    Imagine tons of tweets posted on twitter every day. In RDMS, there could be tables with billions of rows, and you don’t want to do query on those tables directly, not even mentioning, most of time, table joins are also needed for complex queries (MySQL breaks when it gets to billions).

  • Disk I/O bottleneck

    If a website needs to send results to different users based on users’ real-time info, we are probably talking about tens or hundreds of thousands of SQL read/write requests per second. Then disk i/o will be a serious bottleneck.


PostgreSQL

Using explain

Query Planning