MySQL下实现闪回的设计思路 (MySQL Flashback Feature)

9 月 9th, 2012 | Posted by | Filed under 未分类

本文内容遵从CC版权协议, 可以随意转载, 但必须以超链接形式标明文章原始出处和作者信息及版权声明
网址: http://www.penglixun.com/database/mysql_flashback_feature.html

用过Oracle数据库的同学都知道,Oracle有一个Flash Recovery Area,可以把变更的块写入这块区域,当数据操作错误,需要恢复的时候,可以利用闪回空间中存储的数据块覆盖回去,也可以重构回滚段,恢复到需要的一致点。
As we know, There has a Flash Recovery Area in Oracle DB, Which allows the modified blocks been written into. So that, if there’s any incorrect deletion of data, and need to recover, DBA can use the data blocks which were stored in the Flash Recovery Area ,or reconstructed rollback segments, to restore the data to the consistent point.

而MySQL/InnoDB暂时没有提供这些功能,但是InnoDB很多设计都参考了Oracle,因此我觉得InnoDB也可以实现Flashback功能。
MySQL / InnoDB haven’t performed this great and useful function before I worked on it , though many designs of InnoDB are referred to Oracle. In this case, I think InnoDB should implement Flashback as well.

最开始我是想仿照Oracle,利用undo log来闪回,通过把COMMITTED的TRX标记为UNCOMMITTED,让InnoDB认为已经提交的事务没有提交,从而进行回滚。
具体方案是这样:
At first, I want to implement this feature, Oracle of reference. I can set COMMITTED transactions to UNCOMMITTED status during InnoDB starting with processing undo log. Then InnoDB will regard these committed transactions as uncommitted one, and rollback it.
Here are the details:

1. 在my.cnf中配置一个InnoDB_Flashback_Trx_ID的参数,标识回滚到这个trx_id的一致状态。
1. Add an option on my.cnf named InnoDB_Flashback_Trx_ID. It mean InnoDB need rollback to this trx snapshot.

2. 在InnoDB启动读取回滚段构造回滚事务时,凡是比InnoDB_Flashback_Trx_ID大的事务,都标记为UNCOMMITTED。
2. When InnoDB starting, and reading undo segments, I will set all transactions that trx_id > InnoDB_Flashback_Trx_ID to UNCOMMITTED.

3. InnoDB会把这些提交的事务认为没有提交,进而构造未提交事务,利用InnoDB自己的机制,将会在打开数据库前回滚这些事务。
3. InnoDB will consider these committed transactions are uncommitted, so construction the trx, and after construction all uncommitted transactions, InnoDB will rollback these transactions.

但这个方案有明显的弊端,首先只能适用于InnoDB,然后闪回操作需要重启,并且在实际编码实现这个方案的测试中发现,如果发生了DDL,再做一次闪回到DDL之前的TRX_ID,那么InnoDB会崩溃,并且无法再启动,应该是数据文件已经损坏,因为InnoDB的undo是逻辑记录,而非物理记录。
But this way have an Obvious disadvantages, it can only used by InnoDB. And flashback need restart MySQL. In the actual coding I found that if InnoDB did DDL, and I will rollback to the TRX_ID before DDL, InnoDB will crash, and can’t start again. I think the datafiles is corrupted, because InnoDB undo is logical records, not physical records.

因此想到了第二个方案,就是利用binlog,因为如果是ROW格式的binlog,其中记录了每个ROW的完整信息,INSERT会包含每个字段的值,DELETE也会包含每个字段的值,UPDATE会在SET和WHERE部分包含所有的字段值。因此binlog就是个完整的逻辑redo,把它的操作逆过来,就是需要的“undo”。
具体方案是这样:
So I think another way that use binlog. Because the ROW format binlog will record whole information about modified rows. INSERT/DELETE will contain all columns’ values. UPDATE will contain all columns’ on SET/WHERE part. So binlog like a whole logical redo log, reversed them can get the “undo” I need. Detail:

1. 修改Row_log_event的print的结果,将Event_type逆转:WRITE_ROWS_EVENT转为DELETE_ROWS_EVENT / DELETE_ROWS_EVENT转为WRITE_ROWS_EVENT,这只要改一个标记位即可,就是第4个字节ptr[4]。
1. Modifying the result of Row_log_event::print that reversed Event_type: Modifying WRITE_ROWS_EVENT to DELETE_ROWS_EVENT / DELETE_ROWS_EVENT to WRITE_ROWS_EVENT, this change need only modify a byte, that’s ptr[4].

2. 对于UPDATE_ROWS_EVENT,需要对调SET和WHERE部分,这是唯一相对有点麻烦的地方,我增加了个exchange_update_rows函数来完成。主要是利用print_verbose_one_row函数来解析出SET和WHERE部分的长度,以此来推断SET和WHERE的分割点,然后用memcpy交换。
2. With UPDATE_ROWS_EVENT, it need swap SET/WHERE parts. This is the only place has little trouble, I added an exchange_update_rows() function to do it. It will use print_verbose_one_row() to parse the length of SET/WHERE parts, so I can get the cut-point of SET/WHERE parts, and then swap it with memcpy().

3. 得到了逆转后的Event,就需要逆转输出。因此我在内存中拦截输出,我修改了Write_on_release_cache类,并且在Log_event中增加了一个buff,可以把Event的print结果打印在buff中,因此mysqlbinlog可以得到每个event的输出,并且存在内存中。
3. After get the reversed Event, it need reverse the sequence of Events. So I intercepted event output in memory by modifying Write_on_release_cache class, and I added a buff member on Log_event to save the print output. So mysqlbinlog can get all events’ output, and store in memory.

4. mysqlbinlog中我用动态数组存下所有的event输出,然后就从末尾向前逆向输出所有的事件,这样就可以获得闪回的逆操作文件,把这个文件导入目标库既可以完成闪回。
4. I used DYNAMIC_ARRAY to cache all events’ output in mysqlbinlog. and then I print the events’ output from end to begin, so I get the flashback file. You can import this file to MYSQL, data can flashback.

这个方案的好处很明显,通用于所有的存储引擎,因为binlog是Server层的。另外可以利用mysqlbinlog已有的各种filter来筛选部分日志输出为回滚日志,这样可以灵活选择闪回某一段操作,闪回某一个库的操作,某一个时间段的操作等等。
The advantage of this way is that all store engines can use it, because binlog is the log of Server. And then, mysqlbinlog have many filters, such as start-position/start-datatime and so on.

补丁可以看这里(Patch here):http://mysql.taobao.org/index.php/Patch_source_code#Add_flashback_feature_for_mysqlbinlog

  1. zhh001
    10 月 16th, 201220:28

    哇哈哈… 终于有人比我英语差了

    [回复]

  2. 刀尖红叶
    11 月 30th, 201311:25

    希望补丁被MySQL或MariaDB官方收录!

    [回复]

  3. satiini
    4 月 1st, 201412:11

    insert和delete操作都能闪回成功,但是update操作却没有任何效果,请问可能是什么原因呢?

    [回复]

  4. earl86
    9 月 28th, 201417:41

    今天回滚mariadb10.0.12 的binlog 报错,望修复啊
    ./mysqlbinlog -B -v –base64-output=decode-rows –start-position=156811370 –stop-position=161648150 –database=test mysql-bin.001234 > t4.sql
    ERROR: Error in Log_event::read_log_event(): ‘Found invalid event in binary log’, data_len: 40, event_type: -94
    Segmentation fault (core dumped)

    [回复]

  5. xuanye
    4 月 23rd, 201511:29

    大神,什么时候能把5.6的版本的开放出来啊。

    [回复]

  6. 友哥
    3 月 9th, 201619:07

    @satiini
    可以看下这边,基于5.6版本,支持update操作

    [回复]

  7. 友哥
    3 月 9th, 201619:08

    可以看下这边,基于5.6版本,支持update操作@satiini @xuanye
    http://www.cnblogs.com/youge-OneSQL/p/5249736.html

    [回复]

  8. 7 月 8th, 201612:06

    @友哥 update貌似不好使

    [回复]