MySQL下实现闪回的设计思路 (MySQL Flashback Feature)

9 月 9th, 2012 | Posted by | Filed under 未分类

用过Oracle数据库的同学都知道,Oracle有一个Flash Recovery Area,可以把变更的块写入这块区域,当数据操作错误,需要恢复的时候,可以利用闪回空间中存储的数据块覆盖回去,也可以重构回滚段,恢复到需要的一致点。
As we know, There has a Flash Recovery Area in Oracle DB, Which allows the modified blocks been written into. So that, if there’s any incorrect deletion of data, and need to recover, DBA can use the data blocks which were stored in the Flash Recovery Area ,or reconstructed rollback segments, to restore the data to the consistent point.

而MySQL/InnoDB暂时没有提供这些功能,但是InnoDB很多设计都参考了Oracle,因此我觉得InnoDB也可以实现Flashback功能。
MySQL / InnoDB haven’t performed this great and useful function before I worked on it , though many designs of InnoDB are referred to Oracle. In this case, I think InnoDB should implement Flashback as well.

最开始我是想仿照Oracle,利用undo log来闪回,通过把COMMITTED的TRX标记为UNCOMMITTED,让InnoDB认为已经提交的事务没有提交,从而进行回滚。
具体方案是这样:
At first, I want to implement this feature, Oracle of reference. I can set COMMITTED transactions to UNCOMMITTED status during InnoDB starting with processing undo log. Then InnoDB will regard these committed transactions as uncommitted one, and rollback it.
Here are the details:

1. 在my.cnf中配置一个InnoDB_Flashback_Trx_ID的参数,标识回滚到这个trx_id的一致状态。
1. Add an option on my.cnf named InnoDB_Flashback_Trx_ID. It mean InnoDB need rollback to this trx snapshot.

2. 在InnoDB启动读取回滚段构造回滚事务时,凡是比InnoDB_Flashback_Trx_ID大的事务,都标记为UNCOMMITTED。
2. When InnoDB starting, and reading undo segments, I will set all transactions that trx_id > InnoDB_Flashback_Trx_ID to UNCOMMITTED.

3. InnoDB会把这些提交的事务认为没有提交,进而构造未提交事务,利用InnoDB自己的机制,将会在打开数据库前回滚这些事务。
3. InnoDB will consider these committed transactions are uncommitted, so construction the trx, and after construction all uncommitted transactions, InnoDB will rollback these transactions.

但这个方案有明显的弊端,首先只能适用于InnoDB,然后闪回操作需要重启,并且在实际编码实现这个方案的测试中发现,如果发生了DDL,再做一次闪回到DDL之前的TRX_ID,那么InnoDB会崩溃,并且无法再启动,应该是数据文件已经损坏,因为InnoDB的undo是逻辑记录,而非物理记录。
But this way have an Obvious disadvantages, it can only used by InnoDB. And flashback need restart MySQL. In the actual coding I found that if InnoDB did DDL, and I will rollback to the TRX_ID before DDL, InnoDB will crash, and can’t start again. I think the datafiles is corrupted, because InnoDB undo is logical records, not physical records.

因此想到了第二个方案,就是利用binlog,因为如果是ROW格式的binlog,其中记录了每个ROW的完整信息,INSERT会包含每个字段的值,DELETE也会包含每个字段的值,UPDATE会在SET和WHERE部分包含所有的字段值。因此binlog就是个完整的逻辑redo,把它的操作逆过来,就是需要的“undo”。
具体方案是这样:
So I think another way that use binlog. Because the ROW format binlog will record whole information about modified rows. INSERT/DELETE will contain all columns’ values. UPDATE will contain all columns’ on SET/WHERE part. So binlog like a whole logical redo log, reversed them can get the “undo” I need. Detail:

1. 修改Row_log_event的print的结果,将Event_type逆转:WRITE_ROWS_EVENT转为DELETE_ROWS_EVENT / DELETE_ROWS_EVENT转为WRITE_ROWS_EVENT,这只要改一个标记位即可,就是第4个字节ptr[4]。
1. Modifying the result of Row_log_event::print that reversed Event_type: Modifying WRITE_ROWS_EVENT to DELETE_ROWS_EVENT / DELETE_ROWS_EVENT to WRITE_ROWS_EVENT, this change need only modify a byte, that’s ptr[4].

2. 对于UPDATE_ROWS_EVENT,需要对调SET和WHERE部分,这是唯一相对有点麻烦的地方,我增加了个exchange_update_rows函数来完成。主要是利用print_verbose_one_row函数来解析出SET和WHERE部分的长度,以此来推断SET和WHERE的分割点,然后用memcpy交换。
2. With UPDATE_ROWS_EVENT, it need swap SET/WHERE parts. This is the only place has little trouble, I added an exchange_update_rows() function to do it. It will use print_verbose_one_row() to parse the length of SET/WHERE parts, so I can get the cut-point of SET/WHERE parts, and then swap it with memcpy().

3. 得到了逆转后的Event,就需要逆转输出。因此我在内存中拦截输出,我修改了Write_on_release_cache类,并且在Log_event中增加了一个buff,可以把Event的print结果打印在buff中,因此mysqlbinlog可以得到每个event的输出,并且存在内存中。
3. After get the reversed Event, it need reverse the sequence of Events. So I intercepted event output in memory by modifying Write_on_release_cache class, and I added a buff member on Log_event to save the print output. So mysqlbinlog can get all events’ output, and store in memory.

4. mysqlbinlog中我用动态数组存下所有的event输出,然后就从末尾向前逆向输出所有的事件,这样就可以获得闪回的逆操作文件,把这个文件导入目标库既可以完成闪回。
4. I used DYNAMIC_ARRAY to cache all events’ output in mysqlbinlog. and then I print the events’ output from end to begin, so I get the flashback file. You can import this file to MYSQL, data can flashback.

这个方案的好处很明显,通用于所有的存储引擎,因为binlog是Server层的。另外可以利用mysqlbinlog已有的各种filter来筛选部分日志输出为回滚日志,这样可以灵活选择闪回某一段操作,闪回某一个库的操作,某一个时间段的操作等等。
The advantage of this way is that all store engines can use it, because binlog is the log of Server. And then, mysqlbinlog have many filters, such as start-position/start-datatime and so on.

补丁可以看这里(Patch here):http://mysql.taobao.org/index.php/Patch_source_code#Add_flashback_feature_for_mysqlbinlog

百度AStar2008的一道题:成语纠错

6 月 17th, 2012 | Posted by | Filed under 程序设计

有个小盆友正好问我这个问题,当年这个题在Astar我是满分pass了,贴出来参考下,无技术含量。

问题背景
成语是中华民族的文化瑰宝,作为历史的缩影、智慧的结晶、汉语言的精华,闪烁着睿智的光芒。
你的任务是给一个错误的四字成语进行纠错,找到它的正确写法。具体来说,你只允许修改四个汉字中的其中一个,使得修改后的成语在给定的成语列表中出现。原先的错误成语保证不在成语列表中出现。

有时,这样的“纠错”结果并不惟一。例如“一糯千金”可以改为“一字千金”也可以改成“一诺千金”。但由于“糯”和“诺”是同音字,“一糯千金”实为“一诺千金”的可能性比较大。
因此,我们还将提供一个汉字分类表,要求修改前后的两个字必须属于同一个分类。
在这样的限制下,我们保证成语纠错的结果惟一。
注意
1、汉字均采用GBK编码(参见FAQ)
2、每个汉字分类至少包含两个汉字,同一个汉字可能出现在多个类别中。
3、成语列表中的成语都是真实存在的四字成语。成语列表和待纠错成语中的所有汉字均在汉字分类表中的至少一个分类中出现。
输入格式
输入第一行包含两个整数n, m(1<=n<=200, 1<=m<=20000)。n表示汉字类别的个数,m表示成语的个数。 以下n行每行用一个无空白分隔符(空格、TAB)的汉字串表示一个分类中的所有汉字。注意,该汉字串最多可能包含200个汉字。 以下m行为成语列表,每行一个成语,恰好四个汉字。 最后一行为待纠错的成语,恰好四个汉字,且不在成语列表中出现。 输出格式 仅一行,为一个四字成语。在“修改必须在同一分类中进行”的限制下,输入数据保证纠错结果惟一。 样例输入 7 3 糯诺挪喏懦 字自子紫籽 前钱千牵浅 进近今仅紧金斤尽劲 完万 水睡税 山闪衫善扇杉 一诺千金 一字千金 万水千山 一糯千金 样例输出 一诺千金

#include 
#include 
#include 
using namespace std;

int hashkey(char ch1,char ch2)
{
    return ((unsigned char)ch1-129)*190 + ((unsigned char)ch2-64) – (unsigned char)ch2/128;
}

int checksame(string str1,string str2,string &strn,string &strm)
{
    int count = 0;
    for(int i = 0; i < 4; ++i)
        if((str1[i*2]==str2[i*2])&&(str1[i*2+1]==str2[i*2+1]))
            ++count;
            else{
                strn.resize(2);
                strm.resize(2);
                strn[0] = str1[i*2];
                strn[1] = str1[i*2+1];
                strm[0] = str2[i*2];
                strm[1] = str2[i*2+1];
            }
    //cout << "Check:" << str1 << "," << str2
    //<< ":" << count
    //<< "--" << strn[0] << strn[1]
    //<< "|" << strm[0] << strm[1] < > hashn;
    vector< vector < vector > > hashm;
    string str,tmp1,tmp2;
    string kind[200],word[20000];

    cin >> n >> m;

    hashn.resize(25000);
    for(int i = 0; i < 25000; ++i)
        hashn[i].resize(1,0);
    for(int i = 0; i < n; ++i){
        cin >> kind[i];
        //cout << kind[i].size() << endl;
        for(int j = 0; j < (kind[i].size()/2); ++j){
            GBKindex = hashkey(kind[i][j*2],kind[i][j*2+1]);
            count = ++hashn[GBKindex][0];
            hashn[GBKindex].resize(count+1);
            hashn[GBKindex][count] = i;
            //cout << GBKindex << ":" << hashn[GBKindex][0] << ":"
            //<< i << ":" <> word[i];
        for(int j = 0; j < 4; ++j){
            GBKindex = hashkey(word[i][j*2],word[i][j*2+1]);
            count = ++hashm[j][GBKindex][0];
            hashm[j][GBKindex].resize(count+1);
            hashm[j][GBKindex][count] = i;
            //cout << GBKindex << ":" << hashm[j][GBKindex][0] << ":"
            //<< i << ":" << word[i][j*2] << word[i][j*2+1] <> str;

    for(int i = 0; i < 4; ++i ){
        GBKindex = hashkey(str[i*2],str[i*2+1]);
        //cout << GBKindex <		
标签:

为MySQL增加线程内存监控 (MySQL Thread Memory Usage Monitor)

4 月 27th, 2012 | Posted by | Filed under 未分类

为了国际友人看得懂,以后我的博客都同时提供中英文版。:)
For foreign friends can understand, all of my blog at the same time in English in the future.

使用MySQL中我经常发现mysqld的内存使用会涨的很快(Buffer Pool是大页分配的),以至于使用SWAP,到底Server层用了多少内存,没有一个监控机制,所以第一步我编写了个patch(基于5.6.6)来监控每个线程用了多少内存,一旦mysqld进程使用太多内存,就去看哪些线程用的多,杀掉这些线程。
I often found mysqld process use memory will grow up very fast(InnoDB Buffer Pool used large page), lead to mysqld use SWAP. How many memory MySQL Server(Threads) used? no monitor now! So I write a patch based on MySQL 5.6.6 first, it can monitor how many memory used each threads. If I found mysqld process used too many memory, I can watch which threads used more memory, and kill them.

打上补丁后的效果像这样:
This is the effect after patched:
MySQL Thread Memory Usage

代码可以看patch
This is the patch:

  5.6_thread_mem_usage.patch (4.1 KiB, 3,722 hits)

基本方法就是在my_malloc和my_free中增加回调函数(@淘宝丁奇 提供的思路,太帅了),获取调用my_malloc和my_free函数的THD描述符,用THD中新加的malloc_size字段去记录申请和释放内存,其实my_realloc也应该去更新malloc_size,暂时还没加进去。
The method is add callback function on my_malloc/my_free(Xiaobin Lin give me this Callback idea) to get the THD which call my_malloc/my_free. And use a variable named “malloc_size” on THD to record how many memory malloc/free. In fact, my_realloc is also need calc malloc_size, but I have not add it on this version.

然后使用malloc_usable_size函数在free时判断指针申请了多少内存,在GCC 4.2以上可以使用malloc_size(pointor)去判断。
And then, I use malloc_usable_size function to get the size of pointor which will be free. After GCC 4.2, we can use malloc_size to get it.

下一步我会分类监控,把每个线程sort_buffer/join_buffer/net_buffer等线程级内存都分类统计出来占用多少,方便更直观的监控。
Next step, I will monitor the size of sort_buffer/join_buffer/net_buffer in each threads, not only total size each threads.

这是新版补丁,计算了my_realloc重分配的内存:
This is the new patch, calc my_realloc size:

  5.6_thread_mem_usage_ver2.patch (5.0 KiB, 3,082 hits)

(基于mysql-5.6.6)

  5.5_thread_mem_usage.patch (5.2 KiB, 2,932 hits)

(基于percona-5.5.22)

跳跃表的实现和测试

3 月 20th, 2012 | Posted by | Filed under 程序设计

LevelDB中一个核心的数据结构就是跳跃表,它是一个类似单向链表的结构但增加了多层指针进行跳跃,可以获得近似平衡树的效率,但是代码远远没有AVL等平衡二叉树实现复杂,所以尽管理论上跳跃表不是一个好算法,但是实现简单令他很多地方都很实用。
这面是一个跳跃表的结构。

Skip List

这是实现代码和测试代码,非常简单,相比平衡树那是简单了多了去了。
发现一些内存泄露和内存越界,补上fix。

阅读全文…

广度搜索的各种写法

3 月 19th, 2012 | Posted by | Filed under 程序设计

前些天被人问到BFS的递归怎么写,思维定式给想成DFS了,今天晚上正好有空练练手,多写代码,防止老年痴呆。
用递归,QUEUE,多线程都写了一遍。

阅读全文…

标签: