InnoDB建表时设定初始大小 (Setting InnoDB table datafile initial size when create new table)

12月 3rd, 2012 | Posted by | Filed under 未分类

InnoDB在写密集的压力时,由于B-Tree扩展,因而也会带来数据文件的扩展,然而,InnoDB数据文件扩展需要使用mutex保护数据文件,这就会导致波动。 丁奇的博客说明了这个问题:http://dinglin.iteye.com/blog/1317874

When InnoDB under heavy write workload, datafiles will extend quickly, because of B-Tree allocate new pages. But InnoDB need to use mutex to protect datafile, so it will cause performance jitter. Xiaobin Lin said this in his blog: http://dinglin.iteye.com/blog/1317874

解决的方法也很简单,只要知道数据文件可能会增长到多大,预先扩展即可。阅读代码可以知道,InnoDB建表后自动初始化大小是FIL_IBD_FILE_INITIAL_SIZE这个常量控制的,而初始化数据文件是由fil_create_new_single_table_tablespace()函数控制的。所以要改变数据文件初始化大小,只要修改fil_create_new_single_table_tablespace的传入值即可,默认是FIL_IBD_FILE_INITIAL_SIZE。

How to solve it? That’s easy. If we know the datafile will extend to which size at most, we can pre-extend it. After reading source code, we can know InnoDB initial datafile size by FIL_IBD_FILE_INITIAL_SIZE, and fil_create_new_single_table_tablespace() function to do it. So if we want to change datafile initial size, we only need to change the initial size parameter in fil_create_new_single_table_tablespace(), the default value is FIL_IBD_FILE_INITIAL_SIZE.

因此,我在建表语法中加上了datafile_initial_size这个参数,例如:
CREATE TABLE test (

) ENGINE = InnoDB DATAFILE_INITIAL_SIZE=100000;
如果设定的值比FIL_IBD_FILE_INITIAL_SIZE还小,就依然传入FIL_IBD_FILE_INITIAL_SIZE给fil_create_new_single_table_tablespace,否则传入datafile_initial_size进行初始化。

So, I add a new parameter for CREATE TABLE, named ‘datafile_initial_size’. For example:
CREATE TABLE test (

) ENGINE = InnoDB DATAFILE_INITIAL_SIZE=100000;
If DATAFILE_INITIAL_SIZE value less than FIL_IBD_FILE_INITIAL_SIZE, I will still pass FIL_IBD_FILE_INITIAL_SIZE to fil_create_new_single_table_tablespace(), otherwise, I pass DATAFILE_INITIAL_SIZE value to fil_create_new_single_table_tablespace() function for initialization.

因此,这个简单安全的patch就有了,可以看 http://bugs.mysql.com/bug.php?id=67792 关注官方的进展:
So, I wrote this simple patch, see http://bugs.mysql.com/bug.php?id=67792:

Index: storage/innobase/dict/dict0crea.c
===================================================================
--- storage/innobase/dict/dict0crea.c	(revision 3063)
+++ storage/innobase/dict/dict0crea.c	(working copy)
@@ -294,7 +294,8 @@
 		error = fil_create_new_single_table_tablespace(
 			space, path_or_name, is_path,
 			flags == DICT_TF_COMPACT ? 0 : flags,
-			FIL_IBD_FILE_INITIAL_SIZE);
+			table->datafile_initial_size < FIL_IBD_FILE_INITIAL_SIZE ? 
+        FIL_IBD_FILE_INITIAL_SIZE : table->datafile_initial_size);
 		table->space = (unsigned int) space;
 
 		if (error != DB_SUCCESS) {
Index: storage/innobase/handler/ha_innodb.cc
===================================================================
--- storage/innobase/handler/ha_innodb.cc	(revision 3063)
+++ storage/innobase/handler/ha_innodb.cc	(working copy)
@@ -7155,6 +7155,7 @@
 			col_len);
 	}
 
+  table->datafile_initial_size= form->datafile_initial_size;
 	error = row_create_table_for_mysql(table, trx);
 
 	if (error == DB_DUPLICATE_KEY) {
@@ -7760,6 +7761,7 @@
 
 	row_mysql_lock_data_dictionary(trx);
 
+  form->datafile_initial_size= create_info->datafile_initial_size;
 	error = create_table_def(trx, form, norm_name,
 		create_info->options & HA_LEX_CREATE_TMP_TABLE ? name2 : NULL,
 		flags);
Index: storage/innobase/include/dict0mem.h
===================================================================
--- storage/innobase/include/dict0mem.h	(revision 3063)
+++ storage/innobase/include/dict0mem.h	(working copy)
@@ -678,6 +678,7 @@
 /** Value of dict_table_struct::magic_n */
 # define DICT_TABLE_MAGIC_N	76333786
 #endif /* UNIV_DEBUG */
+  uint datafile_initial_size; /* the initial size of the datafile */
 };
 
 #ifndef UNIV_NONINL
Index: support-files/mysql.5.5.18.spec
===================================================================
--- support-files/mysql.5.5.18.spec	(revision 3063)
+++ support-files/mysql.5.5.18.spec	(working copy)
@@ -244,7 +244,7 @@
 Version:        5.5.18
 Release:        %{release}%{?distro_releasetag:.%{distro_releasetag}}
 Distribution:   %{distro_description}
-License:        Copyright (c) 2000, 2011, %{mysql_vendor}. All rights reserved. Under %{license_type} license as shown in the Description field.
+License:        Copyright (c) 2000, 2012, %{mysql_vendor}. All rights reserved. Under %{license_type} license as shown in the Description field.
 Source:         http://www.mysql.com/Downloads/MySQL-5.5/%{src_dir}.tar.gz
 URL:            http://www.mysql.com/
 Packager:       MySQL Release Engineering 
Index: sql/table.h
===================================================================
--- sql/table.h	(revision 3063)
+++ sql/table.h	(working copy)
@@ -596,6 +596,7 @@
   */
   key_map keys_in_use;
   key_map keys_for_keyread;
+  uint datafile_initial_size; /* the initial size of the datafile */
   ha_rows min_rows, max_rows;		/* create information */
   ulong   avg_row_length;		/* create information */
   ulong   version, mysql_version;
@@ -1094,6 +1095,8 @@
 #endif
   MDL_ticket *mdl_ticket;
 
+  uint datafile_initial_size;
+
   void init(THD *thd, TABLE_LIST *tl);
   bool fill_item_list(List *item_list) const;
   void reset_item_list(List *item_list) const;
Index: sql/sql_yacc.yy
===================================================================
--- sql/sql_yacc.yy	(revision 3063)
+++ sql/sql_yacc.yy	(working copy)
@@ -906,6 +906,7 @@
 %token  DATABASE
 %token  DATABASES
 %token  DATAFILE_SYM
+%token  DATAFILE_INITIAL_SIZE_SYM
 %token  DATA_SYM                      /* SQL-2003-N */
 %token  DATETIME
 %token  DATE_ADD_INTERVAL             /* MYSQL-FUNC */
@@ -5046,6 +5047,18 @@
             Lex->create_info.db_type= $3;
             Lex->create_info.used_fields|= HA_CREATE_USED_ENGINE;
           }
+        | DATAFILE_INITIAL_SIZE_SYM opt_equal ulonglong_num
+          {
+            if ($3 > UINT_MAX32)
+            {
+              Lex->create_info.datafile_initial_size= UINT_MAX32;
+            }
+            else
+            {
+              Lex->create_info.datafile_initial_size= $3;
+            }
+            Lex->create_info.used_fields|= HA_CREATE_USED_DATAFILE_INITIAL_SIZE;
+          }
         | MAX_ROWS opt_equal ulonglong_num
           {
             Lex->create_info.max_rows= $3;
@@ -12585,6 +12598,7 @@
         | CURSOR_NAME_SYM          {}
         | DATA_SYM                 {}
         | DATAFILE_SYM             {}
+        | DATAFILE_INITIAL_SIZE_SYM{}
         | DATETIME                 {}
         | DATE_SYM                 {}
         | DAY_SYM                  {}
Index: sql/handler.h
===================================================================
--- sql/handler.h	(revision 3063)
+++ sql/handler.h	(working copy)
@@ -387,6 +387,8 @@
 #define HA_CREATE_USED_TRANSACTIONAL    (1L << 20)
 /** Unused. Reserved for future versions. */
 #define HA_CREATE_USED_PAGE_CHECKSUM    (1L << 21)
+/** Used for InnoDB initial table size. */
+#define HA_CREATE_USED_DATAFILE_INITIAL_SIZE (1L << 22)
 
 typedef ulonglong my_xid; // this line is the same as in log_event.h
 #define MYSQL_XID_PREFIX "MySQLXid"
@@ -1053,6 +1055,7 @@
   LEX_STRING comment;
   const char *data_file_name, *index_file_name;
   const char *alias;
+  uint datafile_initial_size; /* the initial size of the datafile */
   ulonglong max_rows,min_rows;
   ulonglong auto_increment_value;
   ulong table_options;
Index: sql/lex.h
===================================================================
--- sql/lex.h	(revision 3063)
+++ sql/lex.h	(working copy)
@@ -153,6 +153,7 @@
   { "DATABASE",		SYM(DATABASE)},
   { "DATABASES",	SYM(DATABASES)},
   { "DATAFILE", 	SYM(DATAFILE_SYM)},
+  { "DATAFILE_INITIAL_SIZE",   SYM(DATAFILE_INITIAL_SIZE_SYM)},
   { "DATE",		SYM(DATE_SYM)},
   { "DATETIME",		SYM(DATETIME)},
   { "DAY",		SYM(DAY_SYM)},

MariaDB 10.x 将包含多主复制功能

10月 17th, 2012 | Posted by | Filed under 未分类

国庆期间与Monty合作,将我开发的多主复制功能合并到了MariaDB主干,将在10.x版本中出现。

Monty专门写了一片博客来介绍多主复制补丁:http://monty-says.blogspot.com/2012/10/multi-source-replication-for-mariadb-is.html

虽然MariaDB 10.x还没正式发布,但是已经可以下载最新的源码树来编译使用:https://code.launchpad.net/~maria-captains/maria/10.0-base

目前已知的问题就是采用多主复制以后,半同步(Semi-sync)会无法使用,这个要fix估计还需要一点时间,如果你不使用半同步,并且急切的需要使用多主复制,那么可以直接采用源码树上的代码,不再需要把我的补丁打到MySQL中再编译了。而且一般来说用多主复制都是为了聚合数据进行分析,而MariaDB的优化器不用多言,在MySQL的分支中是最强大的,正好可以更好的做OLAP。

具体的使用文档看这里:https://kb.askmonty.org/en/multi-source-replication/

值得一提的是,这次合并以后增加了SHOW ALL SLAVES STATUS功能,可以显示所有的通道复制情况。START/STOP ALL SLAVES 也可以一次性启停所有通道。另外一直影响大家使用的无法跳过指定通道错误的问题,也顺便修复了,增加了一个变量,set @@default_master_connection=’connection_name’,这样可以指定一个通道,然后用单通道的Sql_slave_skip_counter就可以了。

当然也要感谢Monty为我review patch,发现那么多隐含问题,并且给我commit权限,希望能给开源做更多的事情,对MySQL做更多的改进。

SVN:合并一个分支到主干

9月 21st, 2012 | Posted by | Filed under 程序设计

原文在此,我只是翻译:http://www.sepcot.com/blog/2007/04/SVN-Merge-Branch-Trunk

这篇文章只是写给我自己备用的,但是写出来可能更多的人会觉得这很有用。

最近在工作中,我被分配了更多的职责。包括部分网站的分支控制工作。我花了一段时间才理清楚如何处理所有的事情,并且大部分在网络上找到的资料对我都没有太大的帮助,所以我会在这里发这篇文章来阐述。

我们采用SVN做代码版本控制,并且代码存在一台可以用SSH访问的服务器上。

阅读全文…

标签: ,

InnoDB一定会在索引中加上主键吗

9月 20th, 2012 | Posted by | Filed under 未分类

DBA群里在讨论一个问题,到底InnoDB会不会在索引末尾加上主键,什么时候会加?

我之前看代码记得是如果索引末尾就是主键,那么InnoDB就不再添加主键了,如果索引末尾不是主键,那么会添加主键,但是这跟测试结果不符:

CREATE TABLE t (
  a char(32) not null primary key,
  b char(32) not null,
  KEY idx1 (a,b),
  KEY idx2 (b,a)
) Engine=InnoDB;

插入部分数据后可以看到idx1和idx2两个索引的大小相同。这说明idx1和idx2的内部结构是一样的,因此 不可能 是idx1在内部存为(a,b,a)。

在登博的指导下看了 dict0dict.cc:dict_index_build_internal_non_clust() 这个函数,就是构造索引的数据字典的过程,理解了这个过程就明白了,我们接下来解读下这个函数(基于5.6最近trunk):

阅读全文…

InnoDB实现独立表空间多数据文件 (InnoDB multiple datafiles per single-tablespace)

9月 12th, 2012 | Posted by | Filed under 未分类

我们知道,在Oracle中,每个表空间都可以由很多文件组成,这样文件的IO就可以分散在很多存储路径上。虽然MySQL的服务器一般来说不会配置多路径存储,但是,很多老式文件系统(例如EXT3)对大文件的IO操作支持不好,性能很差,所以对MySQL/InnoDB来说,把数据文件大小控制在比较小的范围,也是有好处的。

As we know that Oracle can let one tablespace contains many datafiles, so file IO can distribute to multiple storage paths. Most of MySQL servers will not use multiple storage paths, but many old filesystems can’t support large files well, if datafiles too larger, performance will be lower, such as EXT3. So keeping MySQL/InnoDB datafiles size in a relatively small range is beneficial, too.

InnoDB在共享表空间模式下,是支持多文件的,用innodb_data_file_path选项可以配置:

InnoDB supports multiple datafiles in the shared tablespace mode, and we can configure it with innodb_data_file_path:

innodb_data_file_path = /disk1/ibdata1:2G;/disk2/ibdata2:2G:autoextend

这样配置就把数据文件分散在了disk1和disk2两个路径下,第一个文件固定2G大小,第二个文件初始化2G,可以自增长。

Configuring innodb_data_file_path like this, InnoDB can distribute the two datafiles to disk1 & disk2. The first file is fiexed size, 2GB. And the second file is auto extened, initial size id 2GB.

但是如果设置了 innodb_file_per_table 选项,每张表都会有一个独立的表空间文件,就不能再对每个独立表空间使用多数据文件了。但是,即使每张表分配一个独立的文件,还是可能有某些表变得非常大,例如我们就有几百GB的表,在XFS文件系统上这还没什么问题,有些系统为了“安全”依然使用EXT3,大文件的操作性能就堪忧了。

But if innodb_file_per_table = TRUE, each tables will have their single tablespace datafile, and innodb_data_file_path only used for system-tablespace, single-tablespace can’t use mutiple datafiles. Even thought each tables will have one single datafile, file maybe become very large, too.

当然,可以通过分库分表分区来让数据文件变小,对于大部分小公司都没有中间件来完成分库分表的工作,而大表随处可见,业务变化快,用分区也不合理,因此,通过为独立表空间增加多数据文件的功能,是很好的选择。

Of course, we can split databases / tables, or use partition, it can let datafiles become small. But most of small companies haven’t middleware to split these, and they also have many big tables. So it’s best for them to use mutile datafiles per single-tablespace feature.

如何通过尽可能少的改动,来为InnoDB独立表空间也增加多数据文件的功能呢,经过一段时间调研,可以发现,大多数地方,InnoDB并没有用特别的方式来判断是独立表空间还是共享表空间,并且表空间描述符并没有因共享表空间还是独立表空间而有差异,都是使用 fil_space_t,并且其中 fil_space_t->chain 就是记录从属于表空间的所有文件,用 fil_node_t 描述。

How to implement multiple datafiles per single-tablespace feature with modifying source code as little as possible? I found something useful through research, InnoDB haven’t do many special judge for shared/single-tablesapce, and tablespace descriptor is the same for them (fil_space_t). And fil_space_t->chain (fil_node_t) is the list of the files belong to this tablespace.

尤其当我看到这个注释时:

Especially when I saw this comment:

  /* TODO: The following code must change when InnoDB supports
  multiple datafiles per tablespace. */ 

我觉得InnoDB团队在开发时,也已经考虑到了未来需要增加表空间多文件的支持,更让我确信这是可以实现的。

I think InnoDB team want to do it, too. And they are already do enough preparation when they code. So I’m sure I can implement this feature.

因此基于5.6的源码树修改代码测试,我觉得如下思路是靠谱的,正按着这个方案重新整理代码:

And then I try to modify code on MySQL 5.6 source code, I found a practical way, I’m coding with this design:

用户接口 (User Interface):

CREATE TABLE语法新增两个参数:DATAFILE_INITIAL_SIZE, DATAFILE_NUM,分别表示数据文件初始大小和数据文件数量。

I added two options in CREATE TABLE syntax: DATAFILE_INITIAL_SIZE & DATAFILE_NUM. They represent the initial size of the data files and the number of data files.

CREATE TABLE table_name (...) ENGINE=InnoDB 
  DATAFILE_INITIAL_SIZE=1000000, DATAFILE_NUM=100;

这样就会建100个包含1000000个页面的文件,命名方式采用 “table_name#num.ibd”,都建在默认数据目录下,最多允许初始化255个文件,每个文件都是固定大小,如果还需要增加文件,需要使用ALTER TABLESPACE命令。

This SQL will let MySQL create a table with 100 datafiles, and each datafiles have 1000000 pages. The auto created datafiles named “table_name#num.ibd” in the default datadir. Allowed to contain up to 255 data files, each datafiles are fixed size. If you want to add datafile after created table, you need to use ALTER TABLESPACE command.

ALTER TABLESPACE `db_name/table_name` 
  ADD DATAFILE '/diskN/table_name#256' 
  INITIAL_SIZE = 5000 AUTOEXTEND_SIZE=1000 ENGINE=InnoDB;

这个命令会为db_name下的table_name表增加一个数据文件,位置在”/diskN/table_name#256.ibd”(后缀自动加),初始大小为5000个页面,每次自动扩展1000个页面。

This SQL will add a datafile for db_name.tablename, datafile path is “/diskN/table_name#256.ibd” (suffix .ibd is added automatically), initial size is 5000 pages, each autoextend operation will extend 1000 pages.

阅读全文…