Best practices for MySQL performance optimization and MySQL use indexes

Pankaj kushwaha
16 min readJan 11, 2021

--

Enormously, database operations have become the performance bottleneck of the entire application today, which is particularly evident for Web applications. This isn’t just something that DBAs need to think about with regard to database efficiency, but it is something we programmers need to pay attention to. When designing the configuration of the database table, we need to pay attention to data operation efficiency when running the database (especially the SQL statement when looking up the table). We’re not going to talk about improving too many SQL statements here, but just about MySQL, the database that has the most web applications. Hope you find the following optimization strategies useful.

1. Optimize the Query Cache queries

Most MySQL servers have cache enabled for queries. This is one of the most productive ways to boost performance, and the database engine of MySQL handles it. These query results would be put in a cache as several of the same queries are executed several times, such that subsequent equivalent queries can directly access the cached results without the table working.

The key concern here is that this issue is easily ignored by programmers. Since some of our query statements won’t let the cache be used by MySQL. Please see the example below:

The difference between the two SQL statements above is CURDATE(). This feature does not operate with the MySQL query cache. Therefore, SQL functions such as NOW() and RAND() or other SQL functions, since the return of these functions is a variable, will not trigger query caching. So, all you need is to substitute the MySQL feature with a caching allow variable.

$r = mysql_query( "SELECT username FROM user WHERE signup_date >= CURDATE()" );// Turn on query cache$today = date ( "Ymd" );$r = mysql_query( "SELECT username FROM user WHERE signup_date >='$today'" );

2. EXPLAIN your SELECT query

To let you know how your SQL statement is treated by MySQL, use the EXPLAIN keyword. This will help you analyze your question or table structure’s output bottleneck.

The results of the EXPLAIN question will also tell you how your index primary key is used, how to scan and sort your data table… etc. etc. etc.

Choose your own SELECT statement (it is suggested to choose the most complicated one with several table joins) and add the Clarify keyword to the front. For this, you can use phpmyadmin. You’ll see a shape, then. We forgot to add the group id index in the following example, and there is a tab join:

When an index is applied to the group-id field:

We can see that 7883 rows were searched for in the previous result, while the latter only searched for two rows 9 and 16 of the table. Looking at the column rows helps one to identify possible problems with results.

3. Using Cap 1/Use LIMIT 1 when only one row of data is required

When only one row of data is needed. Sometimes, when you query a table, you know that only one outcome will occur, but because you may need to fetch the cursor, or you might be able to check the number of records returned.
Adding LIMIT 1 will improve output in this case. In this way, after discovering a piece of data, the MySQL database engine will stop looking, instead of trying to find the next piece of data matching the record.
The example below is only to find out if “Chinese” users exist. The latter would obviously be more successful than the former. (Please note that Select * is the first item and Select 1 is the second item)

$r = mysql_query( "SELECT * FROM user WHERE country ='China'" );
if (mysql_num_rows( $r )> 0) {
// ...
}

// efficient:
$r = mysql_query( "SELECT 1 FROM user WHERE country ='China' LIMIT 1" );
if (mysql_num_rows( $r )> 0) {
// ...
}

4. Create an index for the field of quest.

Indexes do not generally refer to primary keys or fields that are unique. If your table contains a field that you will sometimes use to search, then please create an index for it.

The search string “last name LIKE’a percent ‘“ can be seen from the image above. One is indexed and there is no indexing of the other. The standard is about 4 times worse.

Moreover, you may also need to know what kinds of searches are not capable of using usual indexes. For instance, the index can be meaningless if you need to search for a word in a broad article, such as: “WHERE post content LIKE’ percent apple percent ‘“ You may need to use the full-text MySQL database or create an index yourself (for example: search keywords or tags)

5. When joining tables, use a similar type of example and index them.

If there are a number of For JOIN queries in your application, you should make sure that the Join fields in both tables are indexed. MySQL can enable the mechanism in this way to customize Join’s SQL statement for you.

In addition, these fields used in Join should be of the same kind. If you want to enter the DECIMAL and INT fields, for example, MySQL cannot use their indexes. You need to use the same character set for certain STRING forms as well. (Perhaps the character sets of the two tables differ)

$r = mysql_query("SELECT company_name FROM usersLEFT JOIN companies ON (users.state = companies.state)WHERE users.id = $user_id ");// The two state fields should have been indexed, and they should be of the same type and the same character set.

6. Don’t RAND BY ORDER ()

Are you trying to interrupt the returned data rows? Choosing random data? I don’t know who invented this usage, but it’s used like this by many novices. But you just don’t understand the awful problem of doing so with results.

You have N methods to accomplish this goal if you just want to interrupt the returned data rows. Such usage just makes the output of your database drop exponentially. The problem here is: the RAND() function (which consumes CPU time) would have to be executed by MySQL and this is to be registered and then sorted for each row of records. It will not benefit even if you use Limit 1 (because of sorting).

The following example includes choosing a record randomly.

$r = mysql_query( "SELECT username FROM user ORDER BY RAND() LIMIT 1" );// This would be better:$r = mysql_query( "SELECT count(*) FROM user" );$d = mysql_fetch_row( $r );$rand = mt_rand(0, $d [0]-1);$r = mysql_query( "SELECT username FROM user LIMIT $rand, 1" );

7. Avoid Select * / SELECT Stop *

The more information that is read from the database, the slower the query becomes. And, if two separate servers are your database server and your WEB server, this would also raise the network transmission load.

You should, therefore, grow a good habit of taking whatever you need.

$r = mysql_query( "SELECT * FROM user WHERE user_id = 1" );$d = mysql_fetch_assoc( $r );echo "Welcome {$d['username']}" ;// recommended$r = mysql_query( "SELECT username FROM user WHERE user_id = 1" );$d = mysql_fetch_assoc( $r );echo "Welcome {$d['username']}" ;

8. Always set each table with an ID.

For each table in the database, we should set an ID as its primary key, and the best is an INT type (UNSIGNED is recommended), and set the automatic AUTO INCREMENT Sign increase.

Even if your user table has a field called “email” with a primary key, do not make it the primary key. It will degrade output by using the VARCHAR sort as the primary key. Furthermore, you can use the table’s ID in your software to build your data structure.

In addition, there are still some operations under the MySQL data engine which require the use of primary keys. In these instances, the primary key’s output and configurations become very significant, such as clusters, partitions…

“There is only one exception here, i.e. “The “foreign key” of the “associated table,” i.e. the primary key of this table, consists of the primary keys of several individual tables. We call this case a “foreign key.” For example, if there is a “student table” with student ID and a “course table” with course ID, then the “result table” is the “associated table” that connects the student ID with the student table.

9. Use ENUM rather than VARCHAR

The type of ENUM is quite fast and compact. It actually saves TINYINT, but on the outside it appears as a string. Using this field to create some lists of options becomes very ideal in this way.

If you have fields such as “gender”, “country”, “ethnic”, “status” or “department” and you know that these fields have restricted and fixed values, then instead of VARCHAR, you can use ENUM.

To tell you how to reorganize your table layout, MySQL also has a “recommendation” (see Article 10). This suggestion will tell you to convert it to ENUM when you have a VARCHAR region. You can get relevant suggestions using PROCEDURE ANALYSE().

10. Get PROCEDURE ANALYSE Suggestions ()

PROCEDURE ANALYSE() will enable MySQL to help you evaluate your fields and their actual data, and will offer you some useful suggestions. These suggestions will only be useful if there are real data in the table, since it needs data as a basis to make some major decisions.

For example, if you create an INT field as your primary key, but there is not a lot of data, then PROCEDURE ANALYSE() would recommend that you adjust the form of this field to MEDIUMINT. Or you have a VARCHAR field, you might get a recommendation to change it to ENUM because there is not a lot of data. Such recommendations may be due to inadequate data, because the decision-making is not sufficiently accurate.

“When viewing the table, you can click on “Propose table structure” in phpmyadmin to view these suggestions. Please click on “Propose table structure

Note that these are just recommendations. Only when there is more and more data in your table can these suggestions become valid. Remember, you are the one who makes the final decision.

11. Use NOT NULL to the fullest degree possible

You should still keep the fields NOT NULL unless you have a very clear purpose for using NULL values. Please read on, this seems a little controversial.

First, ask yourself how big is the gap between “Empty” and “NULL” (if it’s INT, it’s 0 and NULL)? You shouldn’t use NULL if you think there’s no difference between them. (Do you know? NULL and Empty strings are the same in Oracle!)

Don’t assume NULL doesn’t need space, it takes extra space, and when you compare it, the software will be more complicated. This is not to suggest, of course, that you can’t use NULL. The fact is very complicated. There are still several situations where NULL needs to be used.

The following is taken from MySQL’s own documentation:

“NULL columns require additional space in the row to record whether their values ​​are NULL. For MyISAM tables, each NULL column takes one bit extra, rounded up to the nearest byte.”

12. Statements prepared

Prepared statements are somewhat similar to stored procedures. They are a set of SQL statements running in the background. Whether it is performance or security problems, we can get many benefits from using prepared statements.

Any of your bound variables can be checked by Prepared Statements, which will protect your software from “SQL injection” attacks. You can still manually check your variables, of course. Manual tests, however, are prone to problems and are often ignored by programmers. This issue would be better when we use any system or ORM.

In terms of efficiency, this can offer you substantial performance benefits when the same query is used several times. For these prepared statements, you may specify certain parameters, and MySQL will only parse it once.

Since the binary format is used by the latest version of MySQL to transmit prepared statements, this will render network transmission quite effective.

Of course, since they do not accept query caching, there are certain cases where we need to stop using Prepared Statements. But it is said that it is supported after version 5.1.

You should review the manual: mysqli extension to use prepared statements in PHP or use a database abstraction layer, such as: PDO.

if ( $stmt = $mysqli ->prepare( "SELECT username FROM user WHERE state=?" )) {

// bind parameters
$stmt ->bind_param( "s" , $state );

// carried out
$stmt ->execute();

// binding result
$stmt ->bind_result( $username );

// move the cursor
$stmt ->fetch();

printf( "%s is from %s\n" , $username , $state );

$stmt ->close();
}

13. Queries that are unbuffered under

In normal circumstances, when you run a SQL statement in your script, your program will stop there until no such SQL statement returns, and then your program will continue to run. Unbuffered queries can be used to modify this behaviour.

Mysql unbuffered query() sends a SQL statement to MySQL instead of automatically caching the result like mysql query() (). This will save a lot of significant memory, particularly those query statements that will generate a large number of results, and you don’t have to wait until all the results are returned, you only need to return the first row of data, you can immediately start working on the query result.

There are some drawbacks, however. Because either you read all the rows, or you have to call mysql free result() before the next question to clear the output. Also, there will not be mysql num rows() or mysql data seek(). You need to think carefully, therefore, whether to use unbuffered queries.

"Mysql_unbuffered_query() sends the SQL query query to MySQL without automatically fetching and buffering the result rows as mysql_query() does. This saves a considerable amount of memory with SQL queries that produce large result sets, and you can start working on the result set immediately after the first row has been retrieved as you don't have to wait until the complete SQL query has been performed."

14. Save the IP address as the INT as UNSIGNED

A VARCHAR(15) field will be generated by many programmers to store the IP in the form of a string instead of an integer IP. Only 4 bytes are required if you use plastic to store, and you can have fields of fixed length. In addition, this will provide you with query benefits, especially when you need to use these WHERE conditions: IP between ip1 and ip2.

We must use UNSIGNED INT, so the whole 32-bit unsigned integer will be used by the IP address.

You can use INET ATON() to convert an IP string into an integer for your query, and use INET NTOA() to convert an integer into an IP string. In PHP, such functions as ip2long() and long2ipip also exist ().

$r = "UPDATE users SET ip = INET_ATON('{$_SERVER['REMOTE_ADDR']}') WHERE user_id = $user_id" ;

15. Fixed-length tables would be quicker.

The whole table would be called “fixed-length” or “static” if all fields in the table are “fixed-length”. In the table, for example, there are no fields of the following types: VARCHAR, TEXT, BLOB. As long as you have one of these fields, the table is not a “fixed-length static table” In this way, another approach would be used by the MySQL engine to process it.

Since MySQL searches faster, since these fixed lengths are easy to calculate the offset of the next data, fixed-length tables will boost efficiency, so the reading will naturally be fast. And if the field is not fixed-length, then the program needs to find the primary key each time to find the next one.

Fixed-length tables are much simpler for caching and restoring. However, the only side effect is that some space is lost by fixed-length fields, so fixed-length fields need to allocate too much space, whether or not you use them.

You can break your table into two using the “vertical split” technique (see next item), one with a fixed length and one with a variable length.

16. Segmentation vertically

‘Vertical segmentation’ is a method of converting database tables into several tables by column, which can reduce the size of the table and the number of fields, thereby achieving the optimization objective. (I’ve done a bank project previously, and I’ve seen a table with more than 100 fields, which is really scary.)

Example 1: There is an area for the home address in the Users Table. This field is a field that is optional. In contrast, you do not need to read or rewrite this field regularly, apart from personal data during database operations. So, why not place it on a different table? If there are several occasions, I only have user ID, user name, and password for the user table, this will make your table have better results, think about it. User functions, etc., can also be used. A smaller watch still works well.

Example 2: You have a “last login” area, which is modified every time a user logs in. Each update, however, will cause the table’s query cache to be vacated. You may then position this field in another table, so that your continuous reading of user ID, user name, user function will not be affected, because the query cache can help you improve a lot of output.

Furthermore, you need to pay attention to the table created by these separated fields, you will not often join them, otherwise the performance will be worse than if it is not divided, and there will be a very large amount. The degree of decay.

17. Break a broad statement from DELETE or INSERT

If you need a broad DELETE or INSERT question to be executed on an online website, you need to be very careful to prevent your acts from reacting to your entire website. Since these two operations will lock the table, no other operations will join until the table is locked.

Apache is going to have several processes or threads for girls. It works very effectively, therefore, and our server doesn’t want to have too many child processes, threads, and ties to the database. This is a huge resource on a server, especially memory.

If you lock your table for a period of time, such as 30 seconds, then the cumulative access processes/threads, database links, and the amount of open files in those 30 seconds for a site with a high traffic volume may not only allow you to park the Crash WEB service, but may also cause your entire server to go down immediately.

So, you have to break it up if you have a big deal. A good approach is to use Cap words. An example is here:

while (1) {//Only make 1000 each timemysql_query( "DELETE FROM logs WHERE log_date <= '2009-11-01' LIMIT 1000" );if (mysql_affected_rows() == 0) {// Nothing to delete, exit!break ;}// take a break every timeusleep(50000);}

18. The smaller, the quicker the column.

The most critical bottleneck might be hard disk operations for most database engines. So, in this case, keeping the data compact would be very useful, since it decreases access to the hard drive.

For all data forms, see the MySQL document Storage Specifications.

If there are just a few columns in a table (such as dictionary tables, setup tables), then we have no need to use INT as the primary key. Using MEDIUMINT, SMALLINT or a smaller TINYINT is more economical. Using DATE is a lot better than DATETIME, if you don’t need to log time.

Of course, you will need to leave some space for expansion, or you will die and be hideous if you do this later. See an instance of Slashdot (November 06, 2009). A simple ALTER TABLE statement cost 3 Because there are 16 million pieces of data, more than an hour.

19. Choose the best engine for storage

Two MyISAM and InnoDB storage engines are available for MySQL. There are advantages and drawbacks to each engine. This matter was discussed in Kushell’s previous article’MySQL: InnoDB or MyISAM?.’

For certain applications that require a large number of queries, MyISAM is appropriate, but it is not very good for a large number of writing operations. Even if you only need to update a field, the whole table will be locked, and once the read operation is done, all processes, even the read process, will not function. Furthermore, for calculations such as SELECT COUNT(*), MyISAM is extremely fast.

The pattern for InnoDB will be a very complex storage engine, which will be slower than MyISAM for some small applications. The “row lock” is sponsored, so when there are more writing operations, it will be easier. Moreover, he also supports more advanced applications, such as transactions.

Below is the manual for MySQL

target=”_blank” MyISAM Storage Engine

InnoDB Storage Engine

20. Use an Object Relational Mapper and

ORM (Object Relational Mapper), you can get reliable performance increase. All the things an ORM can do can also be written manually. However, this requires a senior expert.

The most important thing of ORM is “Lazy Loading”, which means that it will only do it when it needs to get a value. But you also need to be careful of the side effects of this mechanism, because it is very likely to create a lot of small queries and reduce performance.

ORM can also package your SQL statements into a transaction, which will be much faster than executing them individually.

Currently, my favorite PHP ORM is: Doctrine.

21. Be careful with “permanent links”

The purpose of “permanent links” is to reduce the number of re-creation of MySQL links. When a link is created, it will always be in a connected state, even if the database operation has ended. Moreover, since our Apache started to reuse its child processes-that is, the next HTTP request will reuse the Apache child process and reuse the same MySQL link.

PHP Manual: mysql_pconnect()

In theory, this sounds very good. But from personal experience (and most people’s), this function creates more troubles. Because, you only have a limited number of links, memory issues, file handles, etc.

Moreover, Apache runs in an extremely parallel environment, which creates many, many processes. This is why this “permanent link” mechanism does not work well. Before you decide to use “permalink”, you need to think about the architecture of your entire system.

Supplement:

1. MySQL forces the use of indexes: force index (index name or primary key PRI)E.g:select * from table force index(PRI) limit 2; (mandatory use of primary key)select * from table force index(pankaj1_index) limit 2; (mandatory use of index "pankaj1_index")select * from table force index(PRI,pankaj1_index) limit 2; (mandatory use of index "PRI and pankaj1_index")2. MySQL prohibits an index: ignore index (index name or primary key PRI)E.g:select * from table ignore index(PRI) limit 2; (primary key is prohibited)select * from table ignore index(pankaj1_index) limit 2; (prohibit the use of index "pankaj1_index")select * from table ignore index(PRI,pankaj1_index) limit 2;(Prohibit the use of index "PRI,pankaj1_index")

I like to learn new and better ways of doing things when working on a scale, and feel free to ask questions and make suggestions.
Also, check out another story on this.
Thanks for reading this.

--

--

Pankaj kushwaha

Database/System Administrator | DevOPS | Cloud Specialist | DevOPS