Home » » Hướng dẫn viết lại đường dẫn SEO website bằng regular expression

Hướng dẫn viết lại đường dẫn SEO website bằng regular expression

Đăng bởi: Code.Elite.Vn

I. Lợi Thế Khi Sử Dụng Regular Expression Khi Seo Là Gì?

1. Bạn có thể chuyển đổi đường dẫn khó nhớ thành đường dẫn thân thiện với người sử dụng.
2. Bạn có thể chuyển hướng đường dẫn chính xác
3. Bạn có thể ngăn đối tượng định lấy hình ảnh trên trang của bạn
4. Ngăn chặn các máy tin rác vào website của bạn
5. Ngăn chặn các vấn đề đường dẫn liên quan
6. Ngăn chặn các  vấn đề nội dung trùng lặp
7. Cung cấp nội dung dựa vào địa chỉ mạng vùng

II.Lợi Thế Khi Sử Dụng Regular Expression Khi Google Analytics Là Gì?

1. Set up complex goals in Google Analytics like the one which could match multiple goal pages. 
2. Set up complex funnel pages in Google Analytics like the one which could match multiple web pages. 
3. Exclude traffic from a IP address range via Google Analytics filters
4. Set up complex advanced segments like the segments which can filter out branded keywords.
5. Understand the commercial value of long tail keywords.
6. Rewrite URLs in Google Analytics reports
7. Filter data based on complex patterns within the GA report interface.

IIIRegular Expression Là Gì?
Là biểu thức chính quy để kiểm tra dự liệu chuỗi và mảng trong lập trình PHP trong các thẻ tìm kiếm.

ví dụ trong các thẻ như: [], ^, (), {}, $, +, * .

Regular Expression

regex cheatsheet for Google Analytics

Caret  ^

^’ – This is known as ‘Caret’ and is used to denote the beginning of a regular expression. For e.g.
^Colou?r => Check for a pattern which starts with ‘Color’ or ‘Colour’
^Nov(ember)? => Check for a pattern which starts with ‘Nov’ or ‘November’
^/elearning\.html => Check for a pattern which starts with ‘/elearning.html’
^.*\.php => Check for a pattern which starts with any file with .php extension.
^/product-price\.php => Check for a pattern which starts with ‘/product-price.php’
Caret also means NOT when used after the opening square bracket. For e.g.
[^a] => Check for any single character other than the lowercase letter ‘a’.
[^B] = > Check for any single character other than the uppercase letter ‘B’.
[^1] => Check for any single character other than the number ‘1’
[^ab] => Check for any single character other than the lower case letters ‘a’ and ‘b’
[^aB] => Check for any single character other than the lower case letter ‘a’ and uppercase letter ‘B’
[^1B] => Check for any single character other than the number ‘1’ and uppercase letter ‘B’
[^Dog] => Check for any single character other than the following: uppercase letter ‘D’, lowercase letter ‘o’ and lowercase letter ‘g’.
[^123b] => Check for any single character other than the following characters: number ‘1’, number ‘2’, number ‘3’ and lowercase letter ‘b’.
[^1-3] => Check for any single character other than the following: number ‘1’, number ‘2’ and number ‘3’
[^0-9] => Check for any single character other than the number.
[^a-z] => Check for any single character which is not a lower case letter.
[^A-Z] => Check for any single character which is not a upper case letter.

Dollar  $

$’ – It is used to denote the end of a regular expression or ending of a line. For e.g.
Colou?r$ => Check for a pattern which ends with ‘Color’ or ‘Colour’
Nov(ember)?$ => Check for a pattern which ends with ‘Nov’ or ‘November’
elearning\.html$ => Check for a pattern which ends with ‘elearning.html’
\.php$ => Check for a pattern which ends with .php
product-price\.php$ => Check for a pattern which ends with ‘product-price.php’

Square Bracket  []

‘[]’ – This square bracket is used to check for any single character in the character set specified in []. For e.g:
[a] => Check for a single character which is a lowercase letter ‘a’.
[ab] => Check for a single character which is either a lower case letter ‘a’ or ‘b’.
[aB] => Check for a single character which is either a lower case letter ‘a’ or uppercase letter ‘B’
[1B] => Check for a single character which is either a number ‘1’ or an uppercase letter ‘B’.
[Dog] => Check for a single character which can be anyone of the following: uppercase letter ‘D’, lower case letter ‘o’ or lowercase letter ‘g’.
[123b] => Check for a single character which can be anyone of the following: number ‘1’, number ‘2’, number ‘3’ or lowercase letter ‘b’.
[1-3] => Check for a single character which can be any one number from 1, 2 and 3.
[0-9] => Check for a single character which is a number.
[a-d] => Check for a single character which can be any one of the following lower case letter: ‘a’, ‘b’, ‘c’ or ‘d’.
[a-z] => Check for a single character which is a lower case letter.
[A-Z] => Check for a single character which is a upper case letter.
[A-T] => Check for a single character which can be any uppercase letter from ‘A’ to ‘T’.
[home.php] => Check for a single character which can be anyone of the following characters: lowercase letter ‘h’, lowercase letter ‘o’, lowercase letter ‘m’, lowercase letter ‘e’, special character ‘.’, lower case letter ‘p’, lowercase letter ‘h’ or lowercase letter ‘p’

Parenthesis ()

()’ – This is known as parenthesis and is used to check for a string. For e.g.
(a) => Check for string ‘a’
(ab) => Check for string ‘ab’
(dog) => Check for string ‘dog’
(dog123) => Check for string ‘dog123’
(0-9) => Check for string ‘0-9’
(A-Z) => Check for string ‘A-Z’
(a-z) => Check for string ‘a-z’
(123dog588) => Check for string ‘123dog588’
Note: () is also used to create and store variables. For e.g. ^ (.*) $

Question mark  ?

‘?’ is used to check for zero or one occurrence of the preceding character. For e.g.
[a]? => Check for zero or one occurrence of lowercase letter ‘a’.
[dog]? => Check for zero or one occurrence of lowercase letter ‘d’, ‘o’ or ‘g’.
[^dog]? => Check for zero or one occurrence of a character which is not the lowercase letter ‘d’, ‘o’ or ‘g’.
[0-9]? => Check for zero or one occurrence of a number
[^a-z]? => Check for zero or one occurrence of a character which is not a lower case letter.
^colou?r$ => check for color or colour.
^Nov(ember)28(th)?$ => check for ‘nov 28’, ‘november 28, Nov 28th and November 28th
Note: ? when used inside a regular expression makes the preceding letter or group of letters optional.
For e.g. the regular expression: ^colou?r$ matches both ‘color’ and ‘colour’. Similarly, the regular expression: ^Nov(ember)28(th)?$ matches: ‘nov 28’, ‘november 28, Nov 28th and November 28th

Plus  +

‘+’ is used to check for one or more occurrences of the preceding character. For e.g.
[a]+ => Check for one or more occurrences of lowercase letter ‘a’.
[dog]+ => Check for one or more occurrences of letters ‘d’, ‘o’ or ‘g’.
[548]+ => Check for one or more occurrences of numbers ‘5’, ‘4’ or ‘8’.
[o-9]+ => Check for one or more numbers
[a-z]+ => Check for one or more lower case letters
[^a-z]+ => Check for one or more characters which are not lowercase letters.
[a-zA-z]+ => Check for any combination of uppercase and lowercase letters.
[a-z0-9]+ => Check for any combination of lowercase letters and numbers.
[A-Z0-9]+ => Check for any combination of uppercase letters and numbers.
[^9]+ => Check for one or more character which is not the number 9.

Multiply *

*‘ is used to check for any number of occurrences (including zero occurrences) of the preceding character. For example, 31* would match 3, 31, 311, 3111, 31111 etc.

Dot .

‘.’ is used to check for a single character (any character that can be typed via keyboard other than a line break character (\n)). For example the regular expression: Action ., Scene2 would match:
  • Action 1, Scene2
  • Action A, Scene2
  • Action 9, Scene2
  • Action &, Scene2
but not
  • Action 10,Scene2
  • Action AB,Scene2

Pipe |

‘|’ is the logical OR . For example:
(His|Her) => Check for the string ‘his’ or ‘her’.

Escaping Character \

‘\’ is the escaping character which is used to escape from the normal way a subsequent character is interpreted. For e.g.
the regular expression: ^www\.abc\.com$ matches www.abc.com

Exclamation !

‘!’ – It is logical NOT. But unlike ^ (caret), it is used only at the beginning of a rule or a condition. For e.g.
  1. (!abc) => Check for a string which is not the string ‘abc’.
  2. [!0-9] => Check for a single character which is not a number.
  3. [!a-z] => Check for a single character which is not a lower case letter.

Curly Brackets {}

{} is used to repeat the preceding character. For example
1{2}  => Check for 11
1{3} => Check for 111
1{4}  => Check for 1111
1{2,4}  => Check for 11, 111 or 1111
[0-9]{2}  => Check for two digits number like 12
[0-9]{3}  => Check for three digits number like 123
[0-9]{4} => Check for 4 digits number like 1234
[0-9]{1,4} => Check for 1 to 4 digits number.

Other Meta characters in Regex

other meta characters

White Spaces

To create a white space in a regular expression, just use the white space. For e.g.
(Himanshu Sharma) => Check for the string ‘Himanshu Sharma’

More Regex Examples

^(*\.html)$ => Check for any number of characters before .html and store them in a variable.
^dog$ => Check for the string ‘dog’
^a+$ => Check for one or more occurrences of a lower case letter ‘a’
^(abc)+$ => Check for one or more occurrences of the string ‘abc’.
^[a-z]+$ => Check for one or more occurrences of a lower case letter.
^(abc)*$ => Check for any number of occurrences of the string ‘abc’.
^a*$ => Check for any number of occurrences of the the lower case letter ‘a’

#. Find all the files which start from ‘elearning’ and which have the ‘.html’ file extension
^elearning* \.html$
#. Find all the PHP files
^*\.php$

mod_rewrite

It is a module (function) written in ‘C’ programming language: ‘mod_rewrite.c’. This module works only with Apache server 1.2 or later and is called from the .htaccess file (ASCII file which contains configuration directives and rules for files and folders). Through this module you can:
  1. Re-Write URLs
  2. Redirect URLs
  3. Solve Canonical URL issues
  4. Solve Hot linking issues
  5. Block visitors from accessing a particular folder, file or the whole website.
  6. Create custom 403 and 404 pages.
  7. Deliver contents on the basis of the IP address and benefits are end less.

Types of Configuration Directives

There are 9 types of configuration directives:
  1. RewriteEngine
  2. RewriteOptions
  3. RewriteLog
  4. RewriteLogLevel
  5. RewriteLock
  6. RewriteMap
  7. RewriteBase
  8. RewriteRule
  9. RewriteCond
But here we will talk about only three directives: RewriteEngine, RewriteRule and RewriteCond. I have not found any use of other directives so far.

RewriteEngine

This configuration directive is used to enable or disable the mod-rewrite module.
Syntax: RewriteEngine on/off
Default Value: RewriteEngine off
That’s why in .htacess file we first enable the mod-rewrite module by adding the following code:
Options +FollowSymLinks
RewriteEngine on

RewriteRule

This configuration directive tells the server to interpret the given statement as a rule.
Syntax: RewriteRule <pattern> <substitution> [FLAGS]
Here pattern is a regular expression and substitution is a URL.
FLAGS can be [R], [F], [NC], [QSA], [L], [OR] etc.

[R] => 
Redirect. Its default value is 302. It can be assigned any number from 300 to 400. For e.g.
RewriteRule ^index\.html$ /index.php [r=301]

[F] => 
Forbidden. It is generally used with hyphen (-). The hyphen tells the server not to perform any substitution. This flag tells the server not to fulfill the request and return ‘403’ response code. For e.g.
RewriteRule ^product-price\.php$ -[F]

[NC] => 
It tells the server to ignore uppercase or lowercase when checking for patterns. For e.g.
RewriteRule ^him*\.php$ [nc]

[QSA] => 
Query String append. It tells the server to pass query string from the original URL to the new URL.
[L] => Last rule. This tag tells the server not to process any more rules.
[OR] => Logical OR. This flag is used as logical OR for RewriteCond statements.

RewriteCond

This configuration directive tells the server to interpret the given statement as a condition for the rule which immediately follows it.
Syntax:
Here first mod-rewrite matches each URL with the given pattern. If no URL matches the pattern, then mod_rewrite process the next rule. If a URL matches the pattern, then mod_rewrite looks for the corresponding RewriteCondIf no corresponding RewriteCond exist, then the matched URL is replaced by the substitution.
If corresponding RewriteCond exist, then each RewriteCond is processed in the order they appear from top to bottom. Each RewriteCond is processed by matching its test string to against its corresponding condition pattern. If test string doesn’t matches with its condition pattern, then mod_rewrite process the next rule, otherwise it process the next RewriteCond. When all RewriteConds are successfully processed, then the matched URL is replaced by the substitution. A test string can be:
1. A simple text
2. RewriteRule back reference
3. RewriteCond back reference
4. Server Variable

RewriteRule Back Reference
It is of the form $N, where N can be any number from o to 9. It is used to denote that variable which was created in the RewriteRule pattern. For e.g.
RewriteRule ^(.*)$ /index.php/$1 [L]

RewriteCond Back Reference
It is of the form %N, where N can be any number from 1 to 9. It is used to denote that variable which was created in the ‘condpattern’ from the last matched ‘RewriteCond’. For e.g.
RewriteCond %{HTTP_HOST} ^(123\.42\.162\.7)$
RewriteCond %1 ^123\.42\.162\.7$
RewriteRule ……………..

Server Variable
Syntax: % {Variable_Name}
E.g.
1. %{HTTP_HOST} – This variable gives information about server name and its IP address.
2. %{HTTP_USER_AGENT} – This variable gives information about user’s operating system and browser.
3. %{QUERY_STRING} – This variable returns query string.
4. %{HTTP_REFERER} – This variable returns the URL of the referer.
5.%{REMOTE_ADDR} -This variable returns the IP address of the referer.

Examples

Example-1: Redirect all request for pages in the media folder to a new page ‘media.html’.
RewriteRule ^media/$ /media.html [r=301,l]
Example-2: Redirect oldaddress.html page to newaddress.html page
RewriteRule ^oldaddress\.html$ /newaddress.html [r=301,l]
Example-3: Redirect one website to another
Redirect 301 https://www.anotherwebsite.com
Example-4: Redirect abc.com/index.html to www.abc.com
RewriteCond %{REQUEST_URL} ^index\.html$
RewriteRule ^(.*)$ https://www.abc.com/$1 [r=301, l]
Example-5: Block a visitor from the IP address 12.34.56.78 to view your file product-prices.html
RewriteCond %{REMOTE_ADDR} ^12\.34\.56\.78$
RewriteRule ^product-prices\.html$ /sorry.html -[F]
Example-6: Block a visitor from the IP address 12.34.56.78 to view your folder ‘sales-demo’
RewriteCond %{REMOTE_ADDR} ^12\.34\.56\.78$
RewriteRule ^sales-demo/$ /sorry.html -[F]
Example-7: Block a visitor from the IP address 12.34.56.78 to view your website www.abc.com
RewriteCond %{REMOTE_ADDR} ^12\.34\.56\.78$
RewriteRule ^.*$ / -[F]
Example-8: Apply 301 from one file to another file
Redirect 301  /file1.html   https://www.mywebsite.com/file2.html  
The above code will permanently redirect file1.html to file2.html. So whenever a search engine or a visitor will look for file1.html, he will automatically be redirected to file2.html.
Example-9: Convert Dynamic URL into Static Looking SEO friendly URL
RewriteCond   % {QUERY_STRING}   ^keyval\=25\&Keyval2\=62$ [nc]
RewriteRule   ^productdescription.php$  https://www.example.com/whiteboard-accessories.php? [r=301, l]
This code will redirect https://www.example.com/productdescription.php?keyval=25&keyval2=62 to https://www.example.com/whiteboard-accessories.php
Note: You need to put question mark (?) at the end of the substitution URL, otherwise query string will be appended at the end of the substitution URL.
Example-10: Redirect non-www to www
rewritecond %{http_host} ^mywebsite.com [nc]
rewriterule ^(.*)$ https://www.mywebsite.com/$1 [r=301,nc]
Note: Replace ‘mywebsite’ by your website name
Example-11: Create Custom 404 page
Create a web page which you want to display as your custom 404 page say custom404.php and then upload your webpage to the root directory. Now add following code to your .htaccess file:
Options +FollowSymLinks
RewriteEngine on
ErrorDocument 404 https://www.mywebsite.com/custom404.php
Example-12: Block an IP address from accessing your website
Add following code in your .htaccess file:
Options +FollowSymLinks
RewriteEngine on
Order Deny, Allow
Deny from 61.16.153.67
If you want to block two or more IP addresses:
Options +FollowSymLinks
RewriteEngine on
Order Deny, Allow
Deny from 61.16.153.67
Deny from 124.202.86.42
Example-13: Resolve the Hot Linking Issue
Hot-linking means direct linking to your website file (images, videos etc). By preventing hot-linking, you can save your sever bandwidth. Add following code in your .htaccess file:
Options +FollowSymLinks
RewriteEngine on
RewriteCond %{HTTP_REFERER} !^https://(.+\.)?mywebsite\.com/ [NC]
RewriteCond %{HTTP_REFERER} !^$
RewriteRule .*\.(jpg|jpeg|gif|bmp|png|swf)$ – [F]
Replace ‘mywebsite’ by your website name and then use hotlinking checker tool to find out whether your files (images,videos etc ) can be hot-linked or not.
Example-14: Enable proxy caching for static resources
Add following code to your .htaccess file
<FilesMatch “\.(gif|jpe?g|png)$”> 
Header set Cache-Control “public” 
</FilesMatch>

Regular Expressions and Google Analytics

There are many cases where regular expressions are very useful in Google Analytics. Some of such cases are:
1. Setting up a goal which should match multiple goal pages instead of one.
2. Setting up a funnel in which a step should match mutiple pages instead of one. Infact when you set up a funnel, all URLs are treated as regular expressions.
3. Excluding traffic from a IP address range via filters. Infact there are many filters which require regular expressions. Big organizations generally own a range of IP addresses. Therefore to exclude organization’s internal traffic you need to specify a IP range using regex.
4. Setting up advanced segments. For example following regex can segment all the traffic coming from social media sites:
twitter\.com|facebook\.com|linkedin\.com|plus\.google\.com|t\.co|bit\.ly|reddit\.com
Note: You can use Regex equipped advanced segments to unleash the power of the long tail keywordsand determine whether these keywords are worth chasing. You can also use regex to segment important data through advanced segments.
5. Rewriting URLs in Google Analytics reports.
You can rewrite URLs in Google Analytics reports with ‘search and replace’ advanced filter. This comes handy when your website has very long ugly dynamic URLs and you can’t figure out what the page is all about just by looking at its URL. So for example with ‘Search & Replace’ advanced filter you can ask GA to report the following URL:
https://www.abc.com/fder/?catg=2341&pid=428
as
https://www.abc.com/outdoor/fleeces
6. Filtering data within the GA report interface.
You can use following regular expressions to filter keywords on the Google Analytics reporting interface:
^[^\.\s\-]+([\.\s\-]+[^\.\s\-]+){0}$ =>Filter 1 word keyword phrase
^[^\.\s\-]+([\.\s\-]+[^\.\s\-]+){1}$ =>Filter 2 words keyword phrase
^[^\.\s\-]+([\.\s\-]+[^\.\s\-]+){2}$ =>Filter 3 words keyword phrase
^[^\.\s\-]+([\.\s\-]+[^\.\s\-]+){3}$ =>Filter 4 words keyword phrase
^[^\.\s\-]+([\.\s\-]+[^\.\s\-]+){4}$ =>Filter 5 words keyword phrase
^[^\.\s\-]+([\.\s\-]+[^\.\s\-]+){5}$ => Filter 6 words keyword phrase
^[^\.\s\-]+([\.\s\-]+[^\.\s\-]+){6}$ => Filter 7 words keyword phrase
^[^\.\s\-]+([\.\s\-]+[^\.\s\-]+){7}$ => Filter 8 words keyword phrase
^[^\.\s\-]+([\.\s\-]+[^\.\s\-]+){8}$ => Filter 9 words keyword phrase
^[^\.\s\-]+([\.\s\-]+[^\.\s\-]+){9}$ =>Filter 10 words keyword phrase
^([^ ]+ ){4,10}[^ ]+$ – Filter keywords that have between 4 to 10 spaces in them. This regex can help you in determining long tail keywords on your website.
^/([^/]+/){3}[^/]*$ – Filter landing pages that that have 4 slashes in their URL. This regex can help you in identifying low quality pages on your website.





0 comments:

Post a Comment