Learning Regex

I am trying to learn more about the basics of regex because creating a good markdown parser kind of requires it. Here are some notes on the subject.

DOWNLOAD MARKDOWN

2 541

Learning Regex

Why Try to Learn Regex Better

One of my main uses of ChatGPT is generating regex expressions.
This typically goes pretty well, though sometimes the generator misses on edge cases which are hard for me to think about when I am focused on just getting the Regex working
I think it would be a lot faster for me tro just learn Regex and be able to quickly write simple expressions myself.
- Most of the time, the regex expressions that I need to write are pretty simple, so I don't think I need to be a master at it.
Parsing markdown (for converting from markdown to HTML) relies heavly on Regex pattern matching, so I wanted to learn this before completing the markdown to HTML functionality that I am working on with markedjs

How I am Attempting to Learn Regex Better

I am using this youtube video to learn regex better at first.
I created an account on regexr.com which I might use to save regex expressions, or I might just create a Table for it in the database
This is the MDN resource that talks about Regex

Notes

Regular expression goes /(PATTERN)/(FLAGS)
A string is a regex, just an exact search regex, except in the case of the special characters seen below

  const a = /html/ // Regex searches for exact matches on html

Special (Meta) Characters
- .: Matches on a single instance of any characters
- *: Look for zero or more instances of whatever the preceding regular expression is

  const a = /.*/ // Regex  matches every instance of all characters

?: For any pattern that is followed by a question mark, make the pattern option

  const a = /&lt;\/?html&gt;/ // Regex matches opening and closing &lt;html&gt; tags

= |: Means "OR" -> match the pattern before the pipe or after the pipe, where | = "pipe"

  const a = /&lt;(h1|h2)&gt;/ // Regex matches opening h1 or h2 tags

+: Look for one or more
(): Called a capture group -> Allow us to create a sub regex and store whatever is captured in that sub regex in memory so that we can reference it later
- Note that the ?: in the below example prevents the capture group from being stored in memory
- The ?<innerText> in the below example names the capture group $innerText, whereas the capture groups are typically given names (references) 1, 2, and so on...

  const a = /&lt;(?:h1|p)&gt;(?&lt;innerText&gt;.*?)&lt;\/(?:h1|p)&gt;/ // Regex matches everything inside &lt;h1&gt; and &lt;p&gt; tags

[]: Called Character Classes: anything that falls within the bracket the regex will attempt to match on
- [^PATTERN]: Negated Character Class: Matches anything except whatever is the PATTERN

  const a = /[\d\w]/ // Matches on any letter or digit
  const b = /[\d\w]+/ // Matches on any number of characters or digits
  const c = /[0-5]+/ // Matches on any number of instances of digits from 0 to 5
  const d = /[a-eX-Z]+/ // Matches on any number of instances of characters a-e or X, Y, and Z 
  const e = /&gt;[^&lt;]+&lt;/ // Matches on everything between the right arrow and the left arrow
  const f = /(\A.*?)&lt;a/ // matches all HTML up the first anchor tag

-: denotes range. See examples above of matching on a range of digits . a range of characters
\: You can use \index to reference previous matches of subgroups with Back References
- In the example below, the \1 is a reference to the subgroup (\w{3}-)

  const a = /(\w{3}-).*?\1/ // Look for three letters followed by a hyphen followed by three letters followed by a hyphen with any characters in between the two matches of the pattern - this first match will be the only match of the entire pattern.

Shorthands
- \w: matches on all letters
- \d: matches on all digits
- \s: matches on all spaces
Anchors
- ^: Denotes the start of a line
- $: Denotes the end of a line
- \A: Denotes the start of the entire text/document
  - ^ will match the start of every line in the document/text
- \Z: Denotes the end of the entire text/document
Repetition / Range
/\d{2}/ will match 2 consecutive digits
/a{2}/ will match two consecutive a's
/\d{3,5}/ will match numbers that are between 3 and 5 characters in length
Will match 123 and 12345, but not 12 or 1 or 123456

Flags

i: case insensitive

const a = /[a-z]/gi; // Match all characters a-z (case insensitive)

g global

Search And Replace

  const a = /(\w+)\s+(\w+)\s+(\w+)/ // Get the first three words
  /* $3 $2 $1 -&gt; rearrange the first three words using capture groups */

Positive Look Ahead - Find a regular expression that is immediately followed by another expression but only include the initial regular expression in the matching. The positive look ahead is denoted ?= in the Regex expression below

  const a = /\w(?=\s)/ // Look for characters that are immediately followed by a space

Negative Look Around - denoted by the ?!

  const a = /&gt;(?!\s)/ // Look for a "&gt;" character that is not immediately followed by a space

  const a = /^(?!.*(?:html|body)).*?$/ // Matches all lines that do not contain the word HTML or body

Email Example

const a = /^.*?@.*?\.\w+/ // (starts with and contains any characters)@(any letters).any characters

Javascript
javascript javascript "string".match() returns an array of matches
javascript javascript /\w+/.test("string") returns a boolean whether the string contains any matches of the regular expression

var pattern = new RegExp(/\w+/,"gi");
var user = "Tim";
var pattern2  new RegExp("name: ".concat(user),"gi");

Python

import re
pattern = re.compile('[a-z]+')
pattern.findall("This is my test sentence") # -&gt; ["his", "is", "my", "test", "sentence"]
pattern = re.compile('[a-zA-Z]+')
pattern.findall("This is my test sentence") # -&gt; ["This", "is", "my", "test", "sentence"]

User Comments

There are currently no comments for this article.