首页 > 解决方案 > 将 Python 脚本转换为 Sed 脚本

问题描述

鹈鹕搬到雨果。我需要重写我的 Markdown 文件的某些部分。

文章标题,由此而来:

Title: Threads                                                                                                          
Date: 2017-03-08 04:30:25                                                                                               
Modified: 2017-03-08 03:40:17                                                                                           
Category: Unix                                                                                                          
Tags: c,                                                                                                                
Slug: an-overwiew-on-threads                                                                                            
Authors: Nsukami                                                                                                        
Summary: A long, thin strand of cotton, nylon, or other fibres used in sewing or weaving.                               
Lang: en

对此:

---                                                                                                                     
title: Threads                                                                                                          
date: 2017-03-08 04:30:25                                                                                               
lastmod: 2017-03-08 03:40:17                                                                                            
categories:  ['Unix']                                                                                                   
tags:  ['c',]                                                                                                           
slug: an-overwiew-on-threads                                                                                            
summary: A long, thin strand of cotton, nylon, or other fibres used in sewing or weaving.                               
---

以及内部链接,来自:

[processes]({filename}/on-processes.md)
[Threads]({filename}/images/threads-example.gif)

对此:

[processes]({{< ref on-processes >}})
[Threads](/images/threads-example.gif)

我已经有一个转换标题的 Awk脚本。我也有这个 Python 脚本,我想用 Sed 或 Awk 重写它(主要用于学习目的):

import os
import re
import sys
import glob
import shutil
from tempfile import mkstemp

def replace(file_path):
    #Create temp file
    fh, abs_path = mkstemp()
    with open(fh,'w') as new_file:
        with open(file_path, 'r') as old_file:
            for line in old_file:
                # search for something like 
                # [this: processes]({filename}/on-processes.md)
                # or this: [Threads]({filename}/images/threads-example.gif)
                match = re.search( r'(?:!| )?((\[.*\])\((\{filename\})/(.*)\.(.*)\)(?:.|,| )?)', line, re.M|re.I)
                if match:
                    old_link = match.group(1)
                    if match.group(5).startswith("md"):
                        # if md file, replace by 
                        # [processes]({{< ref on-processes >}})
                        new_link = match.group(2)+'({{< ref '+match.group(4)+' >}})'
                    else:
                        # if image, replace by 
                        # [Threads](/images/threads-example.gif)
                        new_link = match.group(2)+'(/'+match.group(4)+'.'+match.group(5)+')'
                    line = line.replace(old_link, new_link)
                new_file.write(line)

        #Remove original file
        os.remove(file_path)
        #Move new file to old file
        shutil.move(abs_path, file_path)

if __name__ == '__main__':
    # for all md files in content folder
    for i, article in enumerate(glob.iglob("./content/post/*.*", recursive=True)):
        # replace all internals links
        replace(article)

我尝试使用 Sed,只是为了捕获并打印所有应该转换的内部链接:

~/nskm
>> sed -n /^.*\[[[:alnum:]]\]\({filename}.*\)/p on-threads.md
I've written something about [processes]({filename}/on-processes.md). Let's write some notes about threads. This is not really an introduction
to threads. It's more like a little bit of introspection, so we can have an interesting perspective of what are threads.<br><br>              
Processes each have their own address space. Threads exist as subsets of a process. Threads are just multiple workers in the same [virtual address space]({filename}/on-processes.md#vas), all threads in a process share the same memory. They can also share open files and other resources. Within that VAS, each thread has its own ID, its own stack, its own program counter, its own independent flow of control, its own registers set. A thread is just a **context of execution**.<br>
![Threads]({filename}/images/threads-example.gif)
[Processes]({filename}/on-processes.md) are created with the [fork()](http://man7.org/linux/man-pages/man2/fork.2.html) system call. However, there is a separate system call, named [clone()](http://man7.org/linux/man-pages/man2/clone.2.html) which is used for creating threads. It works like fork(), but it accepts a number of flags for adjusting its behavior so the child can share some parts of the parent's execution context.
Our c script made a call to clone(), twice. And looking at _some_ of the flags that have been passed, we can see that:                        
![insert breakpoints]({filename}/images/insert-breakpoints.png)
![info proc mappings]({filename}/images/info-proc-map1.png)
![info proc mappings]({filename}/images/info-proc-map2.png)
![thread's stack]({filename}/images/stacks.png)

而且我还尝试捕获所有要转换的内部链接,这次是使用 Awk:

~/nskm
>> awk '/.*(filename}.*\..*)/' on-threads.md  # regex not precise enough
I've written something about [processes]({filename}/on-processes.md). Let's write some notes about threads. This is not really an introduction
to threads. It's more like a little bit of introspection, so we can have an interesting perspective of what are threads.<br><br>              
Processes each have their own address space. Threads exist as subsets of a process. Threads are just multiple workers in the same [virtual address space]({filename}/on-processes.md#vas), all threads in a process share the same memory. They can also share open files and other resources. Within that VAS, each thread has its own ID, its own stack, its own program counter, its own independent flow of control, its own registers set. A thread is just a **context of execution**.<br>
![Threads]({filename}/images/threads-example.gif)
[Processes]({filename}/on-processes.md) are created with the [fork()](http://man7.org/linux/man-pages/man2/fork.2.html) system call. However, there is a separate system call, named [clone()](http://man7.org/linux/man-pages/man2/clone.2.html) which is used for creating threads. It works like fork(), but it accepts a number of flags for adjusting its behavior so the child can share some parts of the parent's execution context.
Our c script made a call to clone(), twice. And looking at _some_ of the flags that have been passed, we can see that:                        
![insert breakpoints]({filename}/images/insert-breakpoints.png)
![info proc mappings]({filename}/images/info-proc-map1.png)
![info proc mappings]({filename}/images/info-proc-map2.png)
![thread's stack]({filename}/images/stacks.png)

我被困在如何告诉 sed/awk 不仅要打印,而且这次要替换那些内部链接。如果可能的话,我也想让我的 sed 正则表达式和我的 awk 正则表达式更精确。感谢您提供有关如何从这一点继续的意见和建议。

标签: pythonawksed

解决方案


推荐阅读