君有此病何年瘳

默寫，高考有八分，如果手寫是一種噁心，這個，無疑就是罪惡了。畢竟，從馴化的意義上，沒有比默寫更高效的玩意兒了。

所以，趁五一假日，勞之動之，整之合之。搭建完高考必背古诗文網站：

1、不做任何注释，仅解决高考默写需背诵的全部文字。

2、新课标与课本要求背诵的文字，取合集。

3、课标规定具体范围课本无要求者，取合集；同时将课标规定者标蓝。

4、课标未规定具体范围课本标注者，取合集；同时将课本规定者标红。

5、文字以现行部编本教材为准，课标有但现行部编本无的文字，以此前人教版教材为准。文本直接复制自现行官方教材，由GPT-4删除注释符号。

6、因默写为手写，故网页使用简体书法字体。

7、网站使用Googlesites，网址：高考古诗文 2 ，（需翻墙）
自定义域名：https://mf.bdfzer.com 2 （无需翻墙）

1.0版本今日發布，此后逐步更新完善，对已考内容将逐步做标记，再附历年默写真题。

各個網頁是一個個做的體力活，做完後順手爬蟲下自己，以便轉場。小爬蟲代碼公開：

import requests

from bs4 import BeautifulSoup

from urllib.parse import urljoin, urlparse

import os

import time

import html2text

def get_all_website_links(url, domain_name, visited, max_depth=3):

if max_depth <= 0 or url in visited:

return set()

visited.add(url)

urls = set()

try:

response = requests.get(url, headers={'User-Agent': 'Mozilla/5.0'}, timeout=10)

response.raise_for_status()

# time.sleep(1) # 可以调整或删除这行，根据目标网站的承载能力

except requests.RequestException as e:

print(f"请求错误 {url}: {e}")

return urls

soup = BeautifulSoup(response.text, 'html.parser')

for a_tag in soup.findAll("a"):

href = a_tag.attrs.get("href")

if href == "" or href is None:

continue

href = urljoin(url, href)

parsed_href = urlparse(href)

href = parsed_href.scheme + "://" + parsed_href.netloc + parsed_href.path

if href not in visited and domain_name in href:

urls.add(href)

urls.update(get_all_website_links(href, domain_name, visited, max_depth-1))

return urls

def scrape_website(url):

domain_name = urlparse(url).netloc

visited = set()

visited_urls = get_all_website_links(url, domain_name, visited)

converter = html2text.HTML2Text()

converter.ignore_links = False

all_content = ""

folder_path = 'mf'

if not os.path.exists(folder_path):

os.makedirs(folder_path)

counter = 0

for link in visited_urls:

print(f"正在处理：{link}")

counter += 1

if counter % 10 == 0:

print(f"已处理 {counter} 个链接...")

try:

response = requests.get(link, headers={'User-Agent': 'Mozilla/5.0'}, timeout=10)

response.raise_for_status()

soup = BeautifulSoup(response.text, 'html.parser')

title = soup.find('title').text.replace('/', '|')

markdown_content = converter.handle(response.text)

filename = f"{folder_path}/{title}.md"

with open(filename, 'w', encoding='utf-8') as file:

file.write(markdown_content)

all_content += f"# {title}\n\n{markdown_content}\n\n"

except requests.RequestException as e:

print(f"处理链接时出错 {link}: {e}")

with open(f'{folder_path}/all_content.md', 'w', encoding='utf-8') as file:

file.write(all_content)

start_url = "https://mf.bdfzer.com/"

scrape_website(start_url)

Search This Blog

君有此病何年瘳

Comments

Post a Comment