教材轉json，費人，放棄，先。

嗯，盧宇《生成式人工智能驱动的教育创新与实践》報告，你二道販子一點過時科普也就罷了，直接開始賣自己掛羊頭的所謂備課系統就著實過分了。一屋子人就看這個⋯⋯唉。你以為是AI，其實他在帶貨。這就是國內生態常態，也對。

報告沒啥價值，折騰教材數據，GPT有一個官方應用，將K12語文的txt給過去，看能否有一個更好的教材json文件。

在線生成的數據無法下載，於是折騰本地。

import json

import re

import nltk

nltk.download('punkt')

from nltk.tokenize import sent_tokenize

# 定義文本文件的路徑

text_file_path = '/Users/ylsuen/Desktop/txt/k12chinese.txt'

# 定義輸出 JSON 文件的路徑

json_file_path = '/Users/ylsuen/Desktop/txt/k12chinese.json'

# 初始化空字典來保存 JSON 結構

data = {

"title": "一年级语文上册",

"content": []

}

# 讀取文本文件

with open(text_file_path, 'r', encoding='utf-8') as file:

lines = file.readlines()

current_section = None

current_subsections = []

current_title = None

for line in lines:

line = line.strip()

# 跳過空行

if not line:

continue

# 識別新章節的開始（根據實際內容調整邏輯）

if re.match(r'^\d+|◎|第.+?单元', line) or len(line) < 10:

# 保存上一個章節

if current_title:

data['content'].append({

"title": current_title,

"subsections": current_subsections

})

current_subsections = []

# 更新當前章節標題

current_title = line

else:

# 使用 NLTK 將段落切分為句子

sentences = sent_tokenize(line)

current_subsections.extend(sentences)

# 不要忘記添加最後一個章節

if current_title:

data['content'].append({

"title": current_title,

"subsections": current_subsections

})

# 將字典寫入 JSON 文件

with open(json_file_path, 'w', encoding='utf-8') as json_file:

json.dump(data, json_file, ensure_ascii=False, indent=4)

print(f"Text file converted to JSON and saved as {json_file_path}")

結果是⋯⋯不可用。這個文件要精細化，貌似還是要費人，考慮txt在AI後台也能用，放棄，先。

Search This Blog

教材轉json，費人，放棄，先。

Comments

Post a Comment