File Parsing

Opening files:

open(path, mode, encoding=...) returns a file object
Always use a with block — it closes the file automatically, even on error
Modes: r read, w write (truncate), a append, x create-only, b binary (e.g. rb), + read/write

with open("data.txt", "r", encoding="utf-8") as f:
    content = f.read()

Reading text:

read() loads the whole file as one string; readlines() gives a list of lines (keeping \n)
Prefer iterating the file object for large files — it streams line by line instead of loading everything

with open("data.txt") as f:
    whole = f.read()            # entire file as one string
    # or
    lines = f.readlines()       # list of lines, each ending in "\n"
 
with open("data.txt") as f:     # memory-friendly: stream line by line
    for line in f:
        process(line.rstrip("\n"))

Writing text:

with open("out.txt", "w", encoding="utf-8") as f:
    f.write("one line\n")
    f.writelines(["a\n", "b\n"])
    print("via print", file=f)

Parsing common formats:

CSV → stdlib csv module (handles quoting / embedded commas correctly)

import csv
with open("data.csv", newline="") as f:
    for row in csv.reader(f):       # row is a list of strings
        ...
    # with a header row:
    for row in csv.DictReader(f):   # row is a dict keyed by column name
        ...

JSON → json module

import json
with open("data.json") as f:
    data = json.load(f)         # file -> Python object
with open("out.json", "w") as f:
    json.dump(data, f, indent=2)
 
data = json.loads(text)         # string -> object ; json.dumps(obj) for the reverse

Whitespace / columns → str.split() and str.strip()

with open("nums.txt") as f:
    rows = [list(map(int, line.split())) for line in f]

Paths (pathlib):

from pathlib import Path
p = Path("data") / "file.txt"
text = p.read_text(encoding="utf-8")   # one-shot read
p.write_text("hello")                  # one-shot write
p.exists(); p.suffix; p.stem
list(p.parent.glob("*.csv"))           # find files by pattern

Gotchas:

Always pass encoding="utf-8" for text — the default is platform-dependent
Use newline="" when opening files for the csv module (avoids blank rows on Windows)
read() / readlines() load the whole file into memory — iterate for large files
Binary mode (rb) returns bytes, not str
A file object is exhausted after one full read — re-open() or f.seek(0) to read again

Python · PyTest

StrixTheKiet Notes

Explorer

File Parsing

Opening files:

Reading text:

Writing text:

Parsing common formats:

Paths (pathlib):

Gotchas:

Graph View

Table of Contents

Backlinks

StrixTheKiet Notes

Explorer

File Parsing

Opening files:

Reading text:

Writing text:

Parsing common formats:

Paths (pathlib):

Gotchas:

Related:

Graph View

Table of Contents

Backlinks