CSV Hell: When Your Data Gets All Philosophical About Line Breaks

0
974
April 15, 2025
Kim Schulz
Code Misc

“I expected a clean table, but my CSV file wrote a novel in cell C3.”
Me, one dark evening

We’ve all been there. You crack open a CSV file expecting a tidy, tabular layout, and boom – cell contents are suddenly poets, breaking into multiple lines like they’re performing at open mic night.

Imagine this beauty:

Col1;col2;"col3 l1
Col3 l2
Col3 l3";col4;col5

Or even worse:

Col1;col2;"col3 l1"
 "Col3 l2"
 "Col3 l3";col4;col5

It’s like Excel (or whatever poisonous program you use) got drunk and started mixing Shakespeare into your sales data.

So what’s going on?

CSV files can handle newlines inside fields – if you wrap those fields in quotes! The problem? Not every tool handles that gracefully. Sometimes the quotes are inconsistent. Sometimes every line of the multi-line value has quotes. Sometimes, you wonder if JSON was really the better choice all along.

But fear not: I’ve got a Python script that’ll whip those multi-line rebels back into shape.

Python to the rescue

Here’s a little Python script that does the dirty work and has helped me in more than one situation in my daily work. The script is really simple and will:

Detects when a field spans multiple lines
Merges it all into one line
Makes sure there’s only one pair of quotes around the cleaned-up string
Leaves your CSV nice, parseable, and drama-free

import csv

def clean_multiline_csv(input_file, output_file):
    cleaned_rows = []
    current_row = ""
    inside_multiline = False

    with open(input_file, encoding='utf-8') as infile:
        for line in infile:
            line = line.rstrip('\n')
            if not inside_multiline:
                current_row = line
                if current_row.count('"') % 2 == 1:
                    inside_multiline = True
            else:
                current_row += "\\n" + line.strip()
                if current_row.count('"') % 2 == 0:
                    inside_multiline = False
                    parts = current_row.split(';')
                    for i, part in enumerate(parts):
                        if part.count('"') > 2:
                            parts[i] = '"' + part.replace('"', '').replace("\\n", " ") + '"'
                            break
                    current_row = ';'.join(parts)

            if not inside_multiline:
                cleaned_rows.append(current_row)

    with open(output_file, 'w', encoding='utf-8', newline='') as outfile:
        for row in cleaned_rows:
            outfile.write(row + '\n')

if __name__ == "__main__":
    clean_multiline_csv('input.csv', 'output.csv')

Stick that in a .py file, run it on your messy CSV, and rejoice as the chaos is compressed back into order.

Conclusion

Parsing CSV files should be simple—but when multi-line values sneak in, things can spiral fast. This Python script gives you a quick fix, so your columns stop getting all emotional mid-row.

Feel free to fork it, modify it, or just scream at it in cathartic frustration. Either way—happy data wrangling!

Its also available here: https://gist.github.com/kimusan/db7b74ec9f413afd990ed58c2394011b

Kim is a software engineer with a great interest for everything with gadgets, programming, photography and tech in general. He works for Samsung Research Center Denmark where he looks into innovations, Wireless tech, bluetooth standards and Android development. Spare time is spend with the family, hardware hacking, cooking , BBQ competitions, photography and travel.

or cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

CSV Hell: When Your Data Gets All Philosophical About Line Breaks

CSV Hell: When Your Data Gets All Philosophical About Line Breaks

So what’s going on?

Python to the rescue

Conclusion

Related

About Kim Schulz

Leave Comment

or cancel reply

Search

Recent Posts

Subscribe

Tags

Recent Comments

Find me on Github

Kim Schulzkimusan

Follow me

CSV Hell: When Your Data Gets All Philosophical About Line Breaks

CSV Hell: When Your Data Gets All Philosophical About Line Breaks

So what’s going on?

Python to the rescue

Conclusion

Related

Related Posts

About Kim Schulz

Leave Comment

or cancel reply

Search

Recent Posts

Subscribe

Tags

Recent Comments

Find me on Github

Kim Schulzkimusan

Follow me