Overwrite row in a text file

Fede · 22 October 2022 22:03

Good evening group. I am trying to manage a text file. For now I have managed to write in append and read one line at a time. Alright then. Now I was wondering, if I wanted to overwrite for example the second line?
Is there a particular function, or, do I have to implement a loop, figure out when I am in the second line and write / overwrite?

bgrommes · 22 October 2022 22:19

The latter.

I assume you’re talking about a standard variable-length record text file with and end-of-line sequence at the end of each row / record? Those aren’t designed to be updated in-place since the length of each row can change.

You have to either literally or effectively rewrite the entire file in order to update a single row.

If you load each row in-memory in an array of strings or similar structure, you can address the line you want to update by array element and just change the one line, but to reflect the change in the physical file you’d still have to overwrite the entire file. For probably up to a few hundred rows on modern hardware this won’t take more than a second, but it’s a lot of I/O just the same.

There are other issues with this if you have a present or likely future need to share the file between app instances or processes, so if this is something that needs to happen against thousands of rows or to be shared, you probably need to look for other ways to manage the data, most likely using some kind of database.

Back in the early 1980s I still occasionally used fixed width text files to store data and access them by record length and offset, but once databases became a thing, I never looked back. You quickly run into the need for indexing and sorting and locking that are far easier to manage through a database engine.

npalardy · 22 October 2022 22:20

Pretty much # 2 since text files arent indexed in any way
Plus, if you write less data than there was on that then you have issues (the old stuff you didnt overwrite still exists)
Or if you write more you have issues as you will overwrite the end of line characters

Text files are just a long sequences of bytes that you and I and certain programs like text editors happen to read as a sequence of “lines”

imagine it like its like this (where • is a end of line)

abcdef•ghij•klmno•pqrstuvxyz

if you overwrite “line 2” - ghij - with 1234567 you end up with

abcdef•1234567mno•pqrstuvxyz

because you dont “insert” a line you just write some bytes

you do need to do something different if you want to manage the file as “a bunch of lines”

hope that helps

thorstenstueker · 22 October 2022 23:36

bgrommes:

I assume you’re talking about a standard variable-length record text file with and end-of-line sequence at the end of each row / record? Those aren’t designed to be updated in-place since the length of each row can change.

You have to either literally or effectively rewrite the entire file in order to update a single row.

If you load each row in-memory in an array of strings or similar structure, you can address the line you want to update by array element and just change the one line, but to reflect the change in the physical file you’d still have to overwrite the entire file. For probably up to a few hundred rows on modern hardware this won’t take more than a second, but it’s a lot of I/O just the same.

There are other issues with this if you have a present or likely future need to share the file between app instances or processes, so if this is something that needs to happen against thousands of rows or to be shared, you probably need to look for other ways to manage the data, most likely using some kind of database.

Back in the early 1980s I still occasionally used fixed width text files to store data and access them by record length and offset, but once databases became a thing, I never looked back. You quickly run into the need for indexing and sorting and locking that are far easier to manage through a database engine.

Mostly I am going that way:

defining a String Array and splitting my lines (with the chr(10)) into the array and now I have many possibilities. I can exchange that line for another one. I can remove the line and so on.

After all I concatenate the String in a loop where I take care not to concatenate the ones I wanted to delete. That’s the way I go.

Bad_Wolf · 23 October 2022 06:05

For preventing data loss, first, write the altered file with another name. Then when successful, delete the old file and rename the new one to the old one’s name. In that way, you are sure that when an error occurs, you do not loose the original file.

Emile · 23 October 2022 08:09

Or use a SQLite database…

thorstenstueker · 23 October 2022 08:13

That looks a bit over engineered. Only a bit. I mean: what shall come out. You need for storing the single strings to the SQLITE DB to split. How ever you do it but you need to separate. And then you need to…it is ten times more then using simply the way I wrote about. What I am winning with SQLITE?

bgrommes · 23 October 2022 14:47

Concatenation is expensive. I would write the array to the file line-by-line. Let the framework add the line ending characters. Anything that reduces string allocations (particularly ephemeral allocations) is a good idea, all things being equal. Otherwise you have in memory the original array, plus a giant concatenated string of all the file contents in memory, when you could simply flow the one copy to the file line-by-line. One can argue that writing a long string all at once is faster from an I/O perspective and at times it might be – I don’t pretend to know when addressing I/O bound vs allocation bound approaches would actually (1) be faster and (2) matter – but I personally prefer to keep as few copies of any set of data as possible as a general architectural principle.

This is also why I wish Xojo had a firehose cursor abstraction for read-only database operations; unless I’m missing something, everything is designed to go into a RowSet (mercifully that at least is somewhat lightweight) but often that is ephemeral; you copy it to a ListBox control or maybe a Dictionary and then discard it, or iterate the RowSet directly and discard it when it would be faster generally to just process each record as it comes in, without the need to retain all records in-memory. Aside from the memory and allocation costs, you have two iterations rather than one.

thorstenstueker · 23 October 2022 15:35

Good Idea. But if the String I less big then 100 MB I would not even think for it. Concatenation is expensive…the framework more. Do a few benchmarks and you may find out that your Idea is not the best one. It is more expensive

thorstenstueker · 23 October 2022 15:37

I could also say: to give that advise to a simple beginner is overkill and not nice at all. You can do it also different. Write the String into a memory Area, get the pointers of the not wanted string, pull the area together. So nothing left. Ha. No. For a beginner which asks answering like you shows me: you have no Idea what price it costs or not but you want to tell something important. Be nice. Be polite. It is a beginner.

bgrommes · 23 October 2022 16:08

It’s not nice to suggest other possibilities? Wow. Just wow. Ok well I’ll take that on board next time I consider whether to spend time putting together a response to help someone then.

Torsten_B · 23 October 2022 22:18

@bgrommes suggests a way to solve the problem the OP described. There are certainly other ways that could be explored, too - as always in coding. The OP has their own brains to decide what will work. @bgrommes was kind enough spending time on writing a response with a suggestion how that problem could be solved.

thorstenstueker · 23 October 2022 22:51

defining a String Array and splitting my lines (with the chr(10)) into the array and now I have many possibilities. I can exchange that line for another one. I can remove the line and so on.

After all I concatenate the String in a loop where I take care not to concatenate the ones I wanted to delete. That’s the way I go.

and I was not?

npalardy · 23 October 2022 23:25

as always there are many ways to skin a cat

sometimes the criteria to decide which way to use isn’t present in the original question

if the file is 5 lines I wouldn’t get hung up about read it all in split it manipulate it in an array and spit it back out

5000 lines I might start to consider alternatives

5000000 lines yeah I would

but that detail isn’t in the original post

bgrommes · 24 October 2022 00:29

Thanks. In general I consider that there tends to be many lurkers for every actual poster, and when a general principle can be put forth, that is what will help the most people over time. In my experience string concatenation / allocation can be a real performance sink as an app grows (doubtless I’m influenced by the fact that I’ve worked on a lot of text processing-heavy apps in my time) and it’s easy to keep this in mind when forming habits of coding and design without making code more complicated, even in the short run.

I also try to give advice I’d have appreciated getting in the same circumstance. I still remember what it was like to learn the craft. You don’t always see the tradeoffs. That comes with experience.

You are correct that the points I raised are only going to get noticed on hot paths, but as code evolves, you can never be sure what will end up on a hot path by being called thousands of times iterating through stuff. If you can avoid multiple copies of the same data all over the heap today, it will be less to mop up tomorrow.

In any case the OP is, as always, free to do whatever they think best. If they’re manipulating some self-limiting small file like a config file, then likely no one will ever notice the implementation was brute-forced and I’m not here to dictate anyone’s priorities or esthetic sense of their code or what feels most intuitive to them. I can only share my experiences and practices for what they are worth – generally, $0.02 plus inflation.