removing the last repeated lines in the text files Unicorn Meta Zoo #1: Why another podcast? ...
Married in secret, can marital status in passport be changed at a later date?
Is it acceptable to use working hours to read general interest books?
How to keep bees out of canned beverages?
What's the difference between using dependency injection with a container and using a service locator?
How long after the last departure shall the airport stay open for an emergency return?
What is the best way to deal with NPC-NPC combat?
What is this word supposed to be?
What is the least dense liquid under normal conditions?
Why is an operator the quantum mechanical analogue of an observable?
Did the Roman Empire have penal colonies?
Multiple fireplaces in an apartment building?
Is this homebrew racial feat, Stonehide, balanced?
Are these square matrices always diagonalisable?
How can I wire a 9-position switch so that each position turns on one more LED than the one before?
What to do with someone that cheated their way through university and a PhD program?
What was Apollo 13's "Little Jolt" after MECO?
Is accepting an invalid credit card number a security issue?
Align column where each cell has two decimals with siunitx
Implementing 3DES algorithm in Java: is my code secure?
Why did Israel vote against lifting the American embargo on Cuba?
Visa-free travel to the US using refugee travel document from Spain?
Do I need to protect SFP ports and optics from dust/contaminants? If so, how?
Does Mathematica have an implementation of the Poisson Binomial Distribution?
How to open locks without disable device?
removing the last repeated lines in the text files
Unicorn Meta Zoo #1: Why another podcast?
Announcing the arrival of Valued Associate #679: Cesar Manararemoving exponential notation in text dataReplacing text in multiple files with text from a list in orderHow can I get the last occurrence of a repeated string in the text file?How to capture lines between two strings from a file, but only the last occurrence?Finding the lines with the lowest value in their third column given grep resultsmerging particular text files in different directoriescopy and paste for columns between two text filesDuplicate rows in text filecreating text file with regards to file namesremoving specific lines in text with respect to floating point number
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ margin-bottom:0;
}
I have a text file as follows;
ALIC00AUS_R_20183350000.gz -4052052.7 4212835.9 -2545104.6
ALIC00AUS_R_20183350000.gz -4052052.7 4212836.0 -2545104.6
ALIC00AUS_R_20183350000.gz -4052052.7 4212836.0 -2545104.6
ALIC00AUS_R_20183350000.gz -4052052.7 4212835.9 -2545104.6
ALIC00AUS_R_20183350000.gz -4052052.5 4212836.0 -2545104.6
ALIC00AUS_R_20183350000.gz -4052052.6 4212835.9 -2545104.6
CPVG00CPV_R_20183460000.gz 5626883.4 -2380932.3 1824483.9
CPVG00CPV_R_20183460000.gz 5626883.4 -2380932.3 1824483.9
CPVG00CPV_R_20183460000.gz 5626883.3 -2380932.2 1824483.1
In this file, ALIC00AUS_R_20183350000.gz
and CPVG00CPV_R_20183460000.gz
repeat six and three times, respectively. I need to remove the last repeated lines of each string in column 1 and the output should be as follows;
ALIC00AUS_R_20183350000.gz -4052052.7 4212835.9 -2545104.6
ALIC00AUS_R_20183350000.gz -4052052.7 4212836.0 -2545104.6
ALIC00AUS_R_20183350000.gz -4052052.7 4212836.0 -2545104.6
ALIC00AUS_R_20183350000.gz -4052052.7 4212835.9 -2545104.6
ALIC00AUS_R_20183350000.gz -4052052.5 4212836.0 -2545104.6
CPVG00CPV_R_20183460000.gz 5626883.4 -2380932.3 1824483.9
CPVG00CPV_R_20183460000.gz 5626883.4 -2380932.3 1824483.9
text-processing
add a comment |
I have a text file as follows;
ALIC00AUS_R_20183350000.gz -4052052.7 4212835.9 -2545104.6
ALIC00AUS_R_20183350000.gz -4052052.7 4212836.0 -2545104.6
ALIC00AUS_R_20183350000.gz -4052052.7 4212836.0 -2545104.6
ALIC00AUS_R_20183350000.gz -4052052.7 4212835.9 -2545104.6
ALIC00AUS_R_20183350000.gz -4052052.5 4212836.0 -2545104.6
ALIC00AUS_R_20183350000.gz -4052052.6 4212835.9 -2545104.6
CPVG00CPV_R_20183460000.gz 5626883.4 -2380932.3 1824483.9
CPVG00CPV_R_20183460000.gz 5626883.4 -2380932.3 1824483.9
CPVG00CPV_R_20183460000.gz 5626883.3 -2380932.2 1824483.1
In this file, ALIC00AUS_R_20183350000.gz
and CPVG00CPV_R_20183460000.gz
repeat six and three times, respectively. I need to remove the last repeated lines of each string in column 1 and the output should be as follows;
ALIC00AUS_R_20183350000.gz -4052052.7 4212835.9 -2545104.6
ALIC00AUS_R_20183350000.gz -4052052.7 4212836.0 -2545104.6
ALIC00AUS_R_20183350000.gz -4052052.7 4212836.0 -2545104.6
ALIC00AUS_R_20183350000.gz -4052052.7 4212835.9 -2545104.6
ALIC00AUS_R_20183350000.gz -4052052.5 4212836.0 -2545104.6
CPVG00CPV_R_20183460000.gz 5626883.4 -2380932.3 1824483.9
CPVG00CPV_R_20183460000.gz 5626883.4 -2380932.3 1824483.9
text-processing
Your question has some issues: The line "ALIC00AUS_R_20183350000.gz" repeats only 4 times in your example, then switches to "ALIC00AUS_R_20183370000.gz". Within the section that is "LIC00AUS_R_20183350000.gz" there are two identical sets of strings. Are you looking for only unique strings?
– Charles Green
11 hours ago
Yes, I'm looking for only unique strings. I edited the question.
– deepblue_86
11 hours ago
Thanks! It seems wither of the answers mentioned by @RaidPinata below, will work for you. If the file is large, you may want to look more deeply into the lines referenced from stack exchange, and use thesort
command rather thanawk
– Charles Green
10 hours ago
add a comment |
I have a text file as follows;
ALIC00AUS_R_20183350000.gz -4052052.7 4212835.9 -2545104.6
ALIC00AUS_R_20183350000.gz -4052052.7 4212836.0 -2545104.6
ALIC00AUS_R_20183350000.gz -4052052.7 4212836.0 -2545104.6
ALIC00AUS_R_20183350000.gz -4052052.7 4212835.9 -2545104.6
ALIC00AUS_R_20183350000.gz -4052052.5 4212836.0 -2545104.6
ALIC00AUS_R_20183350000.gz -4052052.6 4212835.9 -2545104.6
CPVG00CPV_R_20183460000.gz 5626883.4 -2380932.3 1824483.9
CPVG00CPV_R_20183460000.gz 5626883.4 -2380932.3 1824483.9
CPVG00CPV_R_20183460000.gz 5626883.3 -2380932.2 1824483.1
In this file, ALIC00AUS_R_20183350000.gz
and CPVG00CPV_R_20183460000.gz
repeat six and three times, respectively. I need to remove the last repeated lines of each string in column 1 and the output should be as follows;
ALIC00AUS_R_20183350000.gz -4052052.7 4212835.9 -2545104.6
ALIC00AUS_R_20183350000.gz -4052052.7 4212836.0 -2545104.6
ALIC00AUS_R_20183350000.gz -4052052.7 4212836.0 -2545104.6
ALIC00AUS_R_20183350000.gz -4052052.7 4212835.9 -2545104.6
ALIC00AUS_R_20183350000.gz -4052052.5 4212836.0 -2545104.6
CPVG00CPV_R_20183460000.gz 5626883.4 -2380932.3 1824483.9
CPVG00CPV_R_20183460000.gz 5626883.4 -2380932.3 1824483.9
text-processing
I have a text file as follows;
ALIC00AUS_R_20183350000.gz -4052052.7 4212835.9 -2545104.6
ALIC00AUS_R_20183350000.gz -4052052.7 4212836.0 -2545104.6
ALIC00AUS_R_20183350000.gz -4052052.7 4212836.0 -2545104.6
ALIC00AUS_R_20183350000.gz -4052052.7 4212835.9 -2545104.6
ALIC00AUS_R_20183350000.gz -4052052.5 4212836.0 -2545104.6
ALIC00AUS_R_20183350000.gz -4052052.6 4212835.9 -2545104.6
CPVG00CPV_R_20183460000.gz 5626883.4 -2380932.3 1824483.9
CPVG00CPV_R_20183460000.gz 5626883.4 -2380932.3 1824483.9
CPVG00CPV_R_20183460000.gz 5626883.3 -2380932.2 1824483.1
In this file, ALIC00AUS_R_20183350000.gz
and CPVG00CPV_R_20183460000.gz
repeat six and three times, respectively. I need to remove the last repeated lines of each string in column 1 and the output should be as follows;
ALIC00AUS_R_20183350000.gz -4052052.7 4212835.9 -2545104.6
ALIC00AUS_R_20183350000.gz -4052052.7 4212836.0 -2545104.6
ALIC00AUS_R_20183350000.gz -4052052.7 4212836.0 -2545104.6
ALIC00AUS_R_20183350000.gz -4052052.7 4212835.9 -2545104.6
ALIC00AUS_R_20183350000.gz -4052052.5 4212836.0 -2545104.6
CPVG00CPV_R_20183460000.gz 5626883.4 -2380932.3 1824483.9
CPVG00CPV_R_20183460000.gz 5626883.4 -2380932.3 1824483.9
text-processing
text-processing
edited 10 hours ago
deepblue_86
asked 11 hours ago
deepblue_86deepblue_86
59211026
59211026
Your question has some issues: The line "ALIC00AUS_R_20183350000.gz" repeats only 4 times in your example, then switches to "ALIC00AUS_R_20183370000.gz". Within the section that is "LIC00AUS_R_20183350000.gz" there are two identical sets of strings. Are you looking for only unique strings?
– Charles Green
11 hours ago
Yes, I'm looking for only unique strings. I edited the question.
– deepblue_86
11 hours ago
Thanks! It seems wither of the answers mentioned by @RaidPinata below, will work for you. If the file is large, you may want to look more deeply into the lines referenced from stack exchange, and use thesort
command rather thanawk
– Charles Green
10 hours ago
add a comment |
Your question has some issues: The line "ALIC00AUS_R_20183350000.gz" repeats only 4 times in your example, then switches to "ALIC00AUS_R_20183370000.gz". Within the section that is "LIC00AUS_R_20183350000.gz" there are two identical sets of strings. Are you looking for only unique strings?
– Charles Green
11 hours ago
Yes, I'm looking for only unique strings. I edited the question.
– deepblue_86
11 hours ago
Thanks! It seems wither of the answers mentioned by @RaidPinata below, will work for you. If the file is large, you may want to look more deeply into the lines referenced from stack exchange, and use thesort
command rather thanawk
– Charles Green
10 hours ago
Your question has some issues: The line "ALIC00AUS_R_20183350000.gz" repeats only 4 times in your example, then switches to "ALIC00AUS_R_20183370000.gz". Within the section that is "LIC00AUS_R_20183350000.gz" there are two identical sets of strings. Are you looking for only unique strings?
– Charles Green
11 hours ago
Your question has some issues: The line "ALIC00AUS_R_20183350000.gz" repeats only 4 times in your example, then switches to "ALIC00AUS_R_20183370000.gz". Within the section that is "LIC00AUS_R_20183350000.gz" there are two identical sets of strings. Are you looking for only unique strings?
– Charles Green
11 hours ago
Yes, I'm looking for only unique strings. I edited the question.
– deepblue_86
11 hours ago
Yes, I'm looking for only unique strings. I edited the question.
– deepblue_86
11 hours ago
Thanks! It seems wither of the answers mentioned by @RaidPinata below, will work for you. If the file is large, you may want to look more deeply into the lines referenced from stack exchange, and use the
sort
command rather than awk
– Charles Green
10 hours ago
Thanks! It seems wither of the answers mentioned by @RaidPinata below, will work for you. If the file is large, you may want to look more deeply into the lines referenced from stack exchange, and use the
sort
command rather than awk
– Charles Green
10 hours ago
add a comment |
3 Answers
3
active
oldest
votes
Awk is a go to program for doing this kind of replacemen.
To remove lines that have a duplicate in the first column this should do it.
awk '!seen[$1]++' filename > outputfile
If you need to remove full duplication of lines use this instead.
awk '!seen[$0]++' filename > outputfile
As seen in this answer: https://unix.stackexchange.com/questions/171091/remove-lines-based-on-duplicates-within-one-column-without-sort
Here is a brief explanation. awk is used for pattern scanning and text processing. First, it checks if the value in column 1 ($1) is in the map seen. If it isn't it prints the line to the output file, or screen if you don't redirect (> outputfile). The next part is that it adds the viewed column info to seen (++) so it can search with the next pattern.
@RaidPinata, this doesn't solve the problem. The above code removes all the repeated lines except for the first one.
– deepblue_86
11 hours ago
1
Oops, wrong link! fixed
– RaidPinata
11 hours ago
@deepblue_86 when I ran this with a test file it removed the duplicates as requested. Do you need it to just remove full duplicate lines, or ones that are a duplicate with the first column only.
– RaidPinata
11 hours ago
That was part of my comment on his question also: The OP seems to be asking to remove only one of the sets of lines with the first field identical.
– Charles Green
10 hours ago
@RaidPinata, I edited the question, could you check it now?
– deepblue_86
10 hours ago
|
show 7 more comments
If you are sure that each $1
(first column) is duplicated at least once, then you can:
- reverse the order of lines
- only select those lines whose
$1
has been seen before - reverse the result
Ex.
$ tac file | awk 'seen[$1]++' | tac
ALIC00AUS_R_20183350000.gz -4052052.7 4212835.9 -2545104.6
ALIC00AUS_R_20183350000.gz -4052052.7 4212836.0 -2545104.6
ALIC00AUS_R_20183350000.gz -4052052.7 4212836.0 -2545104.6
ALIC00AUS_R_20183350000.gz -4052052.7 4212835.9 -2545104.6
ALIC00AUS_R_20183350000.gz -4052052.5 4212836.0 -2545104.6
CPVG00CPV_R_20183460000.gz 5626883.4 -2380932.3 1824483.9
CPVG00CPV_R_20183460000.gz 5626883.4 -2380932.3 1824483.9
add a comment |
If you can meet the guarantee in the answer provided by @steeldriver that is the better solution, if not use this script.
##!/usr/bin/awk -f
{
if (!seen[$1]++) {
line = $0
prev = $1
}
else {
if (prev == $1) print line
line = $0
}
}
END { if (prev != $1) print line }
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "89"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2faskubuntu.com%2fquestions%2f1137740%2fremoving-the-last-repeated-lines-in-the-text-files%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
Awk is a go to program for doing this kind of replacemen.
To remove lines that have a duplicate in the first column this should do it.
awk '!seen[$1]++' filename > outputfile
If you need to remove full duplication of lines use this instead.
awk '!seen[$0]++' filename > outputfile
As seen in this answer: https://unix.stackexchange.com/questions/171091/remove-lines-based-on-duplicates-within-one-column-without-sort
Here is a brief explanation. awk is used for pattern scanning and text processing. First, it checks if the value in column 1 ($1) is in the map seen. If it isn't it prints the line to the output file, or screen if you don't redirect (> outputfile). The next part is that it adds the viewed column info to seen (++) so it can search with the next pattern.
@RaidPinata, this doesn't solve the problem. The above code removes all the repeated lines except for the first one.
– deepblue_86
11 hours ago
1
Oops, wrong link! fixed
– RaidPinata
11 hours ago
@deepblue_86 when I ran this with a test file it removed the duplicates as requested. Do you need it to just remove full duplicate lines, or ones that are a duplicate with the first column only.
– RaidPinata
11 hours ago
That was part of my comment on his question also: The OP seems to be asking to remove only one of the sets of lines with the first field identical.
– Charles Green
10 hours ago
@RaidPinata, I edited the question, could you check it now?
– deepblue_86
10 hours ago
|
show 7 more comments
Awk is a go to program for doing this kind of replacemen.
To remove lines that have a duplicate in the first column this should do it.
awk '!seen[$1]++' filename > outputfile
If you need to remove full duplication of lines use this instead.
awk '!seen[$0]++' filename > outputfile
As seen in this answer: https://unix.stackexchange.com/questions/171091/remove-lines-based-on-duplicates-within-one-column-without-sort
Here is a brief explanation. awk is used for pattern scanning and text processing. First, it checks if the value in column 1 ($1) is in the map seen. If it isn't it prints the line to the output file, or screen if you don't redirect (> outputfile). The next part is that it adds the viewed column info to seen (++) so it can search with the next pattern.
@RaidPinata, this doesn't solve the problem. The above code removes all the repeated lines except for the first one.
– deepblue_86
11 hours ago
1
Oops, wrong link! fixed
– RaidPinata
11 hours ago
@deepblue_86 when I ran this with a test file it removed the duplicates as requested. Do you need it to just remove full duplicate lines, or ones that are a duplicate with the first column only.
– RaidPinata
11 hours ago
That was part of my comment on his question also: The OP seems to be asking to remove only one of the sets of lines with the first field identical.
– Charles Green
10 hours ago
@RaidPinata, I edited the question, could you check it now?
– deepblue_86
10 hours ago
|
show 7 more comments
Awk is a go to program for doing this kind of replacemen.
To remove lines that have a duplicate in the first column this should do it.
awk '!seen[$1]++' filename > outputfile
If you need to remove full duplication of lines use this instead.
awk '!seen[$0]++' filename > outputfile
As seen in this answer: https://unix.stackexchange.com/questions/171091/remove-lines-based-on-duplicates-within-one-column-without-sort
Here is a brief explanation. awk is used for pattern scanning and text processing. First, it checks if the value in column 1 ($1) is in the map seen. If it isn't it prints the line to the output file, or screen if you don't redirect (> outputfile). The next part is that it adds the viewed column info to seen (++) so it can search with the next pattern.
Awk is a go to program for doing this kind of replacemen.
To remove lines that have a duplicate in the first column this should do it.
awk '!seen[$1]++' filename > outputfile
If you need to remove full duplication of lines use this instead.
awk '!seen[$0]++' filename > outputfile
As seen in this answer: https://unix.stackexchange.com/questions/171091/remove-lines-based-on-duplicates-within-one-column-without-sort
Here is a brief explanation. awk is used for pattern scanning and text processing. First, it checks if the value in column 1 ($1) is in the map seen. If it isn't it prints the line to the output file, or screen if you don't redirect (> outputfile). The next part is that it adds the viewed column info to seen (++) so it can search with the next pattern.
edited 10 hours ago
answered 11 hours ago
RaidPinataRaidPinata
616
616
@RaidPinata, this doesn't solve the problem. The above code removes all the repeated lines except for the first one.
– deepblue_86
11 hours ago
1
Oops, wrong link! fixed
– RaidPinata
11 hours ago
@deepblue_86 when I ran this with a test file it removed the duplicates as requested. Do you need it to just remove full duplicate lines, or ones that are a duplicate with the first column only.
– RaidPinata
11 hours ago
That was part of my comment on his question also: The OP seems to be asking to remove only one of the sets of lines with the first field identical.
– Charles Green
10 hours ago
@RaidPinata, I edited the question, could you check it now?
– deepblue_86
10 hours ago
|
show 7 more comments
@RaidPinata, this doesn't solve the problem. The above code removes all the repeated lines except for the first one.
– deepblue_86
11 hours ago
1
Oops, wrong link! fixed
– RaidPinata
11 hours ago
@deepblue_86 when I ran this with a test file it removed the duplicates as requested. Do you need it to just remove full duplicate lines, or ones that are a duplicate with the first column only.
– RaidPinata
11 hours ago
That was part of my comment on his question also: The OP seems to be asking to remove only one of the sets of lines with the first field identical.
– Charles Green
10 hours ago
@RaidPinata, I edited the question, could you check it now?
– deepblue_86
10 hours ago
@RaidPinata, this doesn't solve the problem. The above code removes all the repeated lines except for the first one.
– deepblue_86
11 hours ago
@RaidPinata, this doesn't solve the problem. The above code removes all the repeated lines except for the first one.
– deepblue_86
11 hours ago
1
1
Oops, wrong link! fixed
– RaidPinata
11 hours ago
Oops, wrong link! fixed
– RaidPinata
11 hours ago
@deepblue_86 when I ran this with a test file it removed the duplicates as requested. Do you need it to just remove full duplicate lines, or ones that are a duplicate with the first column only.
– RaidPinata
11 hours ago
@deepblue_86 when I ran this with a test file it removed the duplicates as requested. Do you need it to just remove full duplicate lines, or ones that are a duplicate with the first column only.
– RaidPinata
11 hours ago
That was part of my comment on his question also: The OP seems to be asking to remove only one of the sets of lines with the first field identical.
– Charles Green
10 hours ago
That was part of my comment on his question also: The OP seems to be asking to remove only one of the sets of lines with the first field identical.
– Charles Green
10 hours ago
@RaidPinata, I edited the question, could you check it now?
– deepblue_86
10 hours ago
@RaidPinata, I edited the question, could you check it now?
– deepblue_86
10 hours ago
|
show 7 more comments
If you are sure that each $1
(first column) is duplicated at least once, then you can:
- reverse the order of lines
- only select those lines whose
$1
has been seen before - reverse the result
Ex.
$ tac file | awk 'seen[$1]++' | tac
ALIC00AUS_R_20183350000.gz -4052052.7 4212835.9 -2545104.6
ALIC00AUS_R_20183350000.gz -4052052.7 4212836.0 -2545104.6
ALIC00AUS_R_20183350000.gz -4052052.7 4212836.0 -2545104.6
ALIC00AUS_R_20183350000.gz -4052052.7 4212835.9 -2545104.6
ALIC00AUS_R_20183350000.gz -4052052.5 4212836.0 -2545104.6
CPVG00CPV_R_20183460000.gz 5626883.4 -2380932.3 1824483.9
CPVG00CPV_R_20183460000.gz 5626883.4 -2380932.3 1824483.9
add a comment |
If you are sure that each $1
(first column) is duplicated at least once, then you can:
- reverse the order of lines
- only select those lines whose
$1
has been seen before - reverse the result
Ex.
$ tac file | awk 'seen[$1]++' | tac
ALIC00AUS_R_20183350000.gz -4052052.7 4212835.9 -2545104.6
ALIC00AUS_R_20183350000.gz -4052052.7 4212836.0 -2545104.6
ALIC00AUS_R_20183350000.gz -4052052.7 4212836.0 -2545104.6
ALIC00AUS_R_20183350000.gz -4052052.7 4212835.9 -2545104.6
ALIC00AUS_R_20183350000.gz -4052052.5 4212836.0 -2545104.6
CPVG00CPV_R_20183460000.gz 5626883.4 -2380932.3 1824483.9
CPVG00CPV_R_20183460000.gz 5626883.4 -2380932.3 1824483.9
add a comment |
If you are sure that each $1
(first column) is duplicated at least once, then you can:
- reverse the order of lines
- only select those lines whose
$1
has been seen before - reverse the result
Ex.
$ tac file | awk 'seen[$1]++' | tac
ALIC00AUS_R_20183350000.gz -4052052.7 4212835.9 -2545104.6
ALIC00AUS_R_20183350000.gz -4052052.7 4212836.0 -2545104.6
ALIC00AUS_R_20183350000.gz -4052052.7 4212836.0 -2545104.6
ALIC00AUS_R_20183350000.gz -4052052.7 4212835.9 -2545104.6
ALIC00AUS_R_20183350000.gz -4052052.5 4212836.0 -2545104.6
CPVG00CPV_R_20183460000.gz 5626883.4 -2380932.3 1824483.9
CPVG00CPV_R_20183460000.gz 5626883.4 -2380932.3 1824483.9
If you are sure that each $1
(first column) is duplicated at least once, then you can:
- reverse the order of lines
- only select those lines whose
$1
has been seen before - reverse the result
Ex.
$ tac file | awk 'seen[$1]++' | tac
ALIC00AUS_R_20183350000.gz -4052052.7 4212835.9 -2545104.6
ALIC00AUS_R_20183350000.gz -4052052.7 4212836.0 -2545104.6
ALIC00AUS_R_20183350000.gz -4052052.7 4212836.0 -2545104.6
ALIC00AUS_R_20183350000.gz -4052052.7 4212835.9 -2545104.6
ALIC00AUS_R_20183350000.gz -4052052.5 4212836.0 -2545104.6
CPVG00CPV_R_20183460000.gz 5626883.4 -2380932.3 1824483.9
CPVG00CPV_R_20183460000.gz 5626883.4 -2380932.3 1824483.9
answered 10 hours ago
steeldriversteeldriver
71.2k11115187
71.2k11115187
add a comment |
add a comment |
If you can meet the guarantee in the answer provided by @steeldriver that is the better solution, if not use this script.
##!/usr/bin/awk -f
{
if (!seen[$1]++) {
line = $0
prev = $1
}
else {
if (prev == $1) print line
line = $0
}
}
END { if (prev != $1) print line }
add a comment |
If you can meet the guarantee in the answer provided by @steeldriver that is the better solution, if not use this script.
##!/usr/bin/awk -f
{
if (!seen[$1]++) {
line = $0
prev = $1
}
else {
if (prev == $1) print line
line = $0
}
}
END { if (prev != $1) print line }
add a comment |
If you can meet the guarantee in the answer provided by @steeldriver that is the better solution, if not use this script.
##!/usr/bin/awk -f
{
if (!seen[$1]++) {
line = $0
prev = $1
}
else {
if (prev == $1) print line
line = $0
}
}
END { if (prev != $1) print line }
If you can meet the guarantee in the answer provided by @steeldriver that is the better solution, if not use this script.
##!/usr/bin/awk -f
{
if (!seen[$1]++) {
line = $0
prev = $1
}
else {
if (prev == $1) print line
line = $0
}
}
END { if (prev != $1) print line }
edited 7 hours ago
answered 10 hours ago
RaidPinataRaidPinata
616
616
add a comment |
add a comment |
Thanks for contributing an answer to Ask Ubuntu!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2faskubuntu.com%2fquestions%2f1137740%2fremoving-the-last-repeated-lines-in-the-text-files%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Your question has some issues: The line "ALIC00AUS_R_20183350000.gz" repeats only 4 times in your example, then switches to "ALIC00AUS_R_20183370000.gz". Within the section that is "LIC00AUS_R_20183350000.gz" there are two identical sets of strings. Are you looking for only unique strings?
– Charles Green
11 hours ago
Yes, I'm looking for only unique strings. I edited the question.
– deepblue_86
11 hours ago
Thanks! It seems wither of the answers mentioned by @RaidPinata below, will work for you. If the file is large, you may want to look more deeply into the lines referenced from stack exchange, and use the
sort
command rather thanawk
– Charles Green
10 hours ago