Linux命令：去除重复的行 uniq

uniq命令主要用于删除文件中重复的行，使用的前提是文件已经使用sort排序后，否则uniq只删除连续重复的行，而非全局删除重复。

主要使用场景：

去除重复的行（可以忽略大小写）

统计重复的行

uniq的主要选项包括：

-c	--count	统计各行出现的次数
-u	--unique	只输出不重复的行
-d	--repeated	只输出重复的行
-D	--all-repeated	输出全部重复的行
-i	--ignore-case	比较时忽略大小写
-w N	--check-chars=N	只比较行的前N个字符
-s N	--skip-chars=N	比较时略过前N个字符，之比较N个字符后的内容
-f N	--skip-fields=N	比较时略过前N个字段，之比较N个字段后的内容

以以下的文件为例

$ cat test_data.txt 
green
red
white
blue
yellow
black
white
blue
blue

对没有sort排序的文件使用uniq，只删除连续重复的行（两个blue变为一个）

$ uniq test_data.txt 
green
red
white
blue
yellow
black
white
blue

先sort后，再uniq即可全局删除重复（删去了多余的blue和white）

$ sort test_data.txt | uniq
black
blue
green
red
white
yellow

使用-c可以统计行的重复次数

$ sort test_data.txt | uniq -c
      1 black
      3 blue
      1 green
      1 red
      2 white
      1 yellow

$ sort test_data.txt | uniq -c | sort
      1 black
      1 green
      1 red
      1 yellow
      2 white
      3 blue

使用-w可以只对前n个字符为对象进行删除，若n=1 , blue与black 则视为重复：

$ sort test_data.txt | uniq -w 1
black
green
red
white
yellow

共享此文章：

相关

留下评论 取消回复

留下评论取消回复