Python

Search

droping Duplicates

# this is based on 2 factors the name of the dog and the breed of the dog so we can
# have 2 dog with the same name but diff breed.

df.drop_duplicates(subset=["name", "breed"])

# Without including index
df.drop_duplicates(subset=["name", "breed"], index = False)

Comment

.drop duplicates()

df = pd.DataFrame({"Date": ["2022", "2022", "2021", "2021", "2020", "2020"], "Time": ["20:00", "20:00", "20:00", "21:00", "22:00", "22:00"]})
df.drop_duplicates()

#output
#	Date	Time
#	2022	20:00
#	2021	20:00
#	2021	21:00
#	2020	22:00

Comment

Duplicate Remove

from pathlib import Path
import hashlib
import os

def remove_duplicate(path):
    unique = {}
    for file in Path(path).rglob('*'):
        if file.is_file():
            with open(file, 'rb') as f:
                filehash = hashlib.md5(f.read()).hexdigest()
                if filehash not in unique:
                    unique[filehash] = file
                else:
                    # Test print before removing
                    print(f'Removing --> {unique[filehash]}')
                    #os.remove(unique[filehash])

if __name__ == '__main__':
    path = r'C:foo'
    remove_duplicate(path)

Comment

PREVIOUS	NEXT

Code Example
Python :: train test split sklearn
Python :: create pytorch zeros
Python :: kivy button on click
Python :: word guessing game python
Python :: python read and write pdf data
Python :: subtract from dataframe column
Python :: opencv shift image python
Python :: isdigit python
Python :: python count variable and put the count in a column of data frame
Python :: how to capitalize first letter in python
Python :: how to install python libraries using pip
Python :: legend font size python matplotlib
Python :: How to join two dataframes by 2 columns so they have only the common rows?
Python :: change marker border color plotly
Python :: how to colour letters in python
Python :: python slack
Python :: (for in) printing in python
Python :: remove dot from number python
Python :: Python program to implement linear search and take input.
Python :: sort a dictionary
Python :: pyspark dataframe to parquet
Python :: pandas cumulative mean
Python :: python while false loop
Python :: Simple dictionary in Python
Python :: sort rows by values dataframe
Python :: echo $pythonpath ubuntu set default
Python :: python f string 2 decimals
Python :: django-sslserver
Python :: pandas dataframe to series
Python :: howe to print all values and keysin d

Search

PYTHON

droping Duplicates

.drop duplicates()

Duplicate Remove

ADD CONTENT