go – DeDuplicate Array of Structs-ThrowExceptions

Exception or error:

I have an array of struct and I want to remove all the duplicates element but keep the last element in the array. Something similar to hashmap where I can update the last struct matched every time to the new array

I have a struct something like this

type samplestruct struct {
    value1           string    
    value2           string   
    value3           string   
    value4           string
    value5           string
}

In my array of struct if value1, value2 and value3 of any struct is same , remove all the duplicates and keep the last struct.

func unique(sample []samplestruct) []samplestruct {
    var unique []samplestruct

    for _, v := range sample {
        skip := false
        for _, u := range unique {
            if v.value1 == u.value1 && v.value2 == u.value2 && v.value3 == u.value3 {
                skip = true
                break
            }
        }
        if !skip {
            unique = append(unique, v)
        }
    }
    return unique

}

This code return me the first struct that matched the condition provided but I want the last struct that matches the condition

Given Input –

[
samplestruct{"ram","rahim","india","34","india"},
samplestruct{"ram","rahim","india","38","America"},
samplestruct{"ram","rahim","india","40","Jamica"},
samplestruct{"amit","rawat","bangladesh","35","hawai"},
samplestruct{"amit","rawat","bangladesh","36","india"}
]

ExpectedOutput –

[
samplestruct{"ram","rahim","india","40","Jamica"},
samplestruct{"amit","rawat","bangladesh","36","india"}
]
How to solve:

The code in the question is almost there. When a matching element is found in unique, overwrite the element with the current value:

func unique(sample []samplestruct) []samplestruct {
    var unique []samplestruct
sampleLoop:
    for _, v := range sample {
        for i, u := range unique {
            if v.value1 == u.value1 && v.value2 == u.value2 && v.value3 == u.value3 {
                unique[i] = v
                continue sampleLoop
            }
        }
        unique = append(unique, v)
    }
    return unique
}

The map-based approaches shown in other answers may be more appropriate depending on the size of the data set and number of surviving elements. Here’s a correct implementation of the map approach:

func unique(sample []samplestruct) []samplestruct {
    var unique []samplestruct
    type key struct{ value1, value2, value3 string }
    m := make(map[key]int)
    for _, v := range sample {
        k := key{v.value1, v.value2, v.value3}
        if i, ok := m[k]; ok {
            // Overwrite previous value per requirement in
            // question to keep last matching value.
            unique[i] = v
        } else {
            // Unique key found. Record position and collect
            // in result.
            m[k] = len(unique)
            unique = append(unique, v)
        }
    }
    return unique
}

Answer:

Probably you should use a map here, use the important values as the key, when you encounter a duplicate and check for the key, you replace the value in the map.

Currently you are adding the values to the unique array if you haven’t encountered them before, and then if you encounter one in the array after, you skip it. This is why you are only adding the first encounter of each struct which is the opposite of what you want.

You could either produce the key to the map as a concatenation of your important values (1 to 3), or use a struct of the three values as a key, and build the new key struct for each items and then search for it in the map.

Using a map will also be more performant than an array, as you can lookup much quicker in a map than iterating the unique array each time.

Answer:

Nice little exercise, here is one solution which I will explain below:

package main

import "fmt"

func main() {
    all := []person{
        {"ram", "rahim", "india", "34", "india"},
        {"ram", "rahim", "india", "38", "America"},
        {"ram", "rahim", "india", "40", "Jamica"},
        {"amit", "rawat", "bangladesh", "35", "hawai"},
        {"amit", "rawat", "bangladesh", "36", "india"},
    }

    var deduped []person
    // add the last occurrence always
    for i := len(all) - 1; i >= 0; i-- {
        if !contains(deduped, all[i]) {
            // "append" to the front of the array
            deduped = append([]person{all[i]}, deduped...)
        }
    }

    for _, x := range deduped {
        fmt.Println(x)
    }
}

type person [5]string

func eq(a, b person) bool {
    return a[0] == b[0] && a[1] == b[1] && a[2] == b[2]
}

func contains(list []person, x person) bool {
    for i := range list {
        if eq(x, list[i]) {
            return true
        }
    }
    return false
}

We step through the input array backwards in order to catch the last of multiple equal items. Then we want to append that item to the back of the deduped array. That is why we revert the append operation, creating a new temporary one-item person slice and append the previous to it.

Efficiency-wise, this solution has some drawbacks, appending to the one-item slice will use O(n²) space as it produces a new slice every time, pre-allocating an array of len(all), appending to it, and reversing it afterwards would solve that problem.

The second performance issue that might arise if you do this for a zillion persons is the contains function which is O(n²) lookups for the program. This could be solved with a map[person]bool.

Answer:

Use a map. First scan the list and set up a map with the first 3 values as the key for the map. The map value for each key will be the last found

Then walk the map it will be set to the correct values

package main

import (
    "fmt"
    "strings"
)

type samplestruct struct {
    value1 string
    value2 string
    value3 string
    value4 string
    value5 string
}


func mkey(x samplestruct) string {
    return strings.Join([]string{x.value1, x.value2, x.value3}, "-")
}

func main() {
    cm := make(map[string]samplestruct)
    exampledata := []samplestruct{samplestruct{"ram", "rahim", "india", "34", "india"},
        samplestruct{"ram", "rahim", "india", "38", "America"},
        samplestruct{"ram", "rahim", "india", "40", "Jamica"},
        samplestruct{"amit", "rawat", "bangladesh", "35", "hawai"},
        samplestruct{"amit", "rawat", "bangladesh", "36", "india"}}
    for _, x := range exampledata {
        k := mkey(x)
        cm[k] = x
    }

    for x := range cm {

        fmt.Println(cm[x])

    }

}

https://play.golang.org/p/ITD0VjhFQEk

Leave a Reply

Your email address will not be published. Required fields are marked *