All code here can be found in this repository.
It's been a while since I last shared a project. Life's been pretty busy, but I've been working on something interesting: designing a novel color embedding space. In this post, I'll take you through the process of creating a unique way to label colors in plain English based on their RGB values. The twist? We're using a perceptually accurate color difference method for better labeling precision.
Color is a fundamental part of how we perceive the world, and assigning meaningful names to colors helps us communicate those perceptions. However, representing a color in a dense space with plain English labels isn't trivial. In this post, I'll walk through my process of creating a novel embedding space for colors, where each color is mapped to its closest descriptive label.
The structure of this post is as follows:
The first step in building a color embedding space is defining a way to categorize colors. Using simple rules based on RGB values, I categorized colors into basic labels like "Bright Red," "Teal," and "Light Brown." The logic for this initial labeling relies on thresholds in the RGB space.
For example, if the red component is high, and both green and blue are low, the color is labeled as "Bright Red." Similarly, if red is dominant but green and blue are closer in value, the color is classified as a shade of brown.
Here's a simplified version of the labeling function:
def rgb_to_label(r, g, b):
if r > 230 and g < 100 and b < 100:
return "Bright Red"
elif r > g and r > b and g > 40 and g < 150 and b < 100:
return "Brown"
...
return "Unlabeled Color"
Initially, not all colors are covered, and many are left as "Unlabeled Color." That's where the second part of the process comes in—improving the accuracy using a better distance metric.
The RGB color space is not perceptually uniform, meaning that two colors can look very different to the human eye, even if their RGB values are close. To make the color embedding space more accurate, I moved from simple Euclidean distance in RGB space to the CIELAB color space. The CIELAB space is designed to align more closely with human color perception.
I used the CIE76 color difference formula, which calculates the Euclidean distance between colors in the LAB space. Converting RGB values to LAB is non-trivial but crucial for achieving more accurate results. Here's an example of the conversion and distance calculation:
def rgb_to_lab(r, g, b):
...
return (L, a, b) # CIELAB values
def color_difference_lab(c1, c2):
return sqrt((c2[0] - c1[0]) ** 2 + (c2[1] - c1[1]) ** 2 + (c2[2] - c1[2]) ** 2)
Using the LAB color space, I compute the distance between each unlabeled color and the labeled ones to find the closest match.
Now that we have a perceptually accurate way to measure color differences, the next step is assigning labels to the previously unlabeled colors. For each unlabeled color, I calculate its LAB value and find the closest labeled color based on the CIE76 formula.
The process is straightforward: for each unlabeled color, measure the distance to all labeled colors, and assign the label of the closest match. This ensures that even if a color didn't fall into the initial set of categories, it still receives a reasonable descriptive label.
Here's the code snippet that does the label assignment:
def find_closest_color(row, labeled_df):
current_color_lab = rgb_to_lab(row['R'], row['G'], row['B'])
distances = [color_difference_lab(current_color_lab, rgb_to_lab(lab['R'], lab['G'], lab['B'])) for lab in labeled_df]
closest_idx = np.argmin(distances)
return labeled_df.iloc[closest_idx]['Label']
This approach results in a much denser and more accurate color embedding space. Unlabeled colors are now assigned appropriate labels based on perceptual similarity.