首页 > 解决方案 > unload() 函数的问题(CS50 第 5 周:Speller)

问题描述

我正在做 CS50 的第 5 周作业,Speller。我一次构建一个函数,但我的unload函数遇到了问题(第 151 行)。现在,我只是在使用迭代释放每个节点之前以打印结果的方式测试迭代。我通过将每个节点的单词更改"FREE"为这些节点要被释放的顺序来做到这一点。

函数调用(第 60 行)返回 true,printf命令打印成功。但是,unload函数本身的所有内容都被忽略了。printf我添加的用于查看进度 ( )的行都没有DEBUG DEBUG DEBUG打印。第print()63 行的函数调用应该打印所有单词设置为的表格"FREE",并且所有字典单词位置都显示"NOT FOUND"。相反,它打印的列表和位置完全不变,并且没有DEBUG触发 for 循环(第 155 行)中的任何打印命令。

我不明白为什么会这样。单独的unload()函数调用,无论它是否返回 true,都应该至少触发printffor 循环中的第一个命令(第 157 行)。但即使这样也被跳过了。

有人可以帮我理解为什么函数返回true,但没有做任何应该做的改变吗?提前致谢。

编辑:好的,有人告诉我,我unload在第 60 行没有正确调用该函数。我已经纠正了这一点。现在它会打印出来"LOCATION 00:",但它会在它到达第while158 行的第一个循环时立即结束。我之前遇到过这个问题,我不确定它为什么会这样做。strcmp()应该看到头节点的单词不匹配"FREE",直到它从列表的末尾更改为开头。为什么while循环甚至没有触发?

#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>
#include <string.h>

unsigned int HASH_MAX = 50; // Max elements in hash table
unsigned int LENGTH = 20; // Max length of word to be stored

unsigned int hash(const char *word); // assign hash code -- [(code + current letter) * 3] * string length, % HASH_MAX
bool load(FILE *dictionary); // load dictionary into memory
bool check(char *word); // check if word exists in dictionary
bool unload(void); // unload dictionary from memory, free memory (CURRENTLY DEBUGGING, CHECKING ITERATION)
void print(void); // print table contents and node locations

typedef struct _node // node structure: stored word, pointer to next node
{
    char *word[20];
    struct _node *next;
} node;

node *HASH_TABLE[50];

int main(int argc, char *argv[])
{
    FILE *dictionary = fopen("C:/Users/aaron/Desktop/Dictionary.txt", "r"); // open dictionary file, read

    if (!dictionary) // if dictionary is NULL, return error message, end program
    {
        printf("FILE NOT FOUND\n");
        return 1;
    }

    if (load(dictionary)) // if dictionary loaded successfully (function call), close dictionary and print table contents
    {
        fclose(dictionary);
        print(); // print "LIST (number): {(name, address), ...}\n
    }

    char *checkword = "Albatross"; // test check function for word that does not exist in the library
    char *checkword2 = "Riku"; // test check function for word that does exist in the library

    if (check(checkword)) // return check results for checkword, found or not found
    {
        printf("\n%s found\n", checkword);
    }
    else
    {
        printf("\n%s not found\n", checkword);
    }

    if (check(checkword2)) // return check results for checkword2, found or not found
    {
        printf("\n%s found\n", checkword2);
    }
    else
    {
        printf("\n%s not found\n", checkword2);
    }

    if (unload()) // if unloaded successfully (function call), print contents
    {
        printf("\nUNLOADED...\n\n"); // DEBUG DEBUG DEBUG (confirm unload function returned true)
        print();
    }
}

unsigned int hash(const char *word) // assign hash code -- [(code + current letter) * 3] * string length, % HASH_MAX
{
    char word_conv[LENGTH + 1]; // store converted word for uniform key
    unsigned int code = 0; // hash code

    strcpy(word_conv, word);

    for (int i = 0; i < strlen(word); i++) // set all letters in the word to lower case
    {
        word_conv[i] = tolower(word_conv[i]);
    }

    for (int j = 0; j < strlen(word_conv); j++) // for all letters in converted word, add ascii value to code and multiply by 3
    {
        code += word_conv[j];
        code = code * 3;
    }

    code = code % HASH_MAX; // set code to remainder of current code divided by maximum hash table size

    return code;
}

bool load(FILE *dictionary) // load dictionary into memory
{
    char word[LENGTH+1]; // store next word in the dictionary

    while (!feof(dictionary)) // until end of dictionary file
    {
        fscanf(dictionary, "%s", word); // scan for next word

        node *new_n = malloc(sizeof(node)); // new node
        strcpy(new_n->word, word); // store scanned word in new node
        new_n->next = NULL; // new node's next pointer set to NULL

        unsigned int code = hash(word); // retrieve and store hash code

        if (HASH_TABLE[code] == NULL) // if hash location has no head
        {
            HASH_TABLE[code] = new_n; // set new node to location head
        }
        else if (HASH_TABLE[code] != NULL) // if head already exists at hash location
        {
            node *trav = HASH_TABLE[code]; // set traversal node

            while (trav->next != NULL) // while traversal node's next pointer is not NULL
            {
                trav = trav->next; // move to next node
            }

            if (trav->next == NULL) // if traversal node's next pointer is null
            {
                trav->next = new_n; // set new node to traversal node's next pointer
            }
        }
    }

    return true; // confirm successful load
}

bool check(char *word) // check if word exists in dictionary
{
    unsigned int code = hash(word); // retrieve and store hash code

    node *check = HASH_TABLE[code]; // set traversal node to hash location head

    while (check != NULL) // while traversal node is not NULL
    {
        int check_true = strcasecmp(check->word, word); // compare traversal node's word to provided word argument

        if (check_true == 0) // if a match is found, return true
        {
            return true;
        }
        else if (check_true != 0) // if no match, move to next node
        {
            check = check->next;
        }
    }

    if (check == NULL) // if end of list is reached without a match, return false
        return false;
}

bool unload(void) // unload dictionary from memory, free memory (CURRENTLY DEBUGGING, CHECKING ITERATION)
{
    char *word = "FREE"; // DEBUG DEBUG DEBUG (changin all nodes' words to "FREE" to test iteration)

    for (int i = 0; i < HASH_MAX; i++) // for every element in the hash table, HASH_MAX (50)
    {
        printf("LOCATION %02d:\n", i); // DEBUG DEBUG DEBUG (print current hash table location)
        while (strcmp(HASH_TABLE[i]->word, word) != 0) // while the head node's word is not "FREE"
        {
            node *trav = HASH_TABLE[i]; // set traversal node to head
            printf("HEAD WORD: %s\n", HASH_TABLE[i]->word); // DEBUG DEBUG DEBUG (print head word to confirm while condition)

            while (strcmp(trav->next->word, word) != 0) // while the traversal node's word is not "FREE"
            {
                trav = trav->next; // move to next node
                printf("."); // DEBUG DEBUG DEBUG (print a dot for every location skipped)
            }

            printf("\n"); // DEBUG DEBUG DEBUG

            strcpy(trav->word, word); // set traversal node's word to "FREE"

            printf("{"); // DEBUG DEBUG DEBUG

            while (trav != NULL) // DEBUG DEBUG DEBUG (print hash location's current list of words)
            {
                printf("%s, ", trav->word); // DEBUG DEBUG DEBUG
            }

            printf("}\n\n"); // DEBUG DEBUG DEBUG
        }
    }

    return true; // freed successfully
}

void print(void) // print hash table contents and node locations
{
    for (int i = 0; i < HASH_MAX; i++) // for every element in the hash table
    {
        node *check = HASH_TABLE[i]; // set traversal node to current hash table element head

        printf("LIST %02d: {", i); // print hash table element location

        while (check != NULL) // for all nodes in the current linked list
        {
            printf("%s, ", check->word); // print traversal node's word
            check = check->next; // move to next node
        }

        printf("}\n");
    }

    printf("\n");

    FILE *dictionary = fopen("C:/Users/aaron/Desktop/Dictionary.txt", "r"); // open dictionary file

    while (!feof(dictionary)) // for all words in the dictionary
    {
        char word[LENGTH + 1]; // store next word

        fscanf(dictionary, "%s", word); // scan for next word

        unsigned int code = hash(word); // retrieve and store word's hash code

        node *search = HASH_TABLE[code]; // set traversal node to hash location head

        while (search != NULL) // for all nodes at that location, or until word is found
        {
            if (strcasecmp(search->word, word) == 0) // compare traversal node's word to scanned word (case insensitive)
            {
                printf("%s: %p\n", search->word, search); // print traversal node's word and location
                break; // break while loop
            }
            else
            {
                search = search->next; // if traversal node's word does not match scanned word, move to next node
            }
        }

        if (search == NULL) // if the scanned word matches none of the words in the hash location's linked list
            printf("\"%s\" NOT FOUND\n", word); // word not found
    }

    fclose(dictionary); // close dictionary file
}

标签: cmemory-managementcs50

解决方案


Caveat: chqrlie has pointed out many of the basic issues, but here's some refactored code.


Your main issue was that unload didn't actually remove the nodes.


One of things to note is that it's easier/faster/better to use tolower once per string.

If the lowercased version is what we store in the node, and we lowercase the search word in check, we can use strcmp instead of strcasecmp [which has to redo the lowercasing for both arguments on each loop iteration].

So, I've changed the hash function to lowercase its argument "in-place".


As I mentioned in my above comment, print was extraneously rereading the dictionary file. So, I've removed that code. If it were necessary to do this, it should go into [yet] another function, or load and/or check should be reused.

(i.e.) print should do one thing well [a programming maxim].


Personally, I dislike "sidebar" comments:

if (unload()) // if unloaded successfully (function call), print contents

I prefer the comment to go above the line:

// if unloaded successfully (function call), print contents
if (unload())

To me, this is much clearer and it helps prevent the line from going beyond 80 characters in width.


Certain fixed constants (e.g. HASH_MAX and LENGTH) are global variables. This prevents them from being used to define arrays

(e.g.) you couldn't say:

node *HASH_TABLE[HASH_MAX];

and had to "hardwire" it as:

node *HASH_TABLE[50];

If we define these with either #define or as an enum, then we can use the preferred definitions.


Doing something like:

for (int i = 0; i < strlen(word); i++)

increases the loop time from O(length) to O(length^2) because strlen is called "length" times inside the loop and it rescans the string each time.

Much better to do:

int len = strlen(word);
for (int i = 0; i < len; i++)

But even this has an extra scan of the buffer. It can be better is to do something like:

for (int chr = *word++;  chr != 0;  chr = *word++)

I've refactored the code with annotations for the bugs. Original code is bracketed inside a #if 0 block:

#if 0
// old/original code
#else
// new/refactored code
#endif

Anyway, here's the code:

#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>
#include <string.h>
#if 1
#include <ctype.h>
#endif

// Max elements in hash table
#if 0
unsigned int HASH_MAX = 50;
#else
enum {
    HASH_MAX = 50
};
#endif

// Max length of word to be stored
#if 0
unsigned int LENGTH = 20;
#else
enum {
    LENGTH = 20
};
#endif

// assign hash code -- [(code + current letter) * 3] * string length, % HASH_MAX
#if 0
unsigned int hash(const char *word);
#else
unsigned int hash(char *word);
#endif

// load dictionary into memory
bool load(FILE *dictionary);

// check if word exists in dictionary
#if 0
bool check(char *word);
#else
bool check(const char *word);
#endif

// unload dictionary from memory, free memory (CURRENTLY DEBUGGING,
// CHECKING ITERATION)
bool unload(void);

// print table contents and node locations
void print(void);

// node structure: stored word, pointer to next node
typedef struct _node {
#if 0
    char *word[20];
#else
    char word[LENGTH + 1];
#endif
    struct _node *next;
} node;

#if 0
node *HASH_TABLE[50];
#else
node *HASH_TABLE[HASH_MAX];
#endif

int
main(int argc, char *argv[])
{
    // open dictionary file, read
#if 0
    FILE *dictionary = fopen("C:/Users/aaron/Desktop/Dictionary.txt", "r");
#else
    FILE *dictionary = fopen("Dictionary.txt", "r");
#endif

    // if dictionary is NULL, return error message, end program
    if (!dictionary) {
        printf("FILE NOT FOUND\n");
        return 1;
    }

    // if dictionary loaded successfully (function call), close dictionary and
    // print table contents
    if (load(dictionary)) {
        fclose(dictionary);
        // print "LIST (number): {(name, address), ...}\n
        print();
    }

    // test check function for word that does not exist in the library
    char *checkword = "Albatross";

    // test check function for word that does exist in the library
    char *checkword2 = "Riku";

    // return check results for checkword, found or not found
    if (check(checkword)) {
        printf("\n%s found\n", checkword);
    }
    else {
        printf("\n%s not found\n", checkword);
    }

    // return check results for checkword2, found or not found
    if (check(checkword2)) {
        printf("\n%s found\n", checkword2);
    }
    else {
        printf("\n%s not found\n", checkword2);
    }

    // if unloaded successfully (function call), print contents
    if (unload()) {
        // DEBUG DEBUG DEBUG (confirm unload function returned true)
        printf("\nUNLOADED...\n\n");
        print();
    }
}

// assign hash code -- [(code + current letter) * 3] * string length, % HASH_MAX
unsigned int
hash(char *word)
{
    // store converted word for uniform key
#if 0
    char word_conv[LENGTH + 1];
#endif

    // hash code
    unsigned int code = 0;

#if 0
    strcpy(word_conv, word);

    // set all letters in the word to lower case
    for (int i = 0; i < strlen(word); i++) {
        word_conv[i] = tolower(word_conv[i]);
    }

    // for all letters in converted word, add ascii value to code and multiply by 3
    for (int j = 0; j < strlen(word_conv); j++) {
        code += word_conv[j];
        code = code * 3;
    }
#else
    int chr;
    while (1) {
        chr = *word;

        if (chr == 0)
            break;

        chr = tolower(chr);
        *word++ = chr;

        code += chr;
        code *= 3;
    }
#endif

    // set code to remainder of current code divided by maximum hash table size
    code = code % HASH_MAX;

    return code;
}

// load dictionary into memory
bool
load(FILE * dictionary)
{
    // store next word in the dictionary
    char word[LENGTH + 1];

    // until end of dictionary file
// NOTE/BUG: don't use feof
#if 0
    while (!feof(dictionary)) {
        // scan for next word
        fscanf(dictionary, "%s", word);
#else
    // scan for next word
    while (fscanf(dictionary, "%s", word) == 1) {
#endif
        // new node
        node *new_n = malloc(sizeof(node));

        // store scanned word in new node
        strcpy(new_n->word, word);
        // new node's next pointer set to NULL
        new_n->next = NULL;

        // retrieve and store hash code
        unsigned int code = hash(new_n->word);

        // NOTE/BUG: there's no need to append to the end of the list -- pushing
        // on the front is adequate and is faster
#if 0
        // if hash location has no head
        if (HASH_TABLE[code] == NULL) {
            // set new node to location head
            HASH_TABLE[code] = new_n;
        }
        // if head already exists at hash location
        else if (HASH_TABLE[code] != NULL) {
            // set traversal node
            node *trav = HASH_TABLE[code];

            // while traversal node's next pointer is not NULL
            while (trav->next != NULL) {
                // move to next node
                trav = trav->next;
            }

            // if traversal node's next pointer is null
            if (trav->next == NULL) {
                // set new node to traversal node's next pointer
                trav->next = new_n;
            }
        }
#else
        new_n->next = HASH_TABLE[code];
        HASH_TABLE[code] = new_n;
#endif
    }

    // confirm successful load
    return true;
}

// check if word exists in dictionary
#if 0
bool
check(char *word)
#else
bool
check(const char *arg)
#endif
{
    char word[LENGTH + 1];

    // retrieve and store hash code
#if 1
    strcpy(word,arg);
#endif
    unsigned int code = hash(word);

    // set traversal node to hash location head
    node *check = HASH_TABLE[code];

    // while traversal node is not NULL
    while (check != NULL) {
        // compare traversal node's word to provided word argument
// NOTE/BUG: strcmp is faster than strcasecmp if we convert to lowercase _once_
#if 0
        int check_true = strcasecmp(check->word, word);
#else
        int check_true = strcmp(check->word, word);
#endif

#if 0
        // if a match is found, return true
        if (check_true == 0) {
            return true;
        }
        // if no match, move to next node
        else if (check_true != 0) {
            check = check->next;
        }
#else
        if (check_true == 0)
            return true;

        check = check->next;
#endif
    }

    // if end of list is reached without a match, return false
#if 0
    if (check == NULL)
        return false;
#else
    return false;
#endif
}

// unload dictionary from memory, free memory
// (CURRENTLY DEBUGGING, CHECKING ITERATION)
bool
unload(void)
{
    // DEBUG DEBUG DEBUG (changin all nodes' words to "FREE" to test iteration)
#if 0
    char *word = "FREE";
#endif

    // for every element in the hash table, HASH_MAX (50)
    for (int i = 0; i < HASH_MAX; i++) {
#if 0
        // DEBUG DEBUG DEBUG (print current hash table location)
        printf("LOCATION %02d:\n", i);
        // while the head node's word is not "FREE"
        while (strcmp(HASH_TABLE[i]->word, word) != 0) {
            // set traversal node to head
            node *trav = HASH_TABLE[i];

            // DEBUG DEBUG DEBUG (print head word to confirm while condition)
            printf("HEAD WORD: %s\n", HASH_TABLE[i]->word);

            // while the traversal node's word is not "FREE"
            while (strcmp(trav->next->word, word) != 0) {
                // move to next node
                trav = trav->next;
                // DEBUG DEBUG DEBUG (print a dot for every location skipped)
                printf(".");
            }

            // DEBUG DEBUG DEBUG
            printf("\n");

            // set traversal node's word to "FREE"
            strcpy(trav->word, word);

            // DEBUG DEBUG DEBUG
            printf("{");

            // DEBUG DEBUG DEBUG (print hash location's current list of words)
            while (trav != NULL) {
                // DEBUG DEBUG DEBUG
                printf("%s, ", trav->word);
            }

            // DEBUG DEBUG DEBUG
            printf("}\n\n");
        }
#else
        node *nxt;
        for (node *cur = HASH_TABLE[i];  cur != NULL;  cur = nxt) {
            nxt = cur->next;
            free(cur);
        }
        HASH_TABLE[i] = NULL;
#endif
    }

    // freed successfully
    return true;
}

// print hash table contents and node locations
void
print(void)
{
    // for every element in the hash table
    for (int i = 0; i < HASH_MAX; i++) {
        // set traversal node to current hash table element head
        node *check = HASH_TABLE[i];

        // print hash table element location
        printf("LIST %02d: {", i);

        // for all nodes in the current linked list
        while (check != NULL) {
            // print traversal node's word
            printf("%s, ", check->word);
            // move to next node
            check = check->next;
        }

        printf("}\n");
    }

    printf("\n");

// NOTE/BUG: why reread dictionary after printing it?
#if 0
    // open dictionary file
    FILE *dictionary = fopen("C:/Users/aaron/Desktop/Dictionary.txt", "r");

    // for all words in the dictionary
    while (!feof(dictionary)) {
        // store next word
        char word[LENGTH + 1];

        // scan for next word
        fscanf(dictionary, "%s", word);

        // retrieve and store word's hash code
        unsigned int code = hash(word);

        // set traversal node to hash location head
        node *search = HASH_TABLE[code];

        // for all nodes at that location, or until word is found
        while (search != NULL) {
            // compare traversal node's word to scanned word (case insensitive)
            if (strcasecmp(search->word, word) == 0) {
                // print traversal node's word and location
                printf("%s: %p\n", search->word, search);
                // break while loop
                break;
            }
            else {
                // if traversal node's word does not match scanned word,
                // move to next node
                search = search->next;
            }
        }

        // if the scanned word matches none of the words in the hash location's
        // linked list
        if (search == NULL)
            // word not found
            printf("\"%s\" NOT FOUND\n", word);
    }

    // close dictionary file
    fclose(dictionary);
#endif
}

Here's a version that has the #if 0 blocks removed.

Also, I've added a slight reordering in the load function, so that it inputs the data directly into the final place inside the node element (i.e. eliminates the intermediate buffer and a strcpy)

#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>
#include <string.h>
#include <ctype.h>

// Max elements in hash table
enum {
    HASH_MAX = 50
};

// Max length of word to be stored
enum {
    LENGTH = 20
};

// assign hash code -- [(code + current letter) * 3] * string length, % HASH_MAX
unsigned int hash(char *word);

// load dictionary into memory
bool load(FILE *dictionary);

// check if word exists in dictionary
bool check(const char *word);

// unload dictionary from memory, free memory (CURRENTLY DEBUGGING,
// CHECKING ITERATION)
bool unload(void);

// print table contents and node locations
void print(void);

// node structure: stored word, pointer to next node
typedef struct _node {
    char word[LENGTH + 1];
    struct _node *next;
} node;

node *HASH_TABLE[HASH_MAX];

int
main(int argc, char *argv[])
{
    // open dictionary file, read
    FILE *dictionary = fopen("Dictionary.txt", "r");

    // if dictionary is NULL, return error message, end program
    if (!dictionary) {
        printf("FILE NOT FOUND\n");
        return 1;
    }

    // if dictionary loaded successfully (function call), close dictionary and
    // print table contents
    if (load(dictionary)) {
        fclose(dictionary);
        // print "LIST (number): {(name, address), ...}\n
        print();
    }

    // test check function for word that does not exist in the library
    char *checkword = "Albatross";

    // test check function for word that does exist in the library
    char *checkword2 = "Riku";

    // return check results for checkword, found or not found
    if (check(checkword)) {
        printf("\n%s found\n", checkword);
    }
    else {
        printf("\n%s not found\n", checkword);
    }

    // return check results for checkword2, found or not found
    if (check(checkword2)) {
        printf("\n%s found\n", checkword2);
    }
    else {
        printf("\n%s not found\n", checkword2);
    }

    // if unloaded successfully (function call), print contents
    if (unload()) {
        // DEBUG DEBUG DEBUG (confirm unload function returned true)
        printf("\nUNLOADED...\n\n");
        print();
    }
}

// assign hash code -- [(code + current letter) * 3] * string length, % HASH_MAX
unsigned int
hash(char *word)
{
    // store converted word for uniform key

    // hash code
    unsigned int code = 0;

    unsigned char chr;
    while (1) {
        chr = *word;

        if (chr == 0)
            break;

        chr = tolower(chr);
        *word++ = chr;

        code += chr;
        code *= 3;
    }

    // set code to remainder of current code divided by maximum hash table size
    code = code % HASH_MAX;

    return code;
}

// load dictionary into memory
bool
load(FILE *dictionary)
{

    // scan for next word
    while (1) {
        // new node
        node *new_n = malloc(sizeof(node));

        if (fscanf(dictionary, "%s", new_n->word) != 1) {
            free(new_n);
            break;
        }

        // store scanned word in new node
        new_n->next = NULL;

        // retrieve and store hash code
        unsigned int code = hash(new_n->word);

        // pushing on the front of the list is adequate and is faster
        new_n->next = HASH_TABLE[code];
        HASH_TABLE[code] = new_n;
    }

    // confirm successful load
    return true;
}

// check if word exists in dictionary
bool
check(const char *arg)
{
    char word[LENGTH + 1];

    // retrieve and store hash code
    strcpy(word,arg);
    unsigned int code = hash(word);

    // set traversal node to hash location head
    node *check = HASH_TABLE[code];

    // while traversal node is not NULL
    while (check != NULL) {
        // compare traversal node's word to provided word argument
        int check_true = strcmp(check->word, word);

        if (check_true == 0)
            return true;

        check = check->next;
    }

    // if end of list is reached without a match, return false
    return false;
}

// unload dictionary from memory, free memory
// (CURRENTLY DEBUGGING, CHECKING ITERATION)
bool
unload(void)
{

    // for every element in the hash table, HASH_MAX (50)
    for (int i = 0; i < HASH_MAX; i++) {
        node *nxt;
        for (node *cur = HASH_TABLE[i];  cur != NULL;  cur = nxt) {
            nxt = cur->next;
            free(cur);
        }
        HASH_TABLE[i] = NULL;
    }

    // freed successfully
    return true;
}

// print hash table contents and node locations
void
print(void)
{
    // for every element in the hash table
    for (int i = 0; i < HASH_MAX; i++) {
        // set traversal node to current hash table element head
        node *check = HASH_TABLE[i];

        // print hash table element location
        printf("LIST %02d: {", i);

        // for all nodes in the current linked list
        while (check != NULL) {
            // print traversal node's word
            printf("%s, ", check->word);

            // move to next node
            check = check->next;
        }

        printf("}\n");
    }

    printf("\n");
}

UPDATE:

Could you please explain for (int chr = *word++; chr != 0; chr = *word++)? I don't know what *word++ means in this context.

Sure. With chr = *word++; it means dereference word [a char pointer]. This fetches the char value pointed to by word (i.e. fetch the value from memory). Then, set this value into chr. Then, increment word [so it points to the next character in the array.

The statement is composed of three operators: = is the assignment operator. * is a dereference operator and ++ is a post-decrement operator.

Based on the precedence [and/or binding] of the operators, * has higher precedence [tighter binding], so it is performed first. The value is placed in chr. Then, ++ is performed on the value in word. It is as the following is performed as a single statement:

chr = *word;
word += 1;

chr = tolower(chr); should be chr = tolower((unsigned char)chr); for reasons explained in my answer. Alternatively, you could define chr as unsigned char chr;

I was under the impression that tolower et. al. were "self protective" of this (e.g. they did the unsigned char cast). But, the [linux] manpage says its UB if the value is out of range. I've edited the second example to use unsigned char chr;.

Strangely, for glibc's tolower, it has a range check built it that works on the int value and returns the original value (i.e. does not index into the translation table) if the value is out of range. This appears to be part of some BSD compatibility [the BSD manpage states it does a range check, but the feature is deprecated]. I'm guessing the glibc range check as added after the manpage was written.

To me, the macro should just do the cast itself [and the global function as well]. But, I think this might break the BSD compatibility.

But, now we're all hamstrung to the old way [or add a wrapper macro] because of backward compatibility.


it is confusing for hash to have a side effect on its argument and further confusing that this side effect be necessary for the strcmp in check to work.

The side effect is [probably] no more [or, perhaps, even less] egregious than what strtok does. That is, it's not modifying a hidden/unrelated global, etc.

IMO, it wouldn't be confusing if the effect were commented [I documented it in the answer text]. Perhaps renaming hash to something a bit more descriptive would help. We could do: take_hash_of_argument_that_we_modify_to_lowercase_first.

That would make the function name "self documenting" as some (e.g. "Uncle" Bob Martin(?)) might suggest member functions should be.

But, maybe hash_and_lowercase might be better. This might be a sufficient clue to the reader that they need to consult the API documentation for the function rather than assuming they know all about it from just the name.

The linked list traversal is much faster with strcmp, so, at a minimum [architecturally] we want to store lower case strings in the nodes. We don't want to repeat the lowercasing for each node on each scan. And, we don't want strcasecmp to repeat the lowercasing on word [and the string in the node] for each loop iteration.

As you say, we could have two functions. And we could still achieve this refactoring: a string based version of tolower that lowercases its argument and leave the hash as it was done originally.

Originally, I considered this approach. I soon realized that everywhere you did a hash, you wanted it to be on the lowercased string. We could achieve this with (e.g.):

strlower(word);
value = hash(word);

But, there wasn't a use case here for doing one of these calls separately--only in pairs.

So, given that, why scan the argument string twice and slow down the operation by 2x?

From JFK [after the failed Bay of Pigs invasion]: Mistakes are not errors if we admit them.

So, I'd paraphrase that as: Side effects are not errors if we document them.


推荐阅读