Skip to content

string_stats

string_stats ¤

StringStatistics ¤

StringStatistics(
    dataframe: DataFrame, column_name: str = "systime"
)

Bases: Base

Provides class methods to calculate statistics on string columns in a pandas DataFrame.

count_unique classmethod ¤

count_unique(
    dataframe: DataFrame, column_name: str = "value_string"
) -> int

Returns the number of unique strings in the column.

most_frequent classmethod ¤

most_frequent(
    dataframe: DataFrame, column_name: str = "value_string"
) -> str

Returns the most frequent string in the column.

count_most_frequent classmethod ¤

count_most_frequent(
    dataframe: DataFrame, column_name: str = "value_string"
) -> int

Returns the count of the most frequent string in the column.

count_null classmethod ¤

count_null(
    dataframe: DataFrame, column_name: str = "value_string"
) -> int

Returns the number of null (NaN) values in the column.

average_string_length classmethod ¤

average_string_length(
    dataframe: DataFrame, column_name: str = "value_string"
) -> float

Returns the average length of strings in the column, excluding null values.

longest_string classmethod ¤

longest_string(
    dataframe: DataFrame, column_name: str = "value_string"
) -> str

Returns the longest string in the column.

shortest_string classmethod ¤

shortest_string(
    dataframe: DataFrame, column_name: str = "value_string"
) -> str

Returns the shortest string in the column.

string_length_summary classmethod ¤

string_length_summary(
    dataframe: DataFrame, column_name: str = "value_string"
) -> pd.DataFrame

Returns a summary of string lengths, including min, max, and average lengths.

most_common_n_strings classmethod ¤

most_common_n_strings(
    dataframe: DataFrame,
    n: int,
    column_name: str = "value_string",
) -> pd.Series

Returns the top N most frequent strings in the column.

contains_substring_count classmethod ¤

contains_substring_count(
    dataframe: DataFrame,
    substring: str,
    column_name: str = "value_string",
) -> int

Counts how many strings contain the specified substring.

starts_with_count classmethod ¤

starts_with_count(
    dataframe: DataFrame,
    prefix: str,
    column_name: str = "value_string",
) -> int

Counts how many strings start with the specified prefix.

ends_with_count classmethod ¤

ends_with_count(
    dataframe: DataFrame,
    suffix: str,
    column_name: str = "value_string",
) -> int

Counts how many strings end with the specified suffix.

uppercase_percentage classmethod ¤

uppercase_percentage(
    dataframe: DataFrame, column_name: str = "value_string"
) -> float

Returns the percentage of strings that are fully uppercase.

lowercase_percentage classmethod ¤

lowercase_percentage(
    dataframe: DataFrame, column_name: str = "value_string"
) -> float

Returns the percentage of strings that are fully lowercase.

contains_digit_count classmethod ¤

contains_digit_count(
    dataframe: DataFrame, column_name: str = "value_string"
) -> int

Counts how many strings contain digits.

summary_as_dict classmethod ¤

summary_as_dict(
    dataframe: DataFrame, column_name: str
) -> Dict[str, Union[int, str, float]]

Returns a dictionary with comprehensive string statistics for the specified column.

summary_as_dataframe classmethod ¤

summary_as_dataframe(
    dataframe: DataFrame, column_name: str
) -> pd.DataFrame

Returns a DataFrame with comprehensive string statistics for the specified column.